Excel format file might be the most common one you will face in the business or accouting job. Here are some tips on how to combine excels files using R.
There are two packages we need-
readxl all created by Hadley Wickham
Let’s Do It!
1. Get the Path of Excel Files
#Always remember to use forward slash "/" in R > excel_path <- list.files('C:/Blog example',pattern = '*.xlsx',full.names = TRUE)  "C:/Blog example/example1.xlsx" "C:/Blog example/example2.xlsx"  "C:/Blog example/example3.xlsx"
list.files function will help you easily list the path or names of files in a folder. If you only want the names of the file, just change to
full.names = FALSE.
Now we have the full path of the files.
Open and Combine
> combine_excel <- map_df(excel_path,~read_xlsx(.,sheet = 1,col_types = c('text')))
The function we use here is
purrr(included in tidyverse)
read_xlsx(path, sheet = NULL, range = NULL, col_names = TRUE, col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max))
The main arguments here are
path is the full path of the Excel file. If you are too lazy to get the full path, you can use
file.choose() function to pop up a file select window.
sheet can be the sheet index or name.
You can also use
excel_sheets() to get the sheet name like this.
> excel_sheets('C:/Blog example/example1.xlsx')  "Orders"
Sometimes you also need to specify the
read_xlsx() will guess the first couple rows to determine the type of that column. Refer to my experience, it can easily trigger problem. I always import all of them as text.
map_df() is one of the
map family function in
purrr package, which is a nice functional programming toolkit for R.
There are couple funtions of
map family like
map_dbl() etc. You might notice the difference the last 3 characters of each function. It means the type of output expected. Here we expect the Excel can be combined in one big dataframe(tibble), we choose using
map family have similar arguments
.x is the list of arguments that you will put in the function applied. Here the
.x is the
.f is the function you are going to apply on
x. If you don’t have any arguments need to pass to the function,you can simply use the name of function(do not include brackets).
But what if we need to pass another or more same arguments for
.f and specify the arguments inside the
.f. And remember use
. to occupy the postion of
map_df(excel_path,~read_xlsx(.,sheet = 1,col_types = c('text')))
When R read there 3 Excel files in
read_xlsx() will use same arguments
You can also do this in the old traditional way though
for loop. But the functional programming way will make your code more easy to read and save your time to focus on data rather than coding.