分类: R

Excel 数组公式 , Alteryx, VBA 与R

这个星期来PwC上班以后,第一次又重新干起了老本行,写VBA和做template (Excel Based)。

做template,最痛苦的事,莫过于design你的template。一方面要囊括尽可能多的,有用的信息给用户,另一方面又要考虑用户会怎么会去使用你的template (这一点我在悉尼大学工作的时候和Casey同学学习到了很多)。PwC又尤其注重你的效率,所以还得考虑到后续的功能增删所耗费的时间与精力。总之让我深刻体验到了,在非码农部门coding的痛苦。

  1. 没有产品设计手册
  2. 不合理的预期
    Coding 并不是万能药,尤其是在这么一个快节奏的工作环境,经常就是4,5个小时内就要开发完成,想想看这也不太可能。更别提VBA那个屎一样的coding 感受,真是欲仙欲死。




发现一个很有意思的package: drake – A Pipeline Toolkit for Reproducible Computation at Scale

先上官方的mannual 还特别贴心写了本书来教你怎么用





#随便写一些dummy 函数

#给任意一个data.table 加ID列
add_id <- function(dt){

get_min_measures <- function(dt){

#构筑workflow plan

my_plan <- drake_plan(raw_data = fread(file_in('iris.csv')), #读取input,需要用file_in()来告诉drake这是个input
                      indexed_dt = add_id(raw_data),#用上一步的名字作为argument
                      min_measures_species = get_min_measures(raw_data),
                      output = fwrite(indexed_dt,file_out('iris100.csv')))#同理,需要用file_out()来告诉drake这是个input

Functional Programming in R (Using purrr Package)

I wrote a small article about purrr packge before.

Now I think it’s time to write a better article introducing the purrr package.

You can find the official website through this link.

Map Family

The map family is used to apply function or functions over a list or vector.

The “primary” function is the map function.

#Remember: map always return a list rather than a vector
test_list <- list(a=c(1,2,3),

#> $a
#> [1] 2
#> $b
#> [1] 3
#> $c
#> [1] 4

Datacamp Certificates

I will show some certificates from Datacamp.

Datacamp is a really good website for studying data science no matter you want to study R or Python.

Certificates: (Last course was finished at 21 Jan 2019. 26 Courses were finished in total. )

Dingrui’s Useful R scripts

This blog will be updated from time to time. Please check it regularly

All scripts will be based on the following packages. I really appreciate the authors who develop these packages that make my life and work both interesting and easy.
  • tidyverse
  • data.table
  • readxl
  • writexl
  • lubridate
  • RMySQL
  • RSelenium (If you have trouble on installing RSelenium, please go to this link for further reference.)

R Scripts for Combining Excels Files

Excel format file might be the most common one you will face in the business or accouting job. Here are some tips on how to combine excels files using R.


There are two packages we need-tidyverse and readxl all created by Hadley Wickham

If you are interested in more of them, feel free to go to their documentation readxl and tidyverse

Let’s Do It!

