Dingrui‘s Blog

Data Science · Accounting & Finance · Random Thoughts

Tag: r

发现一个很有意思的package: drake – A Pipeline Toolkit for Reproducible Computation at Scale

先上官方的mannual 还特别贴心写了本书来教你怎么用

由于我接触这个包的时间较短,以下内容大约只覆盖了这个包5%的内容。

我最主要用的三个function,drake_plan,make以及vis_drake_graph

以下是一个简单的例子

library(drake)
library(data.table)

#随便写一些dummy 函数
#尽量只让`dataframe`作为唯一的parameter

#给任意一个data.table 加ID列
add_id <- function(dt){
  return(dt[,ID:=.I])
}

#iris数据集,选取每个种类最小值
get_min_measures <- function(dt){
  return(dt[,lapply(.SD,min),by=.(Species),.SDcols=c(1:4)])
}

#构筑workflow plan

my_plan <- drake_plan(raw_data = fread(file_in('iris.csv')), #读取input,需要用file_in()来告诉drake这是个input
                      indexed_dt = add_id(raw_data),#用上一步的名字作为argument
                      min_measures_species = get_min_measures(raw_data),
                      output = fwrite(indexed_dt,file_out('iris100.csv')))#同理,需要用file_out()来告诉drake这是个input

Read More

A Simple Way to Call VBA Macro and Pass Arguments From R and Python

Sometimes you may encounter huge legacy VBA codes and you are so redundant to re-develop them in other languages (ie. R and Python). However, you really want to add running VBA codes in your workflow. Now there is a simple solution for R and Python. (Tested on Windows OS)

Sample VBA code

We have following macro.


Public Sub Test_Add(Arg1, Arg2) Sheets(1).Range("a1").Value = Arg1 + Arg2 End Sub

We would like to pass the 2 numbers to this macro and write the value in the excel book.

For R

Read More

Functional Programming in R (Using purrr Package)

I wrote a small article about purrr packge before.

Now I think it’s time to write a better article introducing the purrr package.

You can find the official website through this link.

Map Family

The map family is used to apply function or functions over a list or vector.

The “primary” function is the map function.

library(purrr)
#Remember: map always return a list rather than a vector
test_list <- list(a=c(1,2,3),
                  b=c(2,3,4),
                  c=c(3,4,5))

map(test_list,mean)
#> $a
#> [1] 2
#> 
#> $b
#> [1] 3
#> 
#> $c
#> [1] 4

Read More

Datacamp Certificates

I will show some certificates from Datacamp.

Datacamp is a really good website for studying data science no matter you want to study R or Python.

Certificates: (Last course was finished at 21 Jan 2019. 26 Courses were finished in total. )

Dingrui’s Useful R scripts

This blog will be updated from time to time. Please check it regularly

All scripts will be based on the following packages. I really appreciate the authors who develop these packages that make my life and work both interesting and easy.
  • tidyverse
  • data.table
  • readxl
  • writexl
  • lubridate
  • RMySQL
  • RSelenium (If you have trouble on installing RSelenium, please go to this link for further reference.)

Read More

R Scripts for Combining Excels Files

Excel format file might be the most common one you will face in the business or accouting job. Here are some tips on how to combine excels files using R.

Preparation

There are two packages we need-tidyverse and readxl all created by Hadley Wickham

If you are interested in more of them, feel free to go to their documentation readxl and tidyverse

Let’s Do It!

Read More