Dingrui‘s Blog

Data Science · Accounting & Finance · Random Thoughts

分类:Data Science

Excel 数组公式 , Alteryx, VBA 与R

这个星期来PwC上班以后,第一次又重新干起了老本行,写VBA和做template (Excel Based)。

做template,最痛苦的事,莫过于design你的template。一方面要囊括尽可能多的,有用的信息给用户,另一方面又要考虑用户会怎么会去使用你的template (这一点我在悉尼大学工作的时候和Casey同学学习到了很多)。PwC又尤其注重你的效率,所以还得考虑到后续的功能增删所耗费的时间与精力。总之让我深刻体验到了,在非码农部门coding的痛苦。

  1. 没有产品设计手册
    咱们就来简化一点,也别来个手册文档了,能把要求说清楚就不错了,更别提能老老实实写在纸面上而且保证后面不赖账。说多了都是泪
  2. 不合理的预期
    Coding 并不是万能药,尤其是在这么一个快节奏的工作环境,经常就是4,5个小时内就要开发完成,想想看这也不太可能。更别提VBA那个屎一样的coding 感受,真是欲仙欲死。

吐槽完毕,开始聊聊我觉得有意识的地方。

数组公式

好久没写数组公式了,这几天才反应过来,数组公式和R其实写起来感觉差不多,尤其如果你比较习惯R里面向量化的写法的话。

Read More

发现一个很有意思的package: drake – A Pipeline Toolkit for Reproducible Computation at Scale

先上官方的mannual 还特别贴心写了本书来教你怎么用

由于我接触这个包的时间较短,以下内容大约只覆盖了这个包5%的内容。

我最主要用的三个function,drake_plan,make以及vis_drake_graph

以下是一个简单的例子

library(drake)
library(data.table)

#随便写一些dummy 函数
#尽量只让`dataframe`作为唯一的parameter

#给任意一个data.table 加ID列
add_id <- function(dt){
  return(dt[,ID:=.I])
}

#iris数据集,选取每个种类最小值
get_min_measures <- function(dt){
  return(dt[,lapply(.SD,min),by=.(Species),.SDcols=c(1:4)])
}

#构筑workflow plan

my_plan <- drake_plan(raw_data = fread(file_in('iris.csv')), #读取input,需要用file_in()来告诉drake这是个input
                      indexed_dt = add_id(raw_data),#用上一步的名字作为argument
                      min_measures_species = get_min_measures(raw_data),
                      output = fwrite(indexed_dt,file_out('iris100.csv')))#同理,需要用file_out()来告诉drake这是个input

Read More

A Simple Way to Call VBA Macro and Pass Arguments From R and Python

Sometimes you may encounter huge legacy VBA codes and you are so redundant to re-develop them in other languages (ie. R and Python). However, you really want to add running VBA codes in your workflow. Now there is a simple solution for R and Python. (Tested on Windows OS)

Sample VBA code

We have following macro.


Public Sub Test_Add(Arg1, Arg2) Sheets(1).Range("a1").Value = Arg1 + Arg2 End Sub

We would like to pass the 2 numbers to this macro and write the value in the excel book.

For R

Read More

Functional Programming in R (Using purrr Package)

I wrote a small article about purrr packge before.

Now I think it’s time to write a better article introducing the purrr package.

You can find the official website through this link.

Map Family

The map family is used to apply function or functions over a list or vector.

The “primary” function is the map function.

library(purrr)
#Remember: map always return a list rather than a vector
test_list <- list(a=c(1,2,3),
                  b=c(2,3,4),
                  c=c(3,4,5))

map(test_list,mean)
#> $a
#> [1] 2
#> 
#> $b
#> [1] 3
#> 
#> $c
#> [1] 4

Read More

Dingrui’s Useful VBA Scripts

This blog will be updated from time to time. Please check it regularly

Here are some useful VBA scripts (Macros in Excel).

You can download it through following links.

Please enable macro settings before using the following scripts.(Click this link for more information)

  1. Web Scraping Using VBA

This one calls the “IE” explorer in windows OS.(So if you are using Mac OS, please skip this one)
Please click correct button based on your location(Since chinese government blocks Google in mainland area)
It will pop up an IE window with a Google searching result page or a Baidu searching result page.

Read More

Datacamp Certificates

I will show some certificates from Datacamp.

Datacamp is a really good website for studying data science no matter you want to study R or Python.

Certificates: (Last course was finished at 21 Jan 2019. 26 Courses were finished in total. )

Dingrui’s Useful R scripts

This blog will be updated from time to time. Please check it regularly

All scripts will be based on the following packages. I really appreciate the authors who develop these packages that make my life and work both interesting and easy.
  • tidyverse
  • data.table
  • readxl
  • writexl
  • lubridate
  • RMySQL
  • RSelenium (If you have trouble on installing RSelenium, please go to this link for further reference.)

Read More

The Key Value of Data Analysis

A funny joke

Some people will ask,’What kind of data analysis is best?’ The ‘big data’? The ‘small data’? or ‘structured data’?

Honestly, I still can’t answer this question.

But I will ask another question ‘The black cat is better or the white cat?’

Read More

R Scripts for Combining Excels Files

Excel format file might be the most common one you will face in the business or accouting job. Here are some tips on how to combine excels files using R.

Preparation

There are two packages we need-tidyverse and readxl all created by Hadley Wickham

If you are interested in more of them, feel free to go to their documentation readxl and tidyverse

Let’s Do It!

Read More