Chun-Jie Liu

Arrow, DuckDB, data.table and vroom

Chun-Jie Liu · 2022-08-03

R packages for loading and wrangling tabular data

Fatest way to read and tidy tabular data then import it into an embedding database.

vroom is really fast to read the big tabular data (> 10 million observations) into R working environment, it is indeed best choice to load data into memory. However, it’s very slow and taking very large memory to wranggle rows or columns into tidy data frame. Even the simplest query or filtering of one observation takes unexpected time.

Apache arrow

DuckDB vs. SQLite

data.table and dtplyr

Best choice for practical demands

My problem is to load