[datatable-help] New function fread() in v1.8.7
Matthew Dowle
mdowle at mdowle.plus.com
Fri Dec 21 19:28:58 CET 2012
Hi datatablers,
Feedback and bug reports much appreciated :
=====
New function fread(), a fast and friendly file reader.
* header, skip, nrows, sep and colClasses are all auto detected.
* integers>2^31 are detected and read natively as bit64::integer64.
* accepts filenames, URLs and "A,B\n1,2\n3,4" directly
* new implementation entirely in C
* with a 50MB .csv, 1 million rows x 6 columns :
read.csv("test.csv") # 30-60 sec
read.table("test.csv",<all known tricks, known nrows>) # 10 sec
fread("test.csv") # 3 sec
* airline data: 658MB csv (7 million rows x 29 columns)
read.table("2008.csv",<all known tricks, known nrows>) # 360 sec
fread("2008.csv") # 50 sec
See ?fread. Many thanks to Chris Neff and Garrett See for ideas,
discussions and beta testing.
=====
1.8.7 is passing checks on Unix and Windows (but not Mac yet) :
install.packages("data.table", repos="http://R-Forge.R-project.org")
require(data.table)
?fread
fread("your biggest baddest file")
Oddly, R-Forge appears to be compiling Win64 with -O2 optimization
rather
than -O3 (but -O3 on Win32 ok), so speedups might not be as great on
Win64
until that can be resolved on R-Forge, unless you compile yourself. -O3
has some optimizations that fread may benefit from. But interested to
hear.
Seasons greatings!
Matthew
More information about the datatable-help
mailing list