[datatable-help] New function fread() in v1.8.7

Matthew Dowle mdowle at mdowle.plus.com
Fri Dec 21 19:28:58 CET 2012


Hi datatablers,

Feedback and bug reports much appreciated :

=====
New function fread(), a fast and friendly file reader.
* header, skip, nrows, sep and colClasses are all auto detected.
* integers>2^31 are detected and read natively as bit64::integer64.
* accepts filenames, URLs and "A,B\n1,2\n3,4" directly
* new implementation entirely in C
* with a 50MB .csv, 1 million rows x 6 columns :
     read.csv("test.csv")                                   # 30-60 sec
     read.table("test.csv",<all known tricks, known nrows>) #    10 sec
     fread("test.csv")                                      #     3 sec
* airline data: 658MB csv (7 million rows x 29 columns)
     read.table("2008.csv",<all known tricks, known nrows>) #   360 sec
     fread("2008.csv")                                      #    50 sec
See ?fread. Many thanks to Chris Neff and Garrett See for ideas,
discussions and beta testing.
=====

1.8.7 is passing checks on Unix and Windows (but not Mac yet) :

   install.packages("data.table", repos="http://R-Forge.R-project.org")
   require(data.table)
   ?fread
   fread("your biggest baddest file")

Oddly, R-Forge appears to be compiling Win64 with -O2 optimization 
rather
than -O3 (but -O3 on Win32 ok), so speedups might not be as great on 
Win64
until that can be resolved on R-Forge, unless you compile yourself. -O3
has some optimizations that fread may benefit from. But interested to 
hear.

Seasons greatings!

Matthew




More information about the datatable-help mailing list