[datatable-help] New function fread() in v1.8.7

Hideyoshi Maeda hideyoshi.maeda at gmail.com
Mon Dec 24 12:04:51 CET 2012


Hi Matthew,

I am using the new `data.table` `fread()` function to read my csv files, which has the format as follows when using the read.csv function

            Date.and.Time Open High  Low Close Volume
    1 2007/01/01 22:51:00 5683 5683 5673  5673     64
    2 2007/01/01 22:52:00 5675 5676 5674  5674     17
    3 2007/01/01 22:53:00 5674 5674 5673  5674     42

The value of the first column is all of: `2007/01/01 22:53:00`, the next 5 columns are separated with commas.

but when reading the same file using fread i get the following output

        V1 V2                                             V3
    1 2007  1 01 22:51:00,5683.00,5683.00,5673.00,5673.00,64
    2 2007  1 01 22:52:00,5675.00,5676.00,5674.00,5674.00,17
    3 2007  1 01 22:53:00,5674.00,5674.00,5673.00,5674.00,42

This is because the autodetect is using the "/" as a separator...

I tried overriding this using the `sep=","` argument but this does not seem to be used in the function anywhere.

Furthremore when using verbose I get the following output, which suggests that I was right in thinking that "/" is used as a separator rather than ",".

Is there any way to fix this, so that it correctly reads all 6 columns separately?

Thanks

HLM

On 21 Dec 2012, at 18:28, Matthew Dowle <mdowle at mdowle.plus.com> wrote:

> 
> Hi datatablers,
> 
> Feedback and bug reports much appreciated :
> 
> =====
> New function fread(), a fast and friendly file reader.
> * header, skip, nrows, sep and colClasses are all auto detected.
> * integers>2^31 are detected and read natively as bit64::integer64.
> * accepts filenames, URLs and "A,B\n1,2\n3,4" directly
> * new implementation entirely in C
> * with a 50MB .csv, 1 million rows x 6 columns :
>    read.csv("test.csv")                                   # 30-60 sec
>    read.table("test.csv",<all known tricks, known nrows>) #    10 sec
>    fread("test.csv")                                      #     3 sec
> * airline data: 658MB csv (7 million rows x 29 columns)
>    read.table("2008.csv",<all known tricks, known nrows>) #   360 sec
>    fread("2008.csv")                                      #    50 sec
> See ?fread. Many thanks to Chris Neff and Garrett See for ideas,
> discussions and beta testing.
> =====
> 
> 1.8.7 is passing checks on Unix and Windows (but not Mac yet) :
> 
>  install.packages("data.table", repos="http://R-Forge.R-project.org")
>  require(data.table)
>  ?fread
>  fread("your biggest baddest file")
> 
> Oddly, R-Forge appears to be compiling Win64 with -O2 optimization rather
> than -O3 (but -O3 on Win32 ok), so speedups might not be as great on Win64
> until that can be resolved on R-Forge, unless you compile yourself. -O3
> has some optimizations that fread may benefit from. But interested to hear.
> 
> Seasons greatings!
> 
> Matthew
> 
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help



More information about the datatable-help mailing list