[datatable-help] fread colClasses or skip

Hideyoshi Maeda hideyoshi.maeda at gmail.com
Fri Jul 5 15:55:38 CEST 2013


Hi,

I would like to be able to skip a column that is read into R via fread. But the csv I am reading in, has no column headers…which appears to be a problem for fread…is there a way to just specify that I don't want specific columns?

To give an example…

I downloaded the data from the following URL

http://www.truefx.com/dev/data/2013/JUNE-2013/AUDUSD-2013-05.zip

unzipped it…

and read the csv into R using fread and it has pretty much the same file name just with the csv extension.

> system.time(pp <- fread("AUDUSD-2013-05.csv",sep=","))
   user  system elapsed 
 16.427   0.257  16.682 
> head(pp)
        V1                    V2      V3      V4
1: AUD/USD 20130501 00:00:04.728 1.03693 1.03721
2: AUD/USD 20130501 00:00:21.540 1.03695 1.03721
3: AUD/USD 20130501 00:00:33.789 1.03694 1.03721
4: AUD/USD 20130501 00:00:37.499 1.03692 1.03724
5: AUD/USD 20130501 00:00:37.524 1.03697 1.03719
6: AUD/USD 20130501 00:00:39.789 1.03697 1.03717
> str(pp)
Classes ‘data.table’ and 'data.frame':	4060762 obs. of  4 variables:
 $ V1: chr  "AUD/USD" "AUD/USD" "AUD/USD" "AUD/USD" ...
 $ V2: chr  "20130501 00:00:04.728" "20130501 00:00:21.540" "20130501 00:00:33.789" "20130501 00:00:37.499" ...
 $ V3: num  1.04 1.04 1.04 1.04 1.04 ...
 $ V4: num  1.04 1.04 1.04 1.04 1.04 ...
 - attr(*, ".internal.selfref")=<externalptr> 

I tried using the new(ish) colClasses or skip arguments to ignore the fact that the first column is all the same…and is unnecessary.

but doing:

pp1 <- fread("AUDUSD-2013-05.csv",sep=",",skip=1)

doesn't omit the reading in of the first column

and using colClasses leads to the following error

pp1 <- fread("AUDUSD-2013-05.csv",sep=",",colClasses=list(NULL,"character","numeric","numeric"))

Error in fread("AUDUSD-2013-05.csv", sep = ",", colClasses = list(NULL,  : 
  colClasses is type list but has no names

Are there any suggestions to be able to speed up the reading in of data by omitting the first column?

Also perhaps a bit much to ask, but is it possible to directly read a zip file rather than unzipping it first and then reading in the csv?

Oh and if it wasn't clear I'm using v1.8.9

As always, thanks for all of your help, effort and advice in advance.

HLM


More information about the datatable-help mailing list