[datatable-help] fread colClasses or skip

Hideyoshi Maeda hideyoshi.maeda at gmail.com
Fri Jul 5 16:24:45 CEST 2013


One other thought I had was perhaps it might be better to just preallocate column names if need be and then read in?


On 5 Jul 2013, at 14:55, Hideyoshi Maeda <hideyoshi.maeda at gmail.com> wrote:

> Hi,
> 
> I would like to be able to skip a column that is read into R via fread. But the csv I am reading in, has no column headers…which appears to be a problem for fread…is there a way to just specify that I don't want specific columns?
> 
> To give an example…
> 
> I downloaded the data from the following URL
> 
> http://www.truefx.com/dev/data/2013/JUNE-2013/AUDUSD-2013-05.zip
> 
> unzipped it…
> 
> and read the csv into R using fread and it has pretty much the same file name just with the csv extension.
> 
>> system.time(pp <- fread("AUDUSD-2013-05.csv",sep=","))
>   user  system elapsed 
> 16.427   0.257  16.682 
>> head(pp)
>        V1                    V2      V3      V4
> 1: AUD/USD 20130501 00:00:04.728 1.03693 1.03721
> 2: AUD/USD 20130501 00:00:21.540 1.03695 1.03721
> 3: AUD/USD 20130501 00:00:33.789 1.03694 1.03721
> 4: AUD/USD 20130501 00:00:37.499 1.03692 1.03724
> 5: AUD/USD 20130501 00:00:37.524 1.03697 1.03719
> 6: AUD/USD 20130501 00:00:39.789 1.03697 1.03717
>> str(pp)
> Classes ‘data.table’ and 'data.frame':	4060762 obs. of  4 variables:
> $ V1: chr  "AUD/USD" "AUD/USD" "AUD/USD" "AUD/USD" ...
> $ V2: chr  "20130501 00:00:04.728" "20130501 00:00:21.540" "20130501 00:00:33.789" "20130501 00:00:37.499" ...
> $ V3: num  1.04 1.04 1.04 1.04 1.04 ...
> $ V4: num  1.04 1.04 1.04 1.04 1.04 ...
> - attr(*, ".internal.selfref")=<externalptr> 
> 
> I tried using the new(ish) colClasses or skip arguments to ignore the fact that the first column is all the same…and is unnecessary.
> 
> but doing:
> 
> pp1 <- fread("AUDUSD-2013-05.csv",sep=",",skip=1)
> 
> doesn't omit the reading in of the first column
> 
> and using colClasses leads to the following error
> 
> pp1 <- fread("AUDUSD-2013-05.csv",sep=",",colClasses=list(NULL,"character","numeric","numeric"))
> 
> Error in fread("AUDUSD-2013-05.csv", sep = ",", colClasses = list(NULL,  : 
>  colClasses is type list but has no names
> 
> Are there any suggestions to be able to speed up the reading in of data by omitting the first column?
> 
> Also perhaps a bit much to ask, but is it possible to directly read a zip file rather than unzipping it first and then reading in the csv?
> 
> Oh and if it wasn't clear I'm using v1.8.9
> 
> As always, thanks for all of your help, effort and advice in advance.
> 
> HLM



More information about the datatable-help mailing list