[datatable-help] fread colClasses or skip

Hideyoshi Maeda hideyoshi.maeda at gmail.com
Fri Jul 5 16:34:12 CEST 2013


sorry also error in the URL should be….

>> http://www.truefx.com/dev/data/2013/MAY-2013/AUDUSD-2013-05.zip
On 5 Jul 2013, at 15:24, Hideyoshi Maeda <hideyoshi.maeda at gmail.com> wrote:

> One other thought I had was perhaps it might be better to just preallocate column names if need be and then read in?
> 
> 
> On 5 Jul 2013, at 14:55, Hideyoshi Maeda <hideyoshi.maeda at gmail.com> wrote:
> 
>> Hi,
>> 
>> I would like to be able to skip a column that is read into R via fread. But the csv I am reading in, has no column headers…which appears to be a problem for fread…is there a way to just specify that I don't want specific columns?
>> 
>> To give an example…
>> 
>> I downloaded the data from the following URL
>> 
>> http://www.truefx.com/dev/data/2013/JUNE-2013/AUDUSD-2013-05.zip
>> 
>> unzipped it…
>> 
>> and read the csv into R using fread and it has pretty much the same file name just with the csv extension.
>> 
>>> system.time(pp <- fread("AUDUSD-2013-05.csv",sep=","))
>>  user  system elapsed 
>> 16.427   0.257  16.682 
>>> head(pp)
>>       V1                    V2      V3      V4
>> 1: AUD/USD 20130501 00:00:04.728 1.03693 1.03721
>> 2: AUD/USD 20130501 00:00:21.540 1.03695 1.03721
>> 3: AUD/USD 20130501 00:00:33.789 1.03694 1.03721
>> 4: AUD/USD 20130501 00:00:37.499 1.03692 1.03724
>> 5: AUD/USD 20130501 00:00:37.524 1.03697 1.03719
>> 6: AUD/USD 20130501 00:00:39.789 1.03697 1.03717
>>> str(pp)
>> Classes ‘data.table’ and 'data.frame':	4060762 obs. of  4 variables:
>> $ V1: chr  "AUD/USD" "AUD/USD" "AUD/USD" "AUD/USD" ...
>> $ V2: chr  "20130501 00:00:04.728" "20130501 00:00:21.540" "20130501 00:00:33.789" "20130501 00:00:37.499" ...
>> $ V3: num  1.04 1.04 1.04 1.04 1.04 ...
>> $ V4: num  1.04 1.04 1.04 1.04 1.04 ...
>> - attr(*, ".internal.selfref")=<externalptr> 
>> 
>> I tried using the new(ish) colClasses or skip arguments to ignore the fact that the first column is all the same…and is unnecessary.
>> 
>> but doing:
>> 
>> pp1 <- fread("AUDUSD-2013-05.csv",sep=",",skip=1)
>> 
>> doesn't omit the reading in of the first column
>> 
>> and using colClasses leads to the following error
>> 
>> pp1 <- fread("AUDUSD-2013-05.csv",sep=",",colClasses=list(NULL,"character","numeric","numeric"))
>> 
>> Error in fread("AUDUSD-2013-05.csv", sep = ",", colClasses = list(NULL,  : 
>> colClasses is type list but has no names
>> 
>> Are there any suggestions to be able to speed up the reading in of data by omitting the first column?
>> 
>> Also perhaps a bit much to ask, but is it possible to directly read a zip file rather than unzipping it first and then reading in the csv?
>> 
>> Oh and if it wasn't clear I'm using v1.8.9
>> 
>> As always, thanks for all of your help, effort and advice in advance.
>> 
>> HLM
> 



More information about the datatable-help mailing list