[datatable-help] fread colClasses or skip
Hideyoshi Maeda
hideyoshi.maeda at gmail.com
Fri Jul 5 16:34:12 CEST 2013
sorry also error in the URL should be….
>> http://www.truefx.com/dev/data/2013/MAY-2013/AUDUSD-2013-05.zip
On 5 Jul 2013, at 15:24, Hideyoshi Maeda <hideyoshi.maeda at gmail.com> wrote:
> One other thought I had was perhaps it might be better to just preallocate column names if need be and then read in?
>
>
> On 5 Jul 2013, at 14:55, Hideyoshi Maeda <hideyoshi.maeda at gmail.com> wrote:
>
>> Hi,
>>
>> I would like to be able to skip a column that is read into R via fread. But the csv I am reading in, has no column headers…which appears to be a problem for fread…is there a way to just specify that I don't want specific columns?
>>
>> To give an example…
>>
>> I downloaded the data from the following URL
>>
>> http://www.truefx.com/dev/data/2013/JUNE-2013/AUDUSD-2013-05.zip
>>
>> unzipped it…
>>
>> and read the csv into R using fread and it has pretty much the same file name just with the csv extension.
>>
>>> system.time(pp <- fread("AUDUSD-2013-05.csv",sep=","))
>> user system elapsed
>> 16.427 0.257 16.682
>>> head(pp)
>> V1 V2 V3 V4
>> 1: AUD/USD 20130501 00:00:04.728 1.03693 1.03721
>> 2: AUD/USD 20130501 00:00:21.540 1.03695 1.03721
>> 3: AUD/USD 20130501 00:00:33.789 1.03694 1.03721
>> 4: AUD/USD 20130501 00:00:37.499 1.03692 1.03724
>> 5: AUD/USD 20130501 00:00:37.524 1.03697 1.03719
>> 6: AUD/USD 20130501 00:00:39.789 1.03697 1.03717
>>> str(pp)
>> Classes ‘data.table’ and 'data.frame': 4060762 obs. of 4 variables:
>> $ V1: chr "AUD/USD" "AUD/USD" "AUD/USD" "AUD/USD" ...
>> $ V2: chr "20130501 00:00:04.728" "20130501 00:00:21.540" "20130501 00:00:33.789" "20130501 00:00:37.499" ...
>> $ V3: num 1.04 1.04 1.04 1.04 1.04 ...
>> $ V4: num 1.04 1.04 1.04 1.04 1.04 ...
>> - attr(*, ".internal.selfref")=<externalptr>
>>
>> I tried using the new(ish) colClasses or skip arguments to ignore the fact that the first column is all the same…and is unnecessary.
>>
>> but doing:
>>
>> pp1 <- fread("AUDUSD-2013-05.csv",sep=",",skip=1)
>>
>> doesn't omit the reading in of the first column
>>
>> and using colClasses leads to the following error
>>
>> pp1 <- fread("AUDUSD-2013-05.csv",sep=",",colClasses=list(NULL,"character","numeric","numeric"))
>>
>> Error in fread("AUDUSD-2013-05.csv", sep = ",", colClasses = list(NULL, :
>> colClasses is type list but has no names
>>
>> Are there any suggestions to be able to speed up the reading in of data by omitting the first column?
>>
>> Also perhaps a bit much to ask, but is it possible to directly read a zip file rather than unzipping it first and then reading in the csv?
>>
>> Oh and if it wasn't clear I'm using v1.8.9
>>
>> As always, thanks for all of your help, effort and advice in advance.
>>
>> HLM
>
More information about the datatable-help
mailing list