[datatable-help] fread colClasses or skip
Hideyoshi Maeda
hideyoshi.maeda at gmail.com
Fri Jul 5 16:24:45 CEST 2013
One other thought I had was perhaps it might be better to just preallocate column names if need be and then read in?
On 5 Jul 2013, at 14:55, Hideyoshi Maeda <hideyoshi.maeda at gmail.com> wrote:
> Hi,
>
> I would like to be able to skip a column that is read into R via fread. But the csv I am reading in, has no column headers…which appears to be a problem for fread…is there a way to just specify that I don't want specific columns?
>
> To give an example…
>
> I downloaded the data from the following URL
>
> http://www.truefx.com/dev/data/2013/JUNE-2013/AUDUSD-2013-05.zip
>
> unzipped it…
>
> and read the csv into R using fread and it has pretty much the same file name just with the csv extension.
>
>> system.time(pp <- fread("AUDUSD-2013-05.csv",sep=","))
> user system elapsed
> 16.427 0.257 16.682
>> head(pp)
> V1 V2 V3 V4
> 1: AUD/USD 20130501 00:00:04.728 1.03693 1.03721
> 2: AUD/USD 20130501 00:00:21.540 1.03695 1.03721
> 3: AUD/USD 20130501 00:00:33.789 1.03694 1.03721
> 4: AUD/USD 20130501 00:00:37.499 1.03692 1.03724
> 5: AUD/USD 20130501 00:00:37.524 1.03697 1.03719
> 6: AUD/USD 20130501 00:00:39.789 1.03697 1.03717
>> str(pp)
> Classes ‘data.table’ and 'data.frame': 4060762 obs. of 4 variables:
> $ V1: chr "AUD/USD" "AUD/USD" "AUD/USD" "AUD/USD" ...
> $ V2: chr "20130501 00:00:04.728" "20130501 00:00:21.540" "20130501 00:00:33.789" "20130501 00:00:37.499" ...
> $ V3: num 1.04 1.04 1.04 1.04 1.04 ...
> $ V4: num 1.04 1.04 1.04 1.04 1.04 ...
> - attr(*, ".internal.selfref")=<externalptr>
>
> I tried using the new(ish) colClasses or skip arguments to ignore the fact that the first column is all the same…and is unnecessary.
>
> but doing:
>
> pp1 <- fread("AUDUSD-2013-05.csv",sep=",",skip=1)
>
> doesn't omit the reading in of the first column
>
> and using colClasses leads to the following error
>
> pp1 <- fread("AUDUSD-2013-05.csv",sep=",",colClasses=list(NULL,"character","numeric","numeric"))
>
> Error in fread("AUDUSD-2013-05.csv", sep = ",", colClasses = list(NULL, :
> colClasses is type list but has no names
>
> Are there any suggestions to be able to speed up the reading in of data by omitting the first column?
>
> Also perhaps a bit much to ask, but is it possible to directly read a zip file rather than unzipping it first and then reading in the csv?
>
> Oh and if it wasn't clear I'm using v1.8.9
>
> As always, thanks for all of your help, effort and advice in advance.
>
> HLM
More information about the datatable-help
mailing list