[datatable-help] fread: coercion of class from integer to character due to NA string.
Vivianne Vilar
viviannevilar at gmail.com
Thu Apr 25 02:38:28 CEST 2013
Hi there,
I think this is probably a known issue, but just in case, here it is.
I am trying to use fread to read a very large csv file, but I am having
problems due to the fact that NAs in a numeric column are represented with
some letters. For example, in my column of SIC codes I have "Z" to
represent NAs. Even though I explicitly set those to be NAs in the command:
data6281 <- fread("data6281.csv",header=TRUE,
na.strings=c("C",".","B","Z",""))
I get the warning message that that column was changed to be character even
though it is supposed to be integer.
With the read.csv I have no problem when I use the command
data6281 <- data.table(read.csv("data6281.csv",header=TRUE,
colClasses=c("integer","integer","integer","integer","integer","factor","character","factor","numeric","numeric","integer"),
na.strings=c("C",".","B","Z","")))
but fread does not allow me to set the column classes since it doesn't
accept the argument colClasses.
A shame really. fread is much faster, and I love that it shows the %
progress.
I don't supposed there is a way around this, but if there is I would be
glad to know.
I would also be happy to provide an example if that's necessary.
Cheers,
Vivianne Siqueira Campos Vilar
----------------------------------------------
“Don't worry about the world coming to an end today. It is already tomorrow
in Australia.”
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130425/a52a01ef/attachment.html>
More information about the datatable-help
mailing list