[datatable-help] fread: coercion of class from integer to character due to NA string.

Vivianne Vilar viviannevilar at gmail.com
Thu Apr 25 02:38:28 CEST 2013


Hi there,

I think this is probably a known issue, but just in case, here it is.

I am trying to use fread to read a very large csv file, but I am having
problems due to the fact that NAs in a numeric column are represented with
some letters. For example, in my column of SIC codes I have "Z" to
represent NAs. Even though I explicitly set those to be NAs in the command:

data6281 <- fread("data6281.csv",header=TRUE,
na.strings=c("C",".","B","Z",""))

I get the warning message that that column was changed to be character even
though it is supposed to be integer.

With the read.csv I have no problem when I use the command

data6281 <- data.table(read.csv("data6281.csv",header=TRUE,
colClasses=c("integer","integer","integer","integer","integer","factor","character","factor","numeric","numeric","integer"),
na.strings=c("C",".","B","Z","")))

but fread does not allow me to set the column classes since it doesn't
accept the argument colClasses.

A shame really. fread is much faster, and I love that it shows the %
progress.

I don't supposed there is a way around this, but if there is I would be
glad to know.

I would also be happy to provide an example if that's necessary.

Cheers,

Vivianne Siqueira Campos Vilar
----------------------------------------------
“Don't worry about the world coming to an end today. It is already tomorrow
in Australia.”
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130425/a52a01ef/attachment.html>


More information about the datatable-help mailing list