[datatable-help] fread() coercing to character when seeing NA
Julien Barnier
julien.barnier at ens-lyon.fr
Mon Sep 30 16:06:31 CEST 2013
Hi,
> dt3 <- fread( "a\n2\n4\n?\n5", na.strings=c("?"), colClasses=c(a="integer"))
I think that running fread with the verbose flag allows to answer your
question :
R> dt3 <- fread( "a\n2\n4\n?\n5", na.strings=c("?"),colClasses=c(a="integer"),
verbose=TRUE)
... <snip> ...
Column 1 ('a') has been detected as type 'character'. Ignoring request from
colClasses to read as 'integer' (a lower type) since NAs would result.
0.000s ( 0%) Memory map (rerun may be quicker)
0.000s ( 0%) sep and header detection
0.000s ( 0%) Count rows (wc -l)
0.000s ( 0%) Column type detection (first, middle and last 5 rows)
0.000s ( 0%) Allocation of 4x1 result (xMB) in RAM
0.000s ( 0%) Reading data
0.000s ( 0%) Allocation for type bumps (if any), including gc time if
triggered
0.000s ( 0%) Coercing data already read in type bumps (if any)
0.000s ( 0%) Changing na.strings to NA
0.000s Total
As your «a» column contains a character string "?", fread dtermines this
column as character. And colClasses is ignored as that would result in
possibly unwanted NA value. And all of this, as I understand it, is because
the replacement of na.strings by NA happens as the last step of fread, after
the column type has been set.
So it seems that the only workarounds are either to change your data to
replace your missing value code by a numerical value (like -9999 or anything
else), or to convert your column back to numeric after using fread.
Regards,
Julien
--
Julien Barnier
Centre Max Weber
ENS de Lyon
More information about the datatable-help
mailing list