[datatable-help] fread() coercing to character when seeing NA

Julien Barnier julien.barnier at ens-lyon.fr
Mon Sep 30 16:06:31 CEST 2013


Hi,

> dt3 <- fread( "a\n2\n4\n?\n5", na.strings=c("?"), colClasses=c(a="integer"))

I think that running fread with the verbose flag allows to answer your 
question :

R> dt3 <- fread( "a\n2\n4\n?\n5", na.strings=c("?"),colClasses=c(a="integer"), 
verbose=TRUE)
... <snip> ...
Column 1 ('a') has been detected as type 'character'. Ignoring request from 
colClasses to read as 'integer' (a lower type) since NAs would result.
   0.000s (  0%) Memory map (rerun may be quicker)
   0.000s (  0%) sep and header detection
   0.000s (  0%) Count rows (wc -l)
   0.000s (  0%) Column type detection (first, middle and last 5 rows)
   0.000s (  0%) Allocation of 4x1 result (xMB) in RAM
   0.000s (  0%) Reading data
   0.000s (  0%) Allocation for type bumps (if any), including gc time if 
triggered
   0.000s (  0%) Coercing data already read in type bumps (if any)
   0.000s (  0%) Changing na.strings to NA
   0.000s        Total

As your «a» column contains a character string "?", fread dtermines this 
column as character. And colClasses is ignored as that would result in 
possibly unwanted NA value. And all of this, as I understand it, is because 
the replacement of na.strings by NA happens as the last step of fread, after 
the column type has been set.

So it seems that the only workarounds are either to change your data to 
replace your missing value code by a numerical value (like -9999 or anything 
else), or to convert your column back to numeric after using fread.

Regards,

Julien

-- 
Julien Barnier
Centre Max Weber
ENS de Lyon


More information about the datatable-help mailing list