[datatable-help] fread() coercing to character when seeing NA
Matthew Dowle
mdowle at mdowle.plus.com
Mon Sep 30 20:58:10 CEST 2013
Yes, exactly. On the bug list is #2660 " Improve fread na.strings
handling" :
https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2660&group_id=240&atid=975
which points to :
http://stackoverflow.com/questions/15784138/bad-interpretation-of-n-a-using-fread
Matthew
On 30/09/13 15:06, Julien Barnier wrote:
> Hi,
>
>> dt3 <- fread( "a\n2\n4\n?\n5", na.strings=c("?"), colClasses=c(a="integer"))
> I think that running fread with the verbose flag allows to answer your
> question :
>
> R> dt3 <- fread( "a\n2\n4\n?\n5", na.strings=c("?"),colClasses=c(a="integer"),
> verbose=TRUE)
> ... <snip> ...
> Column 1 ('a') has been detected as type 'character'. Ignoring request from
> colClasses to read as 'integer' (a lower type) since NAs would result.
> 0.000s ( 0%) Memory map (rerun may be quicker)
> 0.000s ( 0%) sep and header detection
> 0.000s ( 0%) Count rows (wc -l)
> 0.000s ( 0%) Column type detection (first, middle and last 5 rows)
> 0.000s ( 0%) Allocation of 4x1 result (xMB) in RAM
> 0.000s ( 0%) Reading data
> 0.000s ( 0%) Allocation for type bumps (if any), including gc time if
> triggered
> 0.000s ( 0%) Coercing data already read in type bumps (if any)
> 0.000s ( 0%) Changing na.strings to NA
> 0.000s Total
>
> As your «a» column contains a character string "?", fread dtermines this
> column as character. And colClasses is ignored as that would result in
> possibly unwanted NA value. And all of this, as I understand it, is because
> the replacement of na.strings by NA happens as the last step of fread, after
> the column type has been set.
>
> So it seems that the only workarounds are either to change your data to
> replace your missing value code by a numerical value (like -9999 or anything
> else), or to convert your column back to numeric after using fread.
>
> Regards,
>
> Julien
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130930/5b15f51e/attachment.html>
More information about the datatable-help
mailing list