<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Yes, exactly. On the bug list is #2660
"
<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">
Improve fread na.strings handling" :<br>
<br>
<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">
<a
href="https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2660&group_id=240&atid=975">https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2660&group_id=240&atid=975</a><br>
<br>
which points to :<br>
<br>
<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">
<a
href="http://stackoverflow.com/questions/15784138/bad-interpretation-of-n-a-using-fread">http://stackoverflow.com/questions/15784138/bad-interpretation-of-n-a-using-fread</a><br>
<br>
Matthew<br>
<br>
On 30/09/13 15:06, Julien Barnier wrote:<br>
</div>
<blockquote cite="mid:5223628.upPkjNS379@l018198" type="cite">
<pre wrap="">Hi,
</pre>
<blockquote type="cite">
<pre wrap="">dt3 <- fread( "a\n2\n4\n?\n5", na.strings=c("?"), colClasses=c(a="integer"))
</pre>
</blockquote>
<pre wrap="">
I think that running fread with the verbose flag allows to answer your
question :
R> dt3 <- fread( "a\n2\n4\n?\n5", na.strings=c("?"),colClasses=c(a="integer"),
verbose=TRUE)
... <snip> ...
Column 1 ('a') has been detected as type 'character'. Ignoring request from
colClasses to read as 'integer' (a lower type) since NAs would result.
0.000s ( 0%) Memory map (rerun may be quicker)
0.000s ( 0%) sep and header detection
0.000s ( 0%) Count rows (wc -l)
0.000s ( 0%) Column type detection (first, middle and last 5 rows)
0.000s ( 0%) Allocation of 4x1 result (xMB) in RAM
0.000s ( 0%) Reading data
0.000s ( 0%) Allocation for type bumps (if any), including gc time if
triggered
0.000s ( 0%) Coercing data already read in type bumps (if any)
0.000s ( 0%) Changing na.strings to NA
0.000s Total
As your «a» column contains a character string "?", fread dtermines this
column as character. And colClasses is ignored as that would result in
possibly unwanted NA value. And all of this, as I understand it, is because
the replacement of na.strings by NA happens as the last step of fread, after
the column type has been set.
So it seems that the only workarounds are either to change your data to
replace your missing value code by a numerical value (like -9999 or anything
else), or to convert your column back to numeric after using fread.
Regards,
Julien
</pre>
</blockquote>
<br>
</body>
</html>