<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN">
<html><body>
<p> </p>
<p>Excellent, thanks for confirming. Thinking about it now, with fresh eyes, new feature request raised :</p>
<p> FR#2456<span class="Apple-tab-span"> </span> rbindlist should choose the highest type per column, not the first</p>
<p> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2456&group_id=240&atid=978</p>
<p>where 'highest' means in this hierarchy: <span>LGLSXP < INTSXP < REALSXP < CPLXSXP < STRSXP</span></p>
<p>That would be easy and wouldn't hurt performance at all.</p>
<p> </p>
<p>On 04.01.2013 23:18, patricknic wrote:</p>
<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%"><!-- html ignored --><!-- head ignored --><!-- meta ignored -->
<div>Some output:</div>
<div>
<div>## NAs in bound data</div>
<div>> dt
<div>Warning messages:</div>
<div>1: In rbindlist(dtlist) : NAs introduced by coercion</div>
<div>2: In rbindlist(dtlist) : NAs introduced by coercion</div>
<div>3: In rbindlist(dtlist) : NAs introduced by coercion</div>
<div>4: In rbindlist(dtlist) : NAs introduced by coercion</div>
<div>5: In rbindlist(dtlist) : NAs introduced by coercion</div>
<div>6: In rbindlist(dtlist) : NAs introduced by coercion</div>
<div>## No NAs in list of data.tables</div>
<div>> sapply(dtlist, function(x) sum(<a href="http://is.na">is.na</a>(x)))</div>
<div> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0</div>
<div>[32] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0</div>
<div>## Summary of bound data.table</div>
<div>> summary(dt)</div>
<div> blockfips land_area water_area </div>
<div> Length:11083767 Min. :0.000e+00 Min. :0.000e+00 </div>
<div> Class :character 1st Qu.:8.098e+03 1st Qu.:0.000e+00 </div>
<div> Mode :character Median :2.478e+04 Median :0.000e+00 </div>
<div> Mean :7.470e+05 Mean :5.782e+04 </div>
<div> 3rd Qu.:1.788e+05 3rd Qu.:0.000e+00 </div>
<div> Max. :2.133e+09 Max. :2.112e+09 </div>
<div> NA's :183 NA's :14 </div>
<div> long lat </div>
<div> Min. :-179.13 Min. :18.91 </div>
<div> 1st Qu.: -99.74 1st Qu.:34.18 </div>
<div> Median : -90.09 Median :38.64 </div>
<div> Mean : -93.01 Mean :38.11 </div>
<div> 3rd Qu.: -82.07 3rd Qu.:41.73 </div>
<div> Max. : 179.75 Max. :71.40 </div>
</div>
<div> </div>
<blockquote class="gmail_quote" style="border-left: 2px solid #CCCCCC; padding: 0 1em;">Many thanks. I'll take a look. If you can find a way to narrow <br /> down the problem then it might be quicker to resolve. Does it <br />happen with the first 2 items passed to rblindlist, the first <br />10, which one causes the NA? If each item is chopped to the <br />first 2 rows, does it still happen? </blockquote>
<div>
<div>> lapply(seq_along(dtlist), function(x) dtlist[[x]][, tab := x])</div>
<div>> dt2
<div>Warning messages:</div>
<div>1: In rbindlist(dtlist) : NAs introduced by coercion</div>
<div>2: In rbindlist(dtlist) : NAs introduced by coercion</div>
<div>3: In rbindlist(dtlist) : NAs introduced by coercion</div>
<div>4: In rbindlist(dtlist) : NAs introduced by coercion</div>
<div>5: In rbindlist(dtlist) : NAs introduced by coercion</div>
<div>6: In rbindlist(dtlist) : NAs introduced by coercion</div>
<div>> dt2[which(apply(<a href="http://is.na">is.na</a>(dt2), 1, any)), table(tab)]</div>
<div>tab</div>
<div> 2 13 23 45 50 </div>
<div>183 1 10 1 2 </div>
</div>
<div>So, for the most part it's coming from the second list data.table.</div>
<div>> dtlist.first2
<div>
<div>> dtlist.first10
<div>> dtlist.first100
<div>> dtlist.first1000
<div>> dt.first2
<div>> dt.first10
<div>> dt.first100
<div>Warning message:</div>
<div>In rbindlist(dtlist.first100) : NAs introduced by coercion</div>
<div>> dt.first1000
<div>Warning messages:</div>
<div>1: In rbindlist(dtlist.first1000) : NAs introduced by coercion</div>
<div>2: In rbindlist(dtlist.first1000) : NAs introduced by coercion</div>
</div>
<div>And NAs start getting introduced somewhere between 10 and 100 row data.tables, which seems really low.</div>
<div> </div>
<blockquote class="gmail_quote" style="border-left: 2px solid #CCCCCC; padding: 0 1em;">Also if the list of data.table/data.frame passed to rbindlist <br /> is called L, and rbindlist(L) returns an NA column, does <br />lapply(L, sapply, class) reveal any type differences? </blockquote>
<div>
<div>> do.call("rbind", lapply(dtlist, sapply, class))</div>
<div> blockfips land_area water_area long lat </div>
<div> [1,] "character" "integer" "integer" "numeric" "numeric"</div>
<div> [2,] "character" "numeric" "numeric" "numeric" "numeric"</div>
<div> [3,] "character" "integer" "integer" "numeric" "numeric"</div>
<div> [4,] "character" "integer" "integer" "numeric" "numeric"</div>
<div> [5,] "character" "integer" "integer" "numeric" "numeric"</div>
<div> [6,] "character" "integer" "integer" "numeric" "numeric"</div>
<div> [7,] "character" "integer" "integer" "numeric" "numeric"</div>
<div> [8,] "character" "integer" "integer" "numeric" "numeric"</div>
<div> [9,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[10,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[11,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[12,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[13,] "character" "numeric" "integer" "numeric" "numeric"</div>
<div>[14,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[15,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[16,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[17,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[18,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[19,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[20,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[21,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[22,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[23,] "character" "integer" "numeric" "numeric" "numeric"</div>
<div>[24,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[25,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[26,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[27,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[28,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[29,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[30,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[31,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[32,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[33,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[34,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[35,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[36,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[37,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[38,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[39,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[40,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[41,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[42,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[43,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[44,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[45,] "character" "numeric" "integer" "numeric" "numeric"</div>
<div>[46,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[47,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[48,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[49,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[50,] "character" "integer" "numeric" "numeric" "numeric"</div>
<div>[51,] "character" "integer" "integer" "numeric" "numeric"</div>
</div>
<div>And there's the problem: in the problem list data.tables column 2 or 3 is numeric instead of integer.</div>
<div> </div>
<blockquote class="gmail_quote" style="border-left: 2px solid #CCCCCC; padding: 0 1em;">It does sound like rblindlist should be issuing a warning or <br /> being more helpful at least, anyway. <br />Hm. It seems I put it in but commented it out : <br />if (TYPEOF(thiscol) != TYPEOF(target)) { <br /> thiscol = PROTECT(coerceVector(thiscol, TYPEOF(target))); <br /> coerced = TRUE; <br /> // TO DO: options(datatable.pedantic=TRUE) to issue this warning : <br /> // warning("Column %d of item %d is type '%s', inconsistent with <br />column %d of item %d's type <br />('%s')",j+1,i+1,type2char(TYPEOF(thiscol)),j+1,first+1,type2char(TYPEOF(target))); <br /> } <br />Likely that coerce is creating the NA. Types are taken from the first <br />item of L. If a column there is 'numeric' then in a later item L it's <br />character, that'll give rise to an NA. <br />Thinking about it, it can probably coerce the target to cope with the <br /> later item ... </blockquote>
<div>> dtlist
<div>> dt
<div>
<div>> dt[, lapply(.SD, function(x) sum(<a href="http://is.na">is.na</a>(x))), .SDcols=c("land_area", "water_area")]</div>
<div> land_area water_area</div>
<div>1: 0 0</div>
</div>
<div>And it's fixed. </div>
<div></div>
<div>Thanks,</div>
<div>Patrick</div>
<br /><br />
<div class="gmail_quote">On Fri, Jan 4, 2013 at 4:52 AM, Matthew Dowle [via R] <span><<a>[hidden email]</a>></span> wrote:<br />
<blockquote class="gmail_quote" style="border-left: 2px solid #CCCCCC; padding: 0 1em;">
<div class="HOEnZb">
<div class="h5"><br />Many thanks. I'll take a look. If you can find a way to narrow <br />down the problem then it might be quicker to resolve. Does it <br />happen with the first 2 items passed to rblindlist, the first <br />10, which one causes the NA? If each item is chopped to the <br />first 2 rows, does it still happen? <br /><br />Also if the list of data.table/data.frame passed to rbindlist <br />is called L, and rbindlist(L) returns an NA column, does <br />lapply(L, sapply, class) reveal any type differences? <br /><br />It does sound like rblindlist should be issuing a warning or <br />being more helpful at least, anyway. <br /><br />Hm. It seems I put it in but commented it out : <br /><br />if (TYPEOF(thiscol) != TYPEOF(target)) { <br /> thiscol = PROTECT(coerceVector(thiscol, TYPEOF(target))); <br /> coerced = TRUE; <br /> // TO DO: options(datatable.pedantic=TRUE) to issue this warning : <br /> // warning("Column %d of item %d is type '%s', inconsistent with <br />column %d of item %d's type <br />('%s')",j+1,i+1,type2char(TYPEOF(thiscol)),j+1,first+1,type2char(TYPEOF(target))); <br />} <br /><br />Likely that coerce is creating the NA. Types are taken from the first <br />item of L. If a column there is 'numeric' then in a later item L it's <br />character, that'll give rise to an NA. <br /><br />Thinking about it, it can probably coerce the target to cope with the <br />later item ... <br /><br /><br />On 03.01.2013 20:30, patricknic wrote:</div>
</div>
<div>
<div>
<div class="h5">
<div class="shrinkable-quote"><br />> Apologies, I forgot to switch the directories in the code. Corrected <br />> on <br />> nabble and below. <br />> <br />> <br />> <br />> <br />> # Directories <br />> tempwd > setwd(tempwd) <br />> <br />> # Packages <br />> library(dataframe) <br />> library(data.table) <br />> library(foreign) <br />> <br />> # Get blocks and coordinates <br />> state.fips > 15:42, <br />> 44:51, 53:56)) <br />> tmpf > dtlist > cat("State", fips, ":\t") <br />> nm > dbfname > if (!file.exists(file.path(tempwd, dbfname))) { <br />> cat("Downloading...\t") <br />> url > paste0("<a href="http://www2.census.gov/geo/tiger/TIGER2011/TABBLOCK/">http://www2.census.gov/geo/tiger/TIGER2011/TABBLOCK/</a>", <br />> nm, ".zip") <br />> download.file(url, destfile=tmp, quiet=FALSE) <br />> unzip(tmp, exdir=tempwd) <br />> } <br />> del > invisible(lapply(del[grep("dbf", del, invert=TRUE)], file.remove)) <br />> cat("Reading...\t") <br />> df as.is=TRUE) <br />> dt > cat("Done\n") <br />> dt[, list(blockfips = GEOID, land_area = ALAND, water_area = <br />> AWATER, long <br />> = as.numeric(INTPTLON), <br />> lat = as.numeric(INTPTLAT))] <br />> }) <br />> b > <br />> ### No NA problem: <br />> dtlist2 > b2 > <br />> <br />> <br />> -- <br />> View this message in context: <br />> <br />> <a href="http://r.789695.n4.nabble.com/NAs-introduced-by-coercion-in-rbindlist-tp4654576p4654577.html">http://r.789695.n4.nabble.com/NAs-introduced-by-coercion-in-rbindlist-tp4654576p4654577.html</a></div>
> Sent from the datatable-help mailing list archive at Nabble.com. <br />> _______________________________________________ <br />> datatable-help mailing list </div>
</div>
> <a href="http://user/SendEmail.jtp?type=node&node=4654623&i=0">[hidden email]</a> <br />> <br />> <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a></div>
_______________________________________________ <br />datatable-help mailing list <br /><a href="http://user/SendEmail.jtp?type=node&node=4654623&i=1">[hidden email]</a> <br /><a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br /><br /><br /><hr size="1" />
<div style="color: #444; font: 12px tahoma,geneva,helvetica,arial,sans-serif;">
<div style="font-weight: bold;">If you reply to this email, your message will be added to the discussion below:</div>
<a href="http://r.789695.n4.nabble.com/NAs-introduced-by-coercion-in-rbindlist-tp4654576p4654623.html">http://r.789695.n4.nabble.com/NAs-introduced-by-coercion-in-rbindlist-tp4654576p4654623.html</a></div>
<div style="color: #666; font: 11px tahoma,geneva,helvetica,arial,sans-serif; margin-top: .4em; line-height: 1.5em;">To unsubscribe from NAs introduced by coercion in rbindlist(), <a>click here</a>.<br /><a style="font: 9px serif;" href="http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml">NAML</a></div>
</blockquote>
</div>
<br /><br /><hr align="left" width="300" />View this message in context: <a href="http://r.789695.n4.nabble.com/NAs-introduced-by-coercion-in-rbindlist-tp4654576p4654696.html">Re: NAs introduced by coercion in rbindlist()</a><br /> Sent from the <a href="http://r.789695.n4.nabble.com/datatable-help-f2315188.html">datatable-help mailing list archive</a> at Nabble.com.</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<p> </p>
<div> </div>
</body></html>