<div>Some output:</div><div><br></div><div><div>## NAs in bound data</div><div>> dt <- rbindlist(dtlist)</div><div>Warning messages:</div><div>1: In rbindlist(dtlist) : NAs introduced by coercion</div><div>2: In rbindlist(dtlist) : NAs introduced by coercion</div>
<div>3: In rbindlist(dtlist) : NAs introduced by coercion</div><div>4: In rbindlist(dtlist) : NAs introduced by coercion</div><div>5: In rbindlist(dtlist) : NAs introduced by coercion</div><div>6: In rbindlist(dtlist) : NAs introduced by coercion</div>
<div><br></div><div>## No NAs in list of data.tables</div><div>> sapply(dtlist, function(x) sum(<a href="http://is.na" target="_top" rel="nofollow" link="external">is.na</a>(x)))</div><div> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0</div><div>
[32] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0</div><div><br></div><div>## Summary of bound data.table</div><div>> summary(dt)</div><div> blockfips land_area water_area </div><div> Length:11083767 Min. :0.000e+00 Min. :0.000e+00 </div>
<div> Class :character 1st Qu.:8.098e+03 1st Qu.:0.000e+00 </div><div> Mode :character Median :2.478e+04 Median :0.000e+00 </div><div> Mean :7.470e+05 Mean :5.782e+04 </div><div> 3rd Qu.:1.788e+05 3rd Qu.:0.000e+00 </div>
<div> Max. :2.133e+09 Max. :2.112e+09 </div><div> NA's :183 NA's :14 </div><div> long lat </div><div> Min. :-179.13 Min. :18.91 </div>
<div> 1st Qu.: -99.74 1st Qu.:34.18 </div><div> Median : -90.09 Median :38.64 </div><div> Mean : -93.01 Mean :38.11 </div><div> 3rd Qu.: -82.07 3rd Qu.:41.73 </div><div> Max. : 179.75 Max. :71.40 </div>
</div><div><br></div><div> </div><blockquote style='border-left:2px solid #CCCCCC;padding:0 1em' class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Many thanks. I'll take a look. If you can find a way to narrow <br>
down the problem then it might be quicker to resolve. Does it <br>happen with the first 2 items passed to rblindlist, the first <br>10, which one causes the NA? If each item is chopped to the <br>first 2 rows, does it still happen? <br>
</blockquote><div><br></div><div><div>> lapply(seq_along(dtlist), function(x) dtlist[[x]][, tab := x])</div><div>> dt2 <- rbindlist(dtlist)</div><div>Warning messages:</div><div>1: In rbindlist(dtlist) : NAs introduced by coercion</div>
<div>2: In rbindlist(dtlist) : NAs introduced by coercion</div><div>3: In rbindlist(dtlist) : NAs introduced by coercion</div><div>4: In rbindlist(dtlist) : NAs introduced by coercion</div><div>5: In rbindlist(dtlist) : NAs introduced by coercion</div>
<div>6: In rbindlist(dtlist) : NAs introduced by coercion</div><div>> dt2[which(apply(<a href="http://is.na" target="_top" rel="nofollow" link="external">is.na</a>(dt2), 1, any)), table(tab)]</div><div>tab</div><div> 2 13 23 45 50 </div><div>183 1 10 1 2 </div>
</div><div><br></div><div>So, for the most part it's coming from the second list data.table.</div><div><br></div><div>> dtlist.first2 <- lapply(dtlist, function(x) x[1:2])</div><div><div>> dtlist.first10 <- lapply(dtlist, function(x) x[1:10])</div>
<div>> dtlist.first100 <- lapply(dtlist, function(x) x[1:100])</div><div>> dtlist.first1000 <- lapply(dtlist, function(x) x[1:1000])</div><div>> dt.first2 <- rbindlist(dtlist.first2)</div><div>> dt.first10 <- rbindlist(dtlist.first10)</div>
<div>> dt.first100 <- rbindlist(dtlist.first100)</div><div>Warning message:</div><div>In rbindlist(dtlist.first100) : NAs introduced by coercion</div><div>> dt.first1000 <- rbindlist(dtlist.first1000)</div><div>
Warning messages:</div><div>1: In rbindlist(dtlist.first1000) : NAs introduced by coercion</div><div>2: In rbindlist(dtlist.first1000) : NAs introduced by coercion</div></div><div><br></div><div>And NAs start getting introduced somewhere between 10 and 100 row data.tables, which seems really low.</div>
<div><br></div><div> </div><blockquote style='border-left:2px solid #CCCCCC;padding:0 1em' class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Also if the list of data.table/data.frame passed to rbindlist <br>
is called L, and rbindlist(L) returns an NA column, does <br>lapply(L, sapply, class) reveal any type differences? <br></blockquote><div><br></div><div><div>> do.call("rbind", lapply(dtlist, sapply, class))</div>
<div> blockfips land_area water_area long lat </div><div> [1,] "character" "integer" "integer" "numeric" "numeric"</div><div> [2,] "character" "numeric" "numeric" "numeric" "numeric"</div>
<div> [3,] "character" "integer" "integer" "numeric" "numeric"</div><div> [4,] "character" "integer" "integer" "numeric" "numeric"</div>
<div> [5,] "character" "integer" "integer" "numeric" "numeric"</div><div> [6,] "character" "integer" "integer" "numeric" "numeric"</div>
<div> [7,] "character" "integer" "integer" "numeric" "numeric"</div><div> [8,] "character" "integer" "integer" "numeric" "numeric"</div>
<div> [9,] "character" "integer" "integer" "numeric" "numeric"</div><div>[10,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[11,] "character" "integer" "integer" "numeric" "numeric"</div><div>[12,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[13,] "character" "numeric" "integer" "numeric" "numeric"</div><div>[14,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[15,] "character" "integer" "integer" "numeric" "numeric"</div><div>[16,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[17,] "character" "integer" "integer" "numeric" "numeric"</div><div>[18,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[19,] "character" "integer" "integer" "numeric" "numeric"</div><div>[20,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[21,] "character" "integer" "integer" "numeric" "numeric"</div><div>[22,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[23,] "character" "integer" "numeric" "numeric" "numeric"</div><div>[24,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[25,] "character" "integer" "integer" "numeric" "numeric"</div><div>[26,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[27,] "character" "integer" "integer" "numeric" "numeric"</div><div>[28,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[29,] "character" "integer" "integer" "numeric" "numeric"</div><div>[30,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[31,] "character" "integer" "integer" "numeric" "numeric"</div><div>[32,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[33,] "character" "integer" "integer" "numeric" "numeric"</div><div>[34,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[35,] "character" "integer" "integer" "numeric" "numeric"</div><div>[36,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[37,] "character" "integer" "integer" "numeric" "numeric"</div><div>[38,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[39,] "character" "integer" "integer" "numeric" "numeric"</div><div>[40,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[41,] "character" "integer" "integer" "numeric" "numeric"</div><div>[42,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[43,] "character" "integer" "integer" "numeric" "numeric"</div><div>[44,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[45,] "character" "numeric" "integer" "numeric" "numeric"</div><div>[46,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[47,] "character" "integer" "integer" "numeric" "numeric"</div><div>[48,] "character" "integer" "integer" "numeric" "numeric"</div>
<div>[49,] "character" "integer" "integer" "numeric" "numeric"</div><div>[50,] "character" "integer" "numeric" "numeric" "numeric"</div>
<div>[51,] "character" "integer" "integer" "numeric" "numeric"</div></div><div><br></div><div>And there's the problem: in the problem list data.tables column 2 or 3 is numeric instead of integer.</div>
<div><br></div><div> </div><blockquote style='border-left:2px solid #CCCCCC;padding:0 1em' class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">It does sound like rblindlist should be issuing a warning or <br>
being more helpful at least, anyway. <br>Hm. It seems I put it in but commented it out : <br>if (TYPEOF(thiscol) != TYPEOF(target)) { <br> thiscol = PROTECT(coerceVector(thiscol, TYPEOF(target))); <br> coerced = TRUE; <br>
// TO DO: options(datatable.pedantic=TRUE) to issue this warning : <br> // warning("Column %d of item %d is type '%s', inconsistent with <br>column %d of item %d's type <br>('%s')",j+1,i+1,type2char(TYPEOF(thiscol)),j+1,first+1,type2char(TYPEOF(target))); <br>
} <br>Likely that coerce is creating the NA. Types are taken from the first <br>item of L. If a column there is 'numeric' then in a later item L it's <br>character, that'll give rise to an NA. <br>Thinking about it, it can probably coerce the target to cope with the <br>
later item ... </blockquote><div><br></div><div>> dtlist <- lapply(dtlist, function(x) x[, land_area := as.numeric(land_area)][, water_area := as.numeric(water_area)])</div><div>> dt <- rbindlist(dtlist)</div>
<div><div>> dt[, lapply(.SD, function(x) sum(<a href="http://is.na" target="_top" rel="nofollow" link="external">is.na</a>(x))), .SDcols=c("land_area", "water_area")]</div><div> land_area water_area</div><div>1: 0 0</div></div>
<div><br></div><div>And it's fixed. </div><div><div><br></div><div><br></div><div><br></div></div><div>Thanks,</div><div>Patrick</div>
<br><br><div class="gmail_quote">On Fri, Jan 4, 2013 at 4:52 AM, Matthew Dowle [via R] <span dir="ltr"><<a href="/user/SendEmail.jtp?type=node&node=4654696&i=0" target="_top" rel="nofollow" link="external">[hidden email]</a>></span> wrote:<br>
<blockquote style='border-left:2px solid #CCCCCC;padding:0 1em' class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">
<br>Many thanks. I'll take a look. If you can find a way to narrow
<br>down the problem then it might be quicker to resolve. Does it
<br>happen with the first 2 items passed to rblindlist, the first
<br>10, which one causes the NA? If each item is chopped to the
<br>first 2 rows, does it still happen?
<br><br>Also if the list of data.table/data.frame passed to rbindlist
<br>is called L, and rbindlist(L) returns an NA column, does
<br>lapply(L, sapply, class) reveal any type differences?
<br><br>It does sound like rblindlist should be issuing a warning or
<br>being more helpful at least, anyway.
<br><br>Hm. It seems I put it in but commented it out :
<br><br>if (TYPEOF(thiscol) != TYPEOF(target)) {
<br> thiscol = PROTECT(coerceVector(thiscol, TYPEOF(target)));
<br> coerced = TRUE;
<br> // TO DO: options(datatable.pedantic=TRUE) to issue this warning :
<br> // warning("Column %d of item %d is type '%s', inconsistent with
<br>column %d of item %d's type
<br>('%s')",j+1,i+1,type2char(TYPEOF(thiscol)),j+1,first+1,type2char(TYPEOF(target)));
<br>}
<br><br>Likely that coerce is creating the NA. Types are taken from the first
<br>item of L. If a column there is 'numeric' then in a later item L it's
<br>character, that'll give rise to an NA.
<br><br>Thinking about it, it can probably coerce the target to cope with the
<br>later item ...
<br><br><br>On 03.01.2013 20:30, patricknic wrote:
</div></div><div><div><div class="h5"><div class='shrinkable-quote'><br>> Apologies, I forgot to switch the directories in the code. Corrected
<br>> on
<br>> nabble and below.
<br>>
<br>>
<br>>
<br>>
<br>> # Directories
<br>> tempwd <- tempdir()
<br>> setwd(tempwd)
<br>>
<br>> # Packages
<br>> library(dataframe)
<br>> library(data.table)
<br>> library(foreign)
<br>>
<br>> # Get blocks and coordinates
<br>> state.fips <- as.character(c(paste0(0, c(1:2, 4:6, 8:9)), 10:13,
<br>> 15:42,
<br>> 44:51, 53:56))
<br>> tmpf <- tempfile(fileext=".zip")
<br>> dtlist <- lapply(state.fips, function(fips) {
<br>> cat("State", fips, ":\t")
<br>> nm <- paste0("tl_2011_", fips, "_tabblock")
<br>> dbfname <- paste0(nm, ".dbf")
<br>> if (!file.exists(file.path(tempwd, dbfname))) {
<br>> cat("Downloading...\t")
<br>> url <-
<br>> paste0("<a href="http://www2.census.gov/geo/tiger/TIGER2011/TABBLOCK/" rel="nofollow" link="external" target="_blank">http://www2.census.gov/geo/tiger/TIGER2011/TABBLOCK/</a>",
<br>> nm, ".zip")
<br>> download.file(url, destfile=tmp, quiet=FALSE)
<br>> unzip(tmp, exdir=tempwd)
<br>> }
<br>> del <- dir(tempwd, pattern=nm)
<br>> invisible(lapply(del[grep("dbf", del, invert=TRUE)], file.remove))
<br>> cat("Reading...\t")
<br>> df <- read.dbf(dbfname, <a href="http://as.is" target="_blank" rel="nofollow" link="external">as.is</a>=TRUE)
<br>> dt <- as.data.table(df)
<br>> cat("Done\n")
<br>> dt[, list(blockfips = GEOID, land_area = ALAND, water_area =
<br>> AWATER, long
<br>> = as.numeric(INTPTLON),
<br>> lat = as.numeric(INTPTLAT))]
<br>> })
<br>> b <- rbindlist(dtlist)
<br>>
<br>> ### No NA problem:
<br>> dtlist2 <- lapply(dtlist, as.data.frame)
<br>> b2 <- do.call("rbind", dtlist2)
<br>>
<br>>
<br>>
<br>> --
<br>> View this message in context:
<br>>
<br>> <a href="http://r.789695.n4.nabble.com/NAs-introduced-by-coercion-in-rbindlist-tp4654576p4654577.html" rel="nofollow" link="external" target="_blank">http://r.789695.n4.nabble.com/NAs-introduced-by-coercion-in-rbindlist-tp4654576p4654577.html</a></div>
> Sent from the datatable-help mailing list archive at Nabble.com.
<br>> _______________________________________________
<br>> datatable-help mailing list
<br></div></div>> <a href="http://user/SendEmail.jtp?type=node&node=4654623&i=0" rel="nofollow" link="external" target="_blank">[hidden email]</a>
<br>>
<br>> <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" rel="nofollow" link="external" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a></div>
_______________________________________________
<br>datatable-help mailing list
<br><a href="http://user/SendEmail.jtp?type=node&node=4654623&i=1" rel="nofollow" link="external" target="_blank">[hidden email]</a>
<br><a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" rel="nofollow" link="external" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
<br>
<br>
<hr noshade size="1" color="#cccccc">
<div style="color:#444;font:12px tahoma,geneva,helvetica,arial,sans-serif">
<div style="font-weight:bold">If you reply to this email, your message will be added to the discussion below:</div>
<a href="http://r.789695.n4.nabble.com/NAs-introduced-by-coercion-in-rbindlist-tp4654576p4654623.html" target="_blank" rel="nofollow" link="external">http://r.789695.n4.nabble.com/NAs-introduced-by-coercion-in-rbindlist-tp4654576p4654623.html</a>
</div>
<div style="color:#666;font:11px tahoma,geneva,helvetica,arial,sans-serif;margin-top:.4em;line-height:1.5em">
To unsubscribe from NAs introduced by coercion in rbindlist(), <a href="" target="_blank" rel="nofollow" link="external">click here</a>.<br>
<a href="http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml" rel="nofollow" style="font:9px serif" target="_blank" link="external">NAML</a>
</div></blockquote></div><br>
<br/><hr align="left" width="300" />
View this message in context: <a href="http://r.789695.n4.nabble.com/NAs-introduced-by-coercion-in-rbindlist-tp4654576p4654696.html">Re: NAs introduced by coercion in rbindlist()</a><br/>
Sent from the <a href="http://r.789695.n4.nabble.com/datatable-help-f2315188.html">datatable-help mailing list archive</a> at Nabble.com.<br/>