[datatable-help] NAs introduced by coercion in rbindlist()

patricknic patricknic at gmail.com
Thu Jan 3 21:25:51 CET 2013


Hello,

I ran into a problem with the rbindlist() function. I'm reading 51 tables
into R, totaling 11,083,767 rows. data.table has always been extremely
useful, and remains useful in this situation due to the large table sizes.

There are no NA values in the data, but I receive the "In rbindlist(dtlist)
: NAs introduced by coercion" warning when I try to bind the tables
together. For the most part, the NA values that arise are in consecutive
rows. The NA problem does not occur if I change the data.tables to
data.frames and use do.call("rbind", ...). I would prefer to use rbindlist
to avoid switching back and forth, and because the binding step is many
times faster.

My session info is posted below, as is the exact code I am running
(directories changed to protect the innocent). As a heads up, if you try to
run it, you are downloading a lot of data.

So, the real question, am I running into a bug, a compatibility issue, or
one of the random unreproducible errors that makes R so much fun?



# Directories
tempwd <- tempdir()
setwd(tempwd)

# Packages
library(dataframe)
library(data.table)
library(foreign)

# Get blocks and coordinates
state.fips <- as.character(c(paste0(0, c(1:2, 4:6, 8:9)), 10:13, 15:42,
44:51, 53:56))
tmpf <- tempfile(fileext=".zip")
dtlist <- lapply(state.fips, function(fips) {
  cat("State", fips, ":\t")
  nm <- paste0("tl_2011_", fips, "_tabblock")
  dbfname <- paste0(nm, ".dbf")
  if (!file.exists(file.path(child, dbfname))) {
    cat("Downloading...\t")
    url <- paste0("http://www2.census.gov/geo/tiger/TIGER2011/TABBLOCK/",
nm, ".zip")
    download.file(url, destfile=tmp, quiet=FALSE)
    unzip(tmp, exdir=child)
  }
  del <- dir(datawd, pattern=nm)
  invisible(lapply(del[grep("dbf", del, invert=TRUE)], file.remove))
  cat("Reading...\t")
  df <- read.dbf(dbfname, as.is=TRUE)
  dt <- as.data.table(df)
  cat("Done\n")
  dt[, list(blockfips = GEOID, land_area = ALAND, water_area = AWATER, long
= as.numeric(INTPTLON),
            lat = as.numeric(INTPTLAT))]
})
b <- rbindlist(dtlist)

### No NA problem:
dtlist2 <- lapply(dtlist, as.data.frame)
b2 <- do.call("rbind", dtlist2)






> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] foreign_0.8-49   Metadata_1.0     ncdf_1.6.6       raster_2.0-05   
 [5] rgdal_0.7-18     dummies_1.5.6    RCurl_1.91-1.1   bitops_1.0-4.1  
 [9] sp_0.9-99        reshape_0.8.4    plyr_1.8         data.table_1.8.6
[13] dataframe_2.5   

loaded via a namespace (and not attached):
[1] grid_2.15.0    lattice_0.20-6 R.oo_1.9.3     R.utils_1.12.1 tools_2.15.0  



--
View this message in context: http://r.789695.n4.nabble.com/NAs-introduced-by-coercion-in-rbindlist-tp4654576.html
Sent from the datatable-help mailing list archive at Nabble.com.


More information about the datatable-help mailing list