[datatable-help] NAs introduced by coercion in rbindlist()

Matthew Dowle mdowle at mdowle.plus.com
Fri Jan 4 10:51:53 CET 2013


Many thanks. I'll take a look. If you can find a way to narrow
down the problem then it might be quicker to resolve. Does it
happen with the first 2 items passed to rblindlist, the first
10, which one causes the NA? If each item is chopped to the
first 2 rows, does it still happen?

Also if the list of data.table/data.frame passed to rbindlist
is called L,  and rbindlist(L) returns an NA column,  does
lapply(L, sapply, class) reveal any type differences?

It does sound like rblindlist should be issuing a warning or
being more helpful at least, anyway.

Hm. It seems I put it in but commented it out :

if (TYPEOF(thiscol) != TYPEOF(target)) {
     thiscol = PROTECT(coerceVector(thiscol, TYPEOF(target)));
     coerced = TRUE;
     // TO DO: options(datatable.pedantic=TRUE) to issue this warning :
     // warning("Column %d of item %d is type '%s', inconsistent with 
column %d of item %d's type 
('%s')",j+1,i+1,type2char(TYPEOF(thiscol)),j+1,first+1,type2char(TYPEOF(target)));
}

Likely that coerce is creating the NA. Types are taken from the first 
item of L.  If a column there is 'numeric' then in a later item L it's 
character, that'll give rise to an NA.

Thinking about it, it can probably coerce the target to cope with the 
later item ...


On 03.01.2013 20:30, patricknic wrote:
> Apologies, I forgot to switch the directories in the code. Corrected 
> on
> nabble and below.
>
>
>
>
> # Directories
> tempwd <- tempdir()
> setwd(tempwd)
>
> # Packages
> library(dataframe)
> library(data.table)
> library(foreign)
>
> # Get blocks and coordinates
> state.fips <- as.character(c(paste0(0, c(1:2, 4:6, 8:9)), 10:13, 
> 15:42,
> 44:51, 53:56))
> tmpf <- tempfile(fileext=".zip")
> dtlist <- lapply(state.fips, function(fips) {
>   cat("State", fips, ":\t")
>   nm <- paste0("tl_2011_", fips, "_tabblock")
>   dbfname <- paste0(nm, ".dbf")
>   if (!file.exists(file.path(tempwd, dbfname))) {
>     cat("Downloading...\t")
>     url <- 
> paste0("http://www2.census.gov/geo/tiger/TIGER2011/TABBLOCK/",
> nm, ".zip")
>     download.file(url, destfile=tmp, quiet=FALSE)
>     unzip(tmp, exdir=tempwd)
>   }
>   del <- dir(tempwd, pattern=nm)
>   invisible(lapply(del[grep("dbf", del, invert=TRUE)], file.remove))
>   cat("Reading...\t")
>   df <- read.dbf(dbfname, as.is=TRUE)
>   dt <- as.data.table(df)
>   cat("Done\n")
>   dt[, list(blockfips = GEOID, land_area = ALAND, water_area = 
> AWATER, long
> = as.numeric(INTPTLON),
>             lat = as.numeric(INTPTLAT))]
> })
> b <- rbindlist(dtlist)
>
> ### No NA problem:
> dtlist2 <- lapply(dtlist, as.data.frame)
> b2 <- do.call("rbind", dtlist2)
>
>
>
> --
> View this message in context:
> 
> http://r.789695.n4.nabble.com/NAs-introduced-by-coercion-in-rbindlist-tp4654576p4654577.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> 
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


More information about the datatable-help mailing list