[datatable-help] NAs introduced by coercion in rbindlist()
Matthew Dowle
mdowle at mdowle.plus.com
Fri Jan 4 10:51:53 CET 2013
Many thanks. I'll take a look. If you can find a way to narrow
down the problem then it might be quicker to resolve. Does it
happen with the first 2 items passed to rblindlist, the first
10, which one causes the NA? If each item is chopped to the
first 2 rows, does it still happen?
Also if the list of data.table/data.frame passed to rbindlist
is called L, and rbindlist(L) returns an NA column, does
lapply(L, sapply, class) reveal any type differences?
It does sound like rblindlist should be issuing a warning or
being more helpful at least, anyway.
Hm. It seems I put it in but commented it out :
if (TYPEOF(thiscol) != TYPEOF(target)) {
thiscol = PROTECT(coerceVector(thiscol, TYPEOF(target)));
coerced = TRUE;
// TO DO: options(datatable.pedantic=TRUE) to issue this warning :
// warning("Column %d of item %d is type '%s', inconsistent with
column %d of item %d's type
('%s')",j+1,i+1,type2char(TYPEOF(thiscol)),j+1,first+1,type2char(TYPEOF(target)));
}
Likely that coerce is creating the NA. Types are taken from the first
item of L. If a column there is 'numeric' then in a later item L it's
character, that'll give rise to an NA.
Thinking about it, it can probably coerce the target to cope with the
later item ...
On 03.01.2013 20:30, patricknic wrote:
> Apologies, I forgot to switch the directories in the code. Corrected
> on
> nabble and below.
>
>
>
>
> # Directories
> tempwd <- tempdir()
> setwd(tempwd)
>
> # Packages
> library(dataframe)
> library(data.table)
> library(foreign)
>
> # Get blocks and coordinates
> state.fips <- as.character(c(paste0(0, c(1:2, 4:6, 8:9)), 10:13,
> 15:42,
> 44:51, 53:56))
> tmpf <- tempfile(fileext=".zip")
> dtlist <- lapply(state.fips, function(fips) {
> cat("State", fips, ":\t")
> nm <- paste0("tl_2011_", fips, "_tabblock")
> dbfname <- paste0(nm, ".dbf")
> if (!file.exists(file.path(tempwd, dbfname))) {
> cat("Downloading...\t")
> url <-
> paste0("http://www2.census.gov/geo/tiger/TIGER2011/TABBLOCK/",
> nm, ".zip")
> download.file(url, destfile=tmp, quiet=FALSE)
> unzip(tmp, exdir=tempwd)
> }
> del <- dir(tempwd, pattern=nm)
> invisible(lapply(del[grep("dbf", del, invert=TRUE)], file.remove))
> cat("Reading...\t")
> df <- read.dbf(dbfname, as.is=TRUE)
> dt <- as.data.table(df)
> cat("Done\n")
> dt[, list(blockfips = GEOID, land_area = ALAND, water_area =
> AWATER, long
> = as.numeric(INTPTLON),
> lat = as.numeric(INTPTLAT))]
> })
> b <- rbindlist(dtlist)
>
> ### No NA problem:
> dtlist2 <- lapply(dtlist, as.data.frame)
> b2 <- do.call("rbind", dtlist2)
>
>
>
> --
> View this message in context:
>
> http://r.789695.n4.nabble.com/NAs-introduced-by-coercion-in-rbindlist-tp4654576p4654577.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
More information about the datatable-help
mailing list