[datatable-help] subset between data.table list and single data.table object

Thu Aug 8 00:44:25 CEST 2013

Hi Steve and Matthew, thank you both for your suggestions. This is the code
that I have now:

freadDataRatingDepotFiles <- function (file) 
{
RDdatatmp <- fread(file, autostart=40)
RDdatatmp[, site:= file]
}

big <- lapply(sitefiles,freadDataRatingDepotFiles)
big <- rbindlist(big) 
big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
setnames(big[[u]], c("y", "shift", "x", "stor", "site_no")))
big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u) big[[u]][,
y:=as.numeric(y)])
big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u) big[[u]][,
x:=as.numeric(x)])
big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u) big[[u]][,
shift:=as.numeric(shift)])
big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u) big[[u]][,
stor:=NULL])
big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
na.omit(big[[u]]))
big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
big[[u]][,y:=y+shift])
big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
big[[u]][,shift:=NULL])
big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
setkey(big[[u]], site_no))

I am trying to subset big based on the mean and median values in aimjoin (as
described previously in this message thread).

This is the first row of aimjoin:
dput(aimjoin[1])
structure(list(site_no = "02437100", mean = 3882.65, p50 = 1830), .Names =
c("site_no", 
"mean", "p50"), sorted = "site_no", class = c("data.table", "data.frame"
), row.names = c(NA, -1L), .internal.selfref = <pointer: 0x1bb7d88>)

This is one element of big:
tempbigdata <- data.frame(c(14.80, 14.81, 14.82), c(7900, 7920, 7930),
c("/tried/02437100.exsa.rdb", "/tried/02437100.exsa.rdb",
"/tried/02437100.exsa.rdb"), stringsAsFactors = FALSE)
names(tempbigdata) <- c("y", "x", "site_no")
tempbigdat <- gsub("/tried/", "", tempbigdata) 
tempbigdat <- gsub(".exsa.rdb", "", tempbigdat) # I tried to remove all
characters in the column site_no except for the actual site number, but I
ended up with a character vector instead of a data.table

This is a revised version of the code that I had written previously to
perform the subsetting (prior to using data.table):
mp <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
{ifelse(aimjoin[1]$mean[u] < min(big[[u]]$x), subset(getratings[[u]],
aimjoin[1]$mean[u] > min(big[[u]]$x) & aimjoin[1]$mean[u],
aimjoin[u]$mean[u] > min(big[[u]]$x)), aimjoin[1]$mean[u])})

I have tried to join aimjoin and big, but I received the error message
below:

aimjoin[J(big$site_no)]
Error in `[.data.table`(aimjoin, J(big$site_no)) : 
  x.'site_no' is a character column being joined to i.'V1' which is type
'NULL'. Character columns must join to factor or character columns.

I also tried to merge aimjoin and big, but it was not what I wanted. I would
like for the mean and p50 values -- for each site number -- to be joined to
the site number in big. I figure that would make it easier to perform the
subsetting.

I want to subset big based on whether or not the mean or median in aimjoin
is less than the minimum value of x in big. Those mean or median values in
aimjoin that are smaller than x in big will have to be grouped together for
a future step & those mean or median values in aimjoin that are equal to or
larger than the x in big will be grouped together for a future step.

Can you provide me with advice on how to proceed with the subsetting?

Thank you.

Irucka

--
View this message in context: http://r.789695.n4.nabble.com/subset-between-data-table-list-and-single-data-table-object-tp4673202p4673308.html
Sent from the datatable-help mailing list archive at Nabble.com.