[datatable-help] subset between data.table list and single data.table object

Matthew Dowle mdowle at mdowle.plus.com
Thu Aug 8 06:16:37 CEST 2013


Hm.  Have you worked through the examples of data.table?  Type 
example(data.table) and try to thoroughly understand each and every 
example.  Just forget your immediate problem for the moment, then come 
back to it once you've looked at the examples.

Further comments inline ...


On 07/08/13 23:44, iembry wrote:
> Hi Steve and Matthew, thank you both for your suggestions. This is the code
> that I have now:
>
> freadDataRatingDepotFiles <- function (file)
> {
> RDdatatmp <- fread(file, autostart=40)
> RDdatatmp[, site:= file]
> }
>
> big <- lapply(sitefiles,freadDataRatingDepotFiles)
> big <- rbindlist(big)
> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
> setnames(big[[u]], c("y", "shift", "x", "stor", "site_no")))
That lapply and big[[u]] doesn't make much sense. big is one big table, 
with one set of column names.  Why loop setnames?
> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u) big[[u]][,
> y:=as.numeric(y)])
> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u) big[[u]][,
> x:=as.numeric(x)])
> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u) big[[u]][,
> shift:=as.numeric(shift)])
> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u) big[[u]][,
> stor:=NULL])
> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
> na.omit(big[[u]]))
> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
> big[[u]][,y:=y+shift])
> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
> big[[u]][,shift:=NULL])
> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
> setkey(big[[u]], site_no))
Again, all these lapply don't make much sense now big is one big table.
>
> I am trying to subset big based on the mean and median values in aimjoin (as
> described previously in this message thread).

But that part of the message thread is no longer here.  So I'd have to 
go and hunt for it.

>
> This is the first row of aimjoin:
> dput(aimjoin[1])
> structure(list(site_no = "02437100", mean = 3882.65, p50 = 1830), .Names =
> c("site_no",
> "mean", "p50"), sorted = "site_no", class = c("data.table", "data.frame"
> ), row.names = c(NA, -1L), .internal.selfref = <pointer: 0x1bb7d88>)
>
> This is one element of big:
> tempbigdata <- data.frame(c(14.80, 14.81, 14.82), c(7900, 7920, 7930),
> c("/tried/02437100.exsa.rdb", "/tried/02437100.exsa.rdb",
> "/tried/02437100.exsa.rdb"), stringsAsFactors = FALSE)
> names(tempbigdata) <- c("y", "x", "site_no")
> tempbigdat <- gsub("/tried/", "", tempbigdata)
> tempbigdat <- gsub(".exsa.rdb", "", tempbigdat)

Please paste the data itself laid out just like you see it at the 
prompt.  I find it difficult to parse dput output in emails.  And longer 
to paste it into an R session before I see. I often read and reply from 
a mobile phone, as do others I guess. Questions like this are better 
presented on stack overflow.

> # I tried to remove all
> characters in the column site_no except for the actual site number, but I
> ended up with a character vector instead of a data.table
>
> This is a revised version of the code that I had written previously to
> perform the subsetting (prior to using data.table):
> mp <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
> {ifelse(aimjoin[1]$mean[u] < min(big[[u]]$x), subset(getratings[[u]],
> aimjoin[1]$mean[u] > min(big[[u]]$x) & aimjoin[1]$mean[u],
> aimjoin[u]$mean[u] > min(big[[u]]$x)), aimjoin[1]$mean[u])})
Again, maybe by big[[u]] you mean big[u] if big is keyed, but I didn't 
see a setkey above.  Seems like you maybe want [,...,by=site].
>
>
> I have tried to join aimjoin and big, but I received the error message
> below:
>
> aimjoin[J(big$site_no)]
> Error in `[.data.table`(aimjoin, J(big$site_no)) :
>    x.'site_no' is a character column being joined to i.'V1' which is type
> 'NULL'. Character columns must join to factor or character columns.
I guess that 'site_no' isn't a column of big ...  typo of 'site_no'?   
anyList$notthere is NULL in R and only NULL itself is type NULL, hence 
the guess.
>
>
> I also tried to merge aimjoin and big, but it was not what I wanted. I would
> like for the mean and p50 values -- for each site number -- to be joined to
> the site number in big. I figure that would make it easier to perform the
> subsetting.
Please see examples of good questions on Stack Overflow.  There you see 
people put examples of their input and what their desired output is for 
that input data.  I really can't see what you're trying to do.
>
> I want to subset big based on whether or not the mean or median in aimjoin
> is less than the minimum value of x in big. Those mean or median values in
> aimjoin that are smaller than x in big will have to be grouped together for
> a future step & those mean or median values in aimjoin that are equal to or
> larger than the x in big will be grouped together for a future step.
>
> Can you provide me with advice on how to proceed with the subsetting?
Try to construct a really good toy example that demonstrates what you 
want.  Show input and desired output.  In this case 2 groups of 5 rows 
each should be enough to demonstrate.

>
> Thank you.
>
> Irucka
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/subset-between-data-table-list-and-single-data-table-object-tp4673202p4673308.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>



More information about the datatable-help mailing list