[datatable-help] subset between data.table list and single data.table object

Irucka Embry iruckaE at mail2world.com
Thu Aug 8 18:12:25 CEST 2013


Hi Matthew, thank you for your advice.

I went over the examples in data.table, thank you for the suggestion. I
also got rid of the lapply statements too.

big <- lapply(sitefiles,freadDataRatingDepotFiles)
big <- rbindlist(big)
setnames(big,c("y", "shift", "x", "stor", "site_no"))
big <- big[, y:=as.numeric(y)]
big <- big[, x:=as.numeric(x)]
big <- big[, shift:=as.numeric(shift)]
big <- big[, stor:=NULL]
big <- na.omit(big)
big <- big[,y:=y+shift]
big <- big[,shift:=NULL]
big <- setkey(big, site_no)

I have used dput as people on the main R help list had suggested that
dput be used instead of unformatted tables due to text-based e-mail and
help list. Based on your suggestions I have the input, intermediate
table, and the output tables.

Thank you.

Irucka



INPUT
big
y x site_no
1: 14.80 7900 /tried/02437100.exsa.rdb
2: 14.81 7920 /tried/02437100.exsa.rdb
3: 14.82 7930 /tried/02437100.exsa.rdb
4: 14.83 7950 /tried/02437100.exsa.rdb
5: 14.84 7970 /tried/02437100.exsa.rdb
--- 
112249: 57.86 2400000 /tried/07289000.exsa.rdb
112250: 57.87 2410000 /tried/07289000.exsa.rdb
112251: 57.88 2410000 /tried/07289000.exsa.rdb
112252: 57.89 2420000 /tried/07289000.exsa.rdb
112253: 57.90 2430000 /tried/07289000.exsa.rdb


aimjoin
site_no mean p50
1: 02437100 3882.65 1830.0
2: 02446500 819.82 382.0
3: 02467000 23742.37 10400.0
4: 03217500 224.72 50.0
5: 03219500 496.79 140.0
--- 
54: 06889000 5632.70 2620.0
55: 06891000 7018.45 3300.0
56: 06893000 52604.19 43200.0
57: 06934500 81758.03 61200.0
58: 07010000 186504.25 147000.0
59: 07289000 755685.30 687000.0
site_no mean p50



INTERMEDIATE
bigintermediate
y x site_no mean p50
1: 14.80 7900 02437100 3882.65 1830.0
2: 14.81 7920 02437100 3882.65 1830.0
3: 14.82 7930 02437100 3882.65 1830.0
4: 14.83 7950 02437100 3882.65 1830.0
5: 14.84 7970 02437100 3882.65 1830.0
--- 
112249: 57.86 2400000 07289000 755685.30 687000.0
112250: 57.87 2410000 07289000 755685.30 687000.0
112251: 57.88 2410000 07289000 755685.30 687000.0
112252: 57.89 2420000 07289000 755685.30 687000.0
112253: 57.90 2430000 07289000 755685.30 687000.0



OUTPUT
bigintermean [where mean of site_no > min(x)]
y x site_no mean 
--- 

... 
112249: 57.86 2400000 07289000 755685.30
112250: 57.87 2410000 07289000 755685.30
112251: 57.88 2410000 07289000 755685.30
112252: 57.89 2420000 07289000 755685.30
112253: 57.90 2430000 07289000 755685.30

total of 109,452 rows



bigintermedian [where p50 of site_no > min(x)]
y x site_no p50
--- 

... 
112249: 57.86 2400000 07289000 687000.0
112250: 57.87 2410000 07289000 687000.0
112251: 57.88 2410000 07289000 687000.0
112252: 57.89 2420000 07289000 687000.0
112253: 57.90 2430000 07289000 687000.0

total of 109,452 rows




bigextramean [where mean of site_no < min(x)]
y x site_no mean 
1: 14.80 7900 02437100 3882.65 
2: 14.81 7920 02437100 3882.65 
3: 14.82 7930 02437100 3882.65 
4: 14.83 7950 02437100 3882.65 
5: 14.84 7970 02437100 3882.65

total of 2671 rows


bigextramedian [where p50 of site_no < min(x)]
y x site_no p50
1: 14.80 7900 02437100 1830.0
2: 14.81 7920 02437100 1830.0
3: 14.82 7930 02437100 1830.0
4: 14.83 7950 02437100 1830.0
5: 14.84 7970 02437100 1830.0

total of 2671 rows



bigextrameanmax [where mean of site_no > max(x)]
y x site_no mean 
1: 14.80 7900 02437100 3882.65 
2: 14.81 7920 02437100 3882.65 
3: 14.82 7930 02437100 3882.65 
4: 14.83 7950 02437100 3882.65 
5: 14.84 7970 02437100 3882.65

total of 2671 rows


bigextramedianmax [where p50 of site_no > max(x)]
y x site_no p50
1: 14.80 7900 02437100 1830.0
2: 14.81 7920 02437100 1830.0
3: 14.82 7930 02437100 1830.0
4: 14.83 7950 02437100 1830.0
5: 14.84 7970 02437100 1830.0

total of 2671 rows






<-----Original Message-----> 
>From: Matthew Dowle [mdowle at mdowle.plus.com]
>Sent: 8/7/2013 11:16:37 PM
>To: iruckaE at mail2world.com
>Cc: datatable-help at lists.r-forge.r-project.org
>Subject: Re: [datatable-help] subset between data.table list and single
data.table object
>
>Hm. Have you worked through the examples of data.table? Type 
>example(data.table) and try to thoroughly understand each and every 
>example. Just forget your immediate problem for the moment, then come 
>back to it once you've looked at the examples.
>
>Further comments inline ...
>
>
>On 07/08/13 23:44, iembry wrote:
>> Hi Steve and Matthew, thank you both for your suggestions. This is
the code
>> that I have now:
>>
>> freadDataRatingDepotFiles <- function (file)
>> {
>> RDdatatmp <- fread(file, autostart=40)
>> RDdatatmp[, site:= file]
>> }
>>
>> big <- lapply(sitefiles,freadDataRatingDepotFiles)
>> big <- rbindlist(big)
>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
>> setnames(big[[u]], c("y", "shift", "x", "stor", "site_no")))
>That lapply and big[[u]] doesn't make much sense. big is one big table,
>with one set of column names. Why loop setnames?
>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
big[[u]][,
>> y:=as.numeric(y)])
>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
big[[u]][,
>> x:=as.numeric(x)])
>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
big[[u]][,
>> shift:=as.numeric(shift)])
>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
big[[u]][,
>> stor:=NULL])
>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
>> na.omit(big[[u]]))
>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
>> big[[u]][,y:=y+shift])
>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
>> big[[u]][,shift:=NULL])
>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
>> setkey(big[[u]], site_no))
>Again, all these lapply don't make much sense now big is one big table.
>>
>> I am trying to subset big based on the mean and median values in
aimjoin (as
>> described previously in this message thread).
>
>But that part of the message thread is no longer here. So I'd have to 
>go and hunt for it.
>
>>
>> This is the first row of aimjoin:
>> dput(aimjoin[1])
>> structure(list(site_no = "02437100", mean = 3882.65, p50 = 1830),
.Names =
>> c("site_no",
>> "mean", "p50"), sorted = "site_no", class = c("data.table",
"data.frame"
>> ), row.names = c(NA, -1L), .internal.selfref = <pointer: 0x1bb7d88>)
>>
>> This is one element of big:
>> tempbigdata <- data.frame(c(14.80, 14.81, 14.82), c(7900, 7920,
7930),
>> c("/tried/02437100.exsa.rdb", "/tried/02437100.exsa.rdb",
>> "/tried/02437100.exsa.rdb"), stringsAsFactors = FALSE)
>> names(tempbigdata) <- c("y", "x", "site_no")
>> tempbigdat <- gsub("/tried/", "", tempbigdata)
>> tempbigdat <- gsub(".exsa.rdb", "", tempbigdat)
>
>Please paste the data itself laid out just like you see it at the 
>prompt. I find it difficult to parse dput output in emails. And longer 
>to paste it into an R session before I see. I often read and reply from
>a mobile phone, as do others I guess. Questions like this are better 
>presented on stack overflow.
>
>> # I tried to remove all
>> characters in the column site_no except for the actual site number,
but I
>> ended up with a character vector instead of a data.table
>>
>> This is a revised version of the code that I had written previously
to
>> perform the subsetting (prior to using data.table):
>> mp <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)
>> {ifelse(aimjoin[1]$mean[u] < min(big[[u]]$x), subset(getratings[[u]],
>> aimjoin[1]$mean[u] > min(big[[u]]$x) & aimjoin[1]$mean[u],
>> aimjoin[u]$mean[u] > min(big[[u]]$x)), aimjoin[1]$mean[u])})
>Again, maybe by big[[u]] you mean big[u] if big is keyed, but I didn't 
>see a setkey above. Seems like you maybe want [,...,by=site].
>>
>>
>> I have tried to join aimjoin and big, but I received the error
message
>> below:
>>
>> aimjoin[J(big$site_no)]
>> Error in `[.data.table`(aimjoin, J(big$site_no)) :
>> x.'site_no' is a character column being joined to i.'V1' which is
type
>> 'NULL'. Character columns must join to factor or character columns.
>I guess that 'site_no' isn't a column of big ... typo of 'site_no'? 
>anyList$notthere is NULL in R and only NULL itself is type NULL, hence 
>the guess.
>>
>>
>> I also tried to merge aimjoin and big, but it was not what I wanted.
I would
>> like for the mean and p50 values -- for each site number -- to be
joined to
>> the site number in big. I figure that would make it easier to perform
the
>> subsetting.
>Please see examples of good questions on Stack Overflow. There you see 
>people put examples of their input and what their desired output is for
>that input data. I really can't see what you're trying to do.
>>
>> I want to subset big based on whether or not the mean or median in
aimjoin
>> is less than the minimum value of x in big. Those mean or median
values in
>> aimjoin that are smaller than x in big will have to be grouped
together for
>> a future step & those mean or median values in aimjoin that are equal
to or
>> larger than the x in big will be grouped together for a future step.
>>
>> Can you provide me with advice on how to proceed with the subsetting?
>Try to construct a really good toy example that demonstrates what you 
>want. Show input and desired output. In this case 2 groups of 5 rows 
>each should be enough to demonstrate.
>
>>
>> Thank you.
>>
>> Irucka
>>
>>
>>
>> --
>> View this message in context:
http://r.789695.n4.nabble.com/subset-between-data-table-list-
>and-single-data-table-object-tp4673202p4673308.html
>> Sent from the datatable-help mailing list archive at Nabble.com.
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-h
elp
>>
>
>.
> 


<span id=m2wTl><p><font face="Arial, Helvetica, sans-serif" size="2" style="font-size:13.5px">_______________________________________________________________<BR>Get the Free email that has everyone talking at <a href=http://www.mail2world.com target=new>http://www.mail2world.com</a><br>  <font color=#999999>Unlimited Email Storage – POP3 – Calendar – SMS – Translator – Much More!</font></font></span>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130808/ca739c36/attachment.html>


More information about the datatable-help mailing list