<HTML>
<BODY>
Hi Matthew, thank you for your advice.<br>
<br>
I went over the examples in data.table, thank you for the suggestion. I also got rid of the lapply statements too.<br>
<br>
big <- lapply(sitefiles,freadDataRatingDepotFiles)<br>
big <- rbindlist(big)<br>
setnames(big,c("y", "shift", "x", "stor", "site_no"))<br>
big <- big[, y:=as.numeric(y)]<br>
big <- big[, x:=as.numeric(x)]<br>
big <- big[, shift:=as.numeric(shift)]<br>
big <- big[, stor:=NULL]<br>
big <- na.omit(big)<br>
big <- big[,y:=y+shift]<br>
big <- big[,shift:=NULL]<br>
big <- setkey(big, site_no)<br>
<br>
I have used dput as people on the main R help list had suggested that dput be used instead of unformatted tables due to text-based e-mail and help list. Based on your suggestions I have the input, intermediate table, and the output tables.<br>
<br>
Thank you.<br>
<br>
Irucka<br>
<br>
<br>
<br>
INPUT<br>
big<br>
y x site_no<br>
1: 14.80 7900 /tried/02437100.exsa.rdb<br>
2: 14.81 7920 /tried/02437100.exsa.rdb<br>
3: 14.82 7930 /tried/02437100.exsa.rdb<br>
4: 14.83 7950 /tried/02437100.exsa.rdb<br>
5: 14.84 7970 /tried/02437100.exsa.rdb<br>
--- <br>
112249: 57.86 2400000 /tried/07289000.exsa.rdb<br>
112250: 57.87 2410000 /tried/07289000.exsa.rdb<br>
112251: 57.88 2410000 /tried/07289000.exsa.rdb<br>
112252: 57.89 2420000 /tried/07289000.exsa.rdb<br>
112253: 57.90 2430000 /tried/07289000.exsa.rdb<br>
<br>
<br>
aimjoin<br>
site_no mean p50<br>
1: 02437100 3882.65 1830.0<br>
2: 02446500 819.82 382.0<br>
3: 02467000 23742.37 10400.0<br>
4: 03217500 224.72 50.0<br>
5: 03219500 496.79 140.0<br>
--- <br>
54: 06889000 5632.70 2620.0<br>
55: 06891000 7018.45 3300.0<br>
56: 06893000 52604.19 43200.0<br>
57: 06934500 81758.03 61200.0<br>
58: 07010000 186504.25 147000.0<br>
59: 07289000 755685.30 687000.0<br>
site_no mean p50<br>
<br>
<br>
<br>
INTERMEDIATE<br>
bigintermediate<br>
y x site_no mean p50<br>
1: 14.80 7900 02437100 3882.65 1830.0<br>
2: 14.81 7920 02437100 3882.65 1830.0<br>
3: 14.82 7930 02437100 3882.65 1830.0<br>
4: 14.83 7950 02437100 3882.65 1830.0<br>
5: 14.84 7970 02437100 3882.65 1830.0<br>
--- <br>
112249: 57.86 2400000 07289000 755685.30 687000.0<br>
112250: 57.87 2410000 07289000 755685.30 687000.0<br>
112251: 57.88 2410000 07289000 755685.30 687000.0<br>
112252: 57.89 2420000 07289000 755685.30 687000.0<br>
112253: 57.90 2430000 07289000 755685.30 687000.0<br>
<br>
<br>
<br>
OUTPUT<br>
bigintermean [where mean of site_no > min(x)]<br>
y x site_no mean <br>
--- <br>
<br>
... <br>
112249: 57.86 2400000 07289000 755685.30<br>
112250: 57.87 2410000 07289000 755685.30<br>
112251: 57.88 2410000 07289000 755685.30<br>
112252: 57.89 2420000 07289000 755685.30<br>
112253: 57.90 2430000 07289000 755685.30<br>
<br>
total of 109,452 rows<br>
<br>
<br>
<br>
bigintermedian [where p50 of site_no > min(x)]<br>
y x site_no p50<br>
--- <br>
<br>
... <br>
112249: 57.86 2400000 07289000 687000.0<br>
112250: 57.87 2410000 07289000 687000.0<br>
112251: 57.88 2410000 07289000 687000.0<br>
112252: 57.89 2420000 07289000 687000.0<br>
112253: 57.90 2430000 07289000 687000.0<br>
<br>
total of 109,452 rows<br>
<br>
<br>
<br>
<br>
bigextramean [where mean of site_no < min(x)]<br>
y x site_no mean <br>
1: 14.80 7900 02437100 3882.65 <br>
2: 14.81 7920 02437100 3882.65 <br>
3: 14.82 7930 02437100 3882.65 <br>
4: 14.83 7950 02437100 3882.65 <br>
5: 14.84 7970 02437100 3882.65<br>
<br>
total of 2671 rows<br>
<br>
<br>
bigextramedian [where p50 of site_no < min(x)]<br>
y x site_no p50<br>
1: 14.80 7900 02437100 1830.0<br>
2: 14.81 7920 02437100 1830.0<br>
3: 14.82 7930 02437100 1830.0<br>
4: 14.83 7950 02437100 1830.0<br>
5: 14.84 7970 02437100 1830.0<br>
<br>
total of 2671 rows<br>
<br>
<br>
<br>
bigextrameanmax [where mean of site_no > max(x)]<br>
y x site_no mean <br>
1: 14.80 7900 02437100 3882.65 <br>
2: 14.81 7920 02437100 3882.65 <br>
3: 14.82 7930 02437100 3882.65 <br>
4: 14.83 7950 02437100 3882.65 <br>
5: 14.84 7970 02437100 3882.65<br>
<br>
total of 2671 rows<br>
<br>
<br>
bigextramedianmax [where p50 of site_no > max(x)]<br>
y x site_no p50<br>
1: 14.80 7900 02437100 1830.0<br>
2: 14.81 7920 02437100 1830.0<br>
3: 14.82 7930 02437100 1830.0<br>
4: 14.83 7950 02437100 1830.0<br>
5: 14.84 7970 02437100 1830.0<br>
<br>
total of 2671 rows<br>
<br>
<br>
<br>
<br>
<br>
<br>
<-----Original Message-----> <br>
>From: Matthew Dowle [mdowle@mdowle.plus.com]<br>
>Sent: 8/7/2013 11:16:37 PM<br>
>To: iruckaE@mail2world.com<br>
>Cc: datatable-help@lists.r-forge.r-project.org<br>
>Subject: Re: [datatable-help] subset between data.table list and single data.table object<br>
><br>
>Hm. Have you worked through the examples of data.table? Type <br>
>example(data.table) and try to thoroughly understand each and every <br>
>example. Just forget your immediate problem for the moment, then come <br>
>back to it once you've looked at the examples.<br>
><br>
>Further comments inline ...<br>
><br>
><br>
>On 07/08/13 23:44, iembry wrote:<br>
>> Hi Steve and Matthew, thank you both for your suggestions. This is the code<br>
>> that I have now:<br>
>><br>
>> freadDataRatingDepotFiles <- function (file)<br>
>> {<br>
>> RDdatatmp <- fread(file, autostart=40)<br>
>> RDdatatmp[, site:= file]<br>
>> }<br>
>><br>
>> big <- lapply(sitefiles,freadDataRatingDepotFiles)<br>
>> big <- rbindlist(big)<br>
>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)<br>
>> setnames(big[[u]], c("y", "shift", "x", "stor", "site_no")))<br>
>That lapply and big[[u]] doesn't make much sense. big is one big table, <br>
>with one set of column names. Why loop setnames?<br>
>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u) big[[u]][,<br>
>> y:=as.numeric(y)])<br>
>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u) big[[u]][,<br>
>> x:=as.numeric(x)])<br>
>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u) big[[u]][,<br>
>> shift:=as.numeric(shift)])<br>
>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u) big[[u]][,<br>
>> stor:=NULL])<br>
>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)<br>
>> na.omit(big[[u]]))<br>
>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)<br>
>> big[[u]][,y:=y+shift])<br>
>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)<br>
>> big[[u]][,shift:=NULL])<br>
>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)<br>
>> setkey(big[[u]], site_no))<br>
>Again, all these lapply don't make much sense now big is one big table.<br>
>><br>
>> I am trying to subset big based on the mean and median values in aimjoin (as<br>
>> described previously in this message thread).<br>
><br>
>But that part of the message thread is no longer here. So I'd have to <br>
>go and hunt for it.<br>
><br>
>><br>
>> This is the first row of aimjoin:<br>
>> dput(aimjoin[1])<br>
>> structure(list(site_no = "02437100", mean = 3882.65, p50 = 1830), .Names =<br>
>> c("site_no",<br>
>> "mean", "p50"), sorted = "site_no", class = c("data.table", "data.frame"<br>
>> ), row.names = c(NA, -1L), .internal.selfref = <pointer: 0x1bb7d88>)<br>
>><br>
>> This is one element of big:<br>
>> tempbigdata <- data.frame(c(14.80, 14.81, 14.82), c(7900, 7920, 7930),<br>
>> c("/tried/02437100.exsa.rdb", "/tried/02437100.exsa.rdb",<br>
>> "/tried/02437100.exsa.rdb"), stringsAsFactors = FALSE)<br>
>> names(tempbigdata) <- c("y", "x", "site_no")<br>
>> tempbigdat <- gsub("/tried/", "", tempbigdata)<br>
>> tempbigdat <- gsub(".exsa.rdb", "", tempbigdat)<br>
><br>
>Please paste the data itself laid out just like you see it at the <br>
>prompt. I find it difficult to parse dput output in emails. And longer <br>
>to paste it into an R session before I see. I often read and reply from <br>
>a mobile phone, as do others I guess. Questions like this are better <br>
>presented on stack overflow.<br>
><br>
>> # I tried to remove all<br>
>> characters in the column site_no except for the actual site number, but I<br>
>> ended up with a character vector instead of a data.table<br>
>><br>
>> This is a revised version of the code that I had written previously to<br>
>> perform the subsetting (prior to using data.table):<br>
>> mp <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)<br>
>> {ifelse(aimjoin[1]$mean[u] < min(big[[u]]$x), subset(getratings[[u]],<br>
>> aimjoin[1]$mean[u] > min(big[[u]]$x) & aimjoin[1]$mean[u],<br>
>> aimjoin[u]$mean[u] > min(big[[u]]$x)), aimjoin[1]$mean[u])})<br>
>Again, maybe by big[[u]] you mean big[u] if big is keyed, but I didn't <br>
>see a setkey above. Seems like you maybe want [,...,by=site].<br>
>><br>
>><br>
>> I have tried to join aimjoin and big, but I received the error message<br>
>> below:<br>
>><br>
>> aimjoin[J(big$site_no)]<br>
>> Error in `[.data.table`(aimjoin, J(big$site_no)) :<br>
>> x.'site_no' is a character column being joined to i.'V1' which is type<br>
>> 'NULL'. Character columns must join to factor or character columns.<br>
>I guess that 'site_no' isn't a column of big ... typo of 'site_no'? <br>
>anyList$notthere is NULL in R and only NULL itself is type NULL, hence <br>
>the guess.<br>
>><br>
>><br>
>> I also tried to merge aimjoin and big, but it was not what I wanted. I would<br>
>> like for the mean and p50 values -- for each site number -- to be joined to<br>
>> the site number in big. I figure that would make it easier to perform the<br>
>> subsetting.<br>
>Please see examples of good questions on Stack Overflow. There you see <br>
>people put examples of their input and what their desired output is for <br>
>that input data. I really can't see what you're trying to do.<br>
>><br>
>> I want to subset big based on whether or not the mean or median in aimjoin<br>
>> is less than the minimum value of x in big. Those mean or median values in<br>
>> aimjoin that are smaller than x in big will have to be grouped together for<br>
>> a future step & those mean or median values in aimjoin that are equal to or<br>
>> larger than the x in big will be grouped together for a future step.<br>
>><br>
>> Can you provide me with advice on how to proceed with the subsetting?<br>
>Try to construct a really good toy example that demonstrates what you <br>
>want. Show input and desired output. In this case 2 groups of 5 rows <br>
>each should be enough to demonstrate.<br>
><br>
>><br>
>> Thank you.<br>
>><br>
>> Irucka<br>
>><br>
>><br>
>><br>
>> --<br>
>> View this message in context: http://r.789695.n4.nabble.com/subset-between-data-table-list-<br>
>and-single-data-table-object-tp4673202p4673308.html<br>
>> Sent from the datatable-help mailing list archive at Nabble.com.<br>
>> _______________________________________________<br>
>> datatable-help mailing list<br>
>> datatable-help@lists.r-forge.r-project.org<br>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help<br>
>><br>
><br>
>.<br>
>
</BODY></HTML>
<span id=m2wTl><p><font face="Arial, Helvetica, sans-serif" size="2" style="font-size:13.5px">_______________________________________________________________<BR>Get the Free email that has everyone talking at <a href=http://www.mail2world.com target=new>http://www.mail2world.com</a><br> <font color=#999999>Unlimited Email Storage POP3 Calendar SMS Translator Much More!</font></font></span>