<HTML>

<BODY>

Hi Matthew, thank you for your advice.<br>


<br>


I went over the examples in data.table, thank you for the suggestion. I also got rid of the lapply statements too.<br>


<br>


big <- lapply(sitefiles,freadDataRatingDepotFiles)<br>


big <- rbindlist(big)<br>


setnames(big,c("y", "shift", "x", "stor", "site_no"))<br>


big <- big[, y:=as.numeric(y)]<br>


big <- big[, x:=as.numeric(x)]<br>


big <- big[, shift:=as.numeric(shift)]<br>


big <- big[, stor:=NULL]<br>


big <- na.omit(big)<br>


big <- big[,y:=y+shift]<br>


big <- big[,shift:=NULL]<br>


big <- setkey(big, site_no)<br>


<br>


I have used dput as people on the main R help list had suggested that dput be used instead of unformatted tables due to text-based e-mail and help list. Based on your suggestions I have the input, intermediate table, and the output tables.<br>


<br>


Thank you.<br>


<br>


Irucka<br>


<br>


<br>


<br>


INPUT<br>


big<br>


            y                     x                     site_no<br>


     1: 14.80                    7900           /tried/02437100.exsa.rdb<br>


     2: 14.81                    7920           /tried/02437100.exsa.rdb<br>


     3: 14.82                    7930           /tried/02437100.exsa.rdb<br>


     4: 14.83                    7950           /tried/02437100.exsa.rdb<br>


     5: 14.84                    7970           /tried/02437100.exsa.rdb<br>


    ---                                                 <br>


112249: 57.86   2400000                 /tried/07289000.exsa.rdb<br>


112250: 57.87   2410000                 /tried/07289000.exsa.rdb<br>


112251: 57.88   2410000                /tried/07289000.exsa.rdb<br>


112252: 57.89   2420000                /tried/07289000.exsa.rdb<br>


112253: 57.90   2430000                /tried/07289000.exsa.rdb<br>


<br>


<br>


aimjoin<br>


     site_no                     mean           p50<br>


 1: 02437100    3882.65         1830.0<br>


 2: 02446500            819.82                  382.0<br>


 3: 02467000            23742.37        10400.0<br>


 4: 03217500            224.72          50.0<br>


 5: 03219500            496.79                  140.0<br>


  ---  <br>


54: 06889000          5632.70           2620.0<br>


55: 06891000    7018.45         3300.0<br>


56: 06893000    52604.19        43200.0<br>


57: 06934500    81758.03        61200.0<br>


58: 07010000            186504.25       147000.0<br>


59: 07289000            755685.30       687000.0<br>


     site_no                    mean                    p50<br>


<br>


<br>


<br>


INTERMEDIATE<br>


bigintermediate<br>


            y                           x                        site_no                mean                    p50<br>


     1: 14.80                           7900            02437100        3882.65         1830.0<br>


     2: 14.81                           7920            02437100        3882.65         1830.0<br>


     3: 14.82                           7930            02437100        3882.65         1830.0<br>


     4: 14.83                           7950            02437100        3882.65         1830.0<br>


     5: 14.84                           7970            02437100        3882.65         1830.0<br>


    ---                                                 <br>


112249: 57.86           2400000                 07289000        755685.30       687000.0<br>


112250: 57.87           2410000                 07289000        755685.30       687000.0<br>


112251: 57.88           2410000                 07289000        755685.30       687000.0<br>


112252: 57.89           2420000                 07289000        755685.30       687000.0<br>


112253: 57.90           2430000                 07289000        755685.30       687000.0<br>


<br>


<br>


<br>


OUTPUT<br>


bigintermean [where mean of site_no > min(x)]<br>


            y                           x                     site_no           mean     <br>


    ---   <br>


    <br>


 ...   <br>


112249: 57.86           2400000                 07289000        755685.30<br>


112250: 57.87           2410000                 07289000        755685.30<br>


112251: 57.88           2410000                 07289000        755685.30<br>


112252: 57.89           2420000                 07289000        755685.30<br>


112253: 57.90           2430000                 07289000        755685.30<br>


<br>


total of 109,452 rows<br>


<br>


<br>


<br>


bigintermedian [where p50 of site_no > min(x)]<br>


            y                           x                      site_no          p50<br>


    ---   <br>


    <br>


 ...   <br>


112249: 57.86           2400000                 07289000        687000.0<br>


112250: 57.87           2410000                 07289000        687000.0<br>


112251: 57.88           2410000                 07289000        687000.0<br>


112252: 57.89           2420000                 07289000        687000.0<br>


112253: 57.90           2430000         07289000        687000.0<br>


<br>


total of 109,452 rows<br>


<br>


<br>


<br>


<br>


bigextramean [where mean of site_no < min(x)]<br>


            y           x               site_no         mean    <br>


     1: 14.80           7900    02437100        3882.65 <br>


     2: 14.81           7920    02437100        3882.65  <br>


     3: 14.82           7930    02437100        3882.65  <br>


     4: 14.83           7950    02437100        3882.65 <br>


     5: 14.84           7970    02437100        3882.65<br>


<br>


total of 2671 rows<br>


<br>


<br>


bigextramedian [where p50 of site_no < min(x)]<br>


            y           x               site_no         p50<br>


     1: 14.80           7900    02437100        1830.0<br>


     2: 14.81           7920    02437100        1830.0<br>


     3: 14.82           7930    02437100        1830.0<br>


     4: 14.83           7950    02437100        1830.0<br>


     5: 14.84           7970    02437100        1830.0<br>


<br>


total of 2671 rows<br>


<br>


<br>


<br>


bigextrameanmax [where mean of site_no > max(x)]<br>


            y           x               site_no         mean    <br>


     1: 14.80           7900    02437100        3882.65 <br>


     2: 14.81           7920    02437100        3882.65  <br>


     3: 14.82           7930    02437100        3882.65  <br>


     4: 14.83           7950    02437100        3882.65 <br>


     5: 14.84           7970    02437100        3882.65<br>


<br>


total of 2671 rows<br>


<br>


<br>


bigextramedianmax [where p50 of site_no > max(x)]<br>


            y           x               site_no         p50<br>


     1: 14.80           7900    02437100        1830.0<br>


     2: 14.81           7920    02437100        1830.0<br>


     3: 14.82           7930    02437100        1830.0<br>


     4: 14.83           7950    02437100        1830.0<br>


     5: 14.84           7970    02437100        1830.0<br>


<br>


total of 2671 rows<br>


<br>


<br>


<br>


<br>


<br>


<br>


<-----Original Message-----> <br>


>From: Matthew Dowle [mdowle@mdowle.plus.com]<br>


>Sent: 8/7/2013 11:16:37 PM<br>


>To: iruckaE@mail2world.com<br>


>Cc: datatable-help@lists.r-forge.r-project.org<br>


>Subject: Re: [datatable-help] subset between data.table list and single data.table object<br>


><br>


>Hm.  Have you worked through the examples of data.table?  Type <br>


>example(data.table) and try to thoroughly understand each and every <br>


>example.  Just forget your immediate problem for the moment, then come <br>


>back to it once you've looked at the examples.<br>


><br>


>Further comments inline ...<br>


><br>


><br>


>On 07/08/13 23:44, iembry wrote:<br>


>> Hi Steve and Matthew, thank you both for your suggestions. This is the code<br>


>> that I have now:<br>


>><br>


>> freadDataRatingDepotFiles <- function (file)<br>


>> {<br>


>> RDdatatmp <- fread(file, autostart=40)<br>


>> RDdatatmp[, site:= file]<br>


>> }<br>


>><br>


>> big <- lapply(sitefiles,freadDataRatingDepotFiles)<br>


>> big <- rbindlist(big)<br>


>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)<br>


>> setnames(big[[u]], c("y", "shift", "x", "stor", "site_no")))<br>


>That lapply and big[[u]] doesn't make much sense. big is one big table, <br>


>with one set of column names.  Why loop setnames?<br>


>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u) big[[u]][,<br>


>> y:=as.numeric(y)])<br>


>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u) big[[u]][,<br>


>> x:=as.numeric(x)])<br>


>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u) big[[u]][,<br>


>> shift:=as.numeric(shift)])<br>


>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u) big[[u]][,<br>


>> stor:=NULL])<br>


>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)<br>


>> na.omit(big[[u]]))<br>


>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)<br>


>> big[[u]][,y:=y+shift])<br>


>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)<br>


>> big[[u]][,shift:=NULL])<br>


>> big <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)<br>


>> setkey(big[[u]], site_no))<br>


>Again, all these lapply don't make much sense now big is one big table.<br>


>><br>


>> I am trying to subset big based on the mean and median values in aimjoin (as<br>


>> described previously in this message thread).<br>


><br>


>But that part of the message thread is no longer here.  So I'd have to <br>


>go and hunt for it.<br>


><br>


>><br>


>> This is the first row of aimjoin:<br>


>> dput(aimjoin[1])<br>


>> structure(list(site_no = "02437100", mean = 3882.65, p50 = 1830), .Names =<br>


>> c("site_no",<br>


>> "mean", "p50"), sorted = "site_no", class = c("data.table", "data.frame"<br>


>> ), row.names = c(NA, -1L), .internal.selfref = <pointer: 0x1bb7d88>)<br>


>><br>


>> This is one element of big:<br>


>> tempbigdata <- data.frame(c(14.80, 14.81, 14.82), c(7900, 7920, 7930),<br>


>> c("/tried/02437100.exsa.rdb", "/tried/02437100.exsa.rdb",<br>


>> "/tried/02437100.exsa.rdb"), stringsAsFactors = FALSE)<br>


>> names(tempbigdata) <- c("y", "x", "site_no")<br>


>> tempbigdat <- gsub("/tried/", "", tempbigdata)<br>


>> tempbigdat <- gsub(".exsa.rdb", "", tempbigdat)<br>


><br>


>Please paste the data itself laid out just like you see it at the <br>


>prompt.  I find it difficult to parse dput output in emails.  And longer <br>


>to paste it into an R session before I see. I often read and reply from <br>


>a mobile phone, as do others I guess. Questions like this are better <br>


>presented on stack overflow.<br>


><br>


>> # I tried to remove all<br>


>> characters in the column site_no except for the actual site number, but I<br>


>> ended up with a character vector instead of a data.table<br>


>><br>


>> This is a revised version of the code that I had written previously to<br>


>> perform the subsetting (prior to using data.table):<br>


>> mp <- lapply(seq_along(dailyvaluesneednew$site_no), function(u)<br>


>> {ifelse(aimjoin[1]$mean[u] < min(big[[u]]$x), subset(getratings[[u]],<br>


>> aimjoin[1]$mean[u] > min(big[[u]]$x) & aimjoin[1]$mean[u],<br>


>> aimjoin[u]$mean[u] > min(big[[u]]$x)), aimjoin[1]$mean[u])})<br>


>Again, maybe by big[[u]] you mean big[u] if big is keyed, but I didn't <br>


>see a setkey above.  Seems like you maybe want [,...,by=site].<br>


>><br>


>><br>


>> I have tried to join aimjoin and big, but I received the error message<br>


>> below:<br>


>><br>


>> aimjoin[J(big$site_no)]<br>


>> Error in `[.data.table`(aimjoin, J(big$site_no)) :<br>


>>    x.'site_no' is a character column being joined to i.'V1' which is type<br>


>> 'NULL'. Character columns must join to factor or character columns.<br>


>I guess that 'site_no' isn't a column of big ...  typo of 'site_no'?   <br>


>anyList$notthere is NULL in R and only NULL itself is type NULL, hence <br>


>the guess.<br>


>><br>


>><br>


>> I also tried to merge aimjoin and big, but it was not what I wanted. I would<br>


>> like for the mean and p50 values -- for each site number -- to be joined to<br>


>> the site number in big. I figure that would make it easier to perform the<br>


>> subsetting.<br>


>Please see examples of good questions on Stack Overflow.  There you see <br>


>people put examples of their input and what their desired output is for <br>


>that input data.  I really can't see what you're trying to do.<br>


>><br>


>> I want to subset big based on whether or not the mean or median in aimjoin<br>


>> is less than the minimum value of x in big. Those mean or median values in<br>


>> aimjoin that are smaller than x in big will have to be grouped together for<br>


>> a future step & those mean or median values in aimjoin that are equal to or<br>


>> larger than the x in big will be grouped together for a future step.<br>


>><br>


>> Can you provide me with advice on how to proceed with the subsetting?<br>


>Try to construct a really good toy example that demonstrates what you <br>


>want.  Show input and desired output.  In this case 2 groups of 5 rows <br>


>each should be enough to demonstrate.<br>


><br>


>><br>


>> Thank you.<br>


>><br>


>> Irucka<br>


>><br>


>><br>


>><br>


>> --<br>


>> View this message in context: http://r.789695.n4.nabble.com/subset-between-data-table-list-<br>


>and-single-data-table-object-tp4673202p4673308.html<br>


>> Sent from the datatable-help mailing list archive at Nabble.com.<br>


>> _______________________________________________<br>


>> datatable-help mailing list<br>


>> datatable-help@lists.r-forge.r-project.org<br>


>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help<br>


>><br>


><br>


>.<br>


>

</BODY></HTML>


<span id=m2wTl><p><font face="Arial, Helvetica, sans-serif" size="2" style="font-size:13.5px">_______________________________________________________________<BR>Get the Free email that has everyone talking at <a href=http://www.mail2world.com target=new>http://www.mail2world.com</a><br>  <font color=#999999>Unlimited Email Storage – POP3 – Calendar – SMS – Translator – Much More!</font></font></span>