[datatable-help] subset between data.table list and single data.table object

Matthew Dowle mdowle at mdowle.plus.com
Wed Aug 7 10:49:41 CEST 2013


Hi,
Yes this is much clearer now, thanks.  In this case, inside the 
freadDataRatingDepotFiles function, add a line at the end to add its 
argument (say 'funArg') as a column before returning it; i.e.,  
ret[,site:=funArg].  Likely then key big by site.
Since site is added by := by reference without copying the file's data 
that has just been read, reading and stacking multiple files should be 
quite fast using fread and data.table together if there's a bit of 
tweaking to be done on each file before stacking.
Matthew

On 07/08/13 07:24, iembry wrote:
> Hi Matthew, partly based on your suggestion I  have the following R code:
>
> big = rbindlist(lapply(sitefiles,freadDataRatingDepotFiles))
> big <- setnames(big,c("y", "shift", "x", "stor"))
> big <- big[, y:=as.numeric(y)]
> big <- big[, x:=as.numeric(x)]
> big <- big[, shift:=as.numeric(shift)]
> big <- big[, c("stor"):=NULL]
> big <- na.omit(big)
> big <- big[,y:=y+shift]
> big <- big[,shift:=NULL]
>
> Thus instead of a list of 59 data.table objects I have one list of over
> 100,000 rows.
>
> How do I know which row range belongs to a certain data.table object (59 of
> them) for the other calculations?
>
> As before I want to subset big (or the list of 59 data.tables) based on
> their connection to aimall (see below). aimall contains each of the 59
> station numbers & the order of aimall matches the order of the 59
> data.tables.
>
> Does this help clarify what I had previously asked?
>
> Thank you.
>
> Irucka
>
>
>
> str(big)
> Classes ‘data.table’ and 'data.frame':	112253 obs. of  2 variables:
>   $ y: num  14.8 14.8 14.8 14.8 14.8 ...
>   $ x: num  7900 7920 7930 7950 7970 7980 8000 8010 8030 8050 ...
>   - attr(*, ".internal.selfref")=<externalptr>
>
> dput(aimall)
> structure(list(site_no = c("02437100", "02446500", "02467000",
> "03217500", "03219500", "03227500", "03230700", "03231500", "03439000",
> "03441000", "03455000", "03479000", "04185000", "04186500", "04189000",
> "04191500", "04192500", "04193500", "06191500", "06214500", "06218500",
> "06225500", "06228000", "06235500", "06276500", "06279500", "06287000",
> "06289000", "06311000", "06313500", "06317000", "06320000", "06320500",
> "06323000", "06324000", "06324500", "06326500", "06329500", "06342500",
> "06426500", "06428500", "06436000", "06437000", "06438000", "06818000",
> "06821500", "06856600", "06860000", "06864000", "06864500", "06865500",
> "06877600", "06887500", "06889000", "06891000", "06893000", "06934500",
> "07010000", "07289000"), mean = c(3882.65, 819.82, 23742.37,
> 224.72, 496.79, 1491.39, 3170.14, 3682.46, 237.02, 127.9, 2955.14,
> 176.1, 345.72, 296.23, 275.35, 1870.93, 4544.74, 5157.63, 3106.7,
> 6940.54, 167.04, 1172.53, 771.23, 559.23, 407.46, 2144.53, 3384.37,
> 148.67, 14.99, 195.91, 267.9, 47.49, 63.49, 96.74, 184.16, 446.52,
> 565.5, 12419.4, 22372.86, 23.34, 92.56, 100.45, 296.65, 391.31,
> 43534.12, 16.65, 915.93, 20.16, 197.09, 227.78, 274.43, 1517.04,
> 5042.7, 5632.7, 7018.45, 52604.19, 81758.03, 186504.25, 755685.3
> ), p50 = c(1830, 382, 10400, 50, 140, 500, 1520, 1600, 188, 99,
> 2260, 115, 130, 75, 62, 460, 1470, 1700, 1390, 3670, 80, 559,
> 380, 257, 223, 1550, 2730, 82, 3.8, 120, 130, 23, 46, 34, 86,
> 216, 231, 7900, 20400, 2.9, 36, 7.5, 120, 114, 38200, 6.3, 430,
> 1, 37, 58, 73, 541, 2320, 2620, 3300, 43200, 61200, 147000, 687000
> )), .Names = c("site_no", "mean", "p50"), row.names = c(4463L,
> 4495L, 4586L, 5353L, 5357L, 5378L, 5393L, 5397L, 6165L, 6169L,
> 6203L, 6253L, 7304L, 7308L, 7317L, 7326L, 7328L, 7330L, 9633L,
> 9698L, 9710L, 9725L, 9733L, 9756L, 9832L, 9840L, 9877L, 9889L,
> 9988L, 9997L, 10010L, 10019L, 10022L, 10029L, 10031L, 10032L,
> 10041L, 10052L, 10118L, 10284L, 10288L, 10306L, 10317L, 10322L,
> 11165L, 11185L, 11261L, 11268L, 11281L, 11283L, 11284L, 11325L,
> 11363L, 11370L, 11401L, 11421L, 11606L, 11626L, 12714L), class =
> "data.frame")
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/subset-between-data-table-list-and-single-data-table-object-tp4673202p4673265.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help



More information about the datatable-help mailing list