[datatable-help] updating subsets

Sebastian Fischmeister sfischme at uwaterloo.ca
Tue Jul 21 21:38:50 CEST 2015


As a follow-up. This is working. Apparently it was mostly a type error
in R and not a problem in data.table:


## this works; important things are to (1) create the columns in advance
## and not on the subsets, and (2) use the correct data type

q <- list(data.table(runif(9)),data.table(runif(9)))
lapply(q, function(xx) xx[,freq:=-1L] )
lapply( q, function(xx) { qq <- split(xx, 1:nrow(xx) %/% 5)
                          lapply(qq, function(xx) { xx[, freq:=.N, by="V1"] })
                      })



Frank Erickson <fperickson at wisc.edu> writes:

> Ah ok. mclapply is a little beyond my depth, but I think you can always put
> the variable you are splitting by (1:nrow(xx) %/% 5 here) directly into the
> by argument of xx[i,j,by], so...
>
> q <- list(data.table(runif(9)),data.table(runif(9)))
> lapply( q, function(xx) xx[, freq:=.N, by=.(V1,1:nrow(xx) %/% 5)] )
>
> works for me, on data.table 1.9.4.
>
> On Mon, Jul 20, 2015 at 11:55 AM, Sebastian Fischmeister <
> sfischme at uwaterloo.ca> wrote:
>
>>
>>
>> > I think you should use a single data.table; it's much more
>> straightforward
>> > in that case:
>> >
>> > qq <- data.table(runif(18),id=rep(1:2,each=9))
>> > qq[,freq:=.N,by=.(id,seq(nrow(qq))%/%5)]
>>
>> Thanks for the idea, unfortunately I need lists. The example is just a
>> minimal example to show the problem. The actual code will use large
>> data.tables in lists and I want to eventually use mclapply to
>> parallelize the computation.
>>
>>   Sebastian
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>


More information about the datatable-help mailing list