[datatable-help] by row

Arunkumar Srinivasan aragorn168b at gmail.com
Sun Jun 29 23:39:01 CEST 2014


Hi,

You write: There was some discussion of an .EACHI facility for data.table. Not sure what happened about that but I have an example that might be useful: http://stackoverflow.com/questions/24472254/splitting-a-column-by-factor-within-a-data-frame/24472571#24472571

by=.EACHI was implemented to remove the implicit “by-without-by” feature during joins. And that has been implemented quite sometime back - check the first FR implemented in the README following which Matt also posted on the mailing list asking for feedback.

You write: which shows the code where DT has columns v1, v2 and v3: DT[, split(v2, v1), by = names(DT)] ```

A small comment on this solution per-se. This calls split for each row! I’d approach this a little different:

## 1.9.3
rbindlist(setDT(dd)[, {  
              ans = list(v2);  
              setattr(ans, 'names', v1);  
              list(list(ans))
              }, by = list(v1=as.character(v1))
           ]$V1,  
fill=TRUE)

#     a  b
# 1:  1 NA
# 2:  2 NA
# 3:  6 NA
# 4: NA  3
# 5: NA  4
# 6: NA  5
We can then add this back to dd by reference. Personally I’ve never had to call split on a data.table.

You write: It works well if the rows of DT are unique but if they are not then one must do something ugly like appending a uniquifying column of 1:nrow(DT), say, and then including that in by and then finally removing it again at the end.

This suggests two features:

The ability to tell it to do the by by row
The ability to selectively omit by variables from the output ```
Not sure I follow this entirely, but by= does accept expressions. So, you could do:

dd[, split(v2,v1), by=1:nrow(dd)]
#    nrow  a  b
# 1:    1  1 NA
# 2:    2  2 NA
# 3:    3  6 NA
# 4:    4 NA  3
# 5:    5 NA  4
# 6:    6 NA  5
You write: (By the way, is there an intention to move to the issue system on github for things like this?)

The entire issues from R-Forge have been already moved to github, including feature requests. And since then users have filed new FRs/bugs here. So, yes, you can file FRs directly, although in this case, I think the feature already exists (IIUC)?



Arun

From: Gabor Grothendieck ggrothendieck at gmail.com
Reply: Gabor Grothendieck ggrothendieck at gmail.com
Date: June 29, 2014 at 10:59:22 PM
To: datatable-help at lists.r-forge.r-project.org datatable-help at lists.r-forge.r-project.org
Subject:  [datatable-help] by row  

There was some discussion of an .EACHI facility for data.table. Not  
sure what happened about that but I have an example that might be  
useful:  

http://stackoverflow.com/questions/24472254/splitting-a-column-by-factor-within-a-data-frame/24472571#24472571  

which shows the code where DT has columns v1, v2 and v3:  

DT[, split(v2, v1), by = names(DT)]  

It works well if the rows of DT are unique but if they are not then  
one must do something ugly like appending a uniquifying column of  
1:nrow(DT), say, and then including that in by and then finally  
removing it again at the end.  

This suggests two features:  

1. The ability to tell it to do the by by row  
2. The ability to selectively omit by variables from the output  

For example, if one could use a pseudo column .I and if -.I meant do  
not include it in the output then one could write:  

DT[, split(v2, v1), by = c(names(DT), -.I)]  

Other syntaxes may be thought of too and the main suggestion here is  
the possible need for these features rather than the specific syntax.  

(By the way, is there an intention to move to the issue system on  
github for things like this?)  

--  
Statistics & Software Consulting  
GKX Group, GKX Associates Inc.  
tel: 1-877-GKX-GROUP  
email: ggrothendieck at gmail.com  
_______________________________________________  
datatable-help mailing list  
datatable-help at lists.r-forge.r-project.org  
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140629/8767ec22/attachment.html>


More information about the datatable-help mailing list