[datatable-help] by row
Arunkumar Srinivasan
aragorn168b at gmail.com
Sun Jun 29 23:39:01 CEST 2014
Hi,
You write: There was some discussion of an .EACHI facility for data.table. Not sure what happened about that but I have an example that might be useful: http://stackoverflow.com/questions/24472254/splitting-a-column-by-factor-within-a-data-frame/24472571#24472571
by=.EACHI was implemented to remove the implicit “by-without-by” feature during joins. And that has been implemented quite sometime back - check the first FR implemented in the README following which Matt also posted on the mailing list asking for feedback.
You write: which shows the code where DT has columns v1, v2 and v3: DT[, split(v2, v1), by = names(DT)] ```
A small comment on this solution per-se. This calls split for each row! I’d approach this a little different:
## 1.9.3
rbindlist(setDT(dd)[, {
ans = list(v2);
setattr(ans, 'names', v1);
list(list(ans))
}, by = list(v1=as.character(v1))
]$V1,
fill=TRUE)
# a b
# 1: 1 NA
# 2: 2 NA
# 3: 6 NA
# 4: NA 3
# 5: NA 4
# 6: NA 5
We can then add this back to dd by reference. Personally I’ve never had to call split on a data.table.
You write: It works well if the rows of DT are unique but if they are not then one must do something ugly like appending a uniquifying column of 1:nrow(DT), say, and then including that in by and then finally removing it again at the end.
This suggests two features:
The ability to tell it to do the by by row
The ability to selectively omit by variables from the output ```
Not sure I follow this entirely, but by= does accept expressions. So, you could do:
dd[, split(v2,v1), by=1:nrow(dd)]
# nrow a b
# 1: 1 1 NA
# 2: 2 2 NA
# 3: 3 6 NA
# 4: 4 NA 3
# 5: 5 NA 4
# 6: 6 NA 5
You write: (By the way, is there an intention to move to the issue system on github for things like this?)
The entire issues from R-Forge have been already moved to github, including feature requests. And since then users have filed new FRs/bugs here. So, yes, you can file FRs directly, although in this case, I think the feature already exists (IIUC)?
Arun
From: Gabor Grothendieck ggrothendieck at gmail.com
Reply: Gabor Grothendieck ggrothendieck at gmail.com
Date: June 29, 2014 at 10:59:22 PM
To: datatable-help at lists.r-forge.r-project.org datatable-help at lists.r-forge.r-project.org
Subject: [datatable-help] by row
There was some discussion of an .EACHI facility for data.table. Not
sure what happened about that but I have an example that might be
useful:
http://stackoverflow.com/questions/24472254/splitting-a-column-by-factor-within-a-data-frame/24472571#24472571
which shows the code where DT has columns v1, v2 and v3:
DT[, split(v2, v1), by = names(DT)]
It works well if the rows of DT are unique but if they are not then
one must do something ugly like appending a uniquifying column of
1:nrow(DT), say, and then including that in by and then finally
removing it again at the end.
This suggests two features:
1. The ability to tell it to do the by by row
2. The ability to selectively omit by variables from the output
For example, if one could use a pseudo column .I and if -.I meant do
not include it in the output then one could write:
DT[, split(v2, v1), by = c(names(DT), -.I)]
Other syntaxes may be thought of too and the main suggestion here is
the possible need for these features rather than the specific syntax.
(By the way, is there an intention to move to the issue system on
github for things like this?)
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140629/8767ec22/attachment.html>
More information about the datatable-help
mailing list