[datatable-help] Sum of sets of columns in data.table

Bacou, Melanie mel at mbacou.com
Thu Dec 10 03:38:28 CET 2015


I come across this problem on a regular basis as well, and always end up 
fiddling for a while.
Because the LHS of `:=` is also dynamic, I'm not sure there's any more 
elegant approach.
One alternative might be to create several temporary data.tables holding 
the rowSums and then cbind()?

for (i in age_brackets) {
  tmp <- dt[, rowSums(.SD, na.rm=T), by=.(origin, race, sex,year, 
total_pop), .SDcols=i]
  dt <- cbind(dt, tmp)
}

--Mel.

On 12/9/2015 7:19 AM, Santosh Srinivas wrote:
> Hello All,
>
> I am sure there is a much more efficient way to do this. Please advise 
> any suggestions.
> For now, I have boot fixed this the crude way :-(
>
> age_brackets <- c("pop_0:pop_3","pop_4:pop_6","pop_7:pop_9")
>
> for (i in age_brackets) {
> cmdText <- paste('dt[, paste("",i,sep=""):= rowSums(.SD, na.rm=TRUE), 
> by=list(origin, race, sex,year, total_pop), .SDcols=',i,']', sep="")
> print(cmdText)
> eval(parse(text=cmdText))
> }
>
>
> On Tue, Dec 8, 2015 at 11:13 PM, Santosh Srinivas 
> <santosh.srinivas at gmail.com <mailto:santosh.srinivas at gmail.com>> wrote:
>
>     Hello All,
>
>     I have a dataset as below with a reproducible example after that.
>     My actual data has about 100 columns.
>
>     I want columns that represent the rowSums for sets .. eg. pop_0_3,
>     pop_4_6, pop_7_9  .. this is sum of population in age group of 0-3
>     for example.
>
>     How can I do that using indexes of the columns?
>
>     ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
>         origin race sex year total_pop   pop_0   pop_1   pop_2   pop_3
>       pop_4   pop_5   pop_6   pop_7 pop_8   pop_9
>      1:      0    0   0 2014 318748017 3971847 3957864 3972081 4003272
>     4001929 4002977 4132455 4152653 4118628 4105776
>      2:      0    0   0 2015 321368864 4000831 3988161 3974109 3986357
>     4015656 4013264 4013790 4142998 4163270 4129322
>      3:      0    0   0 2016 323995528 4029356 4017346 4004585 3988434
>     3998839 4026967 4024121 4024481 4153686 4174008
>      4:      0    0   0 2017 326625791 4057231 4046063 4033932 4019069
>     4000955 4010232 4037777 4034839 4035311 4164487
>      5:      0    0   0 2018 329256465 4083375 4074132 4062816 4048550
>     4031712 4012371 4021117 4048454 4045696 4046249
>      6:      0    0   0 2019 331883986 4107606 4100469 4091055 4077589
>     4061316 4043229 4023269 4031853 4059256 4056646
>      7:      0    0   0 2020 334503458 4128810 4124893 4117546 4105953
>     4090466 4072931 4054223 4034013 4042721 4070166
>      8:      0    0   0 2021 337108968 4145903 4146269 4142090 4132527
>     4118898 4102128 4083950 4065004 4044832 4053623
>      9:      0    0   0 2022 339698079 4159190 4163587 4163657 4157230
>     4145600 4130675 4113256 4094835 4075940 4055771
>     10:      0    0   0 2023 342267302 4169856 4177093 4181156 4178958
>     4170441 4157505 4141921 4124243 4105873 4086972
>
>
>     ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>     #
>     https://www.census.gov/population/projections/files/downloadables/NP2014_D1.csv
>
>     require("data.table")
>
>     dt <- structure(list(origin = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L), race = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), sex = c(0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), year = 2014:2023, total_pop =
>     c(318748017L,
>     321368864L, 323995528L, 326625791L, 329256465L, 331883986L,
>     334503458L,
>     337108968L, 339698079L, 342267302L), pop_0 = c(3971847L, 4000831L,
>     4029356L, 4057231L, 4083375L, 4107606L, 4128810L, 4145903L, 4159190L,
>     4169856L), pop_1 = c(3957864L, 3988161L, 4017346L, 4046063L,
>     4074132L, 4100469L, 4124893L, 4146269L, 4163587L, 4177093L),
>         pop_2 = c(3972081L, 3974109L, 4004585L, 4033932L, 4062816L,
>         4091055L, 4117546L, 4142090L, 4163657L, 4181156L), pop_3 =
>     c(4003272L,
>         3986357L, 3988434L, 4019069L, 4048550L, 4077589L, 4105953L,
>         4132527L, 4157230L, 4178958L), pop_4 = c(4001929L, 4015656L,
>         3998839L, 4000955L, 4031712L, 4061316L, 4090466L, 4118898L,
>         4145600L, 4170441L), pop_5 = c(4002977L, 4013264L, 4026967L,
>         4010232L, 4012371L, 4043229L, 4072931L, 4102128L, 4130675L,
>         4157505L), pop_6 = c(4132455L, 4013790L, 4024121L, 4037777L,
>         4021117L, 4023269L, 4054223L, 4083950L, 4113256L, 4141921L
>         ), pop_7 = c(4152653L, 4142998L, 4024481L, 4034839L, 4048454L,
>         4031853L, 4034013L, 4065004L, 4094835L, 4124243L), pop_8 =
>     c(4118628L,
>         4163270L, 4153686L, 4035311L, 4045696L, 4059256L, 4042721L,
>         4044832L, 4075940L, 4105873L), pop_9 = c(4105776L, 4129322L,
>         4174008L, 4164487L, 4046249L, 4056646L, 4070166L, 4053623L,
>         4055771L, 4086972L)), .Names = c("origin", "race", "sex",
>     "year", "total_pop", "pop_0", "pop_1", "pop_2", "pop_3", "pop_4",
>     "pop_5", "pop_6", "pop_7", "pop_8", "pop_9"), class = c("data.table",
>     "data.frame"), row.names = c(NA, -10L))
>
>
>     ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
>     Thank you.
>     Santosh
>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20151209/1699e5c9/attachment-0001.html>


More information about the datatable-help mailing list