From santosh.srinivas at gmail.com  Tue Dec  8 18:43:35 2015
From: santosh.srinivas at gmail.com (Santosh Srinivas)
Date: Tue, 8 Dec 2015 23:13:35 +0530
Subject: [datatable-help] Sum of sets of columns in data.table
Message-ID: <CALtLFXS6Q4pg4hVPVo0YjkfUFP-WuWreEMAwa8Wc3tKm+gf+Dw@mail.gmail.com>

Hello All,

I have a dataset as below with a reproducible example after that. My actual
data has about 100 columns.

I want columns that represent the rowSums for sets .. eg. pop_0_3, pop_4_6,
pop_7_9  .. this is sum of population in age group of 0-3 for example.

How can I do that using indexes of the columns?

---------------------------------------------------------------------------------------------------------------------------------------------------------

    origin race sex year total_pop   pop_0   pop_1   pop_2   pop_3   pop_4
  pop_5   pop_6   pop_7   pop_8   pop_9
 1:      0    0   0 2014 318748017 3971847 3957864 3972081 4003272 4001929
4002977 4132455 4152653 4118628 4105776
 2:      0    0   0 2015 321368864 4000831 3988161 3974109 3986357 4015656
4013264 4013790 4142998 4163270 4129322
 3:      0    0   0 2016 323995528 4029356 4017346 4004585 3988434 3998839
4026967 4024121 4024481 4153686 4174008
 4:      0    0   0 2017 326625791 4057231 4046063 4033932 4019069 4000955
4010232 4037777 4034839 4035311 4164487
 5:      0    0   0 2018 329256465 4083375 4074132 4062816 4048550 4031712
4012371 4021117 4048454 4045696 4046249
 6:      0    0   0 2019 331883986 4107606 4100469 4091055 4077589 4061316
4043229 4023269 4031853 4059256 4056646
 7:      0    0   0 2020 334503458 4128810 4124893 4117546 4105953 4090466
4072931 4054223 4034013 4042721 4070166
 8:      0    0   0 2021 337108968 4145903 4146269 4142090 4132527 4118898
4102128 4083950 4065004 4044832 4053623
 9:      0    0   0 2022 339698079 4159190 4163587 4163657 4157230 4145600
4130675 4113256 4094835 4075940 4055771
10:      0    0   0 2023 342267302 4169856 4177093 4181156 4178958 4170441
4157505 4141921 4124243 4105873 4086972


---------------------------------------------------------------------------------------------------------------------------------------------------------


#
https://www.census.gov/population/projections/files/downloadables/NP2014_D1.csv

require("data.table")

dt <- structure(list(origin = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), race = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), sex = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), year = 2014:2023, total_pop =
c(318748017L,
321368864L, 323995528L, 326625791L, 329256465L, 331883986L, 334503458L,
337108968L, 339698079L, 342267302L), pop_0 = c(3971847L, 4000831L,
4029356L, 4057231L, 4083375L, 4107606L, 4128810L, 4145903L, 4159190L,
4169856L), pop_1 = c(3957864L, 3988161L, 4017346L, 4046063L,
4074132L, 4100469L, 4124893L, 4146269L, 4163587L, 4177093L),
    pop_2 = c(3972081L, 3974109L, 4004585L, 4033932L, 4062816L,
    4091055L, 4117546L, 4142090L, 4163657L, 4181156L), pop_3 = c(4003272L,
    3986357L, 3988434L, 4019069L, 4048550L, 4077589L, 4105953L,
    4132527L, 4157230L, 4178958L), pop_4 = c(4001929L, 4015656L,
    3998839L, 4000955L, 4031712L, 4061316L, 4090466L, 4118898L,
    4145600L, 4170441L), pop_5 = c(4002977L, 4013264L, 4026967L,
    4010232L, 4012371L, 4043229L, 4072931L, 4102128L, 4130675L,
    4157505L), pop_6 = c(4132455L, 4013790L, 4024121L, 4037777L,
    4021117L, 4023269L, 4054223L, 4083950L, 4113256L, 4141921L
    ), pop_7 = c(4152653L, 4142998L, 4024481L, 4034839L, 4048454L,
    4031853L, 4034013L, 4065004L, 4094835L, 4124243L), pop_8 = c(4118628L,
    4163270L, 4153686L, 4035311L, 4045696L, 4059256L, 4042721L,
    4044832L, 4075940L, 4105873L), pop_9 = c(4105776L, 4129322L,
    4174008L, 4164487L, 4046249L, 4056646L, 4070166L, 4053623L,
    4055771L, 4086972L)), .Names = c("origin", "race", "sex",
"year", "total_pop", "pop_0", "pop_1", "pop_2", "pop_3", "pop_4",
"pop_5", "pop_6", "pop_7", "pop_8", "pop_9"), class = c("data.table",
"data.frame"), row.names = c(NA, -10L))


---------------------------------------------------------------------------------------------------------------------------------------------------------

Thank you.
Santosh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20151208/66e3ff23/attachment.html>

From santosh.srinivas at gmail.com  Wed Dec  9 13:19:01 2015
From: santosh.srinivas at gmail.com (Santosh Srinivas)
Date: Wed, 9 Dec 2015 17:49:01 +0530
Subject: [datatable-help] Sum of sets of columns in data.table
In-Reply-To: <CALtLFXS6Q4pg4hVPVo0YjkfUFP-WuWreEMAwa8Wc3tKm+gf+Dw@mail.gmail.com>
References: <CALtLFXS6Q4pg4hVPVo0YjkfUFP-WuWreEMAwa8Wc3tKm+gf+Dw@mail.gmail.com>
Message-ID: <CALtLFXRv0ZEPcZLiOkQySwU59RAbumgA3CQPzn7GK2YSUhPbTg@mail.gmail.com>

Hello All,

I am sure there is a much more efficient way to do this. Please advise any
suggestions.
For now, I have boot fixed this the crude way :-(

age_brackets <- c("pop_0:pop_3","pop_4:pop_6","pop_7:pop_9")

for (i in age_brackets) {
cmdText <- paste('dt[, paste("",i,sep=""):= rowSums(.SD, na.rm=TRUE),
by=list(origin, race, sex,year, total_pop), .SDcols=',i,']', sep="")
print(cmdText)
eval(parse(text=cmdText))
}


On Tue, Dec 8, 2015 at 11:13 PM, Santosh Srinivas <
santosh.srinivas at gmail.com> wrote:

> Hello All,
>
> I have a dataset as below with a reproducible example after that. My
> actual data has about 100 columns.
>
> I want columns that represent the rowSums for sets .. eg. pop_0_3,
> pop_4_6, pop_7_9  .. this is sum of population in age group of 0-3 for
> example.
>
> How can I do that using indexes of the columns?
>
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
>     origin race sex year total_pop   pop_0   pop_1   pop_2   pop_3   pop_4
>   pop_5   pop_6   pop_7   pop_8   pop_9
>  1:      0    0   0 2014 318748017 3971847 3957864 3972081 4003272 4001929
> 4002977 4132455 4152653 4118628 4105776
>  2:      0    0   0 2015 321368864 4000831 3988161 3974109 3986357 4015656
> 4013264 4013790 4142998 4163270 4129322
>  3:      0    0   0 2016 323995528 4029356 4017346 4004585 3988434 3998839
> 4026967 4024121 4024481 4153686 4174008
>  4:      0    0   0 2017 326625791 4057231 4046063 4033932 4019069 4000955
> 4010232 4037777 4034839 4035311 4164487
>  5:      0    0   0 2018 329256465 4083375 4074132 4062816 4048550 4031712
> 4012371 4021117 4048454 4045696 4046249
>  6:      0    0   0 2019 331883986 4107606 4100469 4091055 4077589 4061316
> 4043229 4023269 4031853 4059256 4056646
>  7:      0    0   0 2020 334503458 4128810 4124893 4117546 4105953 4090466
> 4072931 4054223 4034013 4042721 4070166
>  8:      0    0   0 2021 337108968 4145903 4146269 4142090 4132527 4118898
> 4102128 4083950 4065004 4044832 4053623
>  9:      0    0   0 2022 339698079 4159190 4163587 4163657 4157230 4145600
> 4130675 4113256 4094835 4075940 4055771
> 10:      0    0   0 2023 342267302 4169856 4177093 4181156 4178958 4170441
> 4157505 4141921 4124243 4105873 4086972
>
>
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> #
> https://www.census.gov/population/projections/files/downloadables/NP2014_D1.csv
>
> require("data.table")
>
> dt <- structure(list(origin = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L), race = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), sex = c(0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), year = 2014:2023, total_pop =
> c(318748017L,
> 321368864L, 323995528L, 326625791L, 329256465L, 331883986L, 334503458L,
> 337108968L, 339698079L, 342267302L), pop_0 = c(3971847L, 4000831L,
> 4029356L, 4057231L, 4083375L, 4107606L, 4128810L, 4145903L, 4159190L,
> 4169856L), pop_1 = c(3957864L, 3988161L, 4017346L, 4046063L,
> 4074132L, 4100469L, 4124893L, 4146269L, 4163587L, 4177093L),
>     pop_2 = c(3972081L, 3974109L, 4004585L, 4033932L, 4062816L,
>     4091055L, 4117546L, 4142090L, 4163657L, 4181156L), pop_3 = c(4003272L,
>     3986357L, 3988434L, 4019069L, 4048550L, 4077589L, 4105953L,
>     4132527L, 4157230L, 4178958L), pop_4 = c(4001929L, 4015656L,
>     3998839L, 4000955L, 4031712L, 4061316L, 4090466L, 4118898L,
>     4145600L, 4170441L), pop_5 = c(4002977L, 4013264L, 4026967L,
>     4010232L, 4012371L, 4043229L, 4072931L, 4102128L, 4130675L,
>     4157505L), pop_6 = c(4132455L, 4013790L, 4024121L, 4037777L,
>     4021117L, 4023269L, 4054223L, 4083950L, 4113256L, 4141921L
>     ), pop_7 = c(4152653L, 4142998L, 4024481L, 4034839L, 4048454L,
>     4031853L, 4034013L, 4065004L, 4094835L, 4124243L), pop_8 = c(4118628L,
>     4163270L, 4153686L, 4035311L, 4045696L, 4059256L, 4042721L,
>     4044832L, 4075940L, 4105873L), pop_9 = c(4105776L, 4129322L,
>     4174008L, 4164487L, 4046249L, 4056646L, 4070166L, 4053623L,
>     4055771L, 4086972L)), .Names = c("origin", "race", "sex",
> "year", "total_pop", "pop_0", "pop_1", "pop_2", "pop_3", "pop_4",
> "pop_5", "pop_6", "pop_7", "pop_8", "pop_9"), class = c("data.table",
> "data.frame"), row.names = c(NA, -10L))
>
>
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Thank you.
> Santosh
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20151209/7976fca7/attachment.html>

From suttoncarl at ymail.com  Wed Dec  9 19:02:54 2015
From: suttoncarl at ymail.com (carlsutton)
Date: Wed, 9 Dec 2015 10:02:54 -0800 (PST)
Subject: [datatable-help] subset data table in i with multiple criteria for
 multiple variables
Message-ID: <1449684174047-4715347.post@n4.nabble.com>

Is there a way to subset a data table using "i" with multiple criteria using
multiple variables(columns)?  I have some test code shown below on what I
have tried.  And yes, I have read the documentation, taking the data camp
class (Multiple viewing, I'm  a slow learner) and have not seen anything
relevant.  Also checked for questions on this topic in this forum and did
not find an answer for my query.
Attempting to upload R file
dataTableExamples.R
<http://r.789695.n4.nabble.com/file/n4715347/dataTableExamples.R>  
Probably should have stayed in bed today the way things are going.

A cut and paste from RStudio
#  Data Table exercises
require(data.table)
a <- seq(2L,40L, by = 4L)
b <- seq(15L,105L,by = 10L)
c <- 1:10L
d <- rep(c(100L,150L),5L)
e <- 101:110L
dt <- data.table(a,b,c,d,e)
dt  
dta <- subset(dt, a < 35)
dtb <- subset(dta, b > 35)
dtb
dtb[, lapply(.SD,median), by = d]
#  Now attempt to subset the rows in i
vec <- c(a<35, b>35)
dtvec <- dt[vec, lapply(.SD, median, na.rm = TRUE), by = d]
dtvec

And console output
> #  Now attempt to subset the rows in i
> vec <- c(a<35, b>35)
> dtvec <- dt[vec, lapply(.SD, median, na.rm = TRUE), by = d]
Error in `[.data.table`(dt, vec, lapply(.SD, median, na.rm = TRUE), by = d)
: 
  Column 1 of result for group 2 is type 'double' but expecting type
'integer'. Column types must be consistent for each group.
> dtvec
     a  b  c   d   e
 1:  2 15  1 100 101
 2:  6 25  2 150 102
 3: 10 35  3 100 103
 4: 14 45  4 150 104
 5: 18 55  5 100 105
 6: 22 65  6 150 106
 7: 26 75  7 100 107
 8: 30 85  8 150 108
 9: 34 95  9 100 109
10: NA NA NA  NA  NA
11: NA NA NA  NA  NA
12: NA NA NA  NA  NA
13: NA NA NA  NA  NA
14: NA NA NA  NA  NA
15: NA NA NA  NA  NA
16: NA NA NA  NA  NA

The error message has me confused. 
 /Column 1 of result for group 2 is type/
What group 2?  I have only grouped on column "d".  Result 1 is ????  No idea
what "result 1" is referring to, is it the subset in "i", the median of col
a??  No clue.

I have only created integer variable for the data table, so why the
rejection " Column 1 of result for group 2 is type 'double' but expecting
type 'integer'. Column types must be consistent for each group."  What
double?  I have not created any double numbers.

Carl Sutton


-----
Carl Sutton
--
View this message in context: http://r.789695.n4.nabble.com/subset-data-table-in-i-with-multiple-criteria-for-multiple-variables-tp4715347.html
Sent from the datatable-help mailing list archive at Nabble.com.

From mel at mbacou.com  Thu Dec 10 03:38:28 2015
From: mel at mbacou.com (Bacou, Melanie)
Date: Wed, 9 Dec 2015 21:38:28 -0500
Subject: [datatable-help] Sum of sets of columns in data.table
In-Reply-To: <CALtLFXRv0ZEPcZLiOkQySwU59RAbumgA3CQPzn7GK2YSUhPbTg@mail.gmail.com>
References: <CALtLFXS6Q4pg4hVPVo0YjkfUFP-WuWreEMAwa8Wc3tKm+gf+Dw@mail.gmail.com>
 <CALtLFXRv0ZEPcZLiOkQySwU59RAbumgA3CQPzn7GK2YSUhPbTg@mail.gmail.com>
Message-ID: <5668E5A4.2090909@mbacou.com>

I come across this problem on a regular basis as well, and always end up 
fiddling for a while.
Because the LHS of `:=` is also dynamic, I'm not sure there's any more 
elegant approach.
One alternative might be to create several temporary data.tables holding 
the rowSums and then cbind()?

for (i in age_brackets) {
  tmp <- dt[, rowSums(.SD, na.rm=T), by=.(origin, race, sex,year, 
total_pop), .SDcols=i]
  dt <- cbind(dt, tmp)
}

--Mel.

On 12/9/2015 7:19 AM, Santosh Srinivas wrote:
> Hello All,
>
> I am sure there is a much more efficient way to do this. Please advise 
> any suggestions.
> For now, I have boot fixed this the crude way :-(
>
> age_brackets <- c("pop_0:pop_3","pop_4:pop_6","pop_7:pop_9")
>
> for (i in age_brackets) {
> cmdText <- paste('dt[, paste("",i,sep=""):= rowSums(.SD, na.rm=TRUE), 
> by=list(origin, race, sex,year, total_pop), .SDcols=',i,']', sep="")
> print(cmdText)
> eval(parse(text=cmdText))
> }
>
>
> On Tue, Dec 8, 2015 at 11:13 PM, Santosh Srinivas 
> <santosh.srinivas at gmail.com <mailto:santosh.srinivas at gmail.com>> wrote:
>
>     Hello All,
>
>     I have a dataset as below with a reproducible example after that.
>     My actual data has about 100 columns.
>
>     I want columns that represent the rowSums for sets .. eg. pop_0_3,
>     pop_4_6, pop_7_9  .. this is sum of population in age group of 0-3
>     for example.
>
>     How can I do that using indexes of the columns?
>
>     ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
>         origin race sex year total_pop   pop_0   pop_1   pop_2   pop_3
>       pop_4   pop_5   pop_6   pop_7 pop_8   pop_9
>      1:      0    0   0 2014 318748017 3971847 3957864 3972081 4003272
>     4001929 4002977 4132455 4152653 4118628 4105776
>      2:      0    0   0 2015 321368864 4000831 3988161 3974109 3986357
>     4015656 4013264 4013790 4142998 4163270 4129322
>      3:      0    0   0 2016 323995528 4029356 4017346 4004585 3988434
>     3998839 4026967 4024121 4024481 4153686 4174008
>      4:      0    0   0 2017 326625791 4057231 4046063 4033932 4019069
>     4000955 4010232 4037777 4034839 4035311 4164487
>      5:      0    0   0 2018 329256465 4083375 4074132 4062816 4048550
>     4031712 4012371 4021117 4048454 4045696 4046249
>      6:      0    0   0 2019 331883986 4107606 4100469 4091055 4077589
>     4061316 4043229 4023269 4031853 4059256 4056646
>      7:      0    0   0 2020 334503458 4128810 4124893 4117546 4105953
>     4090466 4072931 4054223 4034013 4042721 4070166
>      8:      0    0   0 2021 337108968 4145903 4146269 4142090 4132527
>     4118898 4102128 4083950 4065004 4044832 4053623
>      9:      0    0   0 2022 339698079 4159190 4163587 4163657 4157230
>     4145600 4130675 4113256 4094835 4075940 4055771
>     10:      0    0   0 2023 342267302 4169856 4177093 4181156 4178958
>     4170441 4157505 4141921 4124243 4105873 4086972
>
>
>     ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>     #
>     https://www.census.gov/population/projections/files/downloadables/NP2014_D1.csv
>
>     require("data.table")
>
>     dt <- structure(list(origin = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     0L), race = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), sex = c(0L,
>     0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), year = 2014:2023, total_pop =
>     c(318748017L,
>     321368864L, 323995528L, 326625791L, 329256465L, 331883986L,
>     334503458L,
>     337108968L, 339698079L, 342267302L), pop_0 = c(3971847L, 4000831L,
>     4029356L, 4057231L, 4083375L, 4107606L, 4128810L, 4145903L, 4159190L,
>     4169856L), pop_1 = c(3957864L, 3988161L, 4017346L, 4046063L,
>     4074132L, 4100469L, 4124893L, 4146269L, 4163587L, 4177093L),
>         pop_2 = c(3972081L, 3974109L, 4004585L, 4033932L, 4062816L,
>         4091055L, 4117546L, 4142090L, 4163657L, 4181156L), pop_3 =
>     c(4003272L,
>         3986357L, 3988434L, 4019069L, 4048550L, 4077589L, 4105953L,
>         4132527L, 4157230L, 4178958L), pop_4 = c(4001929L, 4015656L,
>         3998839L, 4000955L, 4031712L, 4061316L, 4090466L, 4118898L,
>         4145600L, 4170441L), pop_5 = c(4002977L, 4013264L, 4026967L,
>         4010232L, 4012371L, 4043229L, 4072931L, 4102128L, 4130675L,
>         4157505L), pop_6 = c(4132455L, 4013790L, 4024121L, 4037777L,
>         4021117L, 4023269L, 4054223L, 4083950L, 4113256L, 4141921L
>         ), pop_7 = c(4152653L, 4142998L, 4024481L, 4034839L, 4048454L,
>         4031853L, 4034013L, 4065004L, 4094835L, 4124243L), pop_8 =
>     c(4118628L,
>         4163270L, 4153686L, 4035311L, 4045696L, 4059256L, 4042721L,
>         4044832L, 4075940L, 4105873L), pop_9 = c(4105776L, 4129322L,
>         4174008L, 4164487L, 4046249L, 4056646L, 4070166L, 4053623L,
>         4055771L, 4086972L)), .Names = c("origin", "race", "sex",
>     "year", "total_pop", "pop_0", "pop_1", "pop_2", "pop_3", "pop_4",
>     "pop_5", "pop_6", "pop_7", "pop_8", "pop_9"), class = c("data.table",
>     "data.frame"), row.names = c(NA, -10L))
>
>
>     ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
>     Thank you.
>     Santosh
>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20151209/1699e5c9/attachment-0001.html>

From mel at mbacou.com  Thu Dec 10 03:42:26 2015
From: mel at mbacou.com (Bacou, Melanie)
Date: Wed, 9 Dec 2015 21:42:26 -0500
Subject: [datatable-help] subset data table in i with multiple criteria
 for multiple variables
In-Reply-To: <1449684174047-4715347.post@n4.nabble.com>
References: <1449684174047-4715347.post@n4.nabble.com>
Message-ID: <5668E692.30201@mbacou.com>

Carl,
Are you just looking for the following syntax?

dt[a<35 & b>35, lapply(.SD, median, na.rm = T), by = d]

You can include as many conditions as necessary in `i`. You can also 
chain data.tables:

dt[a<35][b>35][, lapply(.SD, median, na.rm = T), by = d]

--Mel.

On 12/9/2015 1:02 PM, carlsutton wrote:
> Is there a way to subset a data table using "i" with multiple criteria using
> multiple variables(columns)?  I have some test code shown below on what I
> have tried.  And yes, I have read the documentation, taking the data camp
> class (Multiple viewing, I'm  a slow learner) and have not seen anything
> relevant.  Also checked for questions on this topic in this forum and did
> not find an answer for my query.
> Attempting to upload R file
> dataTableExamples.R
> <http://r.789695.n4.nabble.com/file/n4715347/dataTableExamples.R>
> Probably should have stayed in bed today the way things are going.
>
> A cut and paste from RStudio
> #  Data Table exercises
> require(data.table)
> a <- seq(2L,40L, by = 4L)
> b <- seq(15L,105L,by = 10L)
> c <- 1:10L
> d <- rep(c(100L,150L),5L)
> e <- 101:110L
> dt <- data.table(a,b,c,d,e)
> dt
> dta <- subset(dt, a < 35)
> dtb <- subset(dta, b > 35)
> dtb
> dtb[, lapply(.SD,median), by = d]
> #  Now attempt to subset the rows in i
> vec <- c(a<35, b>35)
> dtvec <- dt[vec, lapply(.SD, median, na.rm = TRUE), by = d]
> dtvec
>
> And console output
>> #  Now attempt to subset the rows in i
>> vec <- c(a<35, b>35)
>> dtvec <- dt[vec, lapply(.SD, median, na.rm = TRUE), by = d]
> Error in `[.data.table`(dt, vec, lapply(.SD, median, na.rm = TRUE), by = d)
> :
>    Column 1 of result for group 2 is type 'double' but expecting type
> 'integer'. Column types must be consistent for each group.
>> dtvec
>       a  b  c   d   e
>   1:  2 15  1 100 101
>   2:  6 25  2 150 102
>   3: 10 35  3 100 103
>   4: 14 45  4 150 104
>   5: 18 55  5 100 105
>   6: 22 65  6 150 106
>   7: 26 75  7 100 107
>   8: 30 85  8 150 108
>   9: 34 95  9 100 109
> 10: NA NA NA  NA  NA
> 11: NA NA NA  NA  NA
> 12: NA NA NA  NA  NA
> 13: NA NA NA  NA  NA
> 14: NA NA NA  NA  NA
> 15: NA NA NA  NA  NA
> 16: NA NA NA  NA  NA
>
> The error message has me confused.
>   /Column 1 of result for group 2 is type/
> What group 2?  I have only grouped on column "d".  Result 1 is ????  No idea
> what "result 1" is referring to, is it the subset in "i", the median of col
> a??  No clue.
>
> I have only created integer variable for the data table, so why the
> rejection " Column 1 of result for group 2 is type 'double' but expecting
> type 'integer'. Column types must be consistent for each group."  What
> double?  I have not created any double numbers.
>
> Carl Sutton
>
>
>
> -----
> Carl Sutton
> --
> View this message in context: http://r.789695.n4.nabble.com/subset-data-table-in-i-with-multiple-criteria-for-multiple-variables-tp4715347.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


From suttoncarl at ymail.com  Thu Dec 10 05:14:30 2015
From: suttoncarl at ymail.com (carlsutton)
Date: Wed, 9 Dec 2015 20:14:30 -0800 (PST)
Subject: [datatable-help] subset data table in i with multiple criteria
 for multiple variables
In-Reply-To: <5668E692.30201@mbacou.com>
References: <1449684174047-4715347.post@n4.nabble.com>
 <5668E692.30201@mbacou.com>
Message-ID: <1264388024.90808.1449721883029.JavaMail.yahoo@mail.yahoo.com>

?Gosh that is simple, elegant and wonderful.? Feel kinda sorta silly for not thinking of it.?? Thank you for enlightening me.
My Dad once said if there was a hard way to do a simple task, he was confident I would find it.? That malady has stuck with me oh so many years.? But in 10 years as an aerospace engineer and 35 as a CPA, there was nothing simple, and an unknown unknown could be devastating.? BTW, worked as? a programmer(Fortran in the 60's) to pay for college, and did some programming at Lockheed? for the flutter group.? .Arun covered chaining in the Data Camp course and I use it?frequently.? On my personal project I am mired down in data exploration and investigating variable distributions, means, medians, et al.? Some behave as expected, others have me scratching my head and muttering.? Somewhere somehow it all will make sense, but the big picture is eluding me.

Carl Sutton CPA
 

    On Wednesday, December 9, 2015 6:43 PM, mbacou [via R] <ml-node+s789695n4715360h54 at n4.nabble.com> wrote:
 
 
  Carl,
Are you just looking for the following syntax?

dt[a<35 & b>35, lapply(.SD, median, na.rm = T), by = d]

You can include as many conditions as necessary in `i`. You can also 
chain data.tables:

dt[a<35][b>35][, lapply(.SD, median, na.rm = T), by = d]

--Mel.

On 12/9/2015 1:02 PM, carlsutton wrote:
> Is there a way to subset a data table using "i" with multiple criteria using
> multiple variables(columns)? ?I have some test code shown below on what I
> have tried. ?And yes, I have read the documentation, taking the data camp
> class (Multiple viewing, I'm ?a slow learner) and have not seen anything
> relevant. ?Also checked for questions on this topic in this forum and did
> not find an answer for my query.
> Attempting to upload R file
> dataTableExamples.R
> <http://r.789695.n4.nabble.com/file/n4715347/dataTableExamples.R>
> Probably should have stayed in bed today the way things are going.
>
> A cut and paste from RStudio
> # ?Data Table exercises
> require(data.table)
> a <- seq(2L,40L, by = 4L)
> b <- seq(15L,105L,by = 10L)
> c <- 1:10L
> d <- rep(c(100L,150L),5L)
> e <- 101:110L
> dt <- data.table(a,b,c,d,e)
> dt
> dta <- subset(dt, a < 35)
> dtb <- subset(dta, b > 35)
> dtb
> dtb[, lapply(.SD,median), by = d]
> # ?Now attempt to subset the rows in i
> vec <- c(a<35, b>35)
> dtvec <- dt[vec, lapply(.SD, median, na.rm = TRUE), by = d]
> dtvec
>
> And console output
>> # ?Now attempt to subset the rows in i
>> vec <- c(a<35, b>35)
>> dtvec <- dt[vec, lapply(.SD, median, na.rm = TRUE), by = d]
> Error in `[.data.table`(dt, vec, lapply(.SD, median, na.rm = TRUE), by = d)
> :
> ? ?Column 1 of result for group 2 is type 'double' but expecting type
> 'integer'. Column types must be consistent for each group.
>> dtvec
> ? ? ? a ?b ?c ? d ? e
> ? 1: ?2 15 ?1 100 101
> ? 2: ?6 25 ?2 150 102
> ? 3: 10 35 ?3 100 103
> ? 4: 14 45 ?4 150 104
> ? 5: 18 55 ?5 100 105
> ? 6: 22 65 ?6 150 106
> ? 7: 26 75 ?7 100 107
> ? 8: 30 85 ?8 150 108
> ? 9: 34 95 ?9 100 109
> 10: NA NA NA ?NA ?NA
> 11: NA NA NA ?NA ?NA
> 12: NA NA NA ?NA ?NA
> 13: NA NA NA ?NA ?NA
> 14: NA NA NA ?NA ?NA
> 15: NA NA NA ?NA ?NA
> 16: NA NA NA ?NA ?NA
>
> The error message has me confused.
> ? /Column 1 of result for group 2 is type/
> What group 2? ?I have only grouped on column "d". ?Result 1 is ???? ?No idea
> what "result 1" is referring to, is it the subset in "i", the median of col
> a?? ?No clue.
>
> I have only created integer variable for the data table, so why the
> rejection " Column 1 of result for group 2 is type 'double' but expecting
> type 'integer'. Column types must be consistent for each group." ?What
> double? ?I have not created any double numbers.
>
> Carl Sutton
>
>
>
> -----
> Carl Sutton
> --
> View this message in context: http://r.789695.n4.nabble.com/subset-data-table-in-i-with-multiple-criteria-for-multiple-variables-tp4715347.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> [hidden email]
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
 
 
   If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/subset-data-table-in-i-with-multiple-criteria-for-multiple-variables-tp4715347p4715360.html   To unsubscribe from subset data table in i with multiple criteria for multiple variables, click here.
 NAML 

 
-----
Carl Sutton
--
View this message in context: http://r.789695.n4.nabble.com/subset-data-table-in-i-with-multiple-criteria-for-multiple-variables-tp4715347p4715361.html
Sent from the datatable-help mailing list archive at Nabble.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20151209/c78d719a/attachment.html>

From aragorn168b at gmail.com  Thu Dec 10 14:28:36 2015
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Thu, 10 Dec 2015 14:28:36 +0100
Subject: [datatable-help] Sum of sets of columns in data.table
In-Reply-To: <CALtLFXRv0ZEPcZLiOkQySwU59RAbumgA3CQPzn7GK2YSUhPbTg@mail.gmail.com>
References: <CALtLFXS6Q4pg4hVPVo0YjkfUFP-WuWreEMAwa8Wc3tKm+gf+Dw@mail.gmail.com>
 <CALtLFXRv0ZEPcZLiOkQySwU59RAbumgA3CQPzn7GK2YSUhPbTg@mail.gmail.com>
Message-ID: <etPan.56697e22.38d7ae17.36d@dragonfly>

Data.table is a column-based data structure. Row wise operations are going to be not as clean or efficient. rowSums() converts the input to a matrix first which is inefficient. But we plan to take care of that in the future releases.

I?d do something like:

cols = split(grep(?^pop?, names(dt), value=TRUE), rep(1:3, each=3))
ans = lapply(cols, function(col) dt[, rowSums(.SD, na.rm=TRUE), by=.(origin, race, sex, year, total_pop), .SDcols = col])

You can then bind these together however you wish. I don?t think there?s a cleaner way to do this, unless you reshape your data into long form. See the reshaping vignette in the Getting started wiki -?https://github.com/Rdatatable/data.table/wiki/Getting-started?in case you?re interested.


--?
Arun

On 9 December 2015 at 13:19:11, Santosh Srinivas (santosh.srinivas at gmail.com) wrote:

Hello All,

I am sure there is a much more efficient way to do this. Please advise any suggestions.
For now, I have boot fixed this the crude way :-(

age_brackets <- c("pop_0:pop_3","pop_4:pop_6","pop_7:pop_9")

for (i in age_brackets) {
cmdText <- paste('dt[, paste("",i,sep=""):= rowSums(.SD, na.rm=TRUE), by=list(origin, race, sex,year, total_pop), .SDcols=',i,']', sep="")
print(cmdText)
eval(parse(text=cmdText))
}


On Tue, Dec 8, 2015 at 11:13 PM, Santosh Srinivas <santosh.srinivas at gmail.com> wrote:
Hello All,

I have a dataset as below with a reproducible example after that. My actual data has about 100 columns.

I want columns that represent the rowSums for sets .. eg. pop_0_3, pop_4_6, pop_7_9 ?.. this is sum of population in age group of 0-3 for example.

How can I do that using indexes of the columns?

---------------------------------------------------------------------------------------------------------------------------------------------------------

? ? origin race sex year total_pop ? pop_0 ? pop_1 ? pop_2 ? pop_3 ? pop_4 ? pop_5 ? pop_6 ? pop_7 ? pop_8 ? pop_9
?1: ? ? ?0 ? ?0 ? 0 2014 318748017 3971847 3957864 3972081 4003272 4001929 4002977 4132455 4152653 4118628 4105776
?2: ? ? ?0 ? ?0 ? 0 2015 321368864 4000831 3988161 3974109 3986357 4015656 4013264 4013790 4142998 4163270 4129322
?3: ? ? ?0 ? ?0 ? 0 2016 323995528 4029356 4017346 4004585 3988434 3998839 4026967 4024121 4024481 4153686 4174008
?4: ? ? ?0 ? ?0 ? 0 2017 326625791 4057231 4046063 4033932 4019069 4000955 4010232 4037777 4034839 4035311 4164487
?5: ? ? ?0 ? ?0 ? 0 2018 329256465 4083375 4074132 4062816 4048550 4031712 4012371 4021117 4048454 4045696 4046249
?6: ? ? ?0 ? ?0 ? 0 2019 331883986 4107606 4100469 4091055 4077589 4061316 4043229 4023269 4031853 4059256 4056646
?7: ? ? ?0 ? ?0 ? 0 2020 334503458 4128810 4124893 4117546 4105953 4090466 4072931 4054223 4034013 4042721 4070166
?8: ? ? ?0 ? ?0 ? 0 2021 337108968 4145903 4146269 4142090 4132527 4118898 4102128 4083950 4065004 4044832 4053623
?9: ? ? ?0 ? ?0 ? 0 2022 339698079 4159190 4163587 4163657 4157230 4145600 4130675 4113256 4094835 4075940 4055771
10: ? ? ?0 ? ?0 ? 0 2023 342267302 4169856 4177093 4181156 4178958 4170441 4157505 4141921 4124243 4105873 4086972


---------------------------------------------------------------------------------------------------------------------------------------------------------


# https://www.census.gov/population/projections/files/downloadables/NP2014_D1.csv

require("data.table")

dt <- structure(list(origin = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,?
0L), race = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), sex = c(0L,?
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), year = 2014:2023, total_pop = c(318748017L,?
321368864L, 323995528L, 326625791L, 329256465L, 331883986L, 334503458L,?
337108968L, 339698079L, 342267302L), pop_0 = c(3971847L, 4000831L,?
4029356L, 4057231L, 4083375L, 4107606L, 4128810L, 4145903L, 4159190L,?
4169856L), pop_1 = c(3957864L, 3988161L, 4017346L, 4046063L,?
4074132L, 4100469L, 4124893L, 4146269L, 4163587L, 4177093L),?
? ? pop_2 = c(3972081L, 3974109L, 4004585L, 4033932L, 4062816L,?
? ? 4091055L, 4117546L, 4142090L, 4163657L, 4181156L), pop_3 = c(4003272L,?
? ? 3986357L, 3988434L, 4019069L, 4048550L, 4077589L, 4105953L,?
? ? 4132527L, 4157230L, 4178958L), pop_4 = c(4001929L, 4015656L,?
? ? 3998839L, 4000955L, 4031712L, 4061316L, 4090466L, 4118898L,?
? ? 4145600L, 4170441L), pop_5 = c(4002977L, 4013264L, 4026967L,?
? ? 4010232L, 4012371L, 4043229L, 4072931L, 4102128L, 4130675L,?
? ? 4157505L), pop_6 = c(4132455L, 4013790L, 4024121L, 4037777L,?
? ? 4021117L, 4023269L, 4054223L, 4083950L, 4113256L, 4141921L
? ? ), pop_7 = c(4152653L, 4142998L, 4024481L, 4034839L, 4048454L,?
? ? 4031853L, 4034013L, 4065004L, 4094835L, 4124243L), pop_8 = c(4118628L,?
? ? 4163270L, 4153686L, 4035311L, 4045696L, 4059256L, 4042721L,?
? ? 4044832L, 4075940L, 4105873L), pop_9 = c(4105776L, 4129322L,?
? ? 4174008L, 4164487L, 4046249L, 4056646L, 4070166L, 4053623L,?
? ? 4055771L, 4086972L)), .Names = c("origin", "race", "sex",?
"year", "total_pop", "pop_0", "pop_1", "pop_2", "pop_3", "pop_4",?
"pop_5", "pop_6", "pop_7", "pop_8", "pop_9"), class = c("data.table",?
"data.frame"), row.names = c(NA, -10L))


---------------------------------------------------------------------------------------------------------------------------------------------------------

Thank you.
Santosh

_______________________________________________  
datatable-help mailing list  
datatable-help at lists.r-forge.r-project.org  
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20151210/3101c0ed/attachment.html>

From clark9876 at airquality.dk  Mon Dec 21 11:57:29 2015
From: clark9876 at airquality.dk (Douglas Clark)
Date: Mon, 21 Dec 2015 02:57:29 -0800 (PST)
Subject: [datatable-help] example(data.table) confusing use of "v"
Message-ID: <1450695449634-4715786.post@n4.nabble.com>

In the *Example* section of data.table-package documentation, `v` is used as
a column name, and later a variable `v`<-"X" is defined and used in
setkeyv(DT,v) to set the key to column `x`. Then `v` is again used as column
name. This is confusing to me. Is the double use of `v` as both column name
and variable intentional? See the extract of the Examples section below:


--
View this message in context: http://r.789695.n4.nabble.com/example-data-table-confusing-use-of-v-tp4715786.html
Sent from the datatable-help mailing list archive at Nabble.com.

From statquant at outlook.com  Mon Dec 21 16:50:34 2015
From: statquant at outlook.com (statquant3)
Date: Mon, 21 Dec 2015 07:50:34 -0800 (PST)
Subject: [datatable-help] join vignette for data.table
Message-ID: <1450713034814-4715792.post@n4.nabble.com>

Hello, is there a join vignette for data.table ?
I realize that I am now more and more using dplyr just for semi_join and
anti_join.
Is there any plan to write those sugar function ?
Happy to do it here and get feedback.

C.


--
View this message in context: http://r.789695.n4.nabble.com/join-vignette-for-data-table-tp4715792.html
Sent from the datatable-help mailing list archive at Nabble.com.

From aragorn168b at gmail.com  Tue Dec 22 12:02:24 2015
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Tue, 22 Dec 2015 12:02:24 +0100
Subject: [datatable-help] join vignette for data.table
In-Reply-To: <1450713034814-4715792.post@n4.nabble.com>
References: <1450713034814-4715792.post@n4.nabble.com>
Message-ID: <etPan.56792dde.28578f93.5208@dragonfly>

In the works. I?ve not yet managed to finish.

require(data.table)
A = data.table(x=1:2, y=3:4)
B = data.table(x=2:3, y=3:4)

anti-join:

A[!B, on="x"] # equivalently A[!B, .SD, on="x"]

semi-join:

A[B, .SD, nomatch=0L, on="x"]

--?
Arun

On 21 December 2015 at 17:09:22, statquant3 (statquant at outlook.com) wrote:

Hello, is there a join vignette for data.table ?  
I realize that I am now more and more using dplyr just for semi_join and  
anti_join.  
Is there any plan to write those sugar function ?  
Happy to do it here and get feedback.  

C.  


--  
View this message in context: http://r.789695.n4.nabble.com/join-vignette-for-data-table-tp4715792.html  
Sent from the datatable-help mailing list archive at Nabble.com.  
_______________________________________________  
datatable-help mailing list  
datatable-help at lists.r-forge.r-project.org  
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20151222/703d3f88/attachment.html>

From statquant at outlook.com  Tue Dec 22 14:42:37 2015
From: statquant at outlook.com (stat quant)
Date: Tue, 22 Dec 2015 14:42:37 +0100
Subject: [datatable-help] join vignette for data.table
In-Reply-To: <etPan.56792dde.28578f93.5208@dragonfly>
References: <1450713034814-4715792.post@n4.nabble.com>
 <etPan.56792dde.28578f93.5208@dragonfly>
Message-ID: <CAJJHHA-KoBEHPP_r8dOxCE=intvbxcfnef06pCpug=L7vmEbmw@mail.gmail.com>

I'd be happy to help.
Happy for me to have a look at it?

cheers

On Tuesday, 22 December 2015, Arunkumar Srinivasan <aragorn168b at gmail.com>
wrote:

> In the works. I?ve not yet managed to finish.
>
> require(data.table)
> A = data.table(x=1:2, y=3:4)
> B = data.table(x=2:3, y=3:4)
>
> anti-join:
>
> A[!B, on="x"] # equivalently A[!B, .SD, on="x"]
>
> semi-join:
>
> A[B, .SD, nomatch=0L, on="x"]
>
> --
> Arun
>
> On 21 December 2015 at 17:09:22, statquant3 (statquant at outlook.com
> <javascript:_e(%7B%7D,'cvml','statquant at outlook.com');>) wrote:
>
> Hello, is there a join vignette for data.table ?
> I realize that I am now more and more using dplyr just for semi_join and
> anti_join.
> Is there any plan to write those sugar function ?
> Happy to do it here and get feedback.
>
> C.
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/join-vignette-for-data-table-tp4715792.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> <javascript:_e(%7B%7D,'cvml','datatable-help at lists.r-forge.r-project.org');>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20151222/c511fa70/attachment.html>

From aragorn168b at gmail.com  Tue Dec 22 14:50:37 2015
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Tue, 22 Dec 2015 14:50:37 +0100
Subject: [datatable-help] join vignette for data.table
In-Reply-To: <CAJJHHA-KoBEHPP_r8dOxCE=intvbxcfnef06pCpug=L7vmEbmw@mail.gmail.com>
References: <1450713034814-4715792.post@n4.nabble.com>
 <etPan.56792dde.28578f93.5208@dragonfly>
 <CAJJHHA-KoBEHPP_r8dOxCE=intvbxcfnef06pCpug=L7vmEbmw@mail.gmail.com>
Message-ID: <etPan.5679554b.101dacbb.5208@dragonfly>

Sure. It?s on the list of tasks for this release as well, so I?ll have to work on it. Will try to put it up somewhere.

But the most useful thing you / anyone else could do is to provide / point to a data set (preferably in connection with flights, maybe weather?) that I can use as a 2nd table to explain joins. I?m not quite satisfied with the small artificial dataset that I?m using at the moment.

Also, the kind of questions that could be asked from that dataset using those two ( or more?) tables which requires joins (+ some other tasks) would be great! That?s the hardest part for me at the moment, not the explanations :-).

--?
Arun

On 22 December 2015 at 14:42:38, stat quant (statquant at outlook.com) wrote:

I'd be happy to help.
Happy for me to have a look at it?

cheers?

On Tuesday, 22 December 2015, Arunkumar Srinivasan <aragorn168b at gmail.com> wrote:
In the works. I?ve not yet managed to finish.

require(data.table)
A = data.table(x=1:2, y=3:4)
B = data.table(x=2:3, y=3:4)

anti-join:

A[!B, on="x"] # equivalently A[!B, .SD, on="x"]

semi-join:

A[B, .SD, nomatch=0L, on="x"]

--?
Arun

On 21 December 2015 at 17:09:22, statquant3 (statquant at outlook.com) wrote:

Hello, is there a join vignette for data.table ?
I realize that I am now more and more using dplyr just for semi_join and
anti_join.
Is there any plan to write those sugar function ?
Happy to do it here and get feedback.

C.


--
View this message in context: http://r.789695.n4.nabble.com/join-vignette-for-data-table-tp4715792.html
Sent from the datatable-help mailing list archive at Nabble.com.
_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20151222/2627b760/attachment.html>

From aragorn168b at gmail.com  Wed Dec 23 16:34:59 2015
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Wed, 23 Dec 2015 16:34:59 +0100
Subject: [datatable-help] example(data.table) confusing use of "v"
In-Reply-To: <1450695449634-4715786.post@n4.nabble.com>
References: <1450695449634-4715786.post@n4.nabble.com>
Message-ID: <etPan.567abf41.c4b5d03.6882@dragonfly>

The ?v? in `v <- ?x?` bears no relation to column name ?v?. I understand how/why this can be confusing. Perhaps we should rename that to `keycol <- ?x?` or something like that.. Please file an issue, and even better as a PR.

--?
Arun

On 21 December 2015 at 12:16:16, Douglas Clark (clark9876 at airquality.dk) wrote:

In the *Example* section of data.table-package documentation, `v` is used as  
a column name, and later a variable `v`<-"X" is defined and used in  
setkeyv(DT,v) to set the key to column `x`. Then `v` is again used as column  
name. This is confusing to me. Is the double use of `v` as both column name  
and variable intentional? See the extract of the Examples section below:  


--  
View this message in context: http://r.789695.n4.nabble.com/example-data-table-confusing-use-of-v-tp4715786.html  
Sent from the datatable-help mailing list archive at Nabble.com.  
_______________________________________________  
datatable-help mailing list  
datatable-help at lists.r-forge.r-project.org  
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20151223/d076254c/attachment.html>

From suttoncarl at ymail.com  Sat Dec 26 23:33:48 2015
From: suttoncarl at ymail.com (carlsutton)
Date: Sat, 26 Dec 2015 14:33:48 -0800 (PST)
Subject: [datatable-help] creating levels for a variable
Message-ID: <1451169228188-4715934.post@n4.nabble.com>

Please forgive me for asking such a basic question.  I have been fumbling
around trying to learn R for a couple years via Data Camp, Coursera, and
Lynda.com but keep amazing myself at how inadequate I am. 

I have searched this site and stack overflow and have not found an answer,
probably because the question is so basic.

I have some rather large data tables.  In performing my exploration work it
would be helpful to combine the values of one variable into separate ranges,
then explore the relationship of those ranges to other variables.

I have created a very simple example of what I am attempting to do, and cut
and pasted the resulting error messages.

The goal is to populate the variable levels with first, second or third,
depending on the value of variable "b".

#  create a range variable
set.seed = 1
library(data.table)

dt <-  data.table(a=1:10, b = sample(c(200:50000),10), c = 21:30, level =
1:10)
dt

dt1 <- dt[, .(':=' (if (dt$b < 12000) level = "first",
        if (dt$b >= 12000 & dt$b < 34000) level = "second",
        if (dt$b > 34000) level = "third"))]
#  gives error Error in `:=`(if (dt$b < 12000) level = "first", if (dt$b >=
12000 & dt$b <
# : Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are
defined for use in j,
#  once only and in particular ways. See help(":=").
dt1

dt1 <- dt[, lapply(if (dt$b < 12000) level = "first",
                    if (dt$b >= 12000 & dt$b < 34000) level = "second",
                    if (dt$b > 34000) level = "third")]
#  gives error Error in match.fun(FUN) : 
#  'if (dt$b >= 12000 & dt$b < 34000) level = "second"' is not a function,
character or symbol
#  In addition: Warning message: In if (dt$b >= 12000 & dt$b < 34000) level
= "second" :
#  the condition has length > 1 and only the first element will be used
dt1

any help is appreciated.

Carl Sutton


-----
Carl Sutton
--
View this message in context: http://r.789695.n4.nabble.com/creating-levels-for-a-variable-tp4715934.html
Sent from the datatable-help mailing list archive at Nabble.com.

From malnamalja at gmx.de  Mon Dec 28 12:27:38 2015
From: malnamalja at gmx.de (el_alisio)
Date: Mon, 28 Dec 2015 03:27:38 -0800 (PST)
Subject: [datatable-help] creating levels for a variable
In-Reply-To: <1451169228188-4715934.post@n4.nabble.com>
References: <1451169228188-4715934.post@n4.nabble.com>
Message-ID: <1451302058726-4715955.post@n4.nabble.com>

Hi Carl,

you may want to check out the cut-function:

library("data.table")
dt <-  data.table(a = 1:10, b = sample(c(200:50000), 10), c = 21:30)
dt[, level := cut(b, breaks = c(0, 12000, 34000, max(b)),
                  labels = c("first", "second", "third"))]

Cheers,

Jannes


--
View this message in context: http://r.789695.n4.nabble.com/creating-levels-for-a-variable-tp4715934p4715955.html
Sent from the datatable-help mailing list archive at Nabble.com.

From malnamalja at gmx.de  Mon Dec 28 12:38:12 2015
From: malnamalja at gmx.de (el_alisio)
Date: Mon, 28 Dec 2015 03:38:12 -0800 (PST)
Subject: [datatable-help] append two factor vectors
In-Reply-To: <1450285986317-4715638.post@n4.nabble.com>
References: <1450285986317-4715638.post@n4.nabble.com>
Message-ID: <1451302692686-4715956.post@n4.nabble.com>

Hi,

how about:


q1 <- c("a1", "a2" ,"a3")
q2 <- c("b1", "b2", "b3") 
as.factor(c(q1, q2))

In case q1 and q2 are already factors, first convert them into characters:

q1 <- as.factor(q1)
q2 <- as.factor(q2)
as.factor(c(as.character(q1), as.character(q2)))

Cheers,

Jannes


agent dunham wrote
> Dear community, 
> 
> i have two vector factors that i need to join into a single vector, and
> afterwards, a unique over this final vector. 
> Can anybody help?
> 
> Example: 
> 
> q1 <- c("a1","a2","a3")
> q2 <- c("b1","b2","b3")
> 
> q1 <- as.factor(q1)
> q2 <- as.factor(q2)
> 
> I have this q's. Then, I'd need:
> 
> qjoin <- c(q1,q2)
> 
> BUT, then I see: 1 2 3 1 2 3
> rather than "a1" "a2" "a3" "b1" "b2" "b3"


--
View this message in context: http://r.789695.n4.nabble.com/append-two-factor-vectors-tp4715638p4715956.html
Sent from the datatable-help mailing list archive at Nabble.com.

From fjbuch at gmail.com  Tue Dec 29 00:41:51 2015
From: fjbuch at gmail.com (Farrel Buchinsky)
Date: Mon, 28 Dec 2015 23:41:51 +0000
Subject: [datatable-help] IDateTime and missing values
Message-ID: <CACAnkwBdocKyQuiLOVFiLkv9gSWwFACEd5XOvyQg2mU36mKFjg@mail.gmail.com>

I have a data.table that contains real data. That means that some values
are missing. I am trying to convert one column into data.table's idate and
itime. Alas I am running into problems.

But look at this

*> IDateTime(as.POSIXlt(c("2015-05-01 13:46:23", "2015-05-03 16:40:00")))*
        idate    itime
1: 2015-05-01 13:46:23
2: 2015-05-03 16:40:00

*> IDateTime(as.POSIXlt(c("2015-05-01 13:46:23", "2015-05-03 16:40:00", NA
)))*
Error in if (any(neg)) res[neg] = paste("-", res[neg], sep = "") :
  missing value where TRUE/FALSE needed

A missing variable screws everything up. Any insights or suggestions for
me? At this point I will have to systematically select rows that meet
!is.na(datevariable)
to perform the IDateTime function on.


-- 
Farrel Buchinsky
(412) 567-7870 (gets me everywhere)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20151228/df2eb50a/attachment.html>

From fjbuch at gmail.com  Tue Dec 29 01:27:17 2015
From: fjbuch at gmail.com (Farrel Buchinsky)
Date: Tue, 29 Dec 2015 00:27:17 +0000
Subject: [datatable-help] IDateTime and missing values
In-Reply-To: <CACAnkwBdocKyQuiLOVFiLkv9gSWwFACEd5XOvyQg2mU36mKFjg@mail.gmail.com>
References: <CACAnkwBdocKyQuiLOVFiLkv9gSWwFACEd5XOvyQg2mU36mKFjg@mail.gmail.com>
Message-ID: <CACAnkwAdwYQ5LVPA9zkNSn-HXeHduU9BvCGC4JqHiS86NKcs0w@mail.gmail.com>

By the way, as.IDate can handle the NA but IDateTime cannot.

On Mon, Dec 28, 2015 at 6:41 PM Farrel Buchinsky <fjbuch at gmail.com> wrote:

> I have a data.table that contains real data. That means that some values
> are missing. I am trying to convert one column into data.table's idate and
> itime. Alas I am running into problems.
>
> But look at this
>
> *> IDateTime(as.POSIXlt(c("2015-05-01 13:46:23", "2015-05-03 16:40:00")))*
>         idate    itime
> 1: 2015-05-01 13:46:23
> 2: 2015-05-03 16:40:00
>
> *> IDateTime(as.POSIXlt(c("2015-05-01 13:46:23", "2015-05-03 16:40:00", NA
> )))*
> Error in if (any(neg)) res[neg] = paste("-", res[neg], sep = "") :
>   missing value where TRUE/FALSE needed
>
> A missing variable screws everything up. Any insights or suggestions for
> me? At this point I will have to systematically select rows that meet !
> is.na(datevariable) to perform the IDateTime function on.
>
>
> --
> Farrel Buchinsky
> (412) 567-7870 (gets me everywhere)
>
-- 
Farrel Buchinsky
(412) 567-7870 (gets me everywhere)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20151229/ecdc1617/attachment.html>

From rv15i at yahoo.se  Thu Dec 31 17:20:03 2015
From: rv15i at yahoo.se (ravi)
Date: Thu, 31 Dec 2015 16:20:03 +0000 (UTC)
Subject: [datatable-help] help in joining two data tables
References: <1649899131.8398424.1451578803532.JavaMail.yahoo.ref@mail.yahoo.com>
Message-ID: <1649899131.8398424.1451578803532.JavaMail.yahoo@mail.yahoo.com>

Hi,I have some trouble in understanding the data.table procedure for joining two tables. Let me start by taking up two example data tables :
library(data.table)
############ the first data.table example
mt<-data.table(mtcars)
## some modifications to the data.table
s1<-1:32;s1[seq(2,32,by=2)]<-NA
mt[,"cntrl":=s1];mt[,"cylO":=cyl];mt[,"cyl":=cyl*2]
setkey(mt,gear,carb,cylO,cntrl)
mt
##? More modifications
mt[gear == 3 & carb ==3 & cylO == 8 & mpg == 16.4,cntrl:=14]
str(mt)
mt
############## the second data.table example
nt<-data.table(gear= c(3,3,3),carb=c(1,3,3),cylO=c(4,8,8),price=c(11,44,55),cntrl=c(21,13,14))
setkey(nt,gear,carb,cylO,cntrl)
############# merging as a data frame
rdJoin<-merge.data.frame(mt,nt,by.x=c("gear","carb","cylO","cntrl"),by.y=c("gear","carb","cylO","cntrl"),all.x=TRUE)
str(rdJoin)
rdJoin
############## questions
# What is the data.table command to get rdJoin?
# How is it possible to specify the key variables for the join -see below
# For example, c("gear","carb")????? c("gear","carb","cylO")?? etc.
# Also, where the variables have different names in the two tables
# For example, if the cntrl variable in the first DT is "cntrl1" and "cntrl2" in the second
Let me elaborate on te questions shown above. First, I would like to start with some general questions :1. In the documentation for data.table (which includes the vignettes available so far), it is mentioned that it is sufficient if one of the two data tables being considered has keys. This is a bit confusing. The straightforward situation is if both the tables have keys. When would it be of advantage to have keys for just one of them? It would be nice if this can be explained in the to-be-released vignette on joins.2. The merge command in base R is very clear and easy to understand. It would be nice if the data table procedure is transparent in the same way. To start with, I would like to know how I can do the following things with data table :??????? (i) the data.table equivalent of the base R command??????????????????????????? merge.data.frame(mt,nt,by.x=c("gear","carb","cylO","cntrl"),by.y=c("gear","carb","cylO","cntrl"),all.x=TRUE)?????????? (ii) How it is possible to choose the number of key variables from a list :???????????????????????? c("gear","carb") ? ? ? ?? c("gear","carb","cylO") ? ? ? ?? ??????? c("gear","carb","cylO","cntrl")???????????????????????? It is very clear in the merge command how this is done. How to do that with data.table?
??????????????????????? The on argument can be used for one of the tables. How can it be specified for the other? That is, without having to use the setkey command each time a change is needed.????????? (iii) How can this be done if the key variables in the two lists have different names? That is, if the cntrl variable in the first DT is "cntrl1" and "cntrl2" in the second, for example.
I have found the data.table package to be very useful. It would be nice if I can understand its use better.
Thanks for any help that I can get.Ravi


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20151231/3b3edf4a/attachment.html>

From rv15i at yahoo.se  Thu Dec 31 17:58:13 2015
From: rv15i at yahoo.se (ravi)
Date: Thu, 31 Dec 2015 16:58:13 +0000 (UTC)
Subject: [datatable-help] help in joining two data tables
In-Reply-To: <1649899131.8398424.1451578803532.JavaMail.yahoo@mail.yahoo.com>
References: <1649899131.8398424.1451578803532.JavaMail.yahoo@mail.yahoo.com>
Message-ID: <86618974.8340374.1451581093979.JavaMail.yahoo@mail.yahoo.com>

Hi,I just want want to follow up, after going through the documentation once again.############################
rtJoin<-nt[mt]? # outer left join
rtJoinidentical(rtJoin,rdJoin) # False
coli<-names(rdJoin)
setcolorder(rtJoin,coli)identical(rtJoin,rdJoin) # False again, the row order appears to be different
##########################
I think that the equivalent data.table command for left outer join is :nt[mt]But the identical command was false. This stayed so even after the column order was set to be the same in the two cases.? Now, the row order is different.
So, my next question is : how does one compare two data tables to check that the results are the same?I have now landed in a very different question from my original one.Thanks for any help.Ravi


      From: ravi <rv15i at yahoo.se>
 To: "datatable-help at lists.r-forge.r-project.org" <datatable-help at lists.r-forge.r-project.org> 
 Sent: Thursday, 31 December 2015, 17:20
 Subject: [datatable-help] help in joining two data tables
   
Hi,I have some trouble in understanding the data.table procedure for joining two tables. Let me start by taking up two example data tables :
library(data.table)
############ the first data.table example
mt<-data.table(mtcars)
## some modifications to the data.table
s1<-1:32;s1[seq(2,32,by=2)]<-NA
mt[,"cntrl":=s1];mt[,"cylO":=cyl];mt[,"cyl":=cyl*2]
setkey(mt,gear,carb,cylO,cntrl)
mt
##? More modifications
mt[gear == 3 & carb ==3 & cylO == 8 & mpg == 16.4,cntrl:=14]
str(mt)
mt
############## the second data.table example
nt<-data.table(gear= c(3,3,3),carb=c(1,3,3),cylO=c(4,8,8),price=c(11,44,55),cntrl=c(21,13,14))
setkey(nt,gear,carb,cylO,cntrl)
############# merging as a data frame
rdJoin<-merge.data.frame(mt,nt,by.x=c("gear","carb","cylO","cntrl"),by.y=c("gear","carb","cylO","cntrl"),all.x=TRUE)
str(rdJoin)
rdJoin
############## questions
# What is the data.table command to get rdJoin?
# How is it possible to specify the key variables for the join -see below
# For example, c("gear","carb")????? c("gear","carb","cylO")?? etc.
# Also, where the variables have different names in the two tables
# For example, if the cntrl variable in the first DT is "cntrl1" and "cntrl2" in the second
Let me elaborate on te questions shown above. First, I would like to start with some general questions :1. In the documentation for data.table (which includes the vignettes available so far), it is mentioned that it is sufficient if one of the two data tables being considered has keys. This is a bit confusing. The straightforward situation is if both the tables have keys. When would it be of advantage to have keys for just one of them? It would be nice if this can be explained in the to-be-released vignette on joins.2. The merge command in base R is very clear and easy to understand. It would be nice if the data table procedure is transparent in the same way. To start with, I would like to know how I can do the following things with data table :??????? (i) the data.table equivalent of the base R command??????????????????????????? merge.data.frame(mt,nt,by.x=c("gear","carb","cylO","cntrl"),by.y=c("gear","carb","cylO","cntrl"),all.x=TRUE)?????????? (ii) How it is possible to choose the number of key variables from a list :???????????????????????? c("gear","carb") ? ? ? ?? c("gear","carb","cylO") ? ? ? ?? ??????? c("gear","carb","cylO","cntrl")???????????????????????? It is very clear in the merge command how this is done. How to do that with data.table?
??????????????????????? The on argument can be used for one of the tables. How can it be specified for the other? That is, without having to use the setkey command each time a change is needed.????????? (iii) How can this be done if the key variables in the two lists have different names? That is, if the cntrl variable in the first DT is "cntrl1" and "cntrl2" in the second, for example.
I have found the data.table package to be very useful. It would be nice if I can understand its use better.
Thanks for any help that I can get.Ravi


_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20151231/5a3aa201/attachment-0001.html>