From santosh.srinivas at gmail.com Tue Dec 8 18:43:35 2015 From: santosh.srinivas at gmail.com (Santosh Srinivas) Date: Tue, 8 Dec 2015 23:13:35 +0530 Subject: [datatable-help] Sum of sets of columns in data.table Message-ID: Hello All, I have a dataset as below with a reproducible example after that. My actual data has about 100 columns. I want columns that represent the rowSums for sets .. eg. pop_0_3, pop_4_6, pop_7_9 .. this is sum of population in age group of 0-3 for example. How can I do that using indexes of the columns? --------------------------------------------------------------------------------------------------------------------------------------------------------- origin race sex year total_pop pop_0 pop_1 pop_2 pop_3 pop_4 pop_5 pop_6 pop_7 pop_8 pop_9 1: 0 0 0 2014 318748017 3971847 3957864 3972081 4003272 4001929 4002977 4132455 4152653 4118628 4105776 2: 0 0 0 2015 321368864 4000831 3988161 3974109 3986357 4015656 4013264 4013790 4142998 4163270 4129322 3: 0 0 0 2016 323995528 4029356 4017346 4004585 3988434 3998839 4026967 4024121 4024481 4153686 4174008 4: 0 0 0 2017 326625791 4057231 4046063 4033932 4019069 4000955 4010232 4037777 4034839 4035311 4164487 5: 0 0 0 2018 329256465 4083375 4074132 4062816 4048550 4031712 4012371 4021117 4048454 4045696 4046249 6: 0 0 0 2019 331883986 4107606 4100469 4091055 4077589 4061316 4043229 4023269 4031853 4059256 4056646 7: 0 0 0 2020 334503458 4128810 4124893 4117546 4105953 4090466 4072931 4054223 4034013 4042721 4070166 8: 0 0 0 2021 337108968 4145903 4146269 4142090 4132527 4118898 4102128 4083950 4065004 4044832 4053623 9: 0 0 0 2022 339698079 4159190 4163587 4163657 4157230 4145600 4130675 4113256 4094835 4075940 4055771 10: 0 0 0 2023 342267302 4169856 4177093 4181156 4178958 4170441 4157505 4141921 4124243 4105873 4086972 --------------------------------------------------------------------------------------------------------------------------------------------------------- # https://www.census.gov/population/projections/files/downloadables/NP2014_D1.csv require("data.table") dt <- structure(list(origin = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), race = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), sex = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), year = 2014:2023, total_pop = c(318748017L, 321368864L, 323995528L, 326625791L, 329256465L, 331883986L, 334503458L, 337108968L, 339698079L, 342267302L), pop_0 = c(3971847L, 4000831L, 4029356L, 4057231L, 4083375L, 4107606L, 4128810L, 4145903L, 4159190L, 4169856L), pop_1 = c(3957864L, 3988161L, 4017346L, 4046063L, 4074132L, 4100469L, 4124893L, 4146269L, 4163587L, 4177093L), pop_2 = c(3972081L, 3974109L, 4004585L, 4033932L, 4062816L, 4091055L, 4117546L, 4142090L, 4163657L, 4181156L), pop_3 = c(4003272L, 3986357L, 3988434L, 4019069L, 4048550L, 4077589L, 4105953L, 4132527L, 4157230L, 4178958L), pop_4 = c(4001929L, 4015656L, 3998839L, 4000955L, 4031712L, 4061316L, 4090466L, 4118898L, 4145600L, 4170441L), pop_5 = c(4002977L, 4013264L, 4026967L, 4010232L, 4012371L, 4043229L, 4072931L, 4102128L, 4130675L, 4157505L), pop_6 = c(4132455L, 4013790L, 4024121L, 4037777L, 4021117L, 4023269L, 4054223L, 4083950L, 4113256L, 4141921L ), pop_7 = c(4152653L, 4142998L, 4024481L, 4034839L, 4048454L, 4031853L, 4034013L, 4065004L, 4094835L, 4124243L), pop_8 = c(4118628L, 4163270L, 4153686L, 4035311L, 4045696L, 4059256L, 4042721L, 4044832L, 4075940L, 4105873L), pop_9 = c(4105776L, 4129322L, 4174008L, 4164487L, 4046249L, 4056646L, 4070166L, 4053623L, 4055771L, 4086972L)), .Names = c("origin", "race", "sex", "year", "total_pop", "pop_0", "pop_1", "pop_2", "pop_3", "pop_4", "pop_5", "pop_6", "pop_7", "pop_8", "pop_9"), class = c("data.table", "data.frame"), row.names = c(NA, -10L)) --------------------------------------------------------------------------------------------------------------------------------------------------------- Thank you. Santosh -------------- next part -------------- An HTML attachment was scrubbed... URL: From santosh.srinivas at gmail.com Wed Dec 9 13:19:01 2015 From: santosh.srinivas at gmail.com (Santosh Srinivas) Date: Wed, 9 Dec 2015 17:49:01 +0530 Subject: [datatable-help] Sum of sets of columns in data.table In-Reply-To: References: Message-ID: Hello All, I am sure there is a much more efficient way to do this. Please advise any suggestions. For now, I have boot fixed this the crude way :-( age_brackets <- c("pop_0:pop_3","pop_4:pop_6","pop_7:pop_9") for (i in age_brackets) { cmdText <- paste('dt[, paste("",i,sep=""):= rowSums(.SD, na.rm=TRUE), by=list(origin, race, sex,year, total_pop), .SDcols=',i,']', sep="") print(cmdText) eval(parse(text=cmdText)) } On Tue, Dec 8, 2015 at 11:13 PM, Santosh Srinivas < santosh.srinivas at gmail.com> wrote: > Hello All, > > I have a dataset as below with a reproducible example after that. My > actual data has about 100 columns. > > I want columns that represent the rowSums for sets .. eg. pop_0_3, > pop_4_6, pop_7_9 .. this is sum of population in age group of 0-3 for > example. > > How can I do that using indexes of the columns? > > > --------------------------------------------------------------------------------------------------------------------------------------------------------- > > origin race sex year total_pop pop_0 pop_1 pop_2 pop_3 pop_4 > pop_5 pop_6 pop_7 pop_8 pop_9 > 1: 0 0 0 2014 318748017 3971847 3957864 3972081 4003272 4001929 > 4002977 4132455 4152653 4118628 4105776 > 2: 0 0 0 2015 321368864 4000831 3988161 3974109 3986357 4015656 > 4013264 4013790 4142998 4163270 4129322 > 3: 0 0 0 2016 323995528 4029356 4017346 4004585 3988434 3998839 > 4026967 4024121 4024481 4153686 4174008 > 4: 0 0 0 2017 326625791 4057231 4046063 4033932 4019069 4000955 > 4010232 4037777 4034839 4035311 4164487 > 5: 0 0 0 2018 329256465 4083375 4074132 4062816 4048550 4031712 > 4012371 4021117 4048454 4045696 4046249 > 6: 0 0 0 2019 331883986 4107606 4100469 4091055 4077589 4061316 > 4043229 4023269 4031853 4059256 4056646 > 7: 0 0 0 2020 334503458 4128810 4124893 4117546 4105953 4090466 > 4072931 4054223 4034013 4042721 4070166 > 8: 0 0 0 2021 337108968 4145903 4146269 4142090 4132527 4118898 > 4102128 4083950 4065004 4044832 4053623 > 9: 0 0 0 2022 339698079 4159190 4163587 4163657 4157230 4145600 > 4130675 4113256 4094835 4075940 4055771 > 10: 0 0 0 2023 342267302 4169856 4177093 4181156 4178958 4170441 > 4157505 4141921 4124243 4105873 4086972 > > > > --------------------------------------------------------------------------------------------------------------------------------------------------------- > > > # > https://www.census.gov/population/projections/files/downloadables/NP2014_D1.csv > > require("data.table") > > dt <- structure(list(origin = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, > 0L), race = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), sex = c(0L, > 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), year = 2014:2023, total_pop = > c(318748017L, > 321368864L, 323995528L, 326625791L, 329256465L, 331883986L, 334503458L, > 337108968L, 339698079L, 342267302L), pop_0 = c(3971847L, 4000831L, > 4029356L, 4057231L, 4083375L, 4107606L, 4128810L, 4145903L, 4159190L, > 4169856L), pop_1 = c(3957864L, 3988161L, 4017346L, 4046063L, > 4074132L, 4100469L, 4124893L, 4146269L, 4163587L, 4177093L), > pop_2 = c(3972081L, 3974109L, 4004585L, 4033932L, 4062816L, > 4091055L, 4117546L, 4142090L, 4163657L, 4181156L), pop_3 = c(4003272L, > 3986357L, 3988434L, 4019069L, 4048550L, 4077589L, 4105953L, > 4132527L, 4157230L, 4178958L), pop_4 = c(4001929L, 4015656L, > 3998839L, 4000955L, 4031712L, 4061316L, 4090466L, 4118898L, > 4145600L, 4170441L), pop_5 = c(4002977L, 4013264L, 4026967L, > 4010232L, 4012371L, 4043229L, 4072931L, 4102128L, 4130675L, > 4157505L), pop_6 = c(4132455L, 4013790L, 4024121L, 4037777L, > 4021117L, 4023269L, 4054223L, 4083950L, 4113256L, 4141921L > ), pop_7 = c(4152653L, 4142998L, 4024481L, 4034839L, 4048454L, > 4031853L, 4034013L, 4065004L, 4094835L, 4124243L), pop_8 = c(4118628L, > 4163270L, 4153686L, 4035311L, 4045696L, 4059256L, 4042721L, > 4044832L, 4075940L, 4105873L), pop_9 = c(4105776L, 4129322L, > 4174008L, 4164487L, 4046249L, 4056646L, 4070166L, 4053623L, > 4055771L, 4086972L)), .Names = c("origin", "race", "sex", > "year", "total_pop", "pop_0", "pop_1", "pop_2", "pop_3", "pop_4", > "pop_5", "pop_6", "pop_7", "pop_8", "pop_9"), class = c("data.table", > "data.frame"), row.names = c(NA, -10L)) > > > > --------------------------------------------------------------------------------------------------------------------------------------------------------- > > Thank you. > Santosh > -------------- next part -------------- An HTML attachment was scrubbed... URL: From suttoncarl at ymail.com Wed Dec 9 19:02:54 2015 From: suttoncarl at ymail.com (carlsutton) Date: Wed, 9 Dec 2015 10:02:54 -0800 (PST) Subject: [datatable-help] subset data table in i with multiple criteria for multiple variables Message-ID: <1449684174047-4715347.post@n4.nabble.com> Is there a way to subset a data table using "i" with multiple criteria using multiple variables(columns)? I have some test code shown below on what I have tried. And yes, I have read the documentation, taking the data camp class (Multiple viewing, I'm a slow learner) and have not seen anything relevant. Also checked for questions on this topic in this forum and did not find an answer for my query. Attempting to upload R file dataTableExamples.R Probably should have stayed in bed today the way things are going. A cut and paste from RStudio # Data Table exercises require(data.table) a <- seq(2L,40L, by = 4L) b <- seq(15L,105L,by = 10L) c <- 1:10L d <- rep(c(100L,150L),5L) e <- 101:110L dt <- data.table(a,b,c,d,e) dt dta <- subset(dt, a < 35) dtb <- subset(dta, b > 35) dtb dtb[, lapply(.SD,median), by = d] # Now attempt to subset the rows in i vec <- c(a<35, b>35) dtvec <- dt[vec, lapply(.SD, median, na.rm = TRUE), by = d] dtvec And console output > # Now attempt to subset the rows in i > vec <- c(a<35, b>35) > dtvec <- dt[vec, lapply(.SD, median, na.rm = TRUE), by = d] Error in `[.data.table`(dt, vec, lapply(.SD, median, na.rm = TRUE), by = d) : Column 1 of result for group 2 is type 'double' but expecting type 'integer'. Column types must be consistent for each group. > dtvec a b c d e 1: 2 15 1 100 101 2: 6 25 2 150 102 3: 10 35 3 100 103 4: 14 45 4 150 104 5: 18 55 5 100 105 6: 22 65 6 150 106 7: 26 75 7 100 107 8: 30 85 8 150 108 9: 34 95 9 100 109 10: NA NA NA NA NA 11: NA NA NA NA NA 12: NA NA NA NA NA 13: NA NA NA NA NA 14: NA NA NA NA NA 15: NA NA NA NA NA 16: NA NA NA NA NA The error message has me confused. /Column 1 of result for group 2 is type/ What group 2? I have only grouped on column "d". Result 1 is ???? No idea what "result 1" is referring to, is it the subset in "i", the median of col a?? No clue. I have only created integer variable for the data table, so why the rejection " Column 1 of result for group 2 is type 'double' but expecting type 'integer'. Column types must be consistent for each group." What double? I have not created any double numbers. Carl Sutton ----- Carl Sutton -- View this message in context: http://r.789695.n4.nabble.com/subset-data-table-in-i-with-multiple-criteria-for-multiple-variables-tp4715347.html Sent from the datatable-help mailing list archive at Nabble.com. From mel at mbacou.com Thu Dec 10 03:38:28 2015 From: mel at mbacou.com (Bacou, Melanie) Date: Wed, 9 Dec 2015 21:38:28 -0500 Subject: [datatable-help] Sum of sets of columns in data.table In-Reply-To: References: Message-ID: <5668E5A4.2090909@mbacou.com> I come across this problem on a regular basis as well, and always end up fiddling for a while. Because the LHS of `:=` is also dynamic, I'm not sure there's any more elegant approach. One alternative might be to create several temporary data.tables holding the rowSums and then cbind()? for (i in age_brackets) { tmp <- dt[, rowSums(.SD, na.rm=T), by=.(origin, race, sex,year, total_pop), .SDcols=i] dt <- cbind(dt, tmp) } --Mel. On 12/9/2015 7:19 AM, Santosh Srinivas wrote: > Hello All, > > I am sure there is a much more efficient way to do this. Please advise > any suggestions. > For now, I have boot fixed this the crude way :-( > > age_brackets <- c("pop_0:pop_3","pop_4:pop_6","pop_7:pop_9") > > for (i in age_brackets) { > cmdText <- paste('dt[, paste("",i,sep=""):= rowSums(.SD, na.rm=TRUE), > by=list(origin, race, sex,year, total_pop), .SDcols=',i,']', sep="") > print(cmdText) > eval(parse(text=cmdText)) > } > > > On Tue, Dec 8, 2015 at 11:13 PM, Santosh Srinivas > > wrote: > > Hello All, > > I have a dataset as below with a reproducible example after that. > My actual data has about 100 columns. > > I want columns that represent the rowSums for sets .. eg. pop_0_3, > pop_4_6, pop_7_9 .. this is sum of population in age group of 0-3 > for example. > > How can I do that using indexes of the columns? > > --------------------------------------------------------------------------------------------------------------------------------------------------------- > > origin race sex year total_pop pop_0 pop_1 pop_2 pop_3 > pop_4 pop_5 pop_6 pop_7 pop_8 pop_9 > 1: 0 0 0 2014 318748017 3971847 3957864 3972081 4003272 > 4001929 4002977 4132455 4152653 4118628 4105776 > 2: 0 0 0 2015 321368864 4000831 3988161 3974109 3986357 > 4015656 4013264 4013790 4142998 4163270 4129322 > 3: 0 0 0 2016 323995528 4029356 4017346 4004585 3988434 > 3998839 4026967 4024121 4024481 4153686 4174008 > 4: 0 0 0 2017 326625791 4057231 4046063 4033932 4019069 > 4000955 4010232 4037777 4034839 4035311 4164487 > 5: 0 0 0 2018 329256465 4083375 4074132 4062816 4048550 > 4031712 4012371 4021117 4048454 4045696 4046249 > 6: 0 0 0 2019 331883986 4107606 4100469 4091055 4077589 > 4061316 4043229 4023269 4031853 4059256 4056646 > 7: 0 0 0 2020 334503458 4128810 4124893 4117546 4105953 > 4090466 4072931 4054223 4034013 4042721 4070166 > 8: 0 0 0 2021 337108968 4145903 4146269 4142090 4132527 > 4118898 4102128 4083950 4065004 4044832 4053623 > 9: 0 0 0 2022 339698079 4159190 4163587 4163657 4157230 > 4145600 4130675 4113256 4094835 4075940 4055771 > 10: 0 0 0 2023 342267302 4169856 4177093 4181156 4178958 > 4170441 4157505 4141921 4124243 4105873 4086972 > > > --------------------------------------------------------------------------------------------------------------------------------------------------------- > > > # > https://www.census.gov/population/projections/files/downloadables/NP2014_D1.csv > > require("data.table") > > dt <- structure(list(origin = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, > 0L), race = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), sex = c(0L, > 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), year = 2014:2023, total_pop = > c(318748017L, > 321368864L, 323995528L, 326625791L, 329256465L, 331883986L, > 334503458L, > 337108968L, 339698079L, 342267302L), pop_0 = c(3971847L, 4000831L, > 4029356L, 4057231L, 4083375L, 4107606L, 4128810L, 4145903L, 4159190L, > 4169856L), pop_1 = c(3957864L, 3988161L, 4017346L, 4046063L, > 4074132L, 4100469L, 4124893L, 4146269L, 4163587L, 4177093L), > pop_2 = c(3972081L, 3974109L, 4004585L, 4033932L, 4062816L, > 4091055L, 4117546L, 4142090L, 4163657L, 4181156L), pop_3 = > c(4003272L, > 3986357L, 3988434L, 4019069L, 4048550L, 4077589L, 4105953L, > 4132527L, 4157230L, 4178958L), pop_4 = c(4001929L, 4015656L, > 3998839L, 4000955L, 4031712L, 4061316L, 4090466L, 4118898L, > 4145600L, 4170441L), pop_5 = c(4002977L, 4013264L, 4026967L, > 4010232L, 4012371L, 4043229L, 4072931L, 4102128L, 4130675L, > 4157505L), pop_6 = c(4132455L, 4013790L, 4024121L, 4037777L, > 4021117L, 4023269L, 4054223L, 4083950L, 4113256L, 4141921L > ), pop_7 = c(4152653L, 4142998L, 4024481L, 4034839L, 4048454L, > 4031853L, 4034013L, 4065004L, 4094835L, 4124243L), pop_8 = > c(4118628L, > 4163270L, 4153686L, 4035311L, 4045696L, 4059256L, 4042721L, > 4044832L, 4075940L, 4105873L), pop_9 = c(4105776L, 4129322L, > 4174008L, 4164487L, 4046249L, 4056646L, 4070166L, 4053623L, > 4055771L, 4086972L)), .Names = c("origin", "race", "sex", > "year", "total_pop", "pop_0", "pop_1", "pop_2", "pop_3", "pop_4", > "pop_5", "pop_6", "pop_7", "pop_8", "pop_9"), class = c("data.table", > "data.frame"), row.names = c(NA, -10L)) > > > --------------------------------------------------------------------------------------------------------------------------------------------------------- > > Thank you. > Santosh > > > > > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From mel at mbacou.com Thu Dec 10 03:42:26 2015 From: mel at mbacou.com (Bacou, Melanie) Date: Wed, 9 Dec 2015 21:42:26 -0500 Subject: [datatable-help] subset data table in i with multiple criteria for multiple variables In-Reply-To: <1449684174047-4715347.post@n4.nabble.com> References: <1449684174047-4715347.post@n4.nabble.com> Message-ID: <5668E692.30201@mbacou.com> Carl, Are you just looking for the following syntax? dt[a<35 & b>35, lapply(.SD, median, na.rm = T), by = d] You can include as many conditions as necessary in `i`. You can also chain data.tables: dt[a<35][b>35][, lapply(.SD, median, na.rm = T), by = d] --Mel. On 12/9/2015 1:02 PM, carlsutton wrote: > Is there a way to subset a data table using "i" with multiple criteria using > multiple variables(columns)? I have some test code shown below on what I > have tried. And yes, I have read the documentation, taking the data camp > class (Multiple viewing, I'm a slow learner) and have not seen anything > relevant. Also checked for questions on this topic in this forum and did > not find an answer for my query. > Attempting to upload R file > dataTableExamples.R > > Probably should have stayed in bed today the way things are going. > > A cut and paste from RStudio > # Data Table exercises > require(data.table) > a <- seq(2L,40L, by = 4L) > b <- seq(15L,105L,by = 10L) > c <- 1:10L > d <- rep(c(100L,150L),5L) > e <- 101:110L > dt <- data.table(a,b,c,d,e) > dt > dta <- subset(dt, a < 35) > dtb <- subset(dta, b > 35) > dtb > dtb[, lapply(.SD,median), by = d] > # Now attempt to subset the rows in i > vec <- c(a<35, b>35) > dtvec <- dt[vec, lapply(.SD, median, na.rm = TRUE), by = d] > dtvec > > And console output >> # Now attempt to subset the rows in i >> vec <- c(a<35, b>35) >> dtvec <- dt[vec, lapply(.SD, median, na.rm = TRUE), by = d] > Error in `[.data.table`(dt, vec, lapply(.SD, median, na.rm = TRUE), by = d) > : > Column 1 of result for group 2 is type 'double' but expecting type > 'integer'. Column types must be consistent for each group. >> dtvec > a b c d e > 1: 2 15 1 100 101 > 2: 6 25 2 150 102 > 3: 10 35 3 100 103 > 4: 14 45 4 150 104 > 5: 18 55 5 100 105 > 6: 22 65 6 150 106 > 7: 26 75 7 100 107 > 8: 30 85 8 150 108 > 9: 34 95 9 100 109 > 10: NA NA NA NA NA > 11: NA NA NA NA NA > 12: NA NA NA NA NA > 13: NA NA NA NA NA > 14: NA NA NA NA NA > 15: NA NA NA NA NA > 16: NA NA NA NA NA > > The error message has me confused. > /Column 1 of result for group 2 is type/ > What group 2? I have only grouped on column "d". Result 1 is ???? No idea > what "result 1" is referring to, is it the subset in "i", the median of col > a?? No clue. > > I have only created integer variable for the data table, so why the > rejection " Column 1 of result for group 2 is type 'double' but expecting > type 'integer'. Column types must be consistent for each group." What > double? I have not created any double numbers. > > Carl Sutton > > > > ----- > Carl Sutton > -- > View this message in context: http://r.789695.n4.nabble.com/subset-data-table-in-i-with-multiple-criteria-for-multiple-variables-tp4715347.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help From suttoncarl at ymail.com Thu Dec 10 05:14:30 2015 From: suttoncarl at ymail.com (carlsutton) Date: Wed, 9 Dec 2015 20:14:30 -0800 (PST) Subject: [datatable-help] subset data table in i with multiple criteria for multiple variables In-Reply-To: <5668E692.30201@mbacou.com> References: <1449684174047-4715347.post@n4.nabble.com> <5668E692.30201@mbacou.com> Message-ID: <1264388024.90808.1449721883029.JavaMail.yahoo@mail.yahoo.com> ?Gosh that is simple, elegant and wonderful.? Feel kinda sorta silly for not thinking of it.?? Thank you for enlightening me. My Dad once said if there was a hard way to do a simple task, he was confident I would find it.? That malady has stuck with me oh so many years.? But in 10 years as an aerospace engineer and 35 as a CPA, there was nothing simple, and an unknown unknown could be devastating.? BTW, worked as? a programmer(Fortran in the 60's) to pay for college, and did some programming at Lockheed? for the flutter group.? .Arun covered chaining in the Data Camp course and I use it?frequently.? On my personal project I am mired down in data exploration and investigating variable distributions, means, medians, et al.? Some behave as expected, others have me scratching my head and muttering.? Somewhere somehow it all will make sense, but the big picture is eluding me. Carl Sutton CPA On Wednesday, December 9, 2015 6:43 PM, mbacou [via R] wrote: Carl, Are you just looking for the following syntax? dt[a<35 & b>35, lapply(.SD, median, na.rm = T), by = d] You can include as many conditions as necessary in `i`. You can also chain data.tables: dt[a<35][b>35][, lapply(.SD, median, na.rm = T), by = d] --Mel. On 12/9/2015 1:02 PM, carlsutton wrote: > Is there a way to subset a data table using "i" with multiple criteria using > multiple variables(columns)? ?I have some test code shown below on what I > have tried. ?And yes, I have read the documentation, taking the data camp > class (Multiple viewing, I'm ?a slow learner) and have not seen anything > relevant. ?Also checked for questions on this topic in this forum and did > not find an answer for my query. > Attempting to upload R file > dataTableExamples.R > > Probably should have stayed in bed today the way things are going. > > A cut and paste from RStudio > # ?Data Table exercises > require(data.table) > a <- seq(2L,40L, by = 4L) > b <- seq(15L,105L,by = 10L) > c <- 1:10L > d <- rep(c(100L,150L),5L) > e <- 101:110L > dt <- data.table(a,b,c,d,e) > dt > dta <- subset(dt, a < 35) > dtb <- subset(dta, b > 35) > dtb > dtb[, lapply(.SD,median), by = d] > # ?Now attempt to subset the rows in i > vec <- c(a<35, b>35) > dtvec <- dt[vec, lapply(.SD, median, na.rm = TRUE), by = d] > dtvec > > And console output >> # ?Now attempt to subset the rows in i >> vec <- c(a<35, b>35) >> dtvec <- dt[vec, lapply(.SD, median, na.rm = TRUE), by = d] > Error in `[.data.table`(dt, vec, lapply(.SD, median, na.rm = TRUE), by = d) > : > ? ?Column 1 of result for group 2 is type 'double' but expecting type > 'integer'. Column types must be consistent for each group. >> dtvec > ? ? ? a ?b ?c ? d ? e > ? 1: ?2 15 ?1 100 101 > ? 2: ?6 25 ?2 150 102 > ? 3: 10 35 ?3 100 103 > ? 4: 14 45 ?4 150 104 > ? 5: 18 55 ?5 100 105 > ? 6: 22 65 ?6 150 106 > ? 7: 26 75 ?7 100 107 > ? 8: 30 85 ?8 150 108 > ? 9: 34 95 ?9 100 109 > 10: NA NA NA ?NA ?NA > 11: NA NA NA ?NA ?NA > 12: NA NA NA ?NA ?NA > 13: NA NA NA ?NA ?NA > 14: NA NA NA ?NA ?NA > 15: NA NA NA ?NA ?NA > 16: NA NA NA ?NA ?NA > > The error message has me confused. > ? /Column 1 of result for group 2 is type/ > What group 2? ?I have only grouped on column "d". ?Result 1 is ???? ?No idea > what "result 1" is referring to, is it the subset in "i", the median of col > a?? ?No clue. > > I have only created integer variable for the data table, so why the > rejection " Column 1 of result for group 2 is type 'double' but expecting > type 'integer'. Column types must be consistent for each group." ?What > double? ?I have not created any double numbers. > > Carl Sutton > > > > ----- > Carl Sutton > -- > View this message in context: http://r.789695.n4.nabble.com/subset-data-table-in-i-with-multiple-criteria-for-multiple-variables-tp4715347.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > [hidden email] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help _______________________________________________ datatable-help mailing list [hidden email] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/subset-data-table-in-i-with-multiple-criteria-for-multiple-variables-tp4715347p4715360.html To unsubscribe from subset data table in i with multiple criteria for multiple variables, click here. NAML ----- Carl Sutton -- View this message in context: http://r.789695.n4.nabble.com/subset-data-table-in-i-with-multiple-criteria-for-multiple-variables-tp4715347p4715361.html Sent from the datatable-help mailing list archive at Nabble.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aragorn168b at gmail.com Thu Dec 10 14:28:36 2015 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Thu, 10 Dec 2015 14:28:36 +0100 Subject: [datatable-help] Sum of sets of columns in data.table In-Reply-To: References: Message-ID: Data.table is a column-based data structure. Row wise operations are going to be not as clean or efficient. rowSums() converts the input to a matrix first which is inefficient. But we plan to take care of that in the future releases. I?d do something like: cols = split(grep(?^pop?, names(dt), value=TRUE), rep(1:3, each=3)) ans = lapply(cols, function(col) dt[, rowSums(.SD, na.rm=TRUE), by=.(origin, race, sex, year, total_pop), .SDcols = col]) You can then bind these together however you wish. I don?t think there?s a cleaner way to do this, unless you reshape your data into long form. See the reshaping vignette in the Getting started wiki -?https://github.com/Rdatatable/data.table/wiki/Getting-started?in case you?re interested. --? Arun On 9 December 2015 at 13:19:11, Santosh Srinivas (santosh.srinivas at gmail.com) wrote: Hello All, I am sure there is a much more efficient way to do this. Please advise any suggestions. For now, I have boot fixed this the crude way :-( age_brackets <- c("pop_0:pop_3","pop_4:pop_6","pop_7:pop_9") for (i in age_brackets) { cmdText <- paste('dt[, paste("",i,sep=""):= rowSums(.SD, na.rm=TRUE), by=list(origin, race, sex,year, total_pop), .SDcols=',i,']', sep="") print(cmdText) eval(parse(text=cmdText)) } On Tue, Dec 8, 2015 at 11:13 PM, Santosh Srinivas wrote: Hello All, I have a dataset as below with a reproducible example after that. My actual data has about 100 columns. I want columns that represent the rowSums for sets .. eg. pop_0_3, pop_4_6, pop_7_9 ?.. this is sum of population in age group of 0-3 for example. How can I do that using indexes of the columns? --------------------------------------------------------------------------------------------------------------------------------------------------------- ? ? origin race sex year total_pop ? pop_0 ? pop_1 ? pop_2 ? pop_3 ? pop_4 ? pop_5 ? pop_6 ? pop_7 ? pop_8 ? pop_9 ?1: ? ? ?0 ? ?0 ? 0 2014 318748017 3971847 3957864 3972081 4003272 4001929 4002977 4132455 4152653 4118628 4105776 ?2: ? ? ?0 ? ?0 ? 0 2015 321368864 4000831 3988161 3974109 3986357 4015656 4013264 4013790 4142998 4163270 4129322 ?3: ? ? ?0 ? ?0 ? 0 2016 323995528 4029356 4017346 4004585 3988434 3998839 4026967 4024121 4024481 4153686 4174008 ?4: ? ? ?0 ? ?0 ? 0 2017 326625791 4057231 4046063 4033932 4019069 4000955 4010232 4037777 4034839 4035311 4164487 ?5: ? ? ?0 ? ?0 ? 0 2018 329256465 4083375 4074132 4062816 4048550 4031712 4012371 4021117 4048454 4045696 4046249 ?6: ? ? ?0 ? ?0 ? 0 2019 331883986 4107606 4100469 4091055 4077589 4061316 4043229 4023269 4031853 4059256 4056646 ?7: ? ? ?0 ? ?0 ? 0 2020 334503458 4128810 4124893 4117546 4105953 4090466 4072931 4054223 4034013 4042721 4070166 ?8: ? ? ?0 ? ?0 ? 0 2021 337108968 4145903 4146269 4142090 4132527 4118898 4102128 4083950 4065004 4044832 4053623 ?9: ? ? ?0 ? ?0 ? 0 2022 339698079 4159190 4163587 4163657 4157230 4145600 4130675 4113256 4094835 4075940 4055771 10: ? ? ?0 ? ?0 ? 0 2023 342267302 4169856 4177093 4181156 4178958 4170441 4157505 4141921 4124243 4105873 4086972 --------------------------------------------------------------------------------------------------------------------------------------------------------- # https://www.census.gov/population/projections/files/downloadables/NP2014_D1.csv require("data.table") dt <- structure(list(origin = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,? 0L), race = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), sex = c(0L,? 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), year = 2014:2023, total_pop = c(318748017L,? 321368864L, 323995528L, 326625791L, 329256465L, 331883986L, 334503458L,? 337108968L, 339698079L, 342267302L), pop_0 = c(3971847L, 4000831L,? 4029356L, 4057231L, 4083375L, 4107606L, 4128810L, 4145903L, 4159190L,? 4169856L), pop_1 = c(3957864L, 3988161L, 4017346L, 4046063L,? 4074132L, 4100469L, 4124893L, 4146269L, 4163587L, 4177093L),? ? ? pop_2 = c(3972081L, 3974109L, 4004585L, 4033932L, 4062816L,? ? ? 4091055L, 4117546L, 4142090L, 4163657L, 4181156L), pop_3 = c(4003272L,? ? ? 3986357L, 3988434L, 4019069L, 4048550L, 4077589L, 4105953L,? ? ? 4132527L, 4157230L, 4178958L), pop_4 = c(4001929L, 4015656L,? ? ? 3998839L, 4000955L, 4031712L, 4061316L, 4090466L, 4118898L,? ? ? 4145600L, 4170441L), pop_5 = c(4002977L, 4013264L, 4026967L,? ? ? 4010232L, 4012371L, 4043229L, 4072931L, 4102128L, 4130675L,? ? ? 4157505L), pop_6 = c(4132455L, 4013790L, 4024121L, 4037777L,? ? ? 4021117L, 4023269L, 4054223L, 4083950L, 4113256L, 4141921L ? ? ), pop_7 = c(4152653L, 4142998L, 4024481L, 4034839L, 4048454L,? ? ? 4031853L, 4034013L, 4065004L, 4094835L, 4124243L), pop_8 = c(4118628L,? ? ? 4163270L, 4153686L, 4035311L, 4045696L, 4059256L, 4042721L,? ? ? 4044832L, 4075940L, 4105873L), pop_9 = c(4105776L, 4129322L,? ? ? 4174008L, 4164487L, 4046249L, 4056646L, 4070166L, 4053623L,? ? ? 4055771L, 4086972L)), .Names = c("origin", "race", "sex",? "year", "total_pop", "pop_0", "pop_1", "pop_2", "pop_3", "pop_4",? "pop_5", "pop_6", "pop_7", "pop_8", "pop_9"), class = c("data.table",? "data.frame"), row.names = c(NA, -10L)) --------------------------------------------------------------------------------------------------------------------------------------------------------- Thank you. Santosh _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From clark9876 at airquality.dk Mon Dec 21 11:57:29 2015 From: clark9876 at airquality.dk (Douglas Clark) Date: Mon, 21 Dec 2015 02:57:29 -0800 (PST) Subject: [datatable-help] example(data.table) confusing use of "v" Message-ID: <1450695449634-4715786.post@n4.nabble.com> In the *Example* section of data.table-package documentation, `v` is used as a column name, and later a variable `v`<-"X" is defined and used in setkeyv(DT,v) to set the key to column `x`. Then `v` is again used as column name. This is confusing to me. Is the double use of `v` as both column name and variable intentional? See the extract of the Examples section below: -- View this message in context: http://r.789695.n4.nabble.com/example-data-table-confusing-use-of-v-tp4715786.html Sent from the datatable-help mailing list archive at Nabble.com. From statquant at outlook.com Mon Dec 21 16:50:34 2015 From: statquant at outlook.com (statquant3) Date: Mon, 21 Dec 2015 07:50:34 -0800 (PST) Subject: [datatable-help] join vignette for data.table Message-ID: <1450713034814-4715792.post@n4.nabble.com> Hello, is there a join vignette for data.table ? I realize that I am now more and more using dplyr just for semi_join and anti_join. Is there any plan to write those sugar function ? Happy to do it here and get feedback. C. -- View this message in context: http://r.789695.n4.nabble.com/join-vignette-for-data-table-tp4715792.html Sent from the datatable-help mailing list archive at Nabble.com. From aragorn168b at gmail.com Tue Dec 22 12:02:24 2015 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Tue, 22 Dec 2015 12:02:24 +0100 Subject: [datatable-help] join vignette for data.table In-Reply-To: <1450713034814-4715792.post@n4.nabble.com> References: <1450713034814-4715792.post@n4.nabble.com> Message-ID: In the works. I?ve not yet managed to finish. require(data.table) A = data.table(x=1:2, y=3:4) B = data.table(x=2:3, y=3:4) anti-join: A[!B, on="x"] # equivalently A[!B, .SD, on="x"] semi-join: A[B, .SD, nomatch=0L, on="x"] --? Arun On 21 December 2015 at 17:09:22, statquant3 (statquant at outlook.com) wrote: Hello, is there a join vignette for data.table ? I realize that I am now more and more using dplyr just for semi_join and anti_join. Is there any plan to write those sugar function ? Happy to do it here and get feedback. C. -- View this message in context: http://r.789695.n4.nabble.com/join-vignette-for-data-table-tp4715792.html Sent from the datatable-help mailing list archive at Nabble.com. _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From statquant at outlook.com Tue Dec 22 14:42:37 2015 From: statquant at outlook.com (stat quant) Date: Tue, 22 Dec 2015 14:42:37 +0100 Subject: [datatable-help] join vignette for data.table In-Reply-To: References: <1450713034814-4715792.post@n4.nabble.com> Message-ID: I'd be happy to help. Happy for me to have a look at it? cheers On Tuesday, 22 December 2015, Arunkumar Srinivasan wrote: > In the works. I?ve not yet managed to finish. > > require(data.table) > A = data.table(x=1:2, y=3:4) > B = data.table(x=2:3, y=3:4) > > anti-join: > > A[!B, on="x"] # equivalently A[!B, .SD, on="x"] > > semi-join: > > A[B, .SD, nomatch=0L, on="x"] > > -- > Arun > > On 21 December 2015 at 17:09:22, statquant3 (statquant at outlook.com > ) wrote: > > Hello, is there a join vignette for data.table ? > I realize that I am now more and more using dplyr just for semi_join and > anti_join. > Is there any plan to write those sugar function ? > Happy to do it here and get feedback. > > C. > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/join-vignette-for-data-table-tp4715792.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aragorn168b at gmail.com Tue Dec 22 14:50:37 2015 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Tue, 22 Dec 2015 14:50:37 +0100 Subject: [datatable-help] join vignette for data.table In-Reply-To: References: <1450713034814-4715792.post@n4.nabble.com> Message-ID: Sure. It?s on the list of tasks for this release as well, so I?ll have to work on it. Will try to put it up somewhere. But the most useful thing you / anyone else could do is to provide / point to a data set (preferably in connection with flights, maybe weather?) that I can use as a 2nd table to explain joins. I?m not quite satisfied with the small artificial dataset that I?m using at the moment. Also, the kind of questions that could be asked from that dataset using those two ( or more?) tables which requires joins (+ some other tasks) would be great! That?s the hardest part for me at the moment, not the explanations :-). --? Arun On 22 December 2015 at 14:42:38, stat quant (statquant at outlook.com) wrote: I'd be happy to help. Happy for me to have a look at it? cheers? On Tuesday, 22 December 2015, Arunkumar Srinivasan wrote: In the works. I?ve not yet managed to finish. require(data.table) A = data.table(x=1:2, y=3:4) B = data.table(x=2:3, y=3:4) anti-join: A[!B, on="x"] # equivalently A[!B, .SD, on="x"] semi-join: A[B, .SD, nomatch=0L, on="x"] --? Arun On 21 December 2015 at 17:09:22, statquant3 (statquant at outlook.com) wrote: Hello, is there a join vignette for data.table ? I realize that I am now more and more using dplyr just for semi_join and anti_join. Is there any plan to write those sugar function ? Happy to do it here and get feedback. C. -- View this message in context: http://r.789695.n4.nabble.com/join-vignette-for-data-table-tp4715792.html Sent from the datatable-help mailing list archive at Nabble.com. _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From aragorn168b at gmail.com Wed Dec 23 16:34:59 2015 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Wed, 23 Dec 2015 16:34:59 +0100 Subject: [datatable-help] example(data.table) confusing use of "v" In-Reply-To: <1450695449634-4715786.post@n4.nabble.com> References: <1450695449634-4715786.post@n4.nabble.com> Message-ID: The ?v? in `v <- ?x?` bears no relation to column name ?v?. I understand how/why this can be confusing. Perhaps we should rename that to `keycol <- ?x?` or something like that.. Please file an issue, and even better as a PR. --? Arun On 21 December 2015 at 12:16:16, Douglas Clark (clark9876 at airquality.dk) wrote: In the *Example* section of data.table-package documentation, `v` is used as a column name, and later a variable `v`<-"X" is defined and used in setkeyv(DT,v) to set the key to column `x`. Then `v` is again used as column name. This is confusing to me. Is the double use of `v` as both column name and variable intentional? See the extract of the Examples section below: -- View this message in context: http://r.789695.n4.nabble.com/example-data-table-confusing-use-of-v-tp4715786.html Sent from the datatable-help mailing list archive at Nabble.com. _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From suttoncarl at ymail.com Sat Dec 26 23:33:48 2015 From: suttoncarl at ymail.com (carlsutton) Date: Sat, 26 Dec 2015 14:33:48 -0800 (PST) Subject: [datatable-help] creating levels for a variable Message-ID: <1451169228188-4715934.post@n4.nabble.com> Please forgive me for asking such a basic question. I have been fumbling around trying to learn R for a couple years via Data Camp, Coursera, and Lynda.com but keep amazing myself at how inadequate I am. I have searched this site and stack overflow and have not found an answer, probably because the question is so basic. I have some rather large data tables. In performing my exploration work it would be helpful to combine the values of one variable into separate ranges, then explore the relationship of those ranges to other variables. I have created a very simple example of what I am attempting to do, and cut and pasted the resulting error messages. The goal is to populate the variable levels with first, second or third, depending on the value of variable "b". # create a range variable set.seed = 1 library(data.table) dt <- data.table(a=1:10, b = sample(c(200:50000),10), c = 21:30, level = 1:10) dt dt1 <- dt[, .(':=' (if (dt$b < 12000) level = "first", if (dt$b >= 12000 & dt$b < 34000) level = "second", if (dt$b > 34000) level = "third"))] # gives error Error in `:=`(if (dt$b < 12000) level = "first", if (dt$b >= 12000 & dt$b < # : Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, # once only and in particular ways. See help(":="). dt1 dt1 <- dt[, lapply(if (dt$b < 12000) level = "first", if (dt$b >= 12000 & dt$b < 34000) level = "second", if (dt$b > 34000) level = "third")] # gives error Error in match.fun(FUN) : # 'if (dt$b >= 12000 & dt$b < 34000) level = "second"' is not a function, character or symbol # In addition: Warning message: In if (dt$b >= 12000 & dt$b < 34000) level = "second" : # the condition has length > 1 and only the first element will be used dt1 any help is appreciated. Carl Sutton ----- Carl Sutton -- View this message in context: http://r.789695.n4.nabble.com/creating-levels-for-a-variable-tp4715934.html Sent from the datatable-help mailing list archive at Nabble.com. From malnamalja at gmx.de Mon Dec 28 12:27:38 2015 From: malnamalja at gmx.de (el_alisio) Date: Mon, 28 Dec 2015 03:27:38 -0800 (PST) Subject: [datatable-help] creating levels for a variable In-Reply-To: <1451169228188-4715934.post@n4.nabble.com> References: <1451169228188-4715934.post@n4.nabble.com> Message-ID: <1451302058726-4715955.post@n4.nabble.com> Hi Carl, you may want to check out the cut-function: library("data.table") dt <- data.table(a = 1:10, b = sample(c(200:50000), 10), c = 21:30) dt[, level := cut(b, breaks = c(0, 12000, 34000, max(b)), labels = c("first", "second", "third"))] Cheers, Jannes -- View this message in context: http://r.789695.n4.nabble.com/creating-levels-for-a-variable-tp4715934p4715955.html Sent from the datatable-help mailing list archive at Nabble.com. From malnamalja at gmx.de Mon Dec 28 12:38:12 2015 From: malnamalja at gmx.de (el_alisio) Date: Mon, 28 Dec 2015 03:38:12 -0800 (PST) Subject: [datatable-help] append two factor vectors In-Reply-To: <1450285986317-4715638.post@n4.nabble.com> References: <1450285986317-4715638.post@n4.nabble.com> Message-ID: <1451302692686-4715956.post@n4.nabble.com> Hi, how about: q1 <- c("a1", "a2" ,"a3") q2 <- c("b1", "b2", "b3") as.factor(c(q1, q2)) In case q1 and q2 are already factors, first convert them into characters: q1 <- as.factor(q1) q2 <- as.factor(q2) as.factor(c(as.character(q1), as.character(q2))) Cheers, Jannes agent dunham wrote > Dear community, > > i have two vector factors that i need to join into a single vector, and > afterwards, a unique over this final vector. > Can anybody help? > > Example: > > q1 <- c("a1","a2","a3") > q2 <- c("b1","b2","b3") > > q1 <- as.factor(q1) > q2 <- as.factor(q2) > > I have this q's. Then, I'd need: > > qjoin <- c(q1,q2) > > BUT, then I see: 1 2 3 1 2 3 > rather than "a1" "a2" "a3" "b1" "b2" "b3" -- View this message in context: http://r.789695.n4.nabble.com/append-two-factor-vectors-tp4715638p4715956.html Sent from the datatable-help mailing list archive at Nabble.com. From fjbuch at gmail.com Tue Dec 29 00:41:51 2015 From: fjbuch at gmail.com (Farrel Buchinsky) Date: Mon, 28 Dec 2015 23:41:51 +0000 Subject: [datatable-help] IDateTime and missing values Message-ID: I have a data.table that contains real data. That means that some values are missing. I am trying to convert one column into data.table's idate and itime. Alas I am running into problems. But look at this *> IDateTime(as.POSIXlt(c("2015-05-01 13:46:23", "2015-05-03 16:40:00")))* idate itime 1: 2015-05-01 13:46:23 2: 2015-05-03 16:40:00 *> IDateTime(as.POSIXlt(c("2015-05-01 13:46:23", "2015-05-03 16:40:00", NA )))* Error in if (any(neg)) res[neg] = paste("-", res[neg], sep = "") : missing value where TRUE/FALSE needed A missing variable screws everything up. Any insights or suggestions for me? At this point I will have to systematically select rows that meet !is.na(datevariable) to perform the IDateTime function on. -- Farrel Buchinsky (412) 567-7870 (gets me everywhere) -------------- next part -------------- An HTML attachment was scrubbed... URL: From fjbuch at gmail.com Tue Dec 29 01:27:17 2015 From: fjbuch at gmail.com (Farrel Buchinsky) Date: Tue, 29 Dec 2015 00:27:17 +0000 Subject: [datatable-help] IDateTime and missing values In-Reply-To: References: Message-ID: By the way, as.IDate can handle the NA but IDateTime cannot. On Mon, Dec 28, 2015 at 6:41 PM Farrel Buchinsky wrote: > I have a data.table that contains real data. That means that some values > are missing. I am trying to convert one column into data.table's idate and > itime. Alas I am running into problems. > > But look at this > > *> IDateTime(as.POSIXlt(c("2015-05-01 13:46:23", "2015-05-03 16:40:00")))* > idate itime > 1: 2015-05-01 13:46:23 > 2: 2015-05-03 16:40:00 > > *> IDateTime(as.POSIXlt(c("2015-05-01 13:46:23", "2015-05-03 16:40:00", NA > )))* > Error in if (any(neg)) res[neg] = paste("-", res[neg], sep = "") : > missing value where TRUE/FALSE needed > > A missing variable screws everything up. Any insights or suggestions for > me? At this point I will have to systematically select rows that meet ! > is.na(datevariable) to perform the IDateTime function on. > > > -- > Farrel Buchinsky > (412) 567-7870 (gets me everywhere) > -- Farrel Buchinsky (412) 567-7870 (gets me everywhere) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rv15i at yahoo.se Thu Dec 31 17:20:03 2015 From: rv15i at yahoo.se (ravi) Date: Thu, 31 Dec 2015 16:20:03 +0000 (UTC) Subject: [datatable-help] help in joining two data tables References: <1649899131.8398424.1451578803532.JavaMail.yahoo.ref@mail.yahoo.com> Message-ID: <1649899131.8398424.1451578803532.JavaMail.yahoo@mail.yahoo.com> Hi,I have some trouble in understanding the data.table procedure for joining two tables. Let me start by taking up two example data tables : library(data.table) ############ the first data.table example mt<-data.table(mtcars) ## some modifications to the data.table s1<-1:32;s1[seq(2,32,by=2)]<-NA mt[,"cntrl":=s1];mt[,"cylO":=cyl];mt[,"cyl":=cyl*2] setkey(mt,gear,carb,cylO,cntrl) mt ##? More modifications mt[gear == 3 & carb ==3 & cylO == 8 & mpg == 16.4,cntrl:=14] str(mt) mt ############## the second data.table example nt<-data.table(gear= c(3,3,3),carb=c(1,3,3),cylO=c(4,8,8),price=c(11,44,55),cntrl=c(21,13,14)) setkey(nt,gear,carb,cylO,cntrl) ############# merging as a data frame rdJoin<-merge.data.frame(mt,nt,by.x=c("gear","carb","cylO","cntrl"),by.y=c("gear","carb","cylO","cntrl"),all.x=TRUE) str(rdJoin) rdJoin ############## questions # What is the data.table command to get rdJoin? # How is it possible to specify the key variables for the join -see below # For example, c("gear","carb")????? c("gear","carb","cylO")?? etc. # Also, where the variables have different names in the two tables # For example, if the cntrl variable in the first DT is "cntrl1" and "cntrl2" in the second Let me elaborate on te questions shown above. First, I would like to start with some general questions :1. In the documentation for data.table (which includes the vignettes available so far), it is mentioned that it is sufficient if one of the two data tables being considered has keys. This is a bit confusing. The straightforward situation is if both the tables have keys. When would it be of advantage to have keys for just one of them? It would be nice if this can be explained in the to-be-released vignette on joins.2. The merge command in base R is very clear and easy to understand. It would be nice if the data table procedure is transparent in the same way. To start with, I would like to know how I can do the following things with data table :??????? (i) the data.table equivalent of the base R command??????????????????????????? merge.data.frame(mt,nt,by.x=c("gear","carb","cylO","cntrl"),by.y=c("gear","carb","cylO","cntrl"),all.x=TRUE)?????????? (ii) How it is possible to choose the number of key variables from a list :???????????????????????? c("gear","carb") ? ? ? ?? c("gear","carb","cylO") ? ? ? ?? ??????? c("gear","carb","cylO","cntrl")???????????????????????? It is very clear in the merge command how this is done. How to do that with data.table? ??????????????????????? The on argument can be used for one of the tables. How can it be specified for the other? That is, without having to use the setkey command each time a change is needed.????????? (iii) How can this be done if the key variables in the two lists have different names? That is, if the cntrl variable in the first DT is "cntrl1" and "cntrl2" in the second, for example. I have found the data.table package to be very useful. It would be nice if I can understand its use better. Thanks for any help that I can get.Ravi -------------- next part -------------- An HTML attachment was scrubbed... URL: From rv15i at yahoo.se Thu Dec 31 17:58:13 2015 From: rv15i at yahoo.se (ravi) Date: Thu, 31 Dec 2015 16:58:13 +0000 (UTC) Subject: [datatable-help] help in joining two data tables In-Reply-To: <1649899131.8398424.1451578803532.JavaMail.yahoo@mail.yahoo.com> References: <1649899131.8398424.1451578803532.JavaMail.yahoo@mail.yahoo.com> Message-ID: <86618974.8340374.1451581093979.JavaMail.yahoo@mail.yahoo.com> Hi,I just want want to follow up, after going through the documentation once again.############################ rtJoin<-nt[mt]? # outer left join rtJoinidentical(rtJoin,rdJoin) # False coli<-names(rdJoin) setcolorder(rtJoin,coli)identical(rtJoin,rdJoin) # False again, the row order appears to be different ########################## I think that the equivalent data.table command for left outer join is :nt[mt]But the identical command was false. This stayed so even after the column order was set to be the same in the two cases.? Now, the row order is different. So, my next question is : how does one compare two data tables to check that the results are the same?I have now landed in a very different question from my original one.Thanks for any help.Ravi From: ravi To: "datatable-help at lists.r-forge.r-project.org" Sent: Thursday, 31 December 2015, 17:20 Subject: [datatable-help] help in joining two data tables Hi,I have some trouble in understanding the data.table procedure for joining two tables. Let me start by taking up two example data tables : library(data.table) ############ the first data.table example mt<-data.table(mtcars) ## some modifications to the data.table s1<-1:32;s1[seq(2,32,by=2)]<-NA mt[,"cntrl":=s1];mt[,"cylO":=cyl];mt[,"cyl":=cyl*2] setkey(mt,gear,carb,cylO,cntrl) mt ##? More modifications mt[gear == 3 & carb ==3 & cylO == 8 & mpg == 16.4,cntrl:=14] str(mt) mt ############## the second data.table example nt<-data.table(gear= c(3,3,3),carb=c(1,3,3),cylO=c(4,8,8),price=c(11,44,55),cntrl=c(21,13,14)) setkey(nt,gear,carb,cylO,cntrl) ############# merging as a data frame rdJoin<-merge.data.frame(mt,nt,by.x=c("gear","carb","cylO","cntrl"),by.y=c("gear","carb","cylO","cntrl"),all.x=TRUE) str(rdJoin) rdJoin ############## questions # What is the data.table command to get rdJoin? # How is it possible to specify the key variables for the join -see below # For example, c("gear","carb")????? c("gear","carb","cylO")?? etc. # Also, where the variables have different names in the two tables # For example, if the cntrl variable in the first DT is "cntrl1" and "cntrl2" in the second Let me elaborate on te questions shown above. First, I would like to start with some general questions :1. In the documentation for data.table (which includes the vignettes available so far), it is mentioned that it is sufficient if one of the two data tables being considered has keys. This is a bit confusing. The straightforward situation is if both the tables have keys. When would it be of advantage to have keys for just one of them? It would be nice if this can be explained in the to-be-released vignette on joins.2. The merge command in base R is very clear and easy to understand. It would be nice if the data table procedure is transparent in the same way. To start with, I would like to know how I can do the following things with data table :??????? (i) the data.table equivalent of the base R command??????????????????????????? merge.data.frame(mt,nt,by.x=c("gear","carb","cylO","cntrl"),by.y=c("gear","carb","cylO","cntrl"),all.x=TRUE)?????????? (ii) How it is possible to choose the number of key variables from a list :???????????????????????? c("gear","carb") ? ? ? ?? c("gear","carb","cylO") ? ? ? ?? ??????? c("gear","carb","cylO","cntrl")???????????????????????? It is very clear in the merge command how this is done. How to do that with data.table? ??????????????????????? The on argument can be used for one of the tables. How can it be specified for the other? That is, without having to use the setkey command each time a change is needed.????????? (iii) How can this be done if the key variables in the two lists have different names? That is, if the cntrl variable in the first DT is "cntrl1" and "cntrl2" in the second, for example. I have found the data.table package to be very useful. It would be nice if I can understand its use better. Thanks for any help that I can get.Ravi _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: