From suttoncarl at ymail.com Thu Dec 1 18:21:46 2016 From: suttoncarl at ymail.com (carlsutton) Date: Thu, 1 Dec 2016 09:21:46 -0800 (PST) Subject: [datatable-help] filing a column with a number from another column by group Message-ID: <1480612906842-4726956.post@n4.nabble.com> As a result of some computations using data.table wherein the computations require using "i" to select certain rows, the results of the computations only reside in those certain rows. Further computations require that the computation results be in each row by group. A much simplified example follows. help_request_120116.R Not being familiar with the nabble upload feature, I have also done a cut and paste. # adding data to data.table column library(data.table) dt1 <- data.table(group = rep(1:5, each = 3),b = rep(1:3,times = 5), var = c(NA,NA,NA,20,NA,NA,21,NA,NA,NA,NA,NA,22,NA,NA), var1 = c(NA,NA,NA,20,NA,NA,20,NA,NA,NA,NA,NA,20,NA,NA)) dt1 # Result Wanted # group b var var1 #1: 1 1 NA 20 #2: 1 2 NA 20 #3: 1 3 NA 20 #4: 2 1 20 20 #5: 2 2 NA 20 #6: 2 3 NA 20 #7: 3 1 21 20 #8: 3 2 NA 20 #9: 3 3 NA 20 #10: 4 1 NA 20 #11: 4 2 NA 20 #12: 4 3 NA 20 #13: 5 1 22 20 #14: 5 2 NA 20 #15: 5 3 NA 20 I have searched the web, books, vignette, FAQs and fumbled around trying this and that for the last week or so with no success. The solution is probably very simple and I hate to waste your time with something this basic, but I just cannot get my head wrapped around how to this. BTW, love the package, use setkey and by all the time. Carl Sutton ----- Carl Sutton -- View this message in context: http://r.789695.n4.nabble.com/filing-a-column-with-a-number-from-another-column-by-group-tp4726956.html Sent from the datatable-help mailing list archive at Nabble.com. From kbsudhir at gmail.com Tue Dec 6 19:34:13 2016 From: kbsudhir at gmail.com (kbsudhir) Date: Tue, 6 Dec 2016 10:34:13 -0800 (PST) Subject: [datatable-help] Sequential numbering for each month on a period of time Message-ID: <1481049253450-4727126.post@n4.nabble.com> Hi All, I am a beginner in R. I have a period of dates in my data set starting 10th April'2006 till 31st Aug'2016. Its a period of 125 months. I want to sequentially identify each month with a number starting "1" in a new corresponding column Ex "Month_Identifier". Ex Apr'2006 - 1 May'2006 - 2 ........ Aug'2016 - 125 Also, the datatype of the column where date is available is a factor. Requesting guidance on how to achieve this. Regards Sudhir -- View this message in context: http://r.789695.n4.nabble.com/Sequential-numbering-for-each-month-on-a-period-of-time-tp4727126.html Sent from the datatable-help mailing list archive at Nabble.com. From jholtman at gmail.com Thu Dec 8 01:14:20 2016 From: jholtman at gmail.com (jim holtman) Date: Wed, 7 Dec 2016 19:14:20 -0500 Subject: [datatable-help] Sequential numbering for each month on a period of time In-Reply-To: <1481049253450-4727126.post@n4.nabble.com> References: <1481049253450-4727126.post@n4.nabble.com> Message-ID: Please at least supply a sample of the data. Is there more than one row per month, are there months missing, if so how to handle. You left a lot unspecified. Guidance can be provided with data. Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Tue, Dec 6, 2016 at 1:34 PM, kbsudhir wrote: > Hi All, > > I am a beginner in R. > > I have a period of dates in my data set starting 10th April'2006 till 31st > Aug'2016. Its a period of 125 months. I want to sequentially identify each > month with a number starting "1" in a new corresponding column Ex > "Month_Identifier". Ex > Apr'2006 - 1 > May'2006 - 2 ........ Aug'2016 - 125 > > Also, the datatype of the column where date is available is a factor. > > Requesting guidance on how to achieve this. > > Regards > Sudhir > > > > -- > View this message in context: http://r.789695.n4.nabble.com/ > Sequential-numbering-for-each-month-on-a-period-of-time-tp4727126.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/ > listinfo/datatable-help > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mel at mbacou.com Thu Dec 8 02:46:37 2016 From: mel at mbacou.com (Bacou, Melanie) Date: Wed, 7 Dec 2016 20:46:37 -0500 Subject: [datatable-help] Sequential numbering for each month on a period of time In-Reply-To: References: <1481049253450-4727126.post@n4.nabble.com> Message-ID: <265580fa-bd2d-b7c4-0ab7-0df082d53715@mbacou.com> Sudhir, Read the documentation for R function `as.Date()`, e.g. you can transform your factor column to a date column using `as.Date(my_date, format = "%d%b'%Y")`, then order/sort on that column. I guess what you're looking for is: dt[, my_date := as.Date(paste(1, my_date), format = "%d %b'%Y")] month_seq <- seq("2006-04-01", "2014-08-01", "month") dt_month <- data.table(month_int=1:length(month_seq), month_seq=month_seq) setkey(dt, my_date) setkey(dt_month, month_seq) dt[dt_month, my_date_int := month_int] --Mel. On 12/7/2016 7:14 PM, jim holtman wrote: > Please at least supply a sample of the data. Is there more than one > row per month, are there months missing, if so how to handle. You > left a lot unspecified. Guidance can be provided with data. > > > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > On Tue, Dec 6, 2016 at 1:34 PM, kbsudhir > wrote: > > Hi All, > > I am a beginner in R. > > I have a period of dates in my data set starting 10th April'2006 > till 31st > Aug'2016. Its a period of 125 months. I want to sequentially > identify each > month with a number starting "1" in a new corresponding column Ex > "Month_Identifier". Ex > Apr'2006 - 1 > May'2006 - 2 ........ Aug'2016 - 125 > > Also, the datatype of the column where date is available is a factor. > > Requesting guidance on how to achieve this. > > Regards > Sudhir > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Sequential-numbering-for-each-month-on-a-period-of-time-tp4727126.html > > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > > > > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From zeeshanuaf at gmail.com Sun Dec 11 18:01:46 2016 From: zeeshanuaf at gmail.com (Zeeshan Mustafa) Date: Sun, 11 Dec 2016 09:01:46 -0800 (PST) Subject: [datatable-help] Multiple tabulation with length, percentages and statistical significance Message-ID: <1481475706841-4727278.post@n4.nabble.com> I have the following data frame with details df=LTACData I want to create cross tabulation of four technologies (tech1,tech2,tech3,tech4) with each sub-category (Milkutil) of respect milk utilization categories (c1,c2,c3)to districts (distr1,distr2,......distr16). I need to report N=total number of observation, percentage and statistical significance. I have tried the following syntax. Please suggest the modification. Table1 <- with(LTACData, summarize( cbind(tech1, tech2,tech3,tech4), by=llist(District, Milkutil), FUN=colSums)) -- View this message in context: http://r.789695.n4.nabble.com/Multiple-tabulation-with-length-percentages-and-statistical-significance-tp4727278.html Sent from the datatable-help mailing list archive at Nabble.com. From hrayrwannis at gmail.com Tue Dec 13 11:41:33 2016 From: hrayrwannis at gmail.com (edhunter) Date: Tue, 13 Dec 2016 02:41:33 -0800 (PST) Subject: [datatable-help] svyby with multiple factor and/or looping difficulty Message-ID: <1481625693300-4727340.post@n4.nabble.com> Hello, I am try to generate weighted tables with svyby with multiple covariates/factors but not nested: df<- data.frame(attend=(sample(c(0,100),500,prob = c(.35,.65),replace = TRUE)),sex=sample(c("m","f"),500,prob=c(.45,.55),replace = TRUE),wgt=sample(c(.88,.99,1.32,1.45,1.76),500,replace=TRUE),agecat=sample(c(1,2,3,4),500,replace=TRUE)) adesg<- svydesign(id=~1, weights = ~wgt,data=df) 1. svyby(~attend,~sex,design = adesg,svymean,vartype = "ci") 2. svyby(~attend,~agecat,adesg,svymean) I want to generate the above two (and many other cov) in one go in a single table. Though the code at 3.)nests the variables rather than providing results per category values as above. I've also tries to loop the above codes but that opens up host of other issues that so far turned up to dead end. 3. svyby(~attend, ~agecat+sex,adesg,svymean) Any thoughts? thanks, -- View this message in context: http://r.789695.n4.nabble.com/svyby-with-multiple-factor-and-or-looping-difficulty-tp4727340.html Sent from the datatable-help mailing list archive at Nabble.com. From suttoncarl at ymail.com Mon Dec 19 04:16:21 2016 From: suttoncarl at ymail.com (Carl Sutton) Date: Mon, 19 Dec 2016 03:16:21 +0000 (UTC) Subject: [datatable-help] split data table column aka tidyr separate function References: <595767984.1338462.1482117381125.ref@mail.yahoo.com> Message-ID: <595767984.1338462.1482117381125@mail.yahoo.com> Hi I have searched the last couple of days for a way to do this but have not found a solution.?? With real data, I have used tidyr to do the task but:1.?? It has used all available memory (12gb on older desktop)2.?? Future tables will be even larger so would need to be split 3.? It is is s l ow, perhaps due to lack of free memory. The data is provided in a format such that a variable "name" (and there are several like this) actually contains the variable name and? indices, i.e. var_09 is the ninth level of that variable.?? The data analysis needs that level as a separate variable.? Code and toy data set are below. #? column split test library(data.table) library(tidyr) #? data table for melt and columns split dt1 <- data.table(a_1 = 1:10, b_2 = 20:29,folks = c("art","brian","ed", ??????????????? "rich","dennis","frank", "derrick","paul","fred","numnuts"), ????????????????? a_2 = 2:11, b_1 = 21:30) melt(dt1, id = "folks")? #? so far so good dt1[,c("a") := tstrsplit(c(a_1),"_",fixed = TRUE)][,c("a") := tstrsplit(c(a_2), ????????????????????????? "_",fixed = TRUE)][] #? That is not producing what I want #? tidyr gives what I want df <- data.frame(a_1 = 1:10, b_2 = 20:29,folks = c("art","brian","ed", ??????????????? "rich","dennis","frank", "derrick","paul","fred","numnuts"), ???????????????? a_2 = 2:11, b_1 = 21:30) df %>% gather(value, nums, -folks) %>% ??????? separate(value, c("varTYpe","varIndex"))?Carl Sutton -------------- next part -------------- An HTML attachment was scrubbed... URL: From suttoncarl at ymail.com Thu Dec 22 01:58:54 2016 From: suttoncarl at ymail.com (carlsutton) Date: Wed, 21 Dec 2016 16:58:54 -0800 (PST) Subject: [datatable-help] command := works with toy data but causes error with real data Message-ID: <1482368334629-4727560.post@n4.nabble.com> Hi Thanks to the help from this list I was able to get the toy data to work and subsequently got cast to work. With anticipation of wonderful things to happen with real data, I copied the command from toy data to melt real data and R complained. What I would like to know: 1. What is wrong with the code and 2. What causes this error message? I have read the help page, searched the web, and am just as clueless now and when I started trying to find out what stupid thing I did to cause this error. Here is the code and error message. class(with(data_1,"var1","var2","var3","var_4")) class(data_1) system.time(races_melt <-melt(data_1, id = c("var1","var2", "var3"), measure = "var_4")[,c("varType", "Seqnc") := tstrsplit(variable,"_")][variable := NULL]) Error message class(with(data_1,"var1","var2","var3","var_4")) [1] "character" > class(data_1) [1] "data.table" "data.frame" > system.time(races_melt <-melt(data_1, id = c("var1","var2", + "var3"), measure = "var_4")[,c("varType", + "Seqnc") := tstrsplit(variable,"_")][variable := + NULL]) Error in `:=`(variable, NULL) : Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":="). Timing stopped at: 0.06 0 0.06 This error message occurs "not infrequently" in my work and sometimes it's my typo's and sometimes I just don't have a clue. The data always is a data.table Below is the toy data that worked (BIG THANK YOU TO R Flores for that) library(data.table) library(tidyr) # data table for melt and columns split dt1 <- data.table(a_1 = 1:10, b_2 = 20:29,folks = c("art","brian","ed", "rich","dennis","frank", "derrick","paul","fred","numnuts"), a_2 = 2:11, b_1 = 21:30) #melt(dt1, id = "folks") # so far so good melted <- melt(dt1, id = "folks")[,c("varType","varIndex") := tstrsplit(variable,"_")][,variable:=NULL] # melted has 40 observations from stacking a and b variables # which have lengths of 20 each str(melted) ----- Carl Sutton -- View this message in context: http://r.789695.n4.nabble.com/command-works-with-toy-data-but-causes-error-with-real-data-tp4727560.html Sent from the datatable-help mailing list archive at Nabble.com. From borja1055 at gmail.com Fri Dec 30 16:41:31 2016 From: borja1055 at gmail.com (borja1055) Date: Fri, 30 Dec 2016 07:41:31 -0800 (PST) Subject: [datatable-help] Converting Dates for Time Series Analysis Message-ID: <1483112491750-4727729.post@n4.nabble.com> my data is currently formatted like this when running the time series plot function, the graphs appear like this How can i format it so the plot looks similar to this Thanks -- View this message in context: http://r.789695.n4.nabble.com/Converting-Dates-for-Time-Series-Analysis-tp4727729.html Sent from the datatable-help mailing list archive at Nabble.com. From mel at mbacou.com Fri Dec 30 22:27:22 2016 From: mel at mbacou.com (Bacou, Melanie) Date: Fri, 30 Dec 2016 16:27:22 -0500 Subject: [datatable-help] Converting Dates for Time Series Analysis In-Reply-To: <1483112491750-4727729.post@n4.nabble.com> References: <1483112491750-4727729.post@n4.nabble.com> Message-ID: <487d51cf-3bcf-ec56-e7c7-02e683615b3c@mbacou.com> Not a data.table question, but you can use, e.g. dt[, Row.Labels := as.date(Row.Labels,"%m/%d/%Y")] On 12/30/2016 10:41 AM, borja1055 wrote: > my data is currently formatted like this > > > when running the time series plot function, the graphs appear like this > > > How can i format it so the plot looks similar to this > > > > Thanks > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Converting-Dates-for-Time-Series-Analysis-tp4727729.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help