From my.r.help at gmail.com Sun Jan 3 06:33:52 2016 From: my.r.help at gmail.com (Michael Smith) Date: Sun, 3 Jan 2016 13:33:52 +0800 Subject: [datatable-help] Bug in as.data.table.xts? Message-ID: <5688B2C0.6090605@gmail.com> Potential bug in as.data.table.xts when converting a single-row xts object. Using the dev version from GitHub. library(xts) library(data.table) x <- as.xts(8, order.by = Sys.Date()) ## Single-row conversion doesn't work. as.data.table(x) Error in as.data.frame.matrix(coredata(x), row.names, optional, ...) : row names must be 'character' or 'integer', not 'logical' ## Two rows work fine. as.data.table(rbind(x, x)) index V1 1: 2016-01-03 8 2: 2016-01-03 8 From j.gorecki at wit.edu.pl Sun Jan 3 15:11:17 2016 From: j.gorecki at wit.edu.pl (Jan Gorecki) Date: Sun, 3 Jan 2016 14:11:17 +0000 (UTC) Subject: [datatable-help] Bug in as.data.table.xts? References: <5688B2C0.6090605@gmail.com> Message-ID: Michael Smith gmail.com> writes: > > Potential bug in as.data.table.xts when converting a single-row xts > object. Using the dev version from GitHub. > > library(xts) > library(data.table) > x <- as.xts(8, order.by = Sys.Date()) > > ## Single-row conversion doesn't work. > as.data.table(x) Thanks for bug report. Just pushed fix. You can find it here: https://github.com/Rdatatable/data.table/pull/1485 You can test it by installing "jangorecki/data.table at as.dt.xts.1rowfix" From my.r.help at gmail.com Mon Jan 4 00:53:10 2016 From: my.r.help at gmail.com (Michael Smith) Date: Mon, 4 Jan 2016 07:53:10 +0800 Subject: [datatable-help] Bug in as.data.table.xts? In-Reply-To: References: <5688B2C0.6090605@gmail.com> Message-ID: <5689B466.3060807@gmail.com> On 01/03/2016 10:11 PM, Jan Gorecki wrote: > Michael Smith gmail.com> writes: > >> >> Potential bug in as.data.table.xts when converting a single-row xts >> object. Using the dev version from GitHub. >> >> library(xts) >> library(data.table) >> x <- as.xts(8, order.by = Sys.Date()) >> >> ## Single-row conversion doesn't work. >> as.data.table(x) > > Thanks for bug report. Just pushed fix. You can find it here: > https://github.com/Rdatatable/data.table/pull/1485 > You can test it by installing "jangorecki/data.table at as.dt.xts.1rowfix" Many thanks for the quick fix. Appreciate it a lot. I've noticed two lines above your fix in R/xts.R there's another case of as.data.frame(x, row.names=FALSE) Should it also be changed to row.names=NULL? M From mattjdowle at gmail.com Mon Jan 4 18:35:29 2016 From: mattjdowle at gmail.com (Matt Dowle) Date: Mon, 4 Jan 2016 09:35:29 -0800 Subject: [datatable-help] Fwd: Tuesday: Join 79 R hackers at "data table discussion" In-Reply-To: <603739052.1451885818604.JavaMail.root@jobs2.meetup.com> References: <603739052.1451885818604.JavaMail.root@jobs2.meetup.com> Message-ID: If you're in the area, meetup in Singapore tomorrow discussing data.table ... As a follow-up to our most recent session on dplyr, we shall have a session discussing the advantages/disadvantages of data table. We have two speakers lined up, but if you out there have been using data table and found it to be useful, please do come forward and share your views, if only for 10 minutes or so. The speakers so far are: - Gaurav, who will discuss some of the differences in syntax between dplyr and data table. - Nicholas Ng, who will discuss some of the speed-ups that data table provides. ---------- Forwarded message ---------- From: R User Group - Singapore (RUGS) Date: Sun, Jan 3, 2016 at 9:36 PM Subject: Tuesday: Join 79 R hackers at "data table discussion" To: mattjdowle at gmail.com [image: Meetup] Tuesday data table discussion R User Group - Singapore (RUGS) Tuesday, January 5, 2016 7:00 PM Microsoft Auditorium (Singapore) One Marina Boulevard, Level 21 Auditorium, Singapore 018989 Singapore Are you going? Yes No 79 R hackers going, including: Eugene Teo Nick Palevsky "I've been fooling around with R most of this year, but haven't gotten very far with it." nanoushka E Stephen "Visiting from SF" Prashant Mehta Hi all, As a follow-up to our most recent session on dplyr, we shall have a session discussing the advantages/disadvantages of data table. We have two speakers lined up, but if you out there have been using data table and found it to be useful, pleas... Learn more More Meetups from this group Mar 16 Nosql databases Wednesday, March 16, 2016 7:00 PM ? 58 attending RSVP Credit Risk Analytics Series: Credit Scores & Profitability 40 attending RSVP R vs Excel vs SAS vs SPSS vs Julia 116 attending RSVP Follow us! Never miss a last-minute change. Get the app. [image: iPhone App Store] [image: Google Play] You're getting this message because your Meetup account is connected to this email address. Unsubscribe from similar emails from this Meetup group. Manage your settings for all types of email updates. Visit your account page to change your contact details, privacy settings, and other settings. Meetup Inc. , POB 4668 #37895 New York NY USA 10163 -------------- next part -------------- An HTML attachment was scrubbed... URL: From shariful at excite.com Tue Jan 5 13:06:18 2016 From: shariful at excite.com (hello_R) Date: Tue, 5 Jan 2016 04:06:18 -0800 (PST) Subject: [datatable-help] Simpe script does not finish Message-ID: <1451995578376-4716161.post@n4.nabble.com> Hi all, I have a dataframe (512x512). I just want to create a new dataframe from it. I used the following code. It does not stop running! Could you please help. Thanks # Read the text file (512x512) df<-read.table("out_1_1.txt") d=NULL for (i in 0:511){ for (j in 0:511){ Myrow<-c(i,j,df[i+1,j+1]) d=rbind(d,Myrow) } } -- View this message in context: http://r.789695.n4.nabble.com/Simpe-script-does-not-finish-tp4716161.html Sent from the datatable-help mailing list archive at Nabble.com. From shariful at excite.com Tue Jan 5 13:08:03 2016 From: shariful at excite.com (hello_R) Date: Tue, 5 Jan 2016 04:08:03 -0800 (PST) Subject: [datatable-help] Simple script does not finish Message-ID: <1451995683363-4716162.post@n4.nabble.com> Hi all, I have a dataframe (512x512). I just want to create a new dataframe from it. I used the following code. It does not stop running! Could you please help. Thanks # Read the text file (512x512) df<-read.table("out_1_1.txt") d=NULL for (i in 0:511){ for (j in 0:511){ Myrow<-c(i,j,df[i+1,j+1]) d=rbind(d,Myrow) } } -- View this message in context: http://r.789695.n4.nabble.com/Simple-script-does-not-finish-tp4716162.html Sent from the datatable-help mailing list archive at Nabble.com. From jholtman at gmail.com Wed Jan 6 14:21:24 2016 From: jholtman at gmail.com (jim holtman) Date: Wed, 6 Jan 2016 08:21:24 -0500 Subject: [datatable-help] Simpe script does not finish In-Reply-To: <1451995578376-4716161.post@n4.nabble.com> References: <1451995578376-4716161.post@n4.nabble.com> Message-ID: ?It is bad practice to keep growing objects like 'd' in the loop. Here is a modified version that runs in under 1 second. > > # Read the text file (512x512) > df<-matrix(runif(512 * 512), 512) > my.stats('start') start (1) - Rgui : 08:19:36 <282.7 415537.0> 415537.0 : 108.9MB > d=matrix(NA,512 * 512, 3) # create output matrix > for (i in 0:511){ + for (j in 0:511){ + d[i * 512 + j + 1,] <- c(i,j,df[i+1,j+1]) + # Myrow<-c(i,j,df[i+1,j+1]) + # d=rbind(d,Myrow) + } + } > my.stats('done') done (1) - Rgui : 08:19:37 <283.9 415538.1> 415538.1 : 164.9MB > head(d, 10) [,1] [,2] [,3] [1,] 0 0 0.72414660 [2,] 0 1 0.26716396 [3,] 0 2 0.11328281 [4,] 0 3 0.05107802 [5,] 0 4 0.44158025 [6,] 0 5 0.09608051 [7,] 0 6 0.18925725 [8,] 0 7 0.64978998 [9,] 0 8 0.51816944 [10,] 0 9 0.83320742 ? ? Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Tue, Jan 5, 2016 at 7:06 AM, hello_R wrote: > Hi all, > I have a dataframe (512x512). I just want to create a new dataframe from > it. > I used the following code. It does not stop running! Could you please help. > Thanks > > # Read the text file (512x512) > df<-read.table("out_1_1.txt") > > d=NULL > for (i in 0:511){ > for (j in 0:511){ > Myrow<-c(i,j,df[i+1,j+1]) > d=rbind(d,Myrow) > } > } > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Simpe-script-does-not-finish-tp4716161.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > -------------- next part -------------- An HTML attachment was scrubbed... URL: From setareh227 at gmail.com Thu Jan 7 17:03:45 2016 From: setareh227 at gmail.com (Maryia) Date: Thu, 7 Jan 2016 08:03:45 -0800 (PST) Subject: [datatable-help] Please help me with data frame and melt function in R Message-ID: <1452182625222-4716238.post@n4.nabble.com> Hi all, I've just come to the amazing R software, so please be patient if my question is basic for you. I have 2 text file (say 1.txt and 2.txt), each file containing 2 columns and thousands row as like below, case size case1 120 case2 120 case3 121 case4 122 (The number of row in two files are different). I would like to combine the related data frames, so I wrote the following code in R (3.2.3 version), df1 = data.frame("1.txt",header=T) df2 = data.frame("2.txt",header=T) df = data.frame(df1$size,df2$size) library(reshape) melted <- melt(df) But the melt command gives an error that "Using as id variables". Actually, the melt function didn't work here. Could you please help me out what is wrong here and how to solve it? Thanks in advance -- View this message in context: http://r.789695.n4.nabble.com/Please-help-me-with-data-frame-and-melt-function-in-R-tp4716238.html Sent from the datatable-help mailing list archive at Nabble.com. From statquant at outlook.com Thu Jan 7 21:58:03 2016 From: statquant at outlook.com (statquant3) Date: Thu, 7 Jan 2016 12:58:03 -0800 (PST) Subject: [datatable-help] How can I reshape a list of list of data.tables from wide to long Message-ID: <1452200283579-4716248.post@n4.nabble.com> Following this post on SO: http://stackoverflow.com/questions/34643746/how-can-i-reshape-a-list-of-list-from-wide-to-long I was wondering if there could be a data.table way to deal with list of list of data.tables. require(data.table) l <- list(a1 = list(b=data.table(rnorm(3)), c=data.table(rnorm(3)), d=data.table(rnorm(3))), a2 = list(b=data.table(rnorm(3)), c=data.table(rnorm(3)), d=data.table(rnorm(3)))) The idea is to go from a N-named list of P-named list of data.table to a P-named list of N-named list of data.table (a pure transpose) +a1---b +b ---a1 ---c ---a2 ---d +c---a1 +a2---b to ---a2 ---c +d---a1 ---d ---a2 Is there a data.table idiomatic way ? Can we make sure the data.tables are not copied ? -- View this message in context: http://r.789695.n4.nabble.com/How-can-I-reshape-a-list-of-list-of-data-tables-from-wide-to-long-tp4716248.html Sent from the datatable-help mailing list archive at Nabble.com. From my.r.help at gmail.com Fri Jan 8 03:02:34 2016 From: my.r.help at gmail.com (Michael Smith) Date: Fri, 8 Jan 2016 10:02:34 +0800 Subject: [datatable-help] Please help me with data frame and melt function in R In-Reply-To: <1452182625222-4716238.post@n4.nabble.com> References: <1452182625222-4716238.post@n4.nabble.com> Message-ID: <568F18BA.7000007@gmail.com> First off, if you have questions about "base" R you might wanna try the R-help mailing list instead of the data.table list. Second, I guess you're looking for merge. See `?merge`. Third, if you want to do it the data.table way, you can just do something like `df2[df2]`. M On 01/08/2016 12:03 AM, Maryia wrote: > Hi all, > > I've just come to the amazing R software, so please be patient if my > question is basic for you. I have 2 text file (say 1.txt and 2.txt), each > file containing 2 columns and thousands row as like below, > > case size > case1 120 > case2 120 > case3 121 > case4 122 > > (The number of row in two files are different). I would like to combine the > related data frames, so I wrote the following code in R (3.2.3 version), > > df1 = data.frame("1.txt",header=T) > df2 = data.frame("2.txt",header=T) > df = data.frame(df1$size,df2$size) > library(reshape) > melted <- melt(df) > > But the melt command gives an error that "Using as id variables". Actually, > the melt function didn't work here. > Could you please help me out what is wrong here and how to solve it? > > Thanks in advance > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Please-help-me-with-data-frame-and-melt-function-in-R-tp4716238.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > From my.r.help at gmail.com Fri Jan 8 03:05:15 2016 From: my.r.help at gmail.com (Michael Smith) Date: Fri, 8 Jan 2016 10:05:15 +0800 Subject: [datatable-help] How can I reshape a list of list of data.tables from wide to long In-Reply-To: <1452200283579-4716248.post@n4.nabble.com> References: <1452200283579-4716248.post@n4.nabble.com> Message-ID: <568F195B.6040105@gmail.com> Not sure about a data.table way, but maybe(?) rlist helps: http://renkun.me/rlist/ M On 01/08/2016 04:58 AM, statquant3 wrote: > Following this post on SO: > http://stackoverflow.com/questions/34643746/how-can-i-reshape-a-list-of-list-from-wide-to-long > I was wondering if there could be a data.table way to deal with list of list > of data.tables. > > require(data.table) > l <- list(a1 = list(b=data.table(rnorm(3)), c=data.table(rnorm(3)), > d=data.table(rnorm(3))), > a2 = list(b=data.table(rnorm(3)), c=data.table(rnorm(3)), > d=data.table(rnorm(3)))) > > The idea is to go from a N-named list of P-named list of data.table to a > P-named list of N-named list of data.table (a pure transpose) > > +a1---b +b ---a1 > ---c ---a2 > ---d +c---a1 > +a2---b to ---a2 > ---c +d---a1 > ---d ---a2 > > Is there a data.table idiomatic way ? > Can we make sure the data.tables are not copied ? > > > > -- > View this message in context: http://r.789695.n4.nabble.com/How-can-I-reshape-a-list-of-list-of-data-tables-from-wide-to-long-tp4716248.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > From statquant at outlook.com Fri Jan 8 09:48:01 2016 From: statquant at outlook.com (statquant3) Date: Fri, 8 Jan 2016 00:48:01 -0800 (PST) Subject: [datatable-help] How can I reshape a list of list of data.tables from wide to long In-Reply-To: <568F195B.6040105@gmail.com> References: <1452200283579-4716248.post@n4.nabble.com> <568F195B.6040105@gmail.com> Message-ID: <1452242881946-4716259.post@n4.nabble.com> I did not find out but thanks for the reference -- View this message in context: http://r.789695.n4.nabble.com/How-can-I-reshape-a-list-of-list-of-data-tables-from-wide-to-long-tp4716248p4716259.html Sent from the datatable-help mailing list archive at Nabble.com. From statquant at outlook.com Fri Jan 8 10:00:33 2016 From: statquant at outlook.com (statquant3) Date: Fri, 8 Jan 2016 01:00:33 -0800 (PST) Subject: [datatable-help] Exploring a data.table Message-ID: <1452243633784-4716260.post@n4.nabble.com> When looking at a wide table (many columns) it is usefull to have a way of summarizing the table. str, summarize help but are not too good. Do you guys have your own ? Cheers -- View this message in context: http://r.789695.n4.nabble.com/Exploring-a-data-table-tp4716260.html Sent from the datatable-help mailing list archive at Nabble.com. From mrodriguez5 at imim.es Fri Jan 8 13:25:04 2016 From: mrodriguez5 at imim.es (MARIA RODRIGUEZ) Date: Fri, 8 Jan 2016 04:25:04 -0800 (PST) Subject: [datatable-help] DROP FACTOR LEVEL LINE GRAPH Message-ID: <1452255904664-4716264.post@n4.nabble.com> Hello everybody! I would like to droup out one factor level in my line graph. I have to subgroups, one of them missing the 24/36 months value and the other one missing the final value (60 months). My syntax automatically removes all the rows containing NAs. That's OK for me...BUT!, in one of the subgroups, the consequence is the total absence of the line....!! As you can see, it shows only the confidence interval for the corresponding value at 60 months. How could I avoid this :( ? I mean, I would like the line of this subgroup to be present, showing only the value at 60 months.... Here is my dataset: Thank you very much for your help! Maria. -- View this message in context: http://r.789695.n4.nabble.com/DROP-FACTOR-LEVEL-LINE-GRAPH-tp4716264.html Sent from the datatable-help mailing list archive at Nabble.com. From statquant at outlook.com Fri Jan 8 14:34:37 2016 From: statquant at outlook.com (statquant3) Date: Fri, 8 Jan 2016 05:34:37 -0800 (PST) Subject: [datatable-help] "ungrouping" a data.table Message-ID: <1452260077373-4716265.post@n4.nabble.com> I am trying to replicate the kdb ungroup function Say you have a table with several list columns, each list have same number of elements on the same row R) t = Sys.time() R) DT=data.table(a=c(1,2,3),b=c('q','w','e'),c=list(rep(t,2),rep(t+1,3),rep(t,0)),d=list(rep(1,2),rep(20,3),rep(1,0))) R) DT a b c d 1: 1 q 2016-01-08 13:45:04.16544,2016-01-08 13:45:04.16544 1,1 2: 2 w 2016-01-08 13:45:05.16544,2016-01-08 13:45:05.16544,2016-01-08 13:45:05.16544 20,20,20 3: 3 e The idea is to unlist all list columns keeping the non-list unchanged I have the following: dtUngroup <- function(DT){ colClasses <- lapply(DT,FUN=class) listCols <- which(colClasses=='list') if(length(listCols)>0){ nonListCols <- setdiff(colnames(DT),listCols) DT[,nbListElem:=lapply(.SD,FUN=lengths),.SDcols=(listCols[1L])] DT1 <- DT[,lapply(.SD,FUN=rep,times=DT$nbListElem),.SDcols=(nonListCols)] DT1[,(listCols):=DT[,lapply(.SD,FUN=unlist),.SDcols=(listCols)]] DT1[,nbListElem:=NULL] return(DT1) } return(DT) } R) dtUngroup(DT)[] a b c d 1: 1 q 1452260946 1 2: 1 q 1452260946 1 3: 2 w 1452260947 20 4: 2 w 1452260947 20 5: 2 w 1452260947 20 Buy as you can see 1. it is verbose 2. empty lists are unsupported 3. POSIXct type is downcasted to numeric Any idea how to fix those ? -- View this message in context: http://r.789695.n4.nabble.com/ungrouping-a-data-table-tp4716265.html Sent from the datatable-help mailing list archive at Nabble.com. From statquant at outlook.com Fri Jan 8 14:54:54 2016 From: statquant at outlook.com (statquant3) Date: Fri, 8 Jan 2016 05:54:54 -0800 (PST) Subject: [datatable-help] "ungrouping" a data.table In-Reply-To: <1452260077373-4716265.post@n4.nabble.com> References: <1452260077373-4716265.post@n4.nabble.com> Message-ID: <1452261294245-4716266.post@n4.nabble.com> Looks like hadley already had it... tidyr:::unnest -- View this message in context: http://r.789695.n4.nabble.com/ungrouping-a-data-table-tp4716265p4716266.html Sent from the datatable-help mailing list archive at Nabble.com. From statquant at outlook.com Fri Jan 8 15:03:27 2016 From: statquant at outlook.com (statquant3) Date: Fri, 8 Jan 2016 06:03:27 -0800 (PST) Subject: [datatable-help] "ungrouping" a data.table In-Reply-To: <1452261294245-4716266.post@n4.nabble.com> References: <1452260077373-4716265.post@n4.nabble.com> <1452261294245-4716266.post@n4.nabble.com> Message-ID: <1452261807393-4716267.post@n4.nabble.com> though it appears terribly unefficient -- View this message in context: http://r.789695.n4.nabble.com/ungrouping-a-data-table-tp4716265p4716267.html Sent from the datatable-help mailing list archive at Nabble.com. From statquant at outlook.com Fri Jan 8 15:23:23 2016 From: statquant at outlook.com (statquant3) Date: Fri, 8 Jan 2016 06:23:23 -0800 (PST) Subject: [datatable-help] "ungrouping" a data.table In-Reply-To: <1452261807393-4716267.post@n4.nabble.com> References: <1452260077373-4716265.post@n4.nabble.com> <1452261294245-4716266.post@n4.nabble.com> <1452261807393-4716267.post@n4.nabble.com> Message-ID: <1452263003130-4716269.post@n4.nabble.com> dtUngroup <- function(DT){ colClasses <- lapply(DT,FUN=class) listCols <- which(colClasses=='list') if(length(listCols)>0){ nonListCols <- setdiff(colnames(DT),listCols) nbListElem <- unlist(DT[,lapply(.SD,FUN=lengths),.SDcols=(listCols[1L])]) DT1 <- DT[,lapply(.SD,FUN=rep,times=(nbListElem)),.SDcols=(nonListCols)] DT1[,(listCols):=DT[,lapply(.SD,FUN=function(x) do.call('c',x)),.SDcols=(listCols)]] return(DT1) } return(DT) } This works... still 20x slower than the equivalent in kdb -- View this message in context: http://r.789695.n4.nabble.com/ungrouping-a-data-table-tp4716265p4716269.html Sent from the datatable-help mailing list archive at Nabble.com. From snowwings.xbc at gmail.com Fri Jan 8 23:03:42 2016 From: snowwings.xbc at gmail.com (snowwings) Date: Fri, 8 Jan 2016 14:03:42 -0800 (PST) Subject: [datatable-help] Print percentage labels SEPARATELY based on groups [ggplot] Message-ID: <1452290622690-4716290.post@n4.nabble.com> I have a data set have two groups, and used ggplot to plot histogram. I tried to print the percentages in each group but not the percentage of the total. ggplot(data.frame(general), aes(x = general$Severity, fill = Organization)) + geom_bar(position = "dodge") + stat_bin(aes(y=..count.., label = paste(..count.., sprintf("%.02f %%", ..count../sum(..count..)*100))),geom="text", vjust=-.5,size=3,position = position_dodge(width = 0.8)) + scale_x_discrete(limits=levels(general$Severity)[1:8]) + theme(axis.text.x = element_text(angle = 15, hjust = 1)) + xlab("Severity") + scale_fill_manual(values=c("royalblue2", rgb(212, 179, 125, maxColorValue=255))) Howeve, I want show the percentages in the table below: And I don't want to show the "NA" in the plot. It feels like that I should change the label sentence. What kind of code should I use? -- View this message in context: http://r.789695.n4.nabble.com/Print-percentage-labels-SEPARATELY-based-on-groups-ggplot-tp4716290.html Sent from the datatable-help mailing list archive at Nabble.com. From mel at mbacou.com Tue Jan 12 04:22:00 2016 From: mel at mbacou.com (Bacou, Melanie) Date: Mon, 11 Jan 2016 22:22:00 -0500 Subject: [datatable-help] "ungrouping" a data.table In-Reply-To: <1452263003130-4716269.post@n4.nabble.com> References: <1452260077373-4716265.post@n4.nabble.com> <1452261294245-4716266.post@n4.nabble.com> <1452261807393-4716267.post@n4.nabble.com> <1452263003130-4716269.post@n4.nabble.com> Message-ID: <56947158.60303@mbacou.com> Assuming all list columns are of the same lengths, then this might be a little faster? dtUngroup <- function(dt) { l <- names(dt)[dt[, lapply(.SD, class)]=="list"] if (length(l)>0) { nl <- setdiff(names(dt), l) t <- sapply(dt[[l[1]]], length) tmp <- dt[, .SD, .SDcols=nl] tmp <- tmp[, lapply(.SD, rep, times=t)] tmp <- cbind(tmp, dt[, lapply(.SD, unlist), .SDcols=l]) } else tmp <- dt return(tmp) } # If you need to cast columns back to POSIXct, might be easier as follows: res <- dtUngroup(DT) res[, c := as.POSIXct(c, origin="1970-01-01")] On 1/8/2016 9:23 AM, statquant3 wrote: > dtUngroup <- function(DT){ > colClasses <- lapply(DT,FUN=class) > listCols <- which(colClasses=='list') > if(length(listCols)>0){ > nonListCols <- setdiff(colnames(DT),listCols) > nbListElem <- unlist(DT[,lapply(.SD,FUN=lengths),.SDcols=(listCols[1L])]) > DT1 <- DT[,lapply(.SD,FUN=rep,times=(nbListElem)),.SDcols=(nonListCols)] > DT1[,(listCols):=DT[,lapply(.SD,FUN=function(x) > do.call('c',x)),.SDcols=(listCols)]] > return(DT1) > } > return(DT) > } > > This works... still 20x slower than the equivalent in kdb > > > > -- > View this message in context: http://r.789695.n4.nabble.com/ungrouping-a-data-table-tp4716265p4716269.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help From tsuimark at hotmail.co.uk Wed Jan 13 15:34:35 2016 From: tsuimark at hotmail.co.uk (tsuimark) Date: Wed, 13 Jan 2016 06:34:35 -0800 (PST) Subject: [datatable-help] Summarising by recurrance Message-ID: <1452695675189-4716393.post@n4.nabble.com> I have a very large dataset showing logins to a website. I'm trying to calculate the frequency of logins by username. what I hope to get is a table like the below where the period is listed as row names and the frequency is the column header.i.e. [count of log in][ 1][ 2][ 3][ 4][ 5][ 6][ 7][ 8][ 9] [Monday ][120][210][540][240][150][456][410][240][540] [Tuesday ][150][280][140][640][180][123][210][320][240] The data is simply login id, date/time login. I've been able to add columns appending the month, name of the day, and day number based on the date of login. ideally I'd like to be able to get the same sort of summary as above for each category (Month, day of month, day name). My knowledge of R is rudimental at best I hope someone can help me. Kind regards, Mark. -- View this message in context: http://r.789695.n4.nabble.com/Summarising-by-recurrance-tp4716393.html Sent from the datatable-help mailing list archive at Nabble.com. From chenhuashan at gmail.com Fri Jan 15 03:39:27 2016 From: chenhuashan at gmail.com (Huashan Chen) Date: Thu, 14 Jan 2016 18:39:27 -0800 (PST) Subject: [datatable-help] possible sort bug? Message-ID: <1452825567014-4716440.post@n4.nabble.com> dt = data.table(id=1:30, nn = paste0('A', 1:30)) smp = sample(30, size =10) # data is sorted as expected aa = dt$id %in% smp dt[aa, ] # However, this gives unsorted result dt[id %in% smp, ] Is this a bug or by design? -- View this message in context: http://r.789695.n4.nabble.com/possible-sort-bug-tp4716440.html Sent from the datatable-help mailing list archive at Nabble.com. From my.r.help at gmail.com Fri Jan 15 05:51:07 2016 From: my.r.help at gmail.com (Michael Smith) Date: Fri, 15 Jan 2016 12:51:07 +0800 Subject: [datatable-help] possible sort bug? In-Reply-To: <1452825567014-4716440.post@n4.nabble.com> References: <1452825567014-4716440.post@n4.nabble.com> Message-ID: <56987ABB.4090101@gmail.com> Try dt[(id) %in% smp, ] On 01/15/2016 10:39 AM, Huashan Chen wrote: > > > dt = data.table(id=1:30, nn = paste0('A', 1:30)) > smp = sample(30, size =10) > > # data is sorted as expected > aa = dt$id %in% smp > dt[aa, ] > > # However, this gives unsorted result > dt[id %in% smp, ] > > Is this a bug or by design? > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/possible-sort-bug-tp4716440.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > From chenhuashan at gmail.com Sun Jan 17 09:15:37 2016 From: chenhuashan at gmail.com (Huashan Chen) Date: Sun, 17 Jan 2016 00:15:37 -0800 (PST) Subject: [datatable-help] possible sort bug? In-Reply-To: <56987ABB.4090101@gmail.com> References: <1452825567014-4716440.post@n4.nabble.com> <56987ABB.4090101@gmail.com> Message-ID: <1453018537567-4716507.post@n4.nabble.com> Hi, Michael, Thank you for the answer. But could you explain more on why the brackets around `id`? -- View this message in context: http://r.789695.n4.nabble.com/possible-sort-bug-tp4716440p4716507.html Sent from the datatable-help mailing list archive at Nabble.com. From my.r.help at gmail.com Mon Jan 18 06:49:20 2016 From: my.r.help at gmail.com (Michael Smith) Date: Mon, 18 Jan 2016 13:49:20 +0800 Subject: [datatable-help] possible sort bug? In-Reply-To: <1453018537567-4716507.post@n4.nabble.com> References: <1452825567014-4716440.post@n4.nabble.com> <56987ABB.4090101@gmail.com> <1453018537567-4716507.post@n4.nabble.com> Message-ID: <569C7CE0.6090804@gmail.com> It has something to do with the evaluation. I think it tells data.table to look "inside" the data.table. Sort of like the identity function. On 01/17/2016 04:15 PM, Huashan Chen wrote: > Hi, Michael, > > Thank you for the answer. But could you explain more on why the brackets > around `id`? > > > > -- > View this message in context: http://r.789695.n4.nabble.com/possible-sort-bug-tp4716440p4716507.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > From wuletawu979 at yahoo.com Wed Jan 20 13:48:07 2016 From: wuletawu979 at yahoo.com (waw) Date: Wed, 20 Jan 2016 04:48:07 -0800 (PST) Subject: [datatable-help] How to aggregate one data.frame based on other time data frame time index? Message-ID: <1453294087297-4716598.post@n4.nabble.com> Dear lists, I have two data frame, data.frame1<-data.frame(date=c(2010-01-01, 2010-01-02, 2010-01-03, 2010-01-04, 2010-01-05,2010-01-06,2010-01-07), A=c(0.5,10,15,3,10,20,12) , B=c(1.5,1,1.5,3.2,10.5,9,12))data.frame2<-data.frame(date=c(2010-01-03, 2010-01-07), A=c(25.5,45) , B=c(4,34.7))Now I would like to compare them. But before, I have to have the same length of rows. Data.frame2 is aggregated values of data.frame1. However, the time interval is irregular. Could you help me how to do this? thanks ! -- View this message in context: http://r.789695.n4.nabble.com/How-to-aggregate-one-data-frame-based-on-other-time-data-frame-time-index-tp4716598.html Sent from the datatable-help mailing list archive at Nabble.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.forrest at senckenberg.de Wed Jan 20 19:18:04 2016 From: matthew.forrest at senckenberg.de (Matthew Forrest) Date: Wed, 20 Jan 2016 19:18:04 +0100 Subject: [datatable-help] use of data.table in an S4 class Message-ID: <569FCF5C.9050701@senckenberg.de> Hi all, I want to use data.table as either a data element (slot) of an S4 class or as a super class (ie. the new class inherits from data.table using "contains"). The reason for this is that I want to build an S4 class with data.table as the main data component of the class, but with some rather complex meta-data (specifically more S4 classes) associated. I then want to operate on this data.table (mostly with ":=") inside some functions The second option (using a superclass of data.table) looks perfect, if it worked I would just be able to treat the new S4 class exactly as a data.table. One could pass a data.table superclass object into a function which could operate on the data.table superclass using ":=", and then (by the reference goodness of data.table) the data.table superclass would be still be modified outside the function. But when data.table is used as a super class, the normal operations just don't work. See my issue flagged on github (with a simple code snippet to demonstrate) here: https://github.com/Rdatatable/data.table/issues/1504 Maybe this can work, which would be fantastic, but let's see. Then there is the idea of using data.table a regular slot. Problem is that accessing the data.table slot in the S4 object and modifying it (either inside a function or using a class method) results in the type of copying that data.table works so hard to avoid! Disaster! For an example, run this code: # simple test object setClass("TestObj", slots = c(id = "character", dt = "data.table" ) ) # define a method setGeneric(name="testMethod", def=function(theObject,new.col.name, cols.to.add) { standardGeneric("testMethod") } ) setMethod(f="testMethod", signature="TestObj", definition=function(theObject,new.col.name, cols.to.add) { theObject at dt <- theObject at dt[,paste(new.col.name):= rowSums(.SD), .SDcols = cols.to.add] return(theObject) } ) # create a TestObj lala <- new("TestObj", id = "test", dt = data.table(a=1:10, b=11:20)) # accessing the data.table slot results in a copy :-( lala at dt <- lala at dt[, c1 := a + b] # using a method also makes a copy :'-( testMethod(lala, new.col.name = "c2", cols.to.add = c("a","b")) lala <- testMethod(lala, new.col.name = "c2", cols.to.add = c("a","b")) So you can see the problem. I want to use a data.table as past of S4 class, and process it in keeping with data.table principles, but I can't find a way. It is possible that I can just suck up the performance cost of the copy, but some of my data.tables are pretty large so that might not be viable. Any help greatly appreciated! Thanks, Matt -- Dr Matthew Forrest Biodiversity and Climate Research Centre (BiK-F) Visiting address: Georg-Voigt-Stra?e 14-16, room 3.04, D-60325 Frankfurt am Main Postal address: Senckenberganlage 25, D-60325 Frankfurt am Main Tel.: +49-69-7542-1867 Fax: +49-69-7542-7904 E-mail: matthew.forrest at senckenberg.de Homepage: http://www.bik-f.de/root/index.php?page_id=709 Senckenberg Gesellschaft f?r Naturforschung Rechtsf?higer Verein gem?? ? 22 BGB Senckenberganlage 25 60325 Frankfurt Direktorium: Prof. Dr. Dr. h.c. Volker Mosbrugger, Prof. Dr. Andreas Mulch, Stephanie Schwedhelm, Prof. Dr. Katrin B?hning-Gaese, Prof. Dr. Uwe Fritz, PD Dr. Ingrid Kr?ncke Pr?sidentin: Dr. h.c. Beate Heraeus Aufsichtsbeh?rde: Magistrat der Stadt Frankfurt am Main (Ordnungsamt) From jan.kacaba at gmail.com Thu Jan 21 14:37:23 2016 From: jan.kacaba at gmail.com (derek) Date: Thu, 21 Jan 2016 05:37:23 -0800 (PST) Subject: [datatable-help] replace a substring in a string Message-ID: <1453383443260-4716659.post@n4.nabble.com> Hello can you please tell me how to replace a string in another string at exact position? For example: The string: "abc-abc-abc-bla-abc" In the string I would like to replace substring "abc" a) second occurrence of "abc" from the left of the string b) first occurrence from the right of the string c) all occurrences d) replace first occurrence after 5th character Thank you for your help in advance. -- View this message in context: http://r.789695.n4.nabble.com/replace-a-substring-in-a-string-tp4716659.html Sent from the datatable-help mailing list archive at Nabble.com. From mrodriguez5 at imim.es Thu Jan 21 16:15:13 2016 From: mrodriguez5 at imim.es (MARIA RODRIGUEZ) Date: Thu, 21 Jan 2016 07:15:13 -0800 (PST) Subject: [datatable-help] GGPLOT. facet_wrap and geom_rect Message-ID: <1453389313232-4716664.post@n4.nabble.com> Dear all, Does anybody know how can I avoid colour transparency (alpha) changing with I mean, I would like to have the same colour and transparency in all the graphs. Thank you very much in advance! -- View this message in context: http://r.789695.n4.nabble.com/GGPLOT-facet-wrap-and-geom-rect-tp4716664.html Sent from the datatable-help mailing list archive at Nabble.com. From mrodriguez5 at imim.es Thu Jan 21 17:30:44 2016 From: mrodriguez5 at imim.es (MARIA RODRIGUEZ) Date: Thu, 21 Jan 2016 08:30:44 -0800 (PST) Subject: [datatable-help] GGPLOT: geom_rect and facet Message-ID: <1453393844698-4716668.post@n4.nabble.com> Dear all, Does anybody know how can I avoid colour transparency (alpha) changing with geom_rect function, I would like to have the same colour and transparency in all the graphs. -- View this message in context: http://r.789695.n4.nabble.com/GGPLOT-geom-rect-and-facet-tp4716668.html Sent from the datatable-help mailing list archive at Nabble.com. From fperickson at wisc.edu Thu Jan 21 17:57:16 2016 From: fperickson at wisc.edu (Frank Erickson) Date: Thu, 21 Jan 2016 11:57:16 -0500 Subject: [datatable-help] GGPLOT: geom_rect and facet In-Reply-To: <1453393844698-4716668.post@n4.nabble.com> References: <1453393844698-4716668.post@n4.nabble.com> Message-ID: Hi, You are posting to the wrong mailing list. This one is for the data.table package, not ggplot2. Try the mailing list mentioned here: http://ggplot2.org/ Best, Frank On Thu, Jan 21, 2016 at 11:30 AM, MARIA RODRIGUEZ wrote: > Dear all, > > Does anybody know how can I avoid colour transparency (alpha) changing > with > geom_rect function, I would like to have the same colour and transparency > in > all the graphs. > > > > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/GGPLOT-geom-rect-and-facet-tp4716668.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > -------------- next part -------------- An HTML attachment was scrubbed... URL: From frederik at ofb.net Wed Jan 27 21:40:13 2016 From: frederik at ofb.net (frederik at ofb.net) Date: Wed, 27 Jan 2016 12:40:13 -0800 Subject: [datatable-help] sorting on a floating point column Message-ID: <20160127204013.GO3375@ofb.net> This is following up on a thread from a couple years ago: http://lists.r-forge.r-project.org/pipermail/datatable-help/2013-May/001689.html I ran into this problem myself, it took a bit of time to debug because it is so surprising. In my case, I was using order() to sort a list of floats. I expected the result to be monotonic but it wasn't! Then I found out that the problem was due to 'order' being part of the data.table library. By using base::order, I was able to get correct behavior. I don't understand why improperly ordering floating point data helps the data.table library accomplish anything, whether it is looking up keys or what. Also, it must be much slower to compare floats with a tolerance, than to just compare them. I seem to recall that floats were designed so that normal comparison is quite fast. Please fix this bug, or at least document it more visibly. Thank you, Frederick Eaton From aragorn168b at gmail.com Wed Jan 27 22:13:44 2016 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Wed, 27 Jan 2016 22:13:44 +0100 Subject: [datatable-help] sorting on a floating point column In-Reply-To: <20160127204013.GO3375@ofb.net> References: <20160127204013.GO3375@ofb.net> Message-ID: This is following up on a thread from a couple years ago:? http://lists.r-forge.r-project.org/pipermail/datatable-help/2013-May/001689.html? Things have changed A LOT! I suggest you keep up-to-date by reading the README about bug fixes and features from the github project page:?https://github.com/Rdatatable/data.table I ran into this problem myself, it took a bit of time to debug because?it is so surprising.? What?s surprising? Reproducible example please. data.table package version, R version as well please.? Without that my best guess is for you to look at `?setNumericRounding`. --? Arun On 27 January 2016 at 21:40:23, frederik at ofb.net (frederik at ofb.net) wrote: This is following up on a thread from a couple years ago: http://lists.r-forge.r-project.org/pipermail/datatable-help/2013-May/001689.html I ran into this problem myself, it took a bit of time to debug because it is so surprising. In my case, I was using order() to sort a list of floats. I expected the result to be monotonic but it wasn't! Then I found out that the problem was due to 'order' being part of the data.table library. By using base::order, I was able to get correct behavior. I don't understand why improperly ordering floating point data helps the data.table library accomplish anything, whether it is looking up keys or what. Also, it must be much slower to compare floats with a tolerance, than to just compare them. I seem to recall that floats were designed so that normal comparison is quite fast. Please fix this bug, or at least document it more visibly. Thank you, Frederick Eaton _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From frederik at ofb.net Thu Jan 28 00:03:16 2016 From: frederik at ofb.net (frederik at ofb.net) Date: Wed, 27 Jan 2016 15:03:16 -0800 Subject: [datatable-help] sorting on a floating point column In-Reply-To: References: <20160127204013.GO3375@ofb.net> Message-ID: <20160127230316.GP3375@ofb.net> data.table 1.9.6 What's surprising is that sorting a list of floats wouldn't do the obvious thing, and sort them exactly. Is it surprising that this would be surprising? Why do you want a minimal test case, when setNumericRounding explains that the behavior I reported is intentional? I now see that this is also documented in the data.table::order page. So I guess it is already "documented visibly". And setNumericRounding explains that it is slightly faster to ignore the last two bytes, requiring fewer radix sort passes. I wanted to share my experience that this behavior is confusing. Thank you at least for pointing me to your documentation. Frederick On Wed, Jan 27, 2016 at 10:13:44PM +0100, Arunkumar Srinivasan wrote: > This is following up on a thread from a couple years ago:? > http://lists.r-forge.r-project.org/pipermail/datatable-help/2013-May/001689.html? > Things have changed A LOT! I suggest you keep up-to-date by reading the README about bug fixes and features from the github project page:?https://github.com/Rdatatable/data.table > > I ran into this problem myself, it took a bit of time to debug because?it is so surprising.? > What?s surprising? Reproducible example please. data.table package version, R version as well please.? > Without that my best guess is for you to look at `?setNumericRounding`. > > --? > Arun > > On 27 January 2016 at 21:40:23, frederik at ofb.net (frederik at ofb.net) wrote: > > This is following up on a thread from a couple years ago: > > http://lists.r-forge.r-project.org/pipermail/datatable-help/2013-May/001689.html > > I ran into this problem myself, it took a bit of time to debug because > it is so surprising. > > In my case, I was using order() to sort a list of floats. > > I expected the result to be monotonic but it wasn't! > > Then I found out that the problem was due to 'order' being part of the > data.table library. By using base::order, I was able to get correct > behavior. > > I don't understand why improperly ordering floating point data helps > the data.table library accomplish anything, whether it is looking up > keys or what. > > Also, it must be much slower to compare floats with a tolerance, than > to just compare them. I seem to recall that floats were designed so > that normal comparison is quite fast. > > Please fix this bug, or at least document it more visibly. > > Thank you, > > Frederick Eaton > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help From jan.kacaba at gmail.com Thu Jan 28 20:23:25 2016 From: jan.kacaba at gmail.com (Jan Kacaba) Date: Thu, 28 Jan 2016 20:23:25 +0100 Subject: [datatable-help] testing R-help through email Message-ID: Hello, I've been subscriber to R-help for long time and I've been using nabble to post my messages to R-help. The option is now disabled. I don't quite understand why and in what ways is the mailing list better than classical forum. I hope this message reaches the target, otherwise I don't know how to post to R-help. Thank you for any message in advance. -------------- next part -------------- An HTML attachment was scrubbed... URL: From statquant at outlook.com Fri Jan 29 15:01:43 2016 From: statquant at outlook.com (statquant3) Date: Fri, 29 Jan 2016 06:01:43 -0800 (PST) Subject: [datatable-help] ITime class being lost Message-ID: <1454076103288-4716936.post@n4.nabble.com> Using ITime it looks like basic operations are messing with type R) seq(from = as.ITime('08:00'), to = as.ITime('09:00'), by=as.ITime('00:05')) [1] 28800 29100 29400 29700 30000 30300 30600 30900 31200 31500 31800 32100 32400 R) cut(as.ITime('08:23'),breaks=seq(from = as.ITime('08:00'), to = as.ITime('08:30'), by=as.ITime('00:05'))) [1] (3e+04,3.03e+04] Levels: (2.88e+04,2.91e+04] (2.91e+04,2.94e+04] (2.94e+04,2.97e+04] (2.97e+04,3e+04] (3e+04,3.03e+04] (3.03e+04,3.06e+04] Any chance this gets fixed ? I know ITime is not used that much... -- View this message in context: http://r.789695.n4.nabble.com/ITime-class-being-lost-tp4716936.html Sent from the datatable-help mailing list archive at Nabble.com. From lianoglou.steve at gene.com Sat Jan 30 15:02:46 2016 From: lianoglou.steve at gene.com (Steve Lianoglou) Date: Sat, 30 Jan 2016 06:02:46 -0800 Subject: [datatable-help] use of data.table in an S4 class In-Reply-To: <569FCF5C.9050701@senckenberg.de> References: <569FCF5C.9050701@senckenberg.de> Message-ID: Hi, > # create a TestObj > lala <- new("TestObj", id = "test", dt = data.table(a=1:10, b=11:20)) > > # accessing the data.table slot results in a copy :-( > lala at dt <- lala at dt[, c1 := a + b] You are reassigning to the object here, what if you were to do this: R> lala at dt[, c1 := a + b] Accessing S4 object directory via their slot (ie using @) is discouraged, but I'm not how it would work if you did the same using a function. For example you might create a function `dt` that returns the object in the @dt slot then work on it directly: R> dt(lala)[, c1 := a + b] Perhaps you can play with those and let us know if either is satisfactory? -steve -- Steve Lianoglou Computational Biologist Genentech