From statquant at outlook.com Wed Oct 14 13:22:46 2015 From: statquant at outlook.com (statquant3) Date: Wed, 14 Oct 2015 04:22:46 -0700 (PDT) Subject: [datatable-help] How can I apply a function of 2 columns to multiple other columns with a by clause Message-ID: <1444821766985-4713576.post@n4.nabble.com> Hello, I am looking to * update several columns by * applying a function f (to each of those columns) that would use those columns AND another one. If there is no "by" I make it work How can I do the same with a "by" : Bellow an example (as I can't be very clear) #data setup library(data.table) set.seed(1) N <- 101 DT <- data.table(x1=rnorm(N),x2=rnorm(N),x3=rnorm(N),x4=rnorm(N),y=letters[sample(5,size=N,replace=T)]) #function to be applied f <- function(x,y){return( frank(x/y,na.last='keep') )} #column names xCols <- paste0('x',1:3) rCols <- paste0('r',1:3) #when there is no by it is easy DT[,(rCols):=lapply(FUN=f,X=.SD,y=DT$x4),.SDcols=xCols] #when there is a by it fails (offcourse DT$x4 is too big) DT[,(rCols):=lapply(FUN=f,X=.SD,y=DT$x4),.SDcols=xCols,by=.(y)] -- View this message in context: http://r.789695.n4.nabble.com/How-can-I-apply-a-function-of-2-columns-to-multiple-other-columns-with-a-by-clause-tp4713576.html Sent from the datatable-help mailing list archive at Nabble.com. From aragorn168b at gmail.com Wed Oct 14 13:45:18 2015 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Wed, 14 Oct 2015 13:45:18 +0200 Subject: [datatable-help] How can I apply a function of 2 columns to multiple other columns with a by clause In-Reply-To: <1444821766985-4713576.post@n4.nabble.com> References: <1444821766985-4713576.post@n4.nabble.com> Message-ID: Will become more natural when?https://github.com/Rdatatable/data.table/issues/495?is fixed. I?ll work on this at some point for v1.9.8. I don?t have a time frame yet. First, for both cases, I?d suggest using `Map()`. Second, as a temporary solution, I?d suggest using `mget()`. #when there is no by it is easy? DT[, (rCols) := Map(f, mget(xCols), list(x4))]? #when there is a by it fails (offcourse DT$x4 is too big)? DT[, (rCols) := Map(f, mget(xCols), list(x4)), by=y]? When #495 is fixed, `mget(xCols)` can be replaced with `.SD` along with `.SDcols = xCols`. --? Arun On 14 October 2015 at 13:33:37, statquant3 (statquant at outlook.com) wrote: Hello, I am looking to * update several columns by * applying a function f (to each of those columns) that would use those columns AND another one. If there is no "by" I make it work How can I do the same with a "by" : Bellow an example (as I can't be very clear) #data setup library(data.table) set.seed(1) N <- 101 DT <- data.table(x1=rnorm(N),x2=rnorm(N),x3=rnorm(N),x4=rnorm(N),y=letters[sample(5,size=N,replace=T)]) #function to be applied f <- function(x,y){return( frank(x/y,na.last='keep') )} #column names xCols <- paste0('x',1:3) rCols <- paste0('r',1:3) #when there is no by it is easy DT[,(rCols):=lapply(FUN=f,X=.SD,y=DT$x4),.SDcols=xCols] #when there is a by it fails (offcourse DT$x4 is too big) DT[,(rCols):=lapply(FUN=f,X=.SD,y=DT$x4),.SDcols=xCols,by=.(y)] -- View this message in context: http://r.789695.n4.nabble.com/How-can-I-apply-a-function-of-2-columns-to-multiple-other-columns-with-a-by-clause-tp4713576.html Sent from the datatable-help mailing list archive at Nabble.com. _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From statquant at outlook.com Wed Oct 14 14:08:53 2015 From: statquant at outlook.com (statquant3) Date: Wed, 14 Oct 2015 05:08:53 -0700 (PDT) Subject: [datatable-help] How can I apply a function of 2 columns to multiple other columns with a by clause In-Reply-To: References: <1444821766985-4713576.post@n4.nabble.com> Message-ID: <1444824533851-4713579.post@n4.nabble.com> Awesome For some reason I avoid using Map,Reduce and those kind of function in data.table. I recall that I was advised against it at some point. Should I reconsider ? -- View this message in context: http://r.789695.n4.nabble.com/How-can-I-apply-a-function-of-2-columns-to-multiple-other-columns-with-a-by-clause-tp4713576p4713579.html Sent from the datatable-help mailing list archive at Nabble.com. From t.jonesd289 at gmail.com Wed Oct 14 20:46:54 2015 From: t.jonesd289 at gmail.com (tjonesd289) Date: Wed, 14 Oct 2015 11:46:54 -0700 (PDT) Subject: [datatable-help] Filter rows using integer64 columns Message-ID: <1444848414820-4713594.post@n4.nabble.com> I have loaded data from file. The resulting data.table looks like this: > require(data.table) >require(bit64) > z = fread('mydata.csv') > print(z) a 1: -688037432807398365 2: 8910419692287774511 3: 7392641969610778497 4: -7275864368241016399 5: 5280275646239497580 > class(z$a) "integer64" However, I cannot filter z on column a... > z[a == -688037432807398365,] Empty data.table (0 rows) of 1 cols: a Similarly, > z[a == as.integer64(-688037432807398365),] Error in UseMethod("as.data.table") : no applicable method for 'as.data.table' applied to an object of class "integer64" I also noticed that as.integer64 rounds the input (see last 3 digits).... > as.integer64(-688037432807398365) integer64 [1] -688037432807398400 Any ideas how to filter rows using integer64 columns? I suppose I could convert to character first, but then what is the point of even having integer64? datatable is version 1.9.4, bit64 is version 0.9-5 Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Filter-rows-using-integer64-columns-tp4713594.html Sent from the datatable-help mailing list archive at Nabble.com. From kevinushey at gmail.com Wed Oct 14 21:06:06 2015 From: kevinushey at gmail.com (Kevin Ushey) Date: Wed, 14 Oct 2015 12:06:06 -0700 Subject: [datatable-help] Filter rows using integer64 columns In-Reply-To: <1444848414820-4713594.post@n4.nabble.com> References: <1444848414820-4713594.post@n4.nabble.com> Message-ID: R's syntax doesn't 'know' about 64bit integers, so when you try to write a == -688037432807398365 you're really creating a big double and losing some precision in the resulting comparison. I think you should construct your 64 bit integers from a string, e.g. as.integer64("-688037432807398365") This will avoid the round-trip to floating point and avoid losses in precision. By the time the R parser is done with '-688037432807398365', the number has already lost precision. Kevin On Wed, Oct 14, 2015 at 11:46 AM, tjonesd289 wrote: > I have loaded data from file. The resulting data.table looks like this: >> require(data.table) >>require(bit64) >> z = fread('mydata.csv') >> print(z) > a > 1: -688037432807398365 > 2: 8910419692287774511 > 3: 7392641969610778497 > 4: -7275864368241016399 > 5: 5280275646239497580 > >> class(z$a) > "integer64" > > However, I cannot filter z on column a... >> z[a == -688037432807398365,] > Empty data.table (0 rows) of 1 cols: a > > Similarly, >> z[a == as.integer64(-688037432807398365),] > Error in UseMethod("as.data.table") : > no applicable method for 'as.data.table' applied to an object of class > "integer64" > > I also noticed that as.integer64 rounds the input (see last 3 digits).... >> as.integer64(-688037432807398365) > integer64 > [1] -688037432807398400 > > Any ideas how to filter rows using integer64 columns? I suppose I could > convert to character first, but then what is the point of even having > integer64? > > datatable is version 1.9.4, bit64 is version 0.9-5 > > Thanks! > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Filter-rows-using-integer64-columns-tp4713594.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help From niparisco at gmail.com Mon Oct 19 15:59:27 2015 From: niparisco at gmail.com (Nicolas Paris) Date: Mon, 19 Oct 2015 15:59:27 +0200 Subject: [datatable-help] DT 1.9.5 - refers previous/next/whatever row Message-ID: Hello, I wonder if there is a way in data.table (or more generaly in R) to work on previous row without loops E.G. something equivalent to : dt <-data.table(col1=c(1,2,3)) for (i in 2:nrow(dt)) { dt[i,col3:=dt[i-1,list(col1)]>2] } Thanks a lot ! -------------- next part -------------- An HTML attachment was scrubbed... URL: From jholtman at gmail.com Mon Oct 19 16:10:20 2015 From: jholtman at gmail.com (jim holtman) Date: Mon, 19 Oct 2015 10:10:20 -0400 Subject: [datatable-help] DT 1.9.5 - refers previous/next/whatever row In-Reply-To: References: Message-ID: does this do what you want: > dt <-data.table(col1=c(1,2,3)) > dt col1 1: 1 2: 2 3: 3 > dt[2:nrow(dt), col3:=dt[1:(nrow(dt) - 1), list(col1)]>2] > dt col1 col3 1: 1 NA 2: 2 FALSE 3: 3 FALSE Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Mon, Oct 19, 2015 at 9:59 AM, Nicolas Paris wrote: > Hello, > > I wonder if there is a way in data.table (or more generaly in R) to work > on previous row without loops > E.G. something equivalent to : > > dt <-data.table(col1=c(1,2,3)) > for (i in 2:nrow(dt)) > { > dt[i,col3:=dt[i-1,list(col1)]>2] > } > > Thanks a lot ! > > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperickson at wisc.edu Mon Oct 19 16:12:08 2015 From: fperickson at wisc.edu (Frank Erickson) Date: Mon, 19 Oct 2015 10:12:08 -0400 Subject: [datatable-help] DT 1.9.5 - refers previous/next/whatever row In-Reply-To: References: Message-ID: I think `shift` is the best option: dt <-data.table(col1=c(1,2,3)) dt[, col3 := shift(col1, type="lag") > 2] On Mon, Oct 19, 2015 at 10:10 AM, jim holtman wrote: > does this do what you want: > > > dt <-data.table(col1=c(1,2,3)) > > dt > col1 > 1: 1 > 2: 2 > 3: 3 > > dt[2:nrow(dt), col3:=dt[1:(nrow(dt) - 1), list(col1)]>2] > > dt > col1 col3 > 1: 1 NA > 2: 2 FALSE > 3: 3 FALSE > > > > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > On Mon, Oct 19, 2015 at 9:59 AM, Nicolas Paris > wrote: > >> Hello, >> >> I wonder if there is a way in data.table (or more generaly in R) to work >> on previous row without loops >> E.G. something equivalent to : >> >> dt <-data.table(col1=c(1,2,3)) >> for (i in 2:nrow(dt)) >> { >> dt[i,col3:=dt[i-1,list(col1)]>2] >> } >> >> Thanks a lot ! >> >> _______________________________________________ >> datatable-help mailing list >> datatable-help at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >> > > > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > -------------- next part -------------- An HTML attachment was scrubbed... URL: From niparisco at gmail.com Mon Oct 19 17:28:43 2015 From: niparisco at gmail.com (Nicolas Paris) Date: Mon, 19 Oct 2015 17:28:43 +0200 Subject: [datatable-help] DT 1.9.5 - refers previous/next/whatever row In-Reply-To: References: Message-ID: Thanks all, Actualy my use case is a little different > dt <- data.table(col1=c("FOOBARBAZ","BARBAZ","BAZ")) col1 1: FOOBARBAZ 2: BARBAZ 3: BAZ What I want to get is : > data.table(col1=c("FOO","BAR","BAZ")) col1 1: FOO 2: BAR 3: BAZ (I remove next row from actual row) This works : dt[,col3:=mapply(function(x,y){gsub(x,"",y,fixed=T)}, shift(col1,fill=" ",type="lead"), col1)] Have you in mind a better solution ? Will it be faster than loop (yet this is less readable)? Thanks again 2015-10-19 16:12 GMT+02:00 Frank Erickson : > I think `shift` is the best option: > > dt <-data.table(col1=c(1,2,3)) > dt[, col3 := shift(col1, type="lag") > 2] > > On Mon, Oct 19, 2015 at 10:10 AM, jim holtman wrote: > >> does this do what you want: >> >> > dt <-data.table(col1=c(1,2,3)) >> > dt >> col1 >> 1: 1 >> 2: 2 >> 3: 3 >> > dt[2:nrow(dt), col3:=dt[1:(nrow(dt) - 1), list(col1)]>2] >> > dt >> col1 col3 >> 1: 1 NA >> 2: 2 FALSE >> 3: 3 FALSE >> >> >> >> Jim Holtman >> Data Munger Guru >> >> What is the problem that you are trying to solve? >> Tell me what you want to do, not how you want to do it. >> >> On Mon, Oct 19, 2015 at 9:59 AM, Nicolas Paris >> wrote: >> >>> Hello, >>> >>> I wonder if there is a way in data.table (or more generaly in R) to work >>> on previous row without loops >>> E.G. something equivalent to : >>> >>> dt <-data.table(col1=c(1,2,3)) >>> for (i in 2:nrow(dt)) >>> { >>> dt[i,col3:=dt[i-1,list(col1)]>2] >>> } >>> >>> Thanks a lot ! >>> >>> _______________________________________________ >>> datatable-help mailing list >>> datatable-help at lists.r-forge.r-project.org >>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >>> >> >> >> _______________________________________________ >> datatable-help mailing list >> datatable-help at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperickson at wisc.edu Mon Oct 19 17:48:58 2015 From: fperickson at wisc.edu (Frank Erickson) Date: Mon, 19 Oct 2015 11:48:58 -0400 Subject: [datatable-help] DT 1.9.5 - refers previous/next/whatever row In-Reply-To: References: Message-ID: Yeah, I would use mapply pretty much the same way: dt[, mapply(sub, pattern = shift(col1, type="lead", fill=""), replacement = "", x = col1, USE.NAMES = FALSE)] I expect it to be faster than a loop, since `dt[]` has some overhead that you would incur for each row of a loop. Once you get used to mapply/Map, it won't seem so hard to read, I think. On Mon, Oct 19, 2015 at 11:28 AM, Nicolas Paris wrote: > Thanks all, > > Actualy my use case is a little different > > > dt <- data.table(col1=c("FOOBARBAZ","BARBAZ","BAZ")) > col1 > 1: FOOBARBAZ > 2: BARBAZ > 3: BAZ > > > What I want to get is : > > data.table(col1=c("FOO","BAR","BAZ")) > col1 > 1: FOO > 2: BAR > 3: BAZ > > (I remove next row from actual row) > > This works : > dt[,col3:=mapply(function(x,y){gsub(x,"",y,fixed=T)}, > shift(col1,fill=" ",type="lead"), > col1)] > > > Have you in mind a better solution ? Will it be faster than loop (yet this > is less readable)? > > Thanks again > > > > > 2015-10-19 16:12 GMT+02:00 Frank Erickson : > >> I think `shift` is the best option: >> >> dt <-data.table(col1=c(1,2,3)) >> dt[, col3 := shift(col1, type="lag") > 2] >> >> On Mon, Oct 19, 2015 at 10:10 AM, jim holtman wrote: >> >>> does this do what you want: >>> >>> > dt <-data.table(col1=c(1,2,3)) >>> > dt >>> col1 >>> 1: 1 >>> 2: 2 >>> 3: 3 >>> > dt[2:nrow(dt), col3:=dt[1:(nrow(dt) - 1), list(col1)]>2] >>> > dt >>> col1 col3 >>> 1: 1 NA >>> 2: 2 FALSE >>> 3: 3 FALSE >>> >>> >>> >>> Jim Holtman >>> Data Munger Guru >>> >>> What is the problem that you are trying to solve? >>> Tell me what you want to do, not how you want to do it. >>> >>> On Mon, Oct 19, 2015 at 9:59 AM, Nicolas Paris >>> wrote: >>> >>>> Hello, >>>> >>>> I wonder if there is a way in data.table (or more generaly in R) to >>>> work on previous row without loops >>>> E.G. something equivalent to : >>>> >>>> dt <-data.table(col1=c(1,2,3)) >>>> for (i in 2:nrow(dt)) >>>> { >>>> dt[i,col3:=dt[i-1,list(col1)]>2] >>>> } >>>> >>>> Thanks a lot ! >>>> >>>> _______________________________________________ >>>> datatable-help mailing list >>>> datatable-help at lists.r-forge.r-project.org >>>> >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >>>> >>> >>> >>> _______________________________________________ >>> datatable-help mailing list >>> datatable-help at lists.r-forge.r-project.org >>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >>> >> >> > > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jonesd289 at gmail.com Mon Oct 19 19:57:52 2015 From: t.jonesd289 at gmail.com (tjonesd289) Date: Mon, 19 Oct 2015 10:57:52 -0700 (PDT) Subject: [datatable-help] Filter rows using integer64 columns In-Reply-To: References: <1444848414820-4713594.post@n4.nabble.com> Message-ID: <1445277472665-4713754.post@n4.nabble.com> Thanks Kevin - that was exactly what I needed! There was also a confounding issue of me importing bit64 after reading the data (which left me with some NAs and incorrect values) but that was simple enough to figure out. Thanks again! -- View this message in context: http://r.789695.n4.nabble.com/Filter-rows-using-integer64-columns-tp4713594p4713754.html Sent from the datatable-help mailing list archive at Nabble.com. From dmtcimen at hotmail.com Wed Oct 21 19:29:52 2015 From: dmtcimen at hotmail.com (demet) Date: Wed, 21 Oct 2015 10:29:52 -0700 (PDT) Subject: [datatable-help] How to export coefficients of the quantile regression analysis for the panel data from R to a spreadsheet? Message-ID: <1445448592547-4713821.post@n4.nabble.com> Hi, I am new to R and I guess my question is pretty easy to solve but a lot of searching did not help me. I am running a quantile regression for panel data (RQPD). I am using coef(rqpd) so it only gives me the coefficients which I want to export to a file. Would be great If you could help me. I am searching the web for a few hours now and was not successful. I tried library(broom) and wasn't successful. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/How-to-export-coefficients-of-the-quantile-regression-analysis-for-the-panel-data-from-R-to-a-spread-tp4713821.html Sent from the datatable-help mailing list archive at Nabble.com. From clark9876 at airquality.dk Thu Oct 29 00:05:41 2015 From: clark9876 at airquality.dk (Douglas Clark) Date: Wed, 28 Oct 2015 16:05:41 -0700 (PDT) Subject: [datatable-help] fread with multi-line character vector input Message-ID: <1446073541400-4714096.post@n4.nabble.com> Can fread read from a multi-line character vector, similar to the read.table text= argument? Feature request? I am importing the output "print" file from a dispersion model (a fortran program). The output file consists of 4000+ lines of text, tables of model parameters, and tables of model results, all of varying lengths. I read the entire file into a character vector variable, such as mylines <- readLines(...), and then use regular expressions to "clean" the text and locate the starting and ending lines of the tables to be imported. I tried to use fread(mylines[start:stop]), but fread doesn't accept a vector of character strings as input -- only a length 1 character vector (ie all in one string). read.table does allow reading from a multi-line character vector variable, using the text= argument, ie read.table(text = mylines[start:stop],...). fread will also read it, if I collapse the multi-line character vector into a single string using paste0 or stri_flatten, as in fread(paste0(mylines[start:stop], collapse = "\n")) or fread(stri_flatten(mylines[start:stop], collapse = "\n")) But it would be nice if I could skip the collapse step. Does fread have a way to directly read from a multi-line character vector without flattening it -- like the text= argument in read.table ? Or is there an easier approach? If not, should this be a feature request? -- View this message in context: http://r.789695.n4.nabble.com/fread-with-multi-line-character-vector-input-tp4714096.html Sent from the datatable-help mailing list archive at Nabble.com.