From sv1 at atlantic.net Sun Feb 1 12:36:54 2015 From: sv1 at atlantic.net (Robert S) Date: Sun, 1 Feb 2015 03:36:54 -0800 (PST) Subject: [datatable-help] loading etf price data into matrix in R Message-ID: <1422790614069-4702603.post@n4.nabble.com> I am a new R programmer and am trying to create a matrix of ETF prices by day. Here is the code I am using: #Download Packages library(tseries) library(quantmod) #Setup list of symbols to download prices pr=c("vti","vv","vug","vtv","vo") #get dates to download data today=as.Date(Sys.Date()) yday=today-1 #set up matrix to hold data pri=matrix(nrow=10,ncol=5) for(i in 1:10){ for(j in 1:5) { pri[i,j]=get.hist.quote(instrument=pr[j],start=yday,quote="AdjClose",provider="yahoo",compression="d") } yday=yday-1} write(pri,"ETFData13115.ods",ncolumns=5,nrows=10,append=TRUE,sep="\t") Here is the output and error message I'm getting: Error in pri[i, j] = get.hist.quote(instrument = pr[j], start = yday, : number of items to replace is not a multiple of replacement length > write(pri,"ETFData13115.ods",ncolumns=5,nrows=10,append=TRUE,sep="\t") Error in write(pri, "ETFData13115.ods", ncolumns = 5, nrows = 10, append = TRUE, : unused argument (nrows = 10) > print(pri) [,1] [,2] [,3] [,4] [,5] [1,] 103.1 91.6 102.8 81.03 121.08 [2,] NA NA NA NA NA [3,] NA NA NA NA NA [4,] NA NA NA NA NA [5,] NA NA NA NA NA [6,] NA NA NA NA NA [7,] NA NA NA NA NA [8,] NA NA NA NA NA [9,] NA NA NA NA NA [10,] NA NA NA NA NA I looked at some of the elements of pri, thinking there might be a date associated with them, and got this: > pri[1,1] [1] 103.1 > pri[1,2] [1] 91.6 > pri[2,1] [1] NA So, that does not seem to be the problem. What am I doing wrong? Is there an easier way to assemble my matrix? Thanks for any help. Robert S -- View this message in context: http://r.789695.n4.nabble.com/loading-etf-price-data-into-matrix-in-R-tp4702603.html Sent from the datatable-help mailing list archive at Nabble.com. From fjbuch at gmail.com Tue Feb 10 01:08:35 2015 From: fjbuch at gmail.com (Farrel Buchinsky) Date: Tue, 10 Feb 2015 00:08:35 +0000 Subject: [datatable-help] can one name a collection of columns by specifying just the first and the last column Message-ID: So lets say one has a data.table with the following columns first.name, last.name, height, weight, shoe.size, eye.color, hair.length, appendage.size, ear.length If one wanted to just include weight through hair.length one would have to go something such as this dt[,list(weight, shoe.size, eye.color, hair.length)] Is there a way to do something along the lines of dt[,list(weight...hair.length)] If so, can you direct me to the documentation? If not can you build it? Is it difficult? Some data.tables have many columns. Thanking you in anticipation. Farrel -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeffzemla at gmail.com Tue Feb 10 03:06:51 2015 From: jeffzemla at gmail.com (Jeff Zemla) Date: Mon, 9 Feb 2015 21:06:51 -0500 Subject: [datatable-help] can one name a collection of columns by specifying just the first and the last column In-Reply-To: References: Message-ID: I don't think you can reference them by name, but if they are in a list you can reference them by index, e.g. dt[,colnames(dt)[3:5],with=FALSE] will get you height, weight and shoe.size or alternatively dt[,.SD,.SDcols=colnames(dt)[3:5]] does the same thing On Mon, Feb 9, 2015 at 7:08 PM, Farrel Buchinsky wrote: > So lets say one has a data.table with the following columns > > first.name, last.name, height, weight, shoe.size, eye.color, hair.length, > appendage.size, ear.length > > If one wanted to just include weight through hair.length one would have to > go something such as this > > dt[,list(weight, shoe.size, eye.color, hair.length)] > > Is there a way to do something along the lines of > > dt[,list(weight...hair.length)] > > If so, can you direct me to the documentation? If not can you build it? Is > it difficult? Some data.tables have many columns. > > Thanking you in anticipation. > > Farrel > > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aragorn168b at gmail.com Tue Feb 10 19:28:13 2015 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Tue, 10 Feb 2015 19:28:13 +0100 Subject: [datatable-help] can one name a collection of columns by specifying just the first and the last column In-Reply-To: References: Message-ID: Farrel, It could be useful. Please file an issue on the github project page. Thanks. --? Arun On 10 Feb 2015 at 01:08:46, Farrel Buchinsky (fjbuch at gmail.com) wrote: So lets say one has a data.table with the following columns first.name, last.name, height, weight, shoe.size, eye.color, hair.length, appendage.size, ear.length If one wanted to just include weight through hair.length one would have to go something such as this dt[,list(weight, shoe.size, eye.color, hair.length)] Is there a way to do something along the lines of dt[,list(weight...hair.length)] If so, can you direct me to the documentation? If not can you build it? Is it difficult? Some data.tables have many columns. Thanking you in anticipation. Farrel _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From caneff at gmail.com Tue Feb 10 19:31:42 2015 From: caneff at gmail.com (Chris Neff) Date: Tue, 10 Feb 2015 18:31:42 +0000 Subject: [datatable-help] can one name a collection of columns by specifying just the first and the last column Message-ID: I don't like this idea. It adds extra that it doesn't need to. Doing it with column numbers is more straightforward, and if all you have is names you can get numbers by doing match() or whatever and then getting the sequence with seq(). Having a sequence of column names is odd. On Tue Feb 10 2015 at 1:28:25 PM Arunkumar Srinivasan wrote: > Farrel, > > It could be useful. Please file an issue on the github project page. > Thanks. > > -- > Arun > > On 10 Feb 2015 at 01:08:46, Farrel Buchinsky (fjbuch at gmail.com) wrote: > > So lets say one has a data.table with the following columns > > first.name, last.name, height, weight, shoe.size, eye.color, hair.length, > appendage.size, ear.length > > If one wanted to just include weight through hair.length one would have to > go something such as this > > dt[,list(weight, shoe.size, eye.color, hair.length)] > > Is there a way to do something along the lines of > > dt[,list(weight...hair.length)] > > If so, can you direct me to the documentation? If not can you build it? Is > it difficult? Some data.tables have many columns. > > Thanking you in anticipation. > > Farrel > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/ > listinfo/datatable-help > > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo > /datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From aragorn168b at gmail.com Tue Feb 10 19:33:59 2015 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Tue, 10 Feb 2015 19:33:59 +0100 Subject: [datatable-help] can one name a collection of columns by specifying just the first and the last column In-Reply-To: References: Message-ID: Chris, But what?s the problem? You can simply not use it? It?s not that uncommon. `base::subset()` does this. --? Arun On 10 Feb 2015 at 19:31:43, Chris Neff (caneff at gmail.com) wrote: I don't like this idea. It adds extra that it doesn't need to.? Doing it with column numbers is more straightforward, and if all you have is names you can get numbers by doing match() or whatever and then getting the sequence with seq(). Having a sequence of column names is odd. On Tue Feb 10 2015 at 1:28:25 PM Arunkumar Srinivasan wrote: Farrel, It could be useful. Please file an issue on the github project page. Thanks. --? Arun On 10 Feb 2015 at 01:08:46, Farrel Buchinsky (fjbuch at gmail.com) wrote: So lets say one has a data.table with the following columns first.name, last.name, height, weight, shoe.size, eye.color, hair.length, appendage.size, ear.length If one wanted to just include weight through hair.length one would have to go something such as this dt[,list(weight, shoe.size, eye.color, hair.length)] Is there a way to do something along the lines of dt[,list(weight...hair.length)] If so, can you direct me to the documentation? If not can you build it? Is it difficult? Some data.tables have many columns. Thanking you in anticipation. Farrel _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From caneff at gmail.com Tue Feb 10 19:39:28 2015 From: caneff at gmail.com (Chris Neff) Date: Tue, 10 Feb 2015 18:39:28 +0000 Subject: [datatable-help] can one name a collection of columns by specifying just the first and the last column References: Message-ID: Wow, didn't realize that worked! So there is precedent then. It just looks funny to me, but you are right it is easily avoided. I just didn't want to see more divergence from subset and data.frame logic, but since this already works with subset that's fine. On Tue Feb 10 2015 at 1:34:03 PM Arunkumar Srinivasan wrote: > Chris, > > But what?s the problem? You can simply not use it? > It?s not that uncommon. `base::subset()` does this. > > -- > Arun > > On 10 Feb 2015 at 19:31:43, Chris Neff (caneff at gmail.com) wrote: > > I don't like this idea. It adds extra that it doesn't need to. Doing it > with column numbers is more straightforward, and if all you have is names > you can get numbers by doing match() or whatever and then getting the > sequence with seq(). Having a sequence of column names is odd. > > On Tue Feb 10 2015 at 1:28:25 PM Arunkumar Srinivasan < > aragorn168b at gmail.com> wrote: > >> Farrel, >> >> It could be useful. Please file an issue on the github project page. >> Thanks. >> >> -- >> Arun >> >> On 10 Feb 2015 at 01:08:46, Farrel Buchinsky (fjbuch at gmail.com) wrote: >> >> So lets say one has a data.table with the following columns >> >> first.name, last.name, height, weight, shoe.size, eye.color, >> hair.length, appendage.size, ear.length >> >> If one wanted to just include weight through hair.length one would have >> to go something such as this >> >> dt[,list(weight, shoe.size, eye.color, hair.length)] >> >> Is there a way to do something along the lines of >> >> dt[,list(weight...hair.length)] >> >> If so, can you direct me to the documentation? If not can you build it? >> Is it difficult? Some data.tables have many columns. >> >> Thanking you in anticipation. >> >> Farrel >> _______________________________________________ >> datatable-help mailing list >> datatable-help at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >> >> _______________________________________________ >> datatable-help mailing list >> datatable-help at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aragorn168b at gmail.com Tue Feb 10 19:50:21 2015 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Tue, 10 Feb 2015 19:50:21 +0100 Subject: [datatable-help] can one name a collection of columns by specifying just the first and the last column In-Reply-To: References: Message-ID: I had the same reaction when I found out ?subset? already did this :-). I?ve the same impression that it?s a bit odd, even though some people prefer it..? --? Arun On 10 Feb 2015 at 19:39:29, Chris Neff (caneff at gmail.com) wrote: Wow, didn't realize that worked! So there is precedent then.? It just looks funny to me, but you are right it is easily avoided.? I just didn't want to see more divergence from subset and data.frame logic, but since this already works with subset that's fine. On Tue Feb 10 2015 at 1:34:03 PM Arunkumar Srinivasan wrote: Chris, But what?s the problem? You can simply not use it? It?s not that uncommon. `base::subset()` does this. --? Arun On 10 Feb 2015 at 19:31:43, Chris Neff (caneff at gmail.com) wrote: I don't like this idea. It adds extra that it doesn't need to.? Doing it with column numbers is more straightforward, and if all you have is names you can get numbers by doing match() or whatever and then getting the sequence with seq(). Having a sequence of column names is odd. On Tue Feb 10 2015 at 1:28:25 PM Arunkumar Srinivasan wrote: Farrel, It could be useful. Please file an issue on the github project page. Thanks. --? Arun On 10 Feb 2015 at 01:08:46, Farrel Buchinsky (fjbuch at gmail.com) wrote: So lets say one has a data.table with the following columns first.name, last.name, height, weight, shoe.size, eye.color, hair.length, appendage.size, ear.length If one wanted to just include weight through hair.length one would have to go something such as this dt[,list(weight, shoe.size, eye.color, hair.length)] Is there a way to do something along the lines of dt[,list(weight...hair.length)] If so, can you direct me to the documentation? If not can you build it? Is it difficult? Some data.tables have many columns. Thanking you in anticipation. Farrel _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From Halperin at outins.com Tue Feb 10 19:58:17 2015 From: Halperin at outins.com (Marc Halperin) Date: Tue, 10 Feb 2015 18:58:17 +0000 Subject: [datatable-help] Best way to apply function to set of columns to create new columns where the function requires other columns from data.table Message-ID: <56D6A84E99427746B72C13F0F475508B062265D0@mbx022-w1-ca-2.exch022.domain.local> I want to add new columns to a data.table that is the weighted average of the columns and a weight variable. This is a general problem I run into when using .SDcols but also needing another variable from the data.table to be available within the function within lapply. Without including that variable within .SDcols (in this case the weight variable), I don't have access to it in the lapply function argument. Is it a bad idea to subset .SD how I've done it? library(data.table) library(Hmisc) dt <- data.table(a=runif(10), b= runif(10), weight=runif(10)) varnames <- c("a","b") dt[ , ( paste( "mean", varnames, sep = "_" ) ) := lapply( .SD[ , .SD, .SDcols = -"weight" ], wtd.mean, weight ), .SDcols = c("weight",varnames) ] Thanks -Marc From mel at mbacou.com Tue Feb 10 22:39:05 2015 From: mel at mbacou.com (Bacou, Melanie) Date: Tue, 10 Feb 2015 16:39:05 -0500 Subject: [datatable-help] can one name a collection of columns by specifying just the first and the last column In-Reply-To: References: Message-ID: <54DA7A79.1080504@mbacou.com> Everyone, The |varA...varZ| construct is borrowed from STATA syntax. Probably a reason why it got into subset() in the first place, though definitely not very R-like. In fact I?ve never come across this construct in R before and had no idea it was actually working either! I?m not sure |dt[, .SD, .SDcols=list(varA...varZ)]| is less typing, less prone to error, or more readable than |dt[, .SD, .SDcols=names(dt)[1:24]| and using indices is also more flexible (what about if we want more complex sequences). I can see one use case for this syntax though if |dt| might change over time but variables always come in known sequences. Not sure we should really encourage it ? but agreed with Arun, if it?s in base::subset() then no reason why not. ?Mel. On 2/10/2015 1:50 PM, Arunkumar Srinivasan wrote: I had the same reaction when I found out ?subset? already did this :-). I?ve the same impression that it?s a bit odd, even though some people prefer it.. Arun On 10 Feb 2015 at 19:39:29, Chris Neff (caneff at gmail.com) wrote: Wow, didn?t realize that worked! So there is precedent then. It just looks funny to me, but you are right it is easily avoided. I just didn?t want to see more divergence from subset and data.frame logic, but since this already works with subset that?s fine. On Tue Feb 10 2015 at 1:34:03 PM Arunkumar Srinivasan aragorn168b at gmail.com wrote: |Chris, But what?s the problem? You can simply not use it? It?s not that uncommon. `base::subset()` does this. -- Arun On 10 Feb 2015 at 19:31:43, Chris Neff (caneff at gmail.com) wrote: | |I don't like this idea. It adds extra that it doesn't need to. Doing it with column numbers is more straightforward, and if all you have is names you can get numbers by doing match() or whatever and then getting the sequence with seq(). Having a sequence of column names is odd. On Tue Feb 10 2015 at 1:28:25 PM Arunkumar Srinivasan wrote: Farrel, It could be useful. Please file an issue on the github project page. Thanks. -- Arun On 10 Feb 2015 at 01:08:46, Farrel Buchinsky (fjbuch at gmail.com) wrote: | | So lets say one has a data.table with the following columns first.name, last.name, height, weight, shoe.size, eye.color, hair.length, appendage.size, ear.length If one wanted to just include weight through hair.length one would have to go something such as this dt[,list(weight, shoe.size, eye.color, hair.length)] Is there a way to do something along the lines of dt[,list(weight...hair.length)] If so, can you direct me to the documentation? If not can you build it? Is it difficult? Some data.tables have many columns. Thanking you in anticipation. Farrel _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help | ------------------------------------------------------------------------ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help ? -- Melanie BACOU International Food Policy Research Institute Snr. Program Manager, HarvestChoice Work +1(202)862-5699 E-mail m.bacou at cgiar.org Visit www.harvestchoice.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From aragorn168b at gmail.com Tue Feb 10 22:45:56 2015 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Tue, 10 Feb 2015 22:45:56 +0100 Subject: [datatable-help] can one name a collection of columns by specifying just the first and the last column In-Reply-To: <54DA7A79.1080504@mbacou.com> References: <54DA7A79.1080504@mbacou.com> Message-ID: Mel, The usage would be something like: DT[, from:to, with=FALSE] # or DT[, .SD, .SDcols = from:to] where from and to are the start and end column names. I agree there?s no real advantage in terms of typing/prone to errors. There might be some merit in readability, as people normally remember column names and not numbers? And this allows you to refer to the names directly without having to type DT and then look up the column or use a match() to find out the column programatically or do: DT[, .SD, .SDcols = names(DT)[some_idx]] --? Arun On 10 Feb 2015 at 22:39:14, Bacou, Melanie (mel at mbacou.com) wrote: Everyone, The varA...varZ construct is borrowed from STATA syntax. Probably a reason why it got into subset() in the first place, though definitely not very R-like. In fact I?ve never come across this construct in R before and had no idea it was actually working either! I?m not sure dt[, .SD, .SDcols=list(varA...varZ)] is less typing, less prone to error, or more readable than dt[, .SD, .SDcols=names(dt)[1:24] and using indices is also more flexible (what about if we want more complex sequences). I can see one use case for this syntax though if dt might change over time but variables always come in known sequences. Not sure we should really encourage it ? but agreed with Arun, if it?s in base::subset() then no reason why not. ?Mel. On 2/10/2015 1:50 PM, Arunkumar Srinivasan wrote: I had the same reaction when I found out ?subset? already did this :-). I?ve the same impression that it?s a bit odd, even though some people prefer it.. Arun On 10 Feb 2015 at 19:39:29, Chris Neff (caneff at gmail.com) wrote: Wow, didn?t realize that worked! So there is precedent then. It just looks funny to me, but you are right it is easily avoided. I just didn?t want to see more divergence from subset and data.frame logic, but since this already works with subset that?s fine. On Tue Feb 10 2015 at 1:34:03 PM Arunkumar Srinivasan aragorn168b at gmail.com wrote: Chris, But what?s the problem? You can simply not use it? It?s not that uncommon. `base::subset()` does this. -- Arun On 10 Feb 2015 at 19:31:43, Chris Neff (caneff at gmail.com) wrote: I don't like this idea. It adds extra that it doesn't need to. Doing it with column numbers is more straightforward, and if all you have is names you can get numbers by doing match() or whatever and then getting the sequence with seq(). Having a sequence of column names is odd. On Tue Feb 10 2015 at 1:28:25 PM Arunkumar Srinivasan wrote: Farrel, It could be useful. Please file an issue on the github project page. Thanks. -- Arun On 10 Feb 2015 at 01:08:46, Farrel Buchinsky (fjbuch at gmail.com) wrote: So lets say one has a data.table with the following columns first.name, last.name, height, weight, shoe.size, eye.color, hair.length, appendage.size, ear.length If one wanted to just include weight through hair.length one would have to go something such as this dt[,list(weight, shoe.size, eye.color, hair.length)] Is there a way to do something along the lines of dt[,list(weight...hair.length)] If so, can you direct me to the documentation? If not can you build it? Is it difficult? Some data.tables have many columns. Thanking you in anticipation. Farrel _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -- Melanie BACOU International Food Policy Research Institute Snr. Program Manager, HarvestChoice Work +1(202)862-5699 E-mail m.bacou at cgiar.org Visit www.harvestchoice.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From eduard.antonyan at gmail.com Tue Feb 10 22:59:41 2015 From: eduard.antonyan at gmail.com (Eduard Antonyan) Date: Tue, 10 Feb 2015 15:59:41 -0600 Subject: [datatable-help] can one name a collection of columns by specifying just the first and the last column In-Reply-To: References: <54DA7A79.1080504@mbacou.com> Message-ID: Not having to type `DT` twice would increase readability/reduce errors, especially that real-life data.tables have much longer names. There was a related FR to this which suggested incorporating regex and wildcard syntax - not sure what happened to it. On Tue, Feb 10, 2015 at 3:45 PM, Arunkumar Srinivasan wrote: > Mel, > > The usage would be something like: > > DT[, from:to, with=FALSE] > # or > DT[, .SD, .SDcols = from:to] > > where from and to are the start and end column names. I agree there?s no > real advantage in terms of typing/prone to errors. > > There might be some merit in readability, as people normally remember > column names and not numbers? And this allows you to refer to the names > directly without having to type DT and then look up the column or use a > match() to find out the column programatically or do: > > DT[, .SD, .SDcols = names(DT)[some_idx]] > > > > -- > Arun > > On 10 Feb 2015 at 22:39:14, Bacou, Melanie (mel at mbacou.com) wrote: > > Everyone, > > The varA...varZ construct is borrowed from STATA syntax. Probably a > reason why it got into subset() in the first place, though definitely not > very R-like. In fact I?ve never come across this construct in R before and > had no idea it was actually working either! > > I?m not sure dt[, .SD, .SDcols=list(varA...varZ)] is less typing, less > prone to error, or more readable than dt[, .SD, .SDcols=names(dt)[1:24] > and using indices is also more flexible (what about if we want more complex > sequences). I can see one use case for this syntax though if dt might > change over time but variables always come in known sequences. > > Not sure we should really encourage it ? but agreed with Arun, if it?s in > base::subset() then no reason why not. > > ?Mel. > > On 2/10/2015 1:50 PM, Arunkumar Srinivasan wrote: > > I had the same reaction when I found out ?subset? already did this :-). > I?ve the same impression that it?s a bit odd, even though some people > prefer it.. > > Arun > > On 10 Feb 2015 at 19:39:29, Chris Neff (caneff at gmail.com) wrote: > > Wow, didn?t realize that worked! So there is precedent then. It just looks > funny to me, but you are right it is easily avoided. I just didn?t want to > see more divergence from subset and data.frame logic, but since this > already works with subset that?s fine. > On Tue Feb 10 2015 at 1:34:03 PM Arunkumar Srinivasan > aragorn168b at gmail.com wrote: > > Chris, > But what?s the problem? You can simply not use it? > It?s not that uncommon. `base::subset()` does this. > -- > Arun > > On 10 Feb 2015 at 19:31:43, Chris Neff (caneff at gmail.com) wrote: > > I don't like this idea. It adds extra that it doesn't need to. Doing it with column numbers is more straightforward, and if all you have is names you can get numbers by doing match() or whatever and then getting the sequence with seq(). Having a sequence of column names is odd. > On Tue Feb 10 2015 at 1:28:25 PM Arunkumar Srinivasan wrote: > > Farrel, > It could be useful. Please file an issue on the github project page. Thanks. > -- > Arun > > On 10 Feb 2015 at 01:08:46, Farrel Buchinsky (fjbuch at gmail.com) wrote: > > So lets say one has a data.table with the following columns > first.name, last.name, height, weight, shoe.size, eye.color, hair.length, appendage.size, ear.length > If one wanted to just include weight through hair.length one would have to go something such as this > dt[,list(weight, shoe.size, eye.color, hair.length)] > Is there a way to do something along the lines of > dt[,list(weight...hair.length)] > If so, can you direct me to the documentation? If not can you build it? Is it difficult? Some data.tables have many columns. > Thanking you in anticipation. > Farrel > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > ------------------------------ > > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > ? > -- > Melanie BACOU > International Food Policy Research Institute > Snr. Program Manager, HarvestChoice > Work +1(202)862-5699 > E-mail m.bacou at cgiar.org > Visit www.harvestchoice.org > > > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mel at mbacou.com Wed Feb 11 00:00:18 2015 From: mel at mbacou.com (Bacou, Melanie) Date: Tue, 10 Feb 2015 18:00:18 -0500 Subject: [datatable-help] can one name a collection of columns by specifying just the first and the last column In-Reply-To: References: <54DA7A79.1080504@mbacou.com> Message-ID: <54DA8D82.10204@mbacou.com> Arun, I see, I hadn?t checked base::subset() documentation carefully, but I see it clearly now: |subset(airquality, Temp > 80, select = c(Ozone, Temp)) subset(airquality, Day == 1, select = -Temp) subset(airquality, select = Ozone:Wind) | |:| is less ambiguous than STATA?s |?| for sure. Yes, would be nice to replicate in data.table. ?Mel. On 2/10/2015 4:59 PM, Eduard Antonyan wrote: Not having to type |DT| twice would increase readability/reduce errors, especially that real-life data.tables have much longer names. There was a related FR to this which suggested incorporating regex and wildcard syntax - not sure what happened to it. On Tue, Feb 10, 2015 at 3:45 PM, Arunkumar Srinivasan aragorn168b at gmail.com wrote: |Mel, The usage would be something like: DT[, from:to, with=FALSE] # or DT[, .SD, .SDcols = from:to] where from and to are the start and end column names. I agree there?s no real advantage in terms of typing/prone to errors. There might be some merit in readability, as people normally remember column names and not numbers? And this allows you to refer to the names directly without having to type DT and then look up the column or use a match() to find out the column programatically or do: DT[, .SD, .SDcols = names(DT)[some_idx]] -- Arun On 10 Feb 2015 at 22:39:14, Bacou, Melanie (mel at mbacou.com) wrote: | > |Everyone, The varA...varZ construct is borrowed from STATA syntax. Probably a reason why it got into subset() in the first place, though definitely not very R-like. In fact I?ve never come across this construct in R before and had no idea it was actually working either! I?m not sure dt[, .SD, .SDcols=list(varA...varZ)] is less typing, less prone to error, or more readable than dt[, .SD, .SDcols=names(dt)[1:24] and using indices is also more flexible (what about if we want more complex sequences). I can see one use case for this syntax though if dt might change over time but variables always come in known sequences. Not sure we should really encourage it ? but agreed with Arun, if it?s in base::subset() then no reason why not. ?Mel. On 2/10/2015 1:50 PM, Arunkumar Srinivasan wrote: I had the same reaction when I found out ?subset? already did this :-). I?ve the same impression that it?s a bit odd, even though some people prefer it.. Arun On 10 Feb 2015 at 19:39:29, Chris Neff (caneff at gmail.com) wrote: Wow, didn?t realize that worked! So there is precedent then. It just looks funny to me, but you are right it is easily avoided. I just didn?t want to see more divergence from subset and data.frame logic, but since this already works with subset that?s fine. On Tue Feb 10 2015 at 1:34:03 PM Arunkumar Srinivasan aragorn168b at gmail.com wrote: Chris, But what?s the problem? You can simply not use it? It?s not that uncommon. `base::subset()` does this. -- Arun On 10 Feb 2015 at 19:31:43, Chris Neff (caneff at gmail.com) wrote: I don't like this idea. It adds extra that it doesn't need to. Doing it with column numbers is more straightforward, and if all you have is names you can get numbers by doing match() or whatever and then getting the sequence with seq(). Having a sequence of column names is odd. On Tue Feb 10 2015 at 1:28:25 PM Arunkumar Srinivasan wrote: Farrel, It could be useful. Please file an issue on the github project page. Thanks. -- Arun On 10 Feb 2015 at 01:08:46, Farrel Buchinsky (fjbuch at gmail.com) wrote: So lets say one has a data.table with the following columns first.name, last.name, height, weight, shoe.size, eye.color, hair.length, appendage.size, ear.length If one wanted to just include weight through hair.length one would have to go something such as this dt[,list(weight, shoe.size, eye.color, hair.length)] Is there a way to do something along the lines of dt[,list(weight...hair.length)] If so, can you direct me to the documentation? If not can you build it? Is it difficult? Some data.tables have many columns. Thanking you in anticipation. Farrel _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help ? -- Melanie BACOU International Food Policy Research Institute Snr. Program Manager, HarvestChoice Work +1(202)862-5699 E-mail m.bacou at cgiar.org Visit www.harvestchoice.org | |_______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help | ? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ica at ign.ku.dk Wed Feb 11 12:42:03 2015 From: ica at ign.ku.dk (Ingeborg) Date: Wed, 11 Feb 2015 03:42:03 -0800 (PST) Subject: [datatable-help] export from r Message-ID: <1423654923294-4703083.post@n4.nabble.com> The code: mydata <- data.frame(c0_1,c0_2,dsoc) mydata write.table(mydata, "c:/mydata.txt", sep="\t") gives the error message: Error in file(file, ifelse(append, "a", "w")) : cannot open the connection In addition: Warning message: In file(file, ifelse(append, "a", "w")) : cannot open file 'c:/mydata.txt': Permission denied Any idea about why this happens ? I tried it on two different windows PC's -- View this message in context: http://r.789695.n4.nabble.com/export-from-r-tp4703083.html Sent from the datatable-help mailing list archive at Nabble.com. From jholtman at gmail.com Wed Feb 11 13:04:46 2015 From: jholtman at gmail.com (Jim Holtman) Date: Wed, 11 Feb 2015 07:04:46 -0500 Subject: [datatable-help] export from r Message-ID: <7jsiw0p5wi7apkw1b93vl23w.1423656286337@email.android.com> Try writing to a directory under the root. Windows blocks writing a file to "c:" Sent from my Verizon Wireless 4G LTE Smartphone
-------- Original message --------
From: Ingeborg
Date:02/11/2015 06:42 (GMT-05:00)
To: datatable-help at lists.r-forge.r-project.org
Subject: [datatable-help] export from r
The code: mydata <- data.frame(c0_1,c0_2,dsoc) mydata write.table(mydata, "c:/mydata.txt", sep="\t") gives the error message: Error in file(file, ifelse(append, "a", "w")) : cannot open the connection In addition: Warning message: In file(file, ifelse(append, "a", "w")) : cannot open file 'c:/mydata.txt': Permission denied Any idea about why this happens ? I tried it on two different windows PC's -- View this message in context: http://r.789695.n4.nabble.com/export-from-r-tp4703083.html Sent from the datatable-help mailing list archive at Nabble.com. _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperickson at wisc.edu Thu Feb 12 16:40:10 2015 From: fperickson at wisc.edu (Frank Erickson) Date: Thu, 12 Feb 2015 10:40:10 -0500 Subject: [datatable-help] Best way to apply function to set of columns to create new columns where the function requires other columns from data.table In-Reply-To: <56D6A84E99427746B72C13F0F475508B062265D0@mbx022-w1-ca-2.exch022.domain.local> References: <56D6A84E99427746B72C13F0F475508B062265D0@mbx022-w1-ca-2.exch022.domain.local> Message-ID: Hi Marc, I think the set function is a good fit: for (j0 in varnames) set(dt,j=paste0(j0,'_mean'),value=wtd.mean(dt[[j0]],dt[[3]])) I guess this is significantly more efficient than nested ['s and .SD's if your data is large. If your data.table is really big, though, maybe you want to assign the weighted means elsewhere...? They're just scalars, so you probably don't need them filling out a vector of the data table. --Frank On Tue, Feb 10, 2015 at 1:58 PM, Marc Halperin wrote: > I want to add new columns to a data.table that is the weighted average of > the columns and a weight variable. This is a general problem I run into > when using .SDcols but also needing another variable from the data.table to > be available within the function within lapply. Without including that > variable within .SDcols (in this case the weight variable), I don't have > access to it in the lapply function argument. Is it a bad idea to subset > .SD how I've done it? > > library(data.table) > library(Hmisc) > > dt <- data.table(a=runif(10), b= runif(10), weight=runif(10)) > > varnames <- c("a","b") > > dt[ , ( paste( "mean", varnames, sep = "_" ) ) := lapply( .SD[ , .SD, > .SDcols = -"weight" ], wtd.mean, weight ), .SDcols = c("weight",varnames) ] > > Thanks > > -Marc > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > -------------- next part -------------- An HTML attachment was scrubbed... URL: From btupper at bigelow.org Thu Feb 12 22:27:56 2015 From: btupper at bigelow.org (Ben Tupper) Date: Thu, 12 Feb 2015 16:27:56 -0500 Subject: [datatable-help] extracting columns dynamically Message-ID: <895F3FCE-4994-4592-AD90-743FCEBE1470@bigelow.org> Hello, I would like to extract a column of a data.table, but I get unexpected (to me) results when I specify a column dynamically. DT <- data.table(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c = 13:18) thisone = "a" str(DT[,a]) # int [1:6] 1 2 3 4 5 6 str(DT[,"a", with = FALSE]) # Classes ?data.table? and 'data.frame': 6 obs. of 1 variable: # $ a: int 1 2 3 4 5 6 # - attr(*, ".internal.selfref")= str(DT[, thisone, with = FALSE]) # Classes ?data.table? and 'data.frame': 6 obs. of 1 variable: # $ a: int 1 2 3 4 5 6 # - attr(*, ".internal.selfref")= I can't noodle out from the help why the latter two don't produce a vector as the first one does. I'm looking at this online resource http://www.rdocumentation.org/packages/data.table/functions/data.table and it doesn't seem like the description of with points to having two different results. "with By default with=TRUE and j is evaluated within the frame of x; column names can be used as variables. When with=FALSE, j is a vector of names or positions to select, similar to a data.frame. with=FALSE is often useful in data.table to select columns dynamically." How should I extract a single column dynamically to retrieve a vector? Cheers and thanks, Ben > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin13.1.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] data.table_1.9.5 devtools_1.6.1 loaded via a namespace (and not attached): [1] chron_2.3-45 evaluate_0.5.5 formatR_1.0 httr_0.5 knitr_1.7 RCurl_1.95-4.1 stringr_0.6.2 [8] tools_3.1.0 Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org From gerald.jean at dgag.ca Mon Feb 16 20:54:54 2015 From: gerald.jean at dgag.ca (Gerald Jean) Date: Mon, 16 Feb 2015 19:54:54 +0000 Subject: [datatable-help] Subsetting within groups. Message-ID: <7889EDA06EB6454D92349FFF17BF790F36456699@PWPRIMX72.mvt.desjardins.com> Hello, I am fairly new to data.table, it's fast and I love it!!! Here is what I am trying to do. Suppose I have a data.table DT, with columns a, b, c, v, t and g. I want to add a new column, x, say, where for each group defined by g, in vector notation: x = c(0, (v[-1] - v[-n]) / (t[-1] - t[-n])) where n is the number of rows for the groups, I don't know n yet. Obviously DT[, x := c(0, (v[-1] - v[-n]) / (t[-1] - t[-n])), by = g] won't work. I have read the doc I found so far but couldn't find examples of subsetting the groups, maybe it could be done using .SD but I am not familiar enough with data.table yet to figure out how to do it. By the way my data.tables are large, 50000 to over 1000000 rows and I have over 60000 of them to process and many more operations to perform, I just hope data.table will do the trick!!! Thanks for your help, G?rald [cid:image001.gif at 01D049F7.A393B320] Gerald Jean, M. Sc. en statistiques Conseiller senior en statistiques Actuariat corporatif, Mod?lisation et Recherche Assurance de dommages Mouvement Desjardins L?vis (si?ge social) 418 835-4900, poste 5527639 1 877 835-4900, poste 5527639 T?l?copieur : 418 835-6657 Faites bonne impression et imprimez seulement au besoin! Ce courriel est confidentiel, peut ?tre prot?g? par le secret professionnel et est adress? exclusivement au destinataire. Il est strictement interdit ? toute autre personne de diffuser, distribuer ou reproduire ce message. Si vous l'avez re?u par erreur, veuillez imm?diatement le d?truire et aviser l'exp?diteur. Merci. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 6632 bytes Desc: image001.gif URL: From aragorn168b at gmail.com Mon Feb 16 20:59:53 2015 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Mon, 16 Feb 2015 20:59:53 +0100 Subject: [datatable-help] Subsetting within groups. In-Reply-To: <7889EDA06EB6454D92349FFF17BF790F36456699@PWPRIMX72.mvt.desjardins.com> References: <7889EDA06EB6454D92349FFF17BF790F36456699@PWPRIMX72.mvt.desjardins.com> Message-ID: Hi Gerald Jean, Glad to hear it! The number of elements in each group is contained in a special variable `.N`. IIUC, all you?ve to do is replace ?n? by ?.N? in your DT code. BTW, we are also adding detailed (HTML) vignettes, which you can find here:?https://github.com/Rdatatable/data.table/wiki/Getting-started You can see here:?https://github.com/Rdatatable/data.table/issues/944?to get an idea on the vignettes that are yet to be done. Hope this is of some help. --? Arun On 16 Feb 2015 at 20:55:04, Gerald Jean (gerald.jean at dgag.ca) wrote: Hello, ? I am fairly new to data.table, it?s fast and I love it!!! ?Here is what I am trying to do.? Suppose I have a data.table DT, with columns a, b, c, v, t and g. ?I want to add a new column, x, say, where for each group defined by g, in vector notation: ? x = c(0, (v[-1] ? v[-n]) / (t[-1] ? t[-n])) ? where n is the number of rows for the groups, I don?t know n yet.? Obviously ? DT[, x := ?c(0, (v[-1] ? v[-n]) / (t[-1] ? t[-n])), by = g] ? won?t work. ?I have read the doc I found so far but couldn?t find examples of subsetting the groups, maybe it could be done using .SD but I am not familiar enough with data.table yet to figure out how to do it. ? By the way my data.tables are large, 50000 to over 1000000 rows and I have over 60000 of them to process and many more operations to perform, I just hope data.table will do the trick!!! ? Thanks for your help, ? G?rald ? Gerald Jean, M. Sc. en statistiques Conseiller senior en statistiques Actuariat corporatif, Mod?lisation et Recherche Assurance de dommages Mouvement Desjardins L?vis (si?ge social) 418 835-4900, poste 5527639 1 877 835-4900, poste 5527639 T?l?copieur : 418 835-6657 Faites bonne impression et imprimez seulement au besoin! Ce courriel est confidentiel, peut ?tre prot?g? par le secret professionnel et est adress? exclusivement au destinataire. Il est strictement interdit ? toute autre personne de diffuser, distribuer ou reproduire ce message. Si vous l'avez re?u par erreur, veuillez imm?diatement le d?truire et aviser l'exp?diteur. Merci. ? ? _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif at 01D049F7.A393B320 Type: application/octet-stream Size: 6632 bytes Desc: not available URL: From pauljohn32 at gmail.com Wed Feb 18 18:05:24 2015 From: pauljohn32 at gmail.com (Paul Johnson) Date: Wed, 18 Feb 2015 11:05:24 -0600 Subject: [datatable-help] extracting columns dynamically In-Reply-To: <895F3FCE-4994-4592-AD90-743FCEBE1470@bigelow.org> References: <895F3FCE-4994-4592-AD90-743FCEBE1470@bigelow.org> Message-ID: I think this is an example of the "drop gotcha", masked by data table. See http://pj.freefaculty.org/blog/?p=274 basically, R defaults to "demote" one column matrices to vectors, we avoid that by additional argument drop = FALSE. However, layering data.table on top of that confuses the situation somewhat. On Thu, Feb 12, 2015 at 3:27 PM, Ben Tupper wrote: > Hello, > > I would like to extract a column of a data.table, but I get unexpected (to > me) results when I specify a column dynamically. > > DT <- data.table(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c = > 13:18) > thisone = "a" > > str(DT[,a]) > # int [1:6] 1 2 3 4 5 6 > > str(DT[,"a", with = FALSE]) > # Classes ?data.table? and 'data.frame': 6 obs. of 1 variable: > # $ a: int 1 2 3 4 5 6 > # - attr(*, ".internal.selfref")= > > str(DT[, thisone, with = FALSE]) > # Classes ?data.table? and 'data.frame': 6 obs. of 1 variable: > # $ a: int 1 2 3 4 5 6 > # - attr(*, ".internal.selfref")= > > I can't noodle out from the help why the latter two don't produce a vector > as the first one does. I'm looking at this online resource > http://www.rdocumentation.org/packages/data.table/functions/data.table > and it doesn't seem like the description of with points to having two > different results. > > "with By default with=TRUE and j is evaluated within the frame > of x; column names can be used as variables. When with=FALSE, j is a vector > of names or positions to select, similar to a data.frame. with=FALSE is > often useful in data.table to select columns dynamically." > > How should I extract a single column dynamically to retrieve a vector? > > Cheers and thanks, > Ben > > > sessionInfo() > R version 3.1.0 (2014-04-10) > Platform: x86_64-apple-darwin13.1.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] data.table_1.9.5 devtools_1.6.1 > > loaded via a namespace (and not attached): > [1] chron_2.3-45 evaluate_0.5.5 formatR_1.0 httr_0.5 knitr_1.7 > RCurl_1.95-4.1 stringr_0.6.2 > [8] tools_3.1.0 > > Ben Tupper > Bigelow Laboratory for Ocean Sciences > 60 Bigelow Drive, P.O. Box 380 > East Boothbay, Maine 04544 > http://www.bigelow.org > > > > > > > > > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From aragorn168b at gmail.com Wed Feb 18 18:11:40 2015 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Wed, 18 Feb 2015 18:11:40 +0100 Subject: [datatable-help] extracting columns dynamically In-Reply-To: <895F3FCE-4994-4592-AD90-743FCEBE1470@bigelow.org> References: <895F3FCE-4994-4592-AD90-743FCEBE1470@bigelow.org> Message-ID: Please go through the FAQ. This is outlined in the data.table FAQ 2.17 - smaller syntax differences between data.frame and data.table. DT[, col] returns a vector because there?s no other way to return a vector and users wanted a way to return a vector. DT[, ?col?, with=FALSE] returns a data.table because data.table doesn?t use the ?drop? argument. Plus you can always do DT[[?col?]] to subset a column from data.frame/data.tables. ?drop? has no other purpose, and it?s default value IMHO is a mistake. I?m not sure about plans to implement it in data.table (I?m not for it). Even if it were, you?d have to do: DT[, ?col?, with=FALSE, drop=FALSE] which seems quite bad in comparison to DT[[?col?]]. --? Arun On 12 Feb 2015 at 22:28:08, Ben Tupper (btupper at bigelow.org) wrote: Hello, I would like to extract a column of a data.table, but I get unexpected (to me) results when I specify a column dynamically. DT <- data.table(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c = 13:18) thisone = "a" str(DT[,a]) # int [1:6] 1 2 3 4 5 6 str(DT[,"a", with = FALSE]) # Classes ?data.table? and 'data.frame': 6 obs. of 1 variable: # $ a: int 1 2 3 4 5 6 # - attr(*, ".internal.selfref")= str(DT[, thisone, with = FALSE]) # Classes ?data.table? and 'data.frame': 6 obs. of 1 variable: # $ a: int 1 2 3 4 5 6 # - attr(*, ".internal.selfref")= I can't noodle out from the help why the latter two don't produce a vector as the first one does. I'm looking at this online resource http://www.rdocumentation.org/packages/data.table/functions/data.table and it doesn't seem like the description of with points to having two different results. "with By default with=TRUE and j is evaluated within the frame of x; column names can be used as variables. When with=FALSE, j is a vector of names or positions to select, similar to a data.frame. with=FALSE is often useful in data.table to select columns dynamically." How should I extract a single column dynamically to retrieve a vector? Cheers and thanks, Ben > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin13.1.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] data.table_1.9.5 devtools_1.6.1 loaded via a namespace (and not attached): [1] chron_2.3-45 evaluate_0.5.5 formatR_1.0 httr_0.5 knitr_1.7 RCurl_1.95-4.1 stringr_0.6.2 [8] tools_3.1.0 Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From btupper at bigelow.org Thu Feb 19 15:49:04 2015 From: btupper at bigelow.org (Ben Tupper) Date: Thu, 19 Feb 2015 09:49:04 -0500 Subject: [datatable-help] extracting columns dynamically In-Reply-To: References: <895F3FCE-4994-4592-AD90-743FCEBE1470@bigelow.org> Message-ID: <14BB4706-64A9-4144-A760-069F3C293C8F@bigelow.org> Hi, On Feb 18, 2015, at 12:11 PM, Arunkumar Srinivasan wrote: > Please go through the FAQ. This is outlined in the data.table FAQ 2.17 - smaller syntax differences between data.frame and data.table. > > DT[, col] returns a vector because there?s no other way to return a vector and users wanted a way to return a vector. > DT[, ?col?, with=FALSE] returns a data.table because data.table doesn?t use the ?drop? argument. Plus you can always do DT[[?col?]] to subset a column from data.frame/data.tables. > ?drop? has no other purpose, and it?s default value IMHO is a mistake. I?m not sure about plans to implement it in data.table (I?m not for it). Even if it were, you?d have to do: DT[, ?col?, with=FALSE, drop=FALSE] which seems quite bad in comparison to DT[[?col?]]. Thanks for this; the DT[['col']] suites my needs perfectly. I can see in the examples (for ?data.table) and in the FAQ the examples that show the column selection behavior using 'with'. This discussion makes me wonder if the documentation for 'with' might benefit from a small embellishment. Perhaps like this? with: By default with=TRUE and j is evaluated within the frame of x; column names can be used as variables. When with=FALSE j is a character vector of column names or a numeric vector of column positions to select, and the value returned is always a data.table. with=FALSE is often useful in data.table to select columns dynamically. I think the above faithfully describes the behavior I see, but I defer to you to know what is best. Thanks again, Ben > -- > Arun > > On 12 Feb 2015 at 22:28:08, Ben Tupper (btupper at bigelow.org) wrote: > >> Hello, >> >> I would like to extract a column of a data.table, but I get unexpected (to me) results when I specify a column dynamically. >> >> DT <- data.table(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c = 13:18) >> thisone = "a" >> >> str(DT[,a]) >> # int [1:6] 1 2 3 4 5 6 >> >> str(DT[,"a", with = FALSE]) >> # Classes ?data.table? and 'data.frame': 6 obs. of 1 variable: >> # $ a: int 1 2 3 4 5 6 >> # - attr(*, ".internal.selfref")= >> >> str(DT[, thisone, with = FALSE]) >> # Classes ?data.table? and 'data.frame': 6 obs. of 1 variable: >> # $ a: int 1 2 3 4 5 6 >> # - attr(*, ".internal.selfref")= >> >> I can't noodle out from the help why the latter two don't produce a vector as the first one does. I'm looking at this online resource http://www.rdocumentation.org/packages/data.table/functions/data.table and it doesn't seem like the description of with points to having two different results. >> >> "with By default with=TRUE and j is evaluated within the frame of x; column names can be used as variables. When with=FALSE, j is a vector of names or positions to select, similar to a data.frame. with=FALSE is often useful in data.table to select columns dynamically." >> >> How should I extract a single column dynamically to retrieve a vector? >> >> Cheers and thanks, >> Ben >> >> > sessionInfo() >> R version 3.1.0 (2014-04-10) >> Platform: x86_64-apple-darwin13.1.0 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] data.table_1.9.5 devtools_1.6.1 >> >> loaded via a namespace (and not attached): >> [1] chron_2.3-45 evaluate_0.5.5 formatR_1.0 httr_0.5 knitr_1.7 RCurl_1.95-4.1 stringr_0.6.2 >> [8] tools_3.1.0 >> >> Ben Tupper >> Bigelow Laboratory for Ocean Sciences >> 60 Bigelow Drive, P.O. Box 380 >> East Boothbay, Maine 04544 >> http://www.bigelow.org >> >> >> >> >> >> >> >> >> _______________________________________________ >> datatable-help mailing list >> datatable-help at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mickcooney at gmail.com Fri Feb 20 16:42:04 2015 From: mickcooney at gmail.com (Mick Cooney) Date: Fri, 20 Feb 2015 15:42:04 +0000 Subject: [datatable-help] Memory usage of data.table chaining Message-ID: I gave a talk about data.table last night to Dublin R and got a very interesting question at the end of it that I hadn't thought of before. I was showing how you can chain operations together in nice concise one liners, the specific example I gave was: show.dt <- trade.dt[typeID %in% showID] [transactionType == side] [, list(transactionID, transactTime, transactionType, typeID, typeName, quantity, price)]; print(tail(show.dt, n = count)); This code is written for the game Eve Online and is used to show the last n number of trades on one side of a trade that my character had done, and I used it as an example of operation chaining. I was asked at the end of talk if the chaining of the typeID and the transactionType was any different to using a logical AND, and my response was that I wasn't sure, but I figured it might be, as doing the logical AND would invoke a vector scan. He then asked about memory use, so in the above example, do all the subcopies of the tables get kept in memory during the invocation, in effect mushrooming the amount of memory required? If that was the case, I could imagine that for large tables it might be worth going with the logical operation to prevent the multiple copies being made? -- Mick Cooney mickcooney at gmail.com From aragorn168b at gmail.com Fri Feb 20 17:11:05 2015 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Fri, 20 Feb 2015 17:11:05 +0100 Subject: [datatable-help] extracting columns dynamically In-Reply-To: <14BB4706-64A9-4144-A760-069F3C293C8F@bigelow.org> References: <895F3FCE-4994-4592-AD90-743FCEBE1470@bigelow.org> <14BB4706-64A9-4144-A760-069F3C293C8F@bigelow.org> Message-ID: Ben,? That?s very clear.. Would you mind making a Pull Request with this change? --? Arun On 19 Feb 2015 at 15:49:16, Ben Tupper (btupper at bigelow.org) wrote: Hi, On Feb 18, 2015, at 12:11 PM, Arunkumar Srinivasan wrote: Please go through the FAQ. This is outlined in the data.table FAQ 2.17 - smaller syntax differences between data.frame and data.table. DT[, col] returns a vector because there?s no other way to return a vector and users wanted a way to return a vector. DT[, ?col?, with=FALSE] returns a data.table because data.table doesn?t use the ?drop? argument. Plus you can always do DT[[?col?]] to subset a column from data.frame/data.tables. ?drop? has no other purpose, and it?s default value IMHO is a mistake. I?m not sure about plans to implement it in data.table (I?m not for it). Even if it were, you?d have to do: DT[, ?col?, with=FALSE, drop=FALSE] which seems quite bad in comparison to DT[[?col?]]. Thanks for this; the DT[['col']] suites my needs perfectly. ?I can see in the examples (for ?data.table) and in the FAQ the examples that show the column selection behavior using 'with'. This discussion makes me wonder if the documentation for 'with' might benefit from a small embellishment. ?Perhaps like this? with: By default with=TRUE and j is evaluated within the frame of x; column names can be used as variables. When with=FALSE j is a character vector of column names or a numeric vector of column positions to select, and the value returned is always a data.table. ?with=FALSE is often useful in data.table to select columns dynamically. I think the above faithfully describes the behavior I see, but I defer to you to know what is best. Thanks again, Ben --? Arun On 12 Feb 2015 at 22:28:08, Ben Tupper (btupper at bigelow.org) wrote: Hello, I would like to extract a column of a data.table, but I get unexpected (to me) results when I specify a column dynamically. DT <- data.table(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c = 13:18) thisone = "a" str(DT[,a]) # int [1:6] 1 2 3 4 5 6 str(DT[,"a", with = FALSE]) # Classes ?data.table? and 'data.frame': 6 obs. of 1 variable: # $ a: int 1 2 3 4 5 6 # - attr(*, ".internal.selfref")= str(DT[, thisone, with = FALSE]) # Classes ?data.table? and 'data.frame': 6 obs. of 1 variable: # $ a: int 1 2 3 4 5 6 # - attr(*, ".internal.selfref")= I can't noodle out from the help why the latter two don't produce a vector as the first one does. I'm looking at this online resource http://www.rdocumentation.org/packages/data.table/functions/data.table and it doesn't seem like the description of with points to having two different results. "with By default with=TRUE and j is evaluated within the frame of x; column names can be used as variables. When with=FALSE, j is a vector of names or positions to select, similar to a data.frame. with=FALSE is often useful in data.table to select columns dynamically." How should I extract a single column dynamically to retrieve a vector? Cheers and thanks, Ben > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin13.1.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] data.table_1.9.5 devtools_1.6.1 loaded via a namespace (and not attached): [1] chron_2.3-45 evaluate_0.5.5 formatR_1.0 httr_0.5 knitr_1.7 RCurl_1.95-4.1 stringr_0.6.2 [8] tools_3.1.0 Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help Ben Tupper Bigelow Laboratory for Ocean?Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From aragorn168b at gmail.com Fri Feb 20 23:36:30 2015 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Fri, 20 Feb 2015 23:36:30 +0100 Subject: [datatable-help] Memory usage of data.table chaining In-Reply-To: References: Message-ID: Hi Mick, Hope it went great! Yes, this isn?t particularly memory efficient as you materialise the first subset, only to subset again with your second condition. The query within `[?]` can be optimised much easier when compared to chained expressions.? What?s the rationale here for doing it this way? To take advantage of automatic indexing? It?d be great to have auto indexing optimised for complex expressions like `typeID %in% showID & transactionType == side` but until then, setting key and subsetting would be the best way.? HTH, Arun On 20 Feb 2015 at 16:42:51, Mick Cooney (mickcooney at gmail.com) wrote: I gave a talk about data.table last night to Dublin R and got a very interesting question at the end of it that I hadn't thought of before. I was showing how you can chain operations together in nice concise one liners, the specific example I gave was: show.dt <- trade.dt[typeID %in% showID] [transactionType == side] [, list(transactionID, transactTime, transactionType, typeID, typeName, quantity, price)]; print(tail(show.dt, n = count)); This code is written for the game Eve Online and is used to show the last n number of trades on one side of a trade that my character had done, and I used it as an example of operation chaining. I was asked at the end of talk if the chaining of the typeID and the transactionType was any different to using a logical AND, and my response was that I wasn't sure, but I figured it might be, as doing the logical AND would invoke a vector scan. He then asked about memory use, so in the above example, do all the subcopies of the tables get kept in memory during the invocation, in effect mushrooming the amount of memory required? If that was the case, I could imagine that for large tables it might be worth going with the logical operation to prevent the multiple copies being made? -- Mick Cooney mickcooney at gmail.com _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From mickcooney at gmail.com Sat Feb 21 15:01:54 2015 From: mickcooney at gmail.com (Mick Cooney) Date: Sat, 21 Feb 2015 14:01:54 +0000 Subject: [datatable-help] Memory usage of data.table chaining In-Reply-To: References: Message-ID: I generally don't think of using the key, is it worth setting temporary keys for stuff like that? I would have thought that if you are doing a select on different columns (thus meaning the keys would need to be recreated) that the speed up from the key-based select would be negated by the cost of resetting keys? It's definitely something I should probably consider doing more. -- Mick Cooney mickcooney at gmail.com From carlosalberto.arnillas at gmail.com Sun Feb 22 00:41:20 2015 From: carlosalberto.arnillas at gmail.com (Carlos Alberto Arnillas) Date: Sat, 21 Feb 2015 18:41:20 -0500 Subject: [datatable-help] bug in merge when a table is keyed? Message-ID: Hello I am running the last version of R and data.table, however, I found a problem that I think has been reported for previous versions and I assumed it was fixed. Here is the data (as obtained from dput from a larger code) yy1 <- structure(list(Spp = c("vicr", "festuca"), rel_cover = c(0.0365853658536585, 0.0609756097560976)), row.names = c(NA, -2L), class = c("data.table", "data.frame"), .Names = c("Spp", "rel_cover")) yy2 <- structure(list(Spp = c("eugra", "vicr", "festuca"), rel_cover = c(0.048780487804878, 0.0609756097560976, 0.0975609756097561)), row.names = c(NA, -3L), class = c("data.table", "data.frame"), .Names = c("Spp", "rel_cover"), sorted = "Spp") > yy2 Spp rel_cover 1: eugra 0.04878049 2: vicr 0.06097561 3: festuca 0.09756098 for some reason, the yy2 dataset had a key assigned (Spp) but wrongly applied (in fact, I never sort that dataset or the one that I used to create it using that variable). Then, if I try to merge both, I get a wrong result: > merge(yy1,yy2, by="Spp",all=T) Spp rel_cover.x rel_cover.y 1: eugra NA 0.04878049 2: festuca 0.06097561 NA 3: festuca NA 0.09756098 4: vicr 0.03658537 0.06097561 however, if I set the key for each variable, I first get a warning, and then the right result > setkey(yy1, Spp) > setkey(yy2, Spp) Warning message: In setkeyv(x, cols, verbose = verbose, physical = physical) : Already keyed by this key but had invalid row order, key rebuilt. If you didn't go under the hood please let datatable-help know so the root cause can be fixed. > merge(yy1,yy2, by="Spp",all=T) Spp rel_cover.x rel_cover.y 1: eugra NA 0.04878049 2: festuca 0.06097561 0.09756098 3: vicr 0.03658537 0.06097561 To solve temporally the problem, I am using merge.data.frame, but I would prefer to keep all my data in data.table If it is not a bug, and I can do something to fix it, let me know please. Thanks in advance Carlos Alberto From aragorn168b at gmail.com Tue Feb 24 00:23:06 2015 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Tue, 24 Feb 2015 00:23:06 +0100 Subject: [datatable-help] bug in merge when a table is keyed? In-Reply-To: References: Message-ID: Hi Carlos, It?d be helpful to generate a MRE as to how you ended up with the data.table having a key set when it?s not really ordered properly.. Also, could you please test on level version as well (I don?t know the version you?re running on)? --? Arun On 22 Feb 2015 at 00:41:51, Carlos Alberto Arnillas (carlosalberto.arnillas at gmail.com) wrote: Hello I am running the last version of R and data.table, however, I found a problem that I think has been reported for previous versions and I assumed it was fixed. Here is the data (as obtained from dput from a larger code) yy1 <- structure(list(Spp = c("vicr", "festuca"), rel_cover = c(0.0365853658536585, 0.0609756097560976)), row.names = c(NA, -2L), class = c("data.table", "data.frame"), .Names = c("Spp", "rel_cover")) yy2 <- structure(list(Spp = c("eugra", "vicr", "festuca"), rel_cover = c(0.048780487804878, 0.0609756097560976, 0.0975609756097561)), row.names = c(NA, -3L), class = c("data.table", "data.frame"), .Names = c("Spp", "rel_cover"), sorted = "Spp") > yy2 Spp rel_cover 1: eugra 0.04878049 2: vicr 0.06097561 3: festuca 0.09756098 for some reason, the yy2 dataset had a key assigned (Spp) but wrongly applied (in fact, I never sort that dataset or the one that I used to create it using that variable). Then, if I try to merge both, I get a wrong result: > merge(yy1,yy2, by="Spp",all=T) Spp rel_cover.x rel_cover.y 1: eugra NA 0.04878049 2: festuca 0.06097561 NA 3: festuca NA 0.09756098 4: vicr 0.03658537 0.06097561 however, if I set the key for each variable, I first get a warning, and then the right result > setkey(yy1, Spp) > setkey(yy2, Spp) Warning message: In setkeyv(x, cols, verbose = verbose, physical = physical) : Already keyed by this key but had invalid row order, key rebuilt. If you didn't go under the hood please let datatable-help know so the root cause can be fixed. > merge(yy1,yy2, by="Spp",all=T) Spp rel_cover.x rel_cover.y 1: eugra NA 0.04878049 2: festuca 0.06097561 0.09756098 3: vicr 0.03658537 0.06097561 To solve temporally the problem, I am using merge.data.frame, but I would prefer to keep all my data in data.table If it is not a bug, and I can do something to fix it, let me know please. Thanks in advance Carlos Alberto _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From aragorn168b at gmail.com Tue Feb 24 00:24:23 2015 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Tue, 24 Feb 2015 00:24:23 +0100 Subject: [datatable-help] Memory usage of data.table chaining In-Reply-To: References: Message-ID: It depends. Keys are useful if you?ve to set it once, and use it for repeated subsets, or you?ve really huge data, where keeping data sorted in memory could improve speed tremendous due to cache efficiency. But auto indexing would be the way to go wherever applicable. We should be expanding it when we find time next. --? Arun On 21 Feb 2015 at 15:02:25, Mick Cooney (mickcooney at gmail.com) wrote: I generally don't think of using the key, is it worth setting temporary keys for stuff like that? I would have thought that if you are doing a select on different columns (thus meaning the keys would need to be recreated) that the speed up from the key-based select would be negated by the cost of resetting keys? It's definitely something I should probably consider doing more. -- Mick Cooney mickcooney at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlosalberto.arnillas at gmail.com Tue Feb 24 00:31:10 2015 From: carlosalberto.arnillas at gmail.com (Carlos Alberto Arnillas) Date: Mon, 23 Feb 2015 18:31:10 -0500 Subject: [datatable-help] bug in merge when a table is keyed? In-Reply-To: References: Message-ID: Hi. The version is 1.9.4. About how I ended up with a table not properly sorted? It happened because that table is a small subset (in terms of rows and columns) of a larger table, and the key used for the larger one include that variable as a third column. So, I guess that the new table inherit the key only for the columns that are in its subset, but it didn't rebuild the index, so the table end up unsorted... Carlos Alberto On Mon, Feb 23, 2015 at 6:23 PM, Arunkumar Srinivasan wrote: > Hi Carlos, > > It?d be helpful to generate a MRE as to how you ended up with the data.table > having a key set when it?s not really ordered properly.. Also, could you > please test on level version as well (I don?t know the version you?re > running on)? > > -- > Arun > > On 22 Feb 2015 at 00:41:51, Carlos Alberto Arnillas > (carlosalberto.arnillas at gmail.com) wrote: > > Hello > I am running the last version of R and data.table, however, I found a > problem that I think has been reported for previous versions and I > assumed it was fixed. > > Here is the data (as obtained from dput from a larger code) > yy1 <- structure(list(Spp = c("vicr", "festuca"), > rel_cover = c(0.0365853658536585, > 0.0609756097560976)), > row.names = c(NA, -2L), class = > c("data.table", "data.frame"), > .Names = c("Spp", "rel_cover")) > > yy2 <- structure(list(Spp = c("eugra", "vicr", "festuca"), > rel_cover = c(0.048780487804878, > 0.0609756097560976, 0.0975609756097561)), > row.names = c(NA, -3L), > class = c("data.table", "data.frame"), > .Names = c("Spp", "rel_cover"), sorted = "Spp") >> yy2 > Spp rel_cover > 1: eugra 0.04878049 > 2: vicr 0.06097561 > 3: festuca 0.09756098 > > for some reason, the yy2 dataset had a key assigned (Spp) but wrongly > applied (in fact, I never sort that dataset or the one that I used to > create it using that variable). Then, if I try to merge both, I get a > wrong result: > >> merge(yy1,yy2, by="Spp",all=T) > Spp rel_cover.x rel_cover.y > 1: eugra NA 0.04878049 > 2: festuca 0.06097561 NA > 3: festuca NA 0.09756098 > 4: vicr 0.03658537 0.06097561 > > however, if I set the key for each variable, I first get a warning, > and then the right result > >> setkey(yy1, Spp) >> setkey(yy2, Spp) > Warning message: > In setkeyv(x, cols, verbose = verbose, physical = physical) : > Already keyed by this key but had invalid row order, key rebuilt. If > you didn't go under the hood please let datatable-help know so the > root cause can be fixed. > > >> merge(yy1,yy2, by="Spp",all=T) > Spp rel_cover.x rel_cover.y > 1: eugra NA 0.04878049 > 2: festuca 0.06097561 0.09756098 > 3: vicr 0.03658537 0.06097561 > > > To solve temporally the problem, I am using merge.data.frame, but I > would prefer to keep all my data in data.table > > If it is not a bug, and I can do something to fix it, let me know please. > > Thanks in advance > > Carlos Alberto > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help From aragorn168b at gmail.com Tue Feb 24 00:32:33 2015 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Tue, 24 Feb 2015 00:32:33 +0100 Subject: [datatable-help] bug in merge when a table is keyed? In-Reply-To: References: Message-ID: It shouldn?t retain key if the subset results in reordering. So if you could provide an example that reproduces the key retaining, that?d be great! Also, please test it on 1.9.5 (current level version) to make sure it?s not fixed yet. Thanks. --? Arun On 24 Feb 2015 at 00:31:30, Carlos Alberto Arnillas (carlosalberto.arnillas at gmail.com) wrote: Hi. The version is 1.9.4. About how I ended up with a table not properly sorted? It happened because that table is a small subset (in terms of rows and columns) of a larger table, and the key used for the larger one include that variable as a third column. So, I guess that the new table inherit the key only for the columns that are in its subset, but it didn't rebuild the index, so the table end up unsorted... Carlos Alberto On Mon, Feb 23, 2015 at 6:23 PM, Arunkumar Srinivasan wrote: > Hi Carlos, > > It?d be helpful to generate a MRE as to how you ended up with the data.table > having a key set when it?s not really ordered properly.. Also, could you > please test on level version as well (I don?t know the version you?re > running on)? > > -- > Arun > > On 22 Feb 2015 at 00:41:51, Carlos Alberto Arnillas > (carlosalberto.arnillas at gmail.com) wrote: > > Hello > I am running the last version of R and data.table, however, I found a > problem that I think has been reported for previous versions and I > assumed it was fixed. > > Here is the data (as obtained from dput from a larger code) > yy1 <- structure(list(Spp = c("vicr", "festuca"), > rel_cover = c(0.0365853658536585, > 0.0609756097560976)), > row.names = c(NA, -2L), class = > c("data.table", "data.frame"), > .Names = c("Spp", "rel_cover")) > > yy2 <- structure(list(Spp = c("eugra", "vicr", "festuca"), > rel_cover = c(0.048780487804878, > 0.0609756097560976, 0.0975609756097561)), > row.names = c(NA, -3L), > class = c("data.table", "data.frame"), > .Names = c("Spp", "rel_cover"), sorted = "Spp") >> yy2 > Spp rel_cover > 1: eugra 0.04878049 > 2: vicr 0.06097561 > 3: festuca 0.09756098 > > for some reason, the yy2 dataset had a key assigned (Spp) but wrongly > applied (in fact, I never sort that dataset or the one that I used to > create it using that variable). Then, if I try to merge both, I get a > wrong result: > >> merge(yy1,yy2, by="Spp",all=T) > Spp rel_cover.x rel_cover.y > 1: eugra NA 0.04878049 > 2: festuca 0.06097561 NA > 3: festuca NA 0.09756098 > 4: vicr 0.03658537 0.06097561 > > however, if I set the key for each variable, I first get a warning, > and then the right result > >> setkey(yy1, Spp) >> setkey(yy2, Spp) > Warning message: > In setkeyv(x, cols, verbose = verbose, physical = physical) : > Already keyed by this key but had invalid row order, key rebuilt. If > you didn't go under the hood please let datatable-help know so the > root cause can be fixed. > > >> merge(yy1,yy2, by="Spp",all=T) > Spp rel_cover.x rel_cover.y > 1: eugra NA 0.04878049 > 2: festuca 0.06097561 0.09756098 > 3: vicr 0.03658537 0.06097561 > > > To solve temporally the problem, I am using merge.data.frame, but I > would prefer to keep all my data in data.table > > If it is not a bug, and I can do something to fix it, let me know please. > > Thanks in advance > > Carlos Alberto > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From statquant at outlook.com Thu Feb 26 18:02:20 2015 From: statquant at outlook.com (statquant3) Date: Thu, 26 Feb 2015 09:02:20 -0800 (PST) Subject: [datatable-help] using lapply to subset mant data.tables Message-ID: <1424970140288-4703898.post@n4.nabble.com> say I have 3 data.tables set.seed(1) DT1 = data.table(x=c(1L,1L,2L),y=rnorm(3)) DT2 = data.table(x=c(1L,2L,2L),y=rnorm(3)) DT3 = data.table(x=c(2L,2L,1L),y=rnorm(3)) DTList = list(DT1,DT2,DT3) I'd like to apply the i expression "x==1L" to all 3 DTs I tried several approaches with lapply like: lapply(DTList, FUN=subset, select=quote(x==1L)) lapply(DTList, FUN=subset, select=x==1L) None works, is there a smart DT way to do this (my DTs are in a list) Cheers -- View this message in context: http://r.789695.n4.nabble.com/using-lapply-to-subset-mant-data-tables-tp4703898.html Sent from the datatable-help mailing list archive at Nabble.com. From gsee000 at gmail.com Thu Feb 26 18:15:53 2015 From: gsee000 at gmail.com (G See) Date: Thu, 26 Feb 2015 11:15:53 -0600 Subject: [datatable-help] using lapply to subset mant data.tables In-Reply-To: <1424970140288-4703898.post@n4.nabble.com> References: <1424970140288-4703898.post@n4.nabble.com> Message-ID: You didn't provide the desired output, but I think you're looking for something like this? lapply(DTList, "[", x==1L) lapply(DTList, subset, x==1L) HTH, Garrett On Thu, Feb 26, 2015 at 11:02 AM, statquant3 wrote: > say I have 3 data.tables > > set.seed(1) > DT1 = data.table(x=c(1L,1L,2L),y=rnorm(3)) > DT2 = data.table(x=c(1L,2L,2L),y=rnorm(3)) > DT3 = data.table(x=c(2L,2L,1L),y=rnorm(3)) > > DTList = list(DT1,DT2,DT3) > > I'd like to apply the i expression "x==1L" to all 3 DTs > I tried several approaches with lapply like: > > lapply(DTList, FUN=subset, select=quote(x==1L)) > lapply(DTList, FUN=subset, select=x==1L) > > None works, is there a smart DT way to do this (my DTs are in a list) > > Cheers > > > > -- > View this message in context: http://r.789695.n4.nabble.com/using-lapply-to-subset-mant-data-tables-tp4703898.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help From statquant at outlook.com Thu Feb 26 18:45:45 2015 From: statquant at outlook.com (statquant3) Date: Thu, 26 Feb 2015 09:45:45 -0800 (PST) Subject: [datatable-help] using lapply to subset mant data.tables In-Reply-To: References: <1424970140288-4703898.post@n4.nabble.com> Message-ID: <1424972745131-4703902.post@n4.nabble.com> Exactly what I wanted... I tried that actually, for some reason it did not work... Thanks -- View this message in context: http://r.789695.n4.nabble.com/using-lapply-to-subset-mant-data-tables-tp4703898p4703902.html Sent from the datatable-help mailing list archive at Nabble.com.