From hideyoshi.maeda at gmail.com Fri Feb 1 23:40:01 2013 From: hideyoshi.maeda at gmail.com (Hideyoshi Maeda) Date: Fri, 1 Feb 2013 22:40:01 +0000 Subject: [datatable-help] Reading character strings into fread In-Reply-To: References: Message-ID: <5AB2C47B-12AA-4A03-AD2E-54CB8EAE2392@gmail.com> I am having a bit of trouble reading character string tables into fread. The character string was generated as part of the output from an API GET request from dropbox. (the API link to dropbox was done via the httr package) Basically I would like freed to do something similar to read.csv?but it seems to not work?.please see the below example, i which i download a csv file, with the content of response as a character string?. I would rather not use read.csv to write a csv file first then and get fread to read it in, as it seems like it is a bit of a round about way of doing it, rather than reading in the table directly from the character string. any help would be much appreciated. thanks p.s. @MD?i thought i responded to your previous email?but clearly I didn't?.so i thought i might as well email the list as well.. > db.app <- oauth_app("db",key=getOption("DropboxKey"), secret=getOption("DropboxSecret")) > db.sig <- sign_oauth1.0(db.app, token=getOption("DropboxOAuthKey"), token_secret=getOption("DropboxOAuthSecret")) > > response <- GET(url=paste0("https://api-content.dropbox.com/1/files/dropbox/",gsub("%2F","/",curlEscape("!! test folder/new file.csv"))),config=c(db.sig,add_headers(Accept="x-dropbox-metadata"))) > response Response [https://api-content.dropbox.com/1/files/dropbox/%21%21%20test%20folder/new%20file.csv] Status: 200 Content-type: text/csv; charset=ascii "Date.and.Time","Open","High","Low","Close","Volume" "2007/01/02 01:46:00",20083,20088,20071,20075,212 "2007/01/02 01:47:00",20075,20120,20075,20106,328 "2007/01/02 01:48:00",20105,20110,20094,20096,256 "2007/01/02 01:49:00",20096,20106,20085,20099,177 "2007/01/02 01:50:00",20098,20100,20081,20092,184 "2007/01/02 01:51:00",20091,20094,20087,20093,48 "2007/01/02 01:52:00",20093,20095,20085,20088,147 "2007/01/02 01:53:00",20088,20090,20086,20089,26 "2007/01/02 01:54:00",20089,20100,20089,20091,116 ... > require(data.table) Loading required package: data.table data.table 1.8.7 For help type: help("data.table") > x <- fread(content(response),sep=",",verbose=TRUE) Input contains a \n (or is ""), taking this to be text input (not a filename) Detected eol as \n only (no \r afterwards), the UNIX and Mac standard. Looking for supplied sep ',' on line 30 (the last non blank line in the first 30) ... found Found 6 columns First row with 6 fields occurs on line 1 (either column names or first row of data) All the fields on line 1 are character fields. Treating as the column names. Count of eol after first data row: 82 Subtracted 0 for last eol and any trailing empty lines, leaving 82 data rows Type codes: 300000 (first 5 rows) Type codes: 300000 (+middle 5 rows) Error in fread(content(response), sep = ",", verbose = TRUE) : Expected sep (',') but ' > x <- read.csv(text=content(response),header=TRUE,stringsAsFactors=FALSE) > head(x) Date.and.Time Open High Low Close Volume 1 2007/01/02 01:46:00 20083 20088 20071 20075 212 2 2007/01/02 01:47:00 20075 20120 20075 20106 328 3 2007/01/02 01:48:00 20105 20110 20094 20096 256 4 2007/01/02 01:49:00 20096 20106 20085 20099 177 5 2007/01/02 01:50:00 20098 20100 20081 20092 184 6 2007/01/02 01:51:00 20091 20094 20087 20093 48 > str(content(response)) chr "\"Date.and.Time\",\"Open\",\"High\",\"Low\",\"Close\",\"Volume\"\n\"2007/01/02 01:46:00\",20083,20088,20071,20075,212\n\"2007/0"| __truncated__ > str(x) 'data.frame': 100 obs. of 6 variables: $ Date.and.Time: chr "2007/01/02 01:46:00" "2007/01/02 01:47:00" "2007/01/02 01:48:00" "2007/01/02 01:49:00" ... $ Open : int 20083 20075 20105 20096 20098 20091 20093 20088 20089 20090 ... $ High : int 20088 20120 20110 20106 20100 20094 20095 20090 20100 20093 ... $ Low : int 20071 20075 20094 20085 20081 20087 20085 20086 20089 20083 ... $ Close : int 20075 20106 20096 20099 20092 20093 20088 20089 20091 20093 ... $ Volume : int 212 328 256 177 184 48 147 26 116 47 ... -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdowle at mdowle.plus.com Sun Feb 3 10:34:04 2013 From: mdowle at mdowle.plus.com (Matthew Dowle) Date: Sun, 3 Feb 2013 09:34:04 -0000 Subject: [datatable-help] Reading character strings into fread In-Reply-To: <5AB2C47B-12AA-4A03-AD2E-54CB8EAE2392@gmail.com> References: <5AB2C47B-12AA-4A03-AD2E-54CB8EAE2392@gmail.com> Message-ID: <62f467f7a4437271e695db3f271b8430.squirrel@webmail.plus.net> Thanks, looks like a bug when detecting types at the end of the file. The verbose output suggests the final line isn't \n terminated. There are many tests that should be ok without but perhaps something peculiar combined with this format. Does adding a newline at the end fix it as a temp workaround? > I am having a bit of trouble reading character string tables into fread. > > The character string was generated as part of the output from an API GET > request from dropbox. (the API link to dropbox was done via the httr > package) > > Basically I would like freed to do something similar to read.csv but it > seems to not work .please see the below example, i which i download a csv > file, with the content of response as a character string . > > I would rather not use read.csv to write a csv file first then and get > fread to read it in, as it seems like it is a bit of a round about way of > doing it, rather than reading in the table directly from the character > string. > > any help would be much appreciated. > > thanks > > p.s. @MD i thought i responded to your previous email but clearly I > didn't .so i thought i might as well email the list as well.. > >> db.app <- oauth_app("db",key=getOption("DropboxKey"), >> secret=getOption("DropboxSecret")) >> db.sig <- sign_oauth1.0(db.app, token=getOption("DropboxOAuthKey"), >> token_secret=getOption("DropboxOAuthSecret")) >> >> response <- >> GET(url=paste0("https://api-content.dropbox.com/1/files/dropbox/",gsub("%2F","/",curlEscape("!! >> test folder/new >> file.csv"))),config=c(db.sig,add_headers(Accept="x-dropbox-metadata"))) >> response > Response > [https://api-content.dropbox.com/1/files/dropbox/%21%21%20test%20folder/new%20file.csv] > Status: 200 > Content-type: text/csv; charset=ascii > "Date.and.Time","Open","High","Low","Close","Volume" > "2007/01/02 01:46:00",20083,20088,20071,20075,212 > "2007/01/02 01:47:00",20075,20120,20075,20106,328 > "2007/01/02 01:48:00",20105,20110,20094,20096,256 > "2007/01/02 01:49:00",20096,20106,20085,20099,177 > "2007/01/02 01:50:00",20098,20100,20081,20092,184 > "2007/01/02 01:51:00",20091,20094,20087,20093,48 > "2007/01/02 01:52:00",20093,20095,20085,20088,147 > "2007/01/02 01:53:00",20088,20090,20086,20089,26 > "2007/01/02 01:54:00",20089,20100,20089,20091,116 ... >> require(data.table) > Loading required package: data.table > data.table 1.8.7 For help type: help("data.table") >> x <- fread(content(response),sep=",",verbose=TRUE) > Input contains a \n (or is ""), taking this to be text input (not a > filename) > Detected eol as \n only (no \r afterwards), the UNIX and Mac standard. > Looking for supplied sep ',' on line 30 (the last non blank line in the > first 30) ... found > Found 6 columns > First row with 6 fields occurs on line 1 (either column names or first row > of data) > All the fields on line 1 are character fields. Treating as the column > names. > Count of eol after first data row: 82 > Subtracted 0 for last eol and any trailing empty lines, leaving 82 data > rows > Type codes: 300000 (first 5 rows) > Type codes: 300000 (+middle 5 rows) > Error in fread(content(response), sep = ",", verbose = TRUE) : > Expected sep (',') but ' >> x <- read.csv(text=content(response),header=TRUE,stringsAsFactors=FALSE) >> head(x) > Date.and.Time Open High Low Close Volume > 1 2007/01/02 01:46:00 20083 20088 20071 20075 212 > 2 2007/01/02 01:47:00 20075 20120 20075 20106 328 > 3 2007/01/02 01:48:00 20105 20110 20094 20096 256 > 4 2007/01/02 01:49:00 20096 20106 20085 20099 177 > 5 2007/01/02 01:50:00 20098 20100 20081 20092 184 > 6 2007/01/02 01:51:00 20091 20094 20087 20093 48 >> str(content(response)) > chr > "\"Date.and.Time\",\"Open\",\"High\",\"Low\",\"Close\",\"Volume\"\n\"2007/01/02 > 01:46:00\",20083,20088,20071,20075,212\n\"2007/0"| __truncated__ >> str(x) > 'data.frame': 100 obs. of 6 variables: > $ Date.and.Time: chr "2007/01/02 01:46:00" "2007/01/02 01:47:00" > "2007/01/02 01:48:00" "2007/01/02 01:49:00" ... > $ Open : int 20083 20075 20105 20096 20098 20091 20093 20088 > 20089 20090 ... > $ High : int 20088 20120 20110 20106 20100 20094 20095 20090 > 20100 20093 ... > $ Low : int 20071 20075 20094 20085 20081 20087 20085 20086 > 20089 20083 ... > $ Close : int 20075 20106 20096 20099 20092 20093 20088 20089 > 20091 20093 ... > $ Volume : int 212 328 256 177 184 48 147 26 116 47 ... > > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help From statquant at outlook.com Tue Feb 12 16:26:18 2013 From: statquant at outlook.com (stat quant) Date: Tue, 12 Feb 2013 16:26:18 +0100 Subject: [datatable-help] r801 build failed on Rforge Message-ID: I guess you know... but the dev build of data.table failed on R-forge, is it expected ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdowle at mdowle.plus.com Tue Feb 12 16:33:00 2013 From: mdowle at mdowle.plus.com (Matthew Dowle) Date: Tue, 12 Feb 2013 15:33:00 +0000 Subject: [datatable-help] r801 build failed on Rforge In-Reply-To: References: Message-ID: I only just realised today. Thanks - will fix. On 12.02.2013 15:26, stat quant wrote: > I guess you know... > but the dev build of data.table failed on R-forge, is it expected ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ken.Williams at windlogics.com Mon Feb 18 20:18:48 2013 From: Ken.Williams at windlogics.com (Ken Williams) Date: Mon, 18 Feb 2013 19:18:48 +0000 Subject: [datatable-help] Time to delete J? Message-ID: I noticed this: ================================ > J function (...) { warning("The J alias is deprecated *outside* DT[...] because J() conflicts with the function J() in XLConnect and rJava. Please use data.table() instead, or define an alias yourself. J() will continue to work *inside* DT[...] as documented. This warning is issued from v1.8.3. J() will be unavailable for use outside DT[...] from v1.8.4. Only then will the conflict with rJava and XLConnect be resolved.") data.table(...) } > packageVersion('data.table') [1] '1.8.6' ================================ Is it time to remove it? -- Ken Williams, Senior Research Scientist WindLogics http://windlogics.com ________________________________ CONFIDENTIALITY NOTICE: This e-mail message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution of any kind is strictly prohibited. If you are not the intended recipient, please contact the sender via reply e-mail and destroy all copies of the original message. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdowle at mdowle.plus.com Tue Feb 19 10:06:54 2013 From: mdowle at mdowle.plus.com (Matthew Dowle) Date: Tue, 19 Feb 2013 09:06:54 +0000 Subject: [datatable-help] =?utf-8?q?Time_to_delete_J=3F?= In-Reply-To: References: Message-ID: <38e1738bdb3fe18137ed4c1dba986de1@imap.plus.net> Hi Ken, Indeed. It's removed and noted in 1.8.7, just not on CRAN yet : o The J() alias is now removed *outside* DT[...], but will still work inside DT[...]; i.e., DT[J(...)] is fine. As warned in v1.8.2 (see below in this file) and deprecated with warning() in v1.8.6. This resolves the conflict with function J() in package XLConnect (#1747) and rJava (#2045). Please use data.table() directly instead of J(), outside DT[...]. Thanks, Matthew On 18.02.2013 19:18, Ken Williams wrote: > I noticed this: > > ================================ > >> J > > function (...) > > { > > warning("The J alias is deprecated *outside* DT[...] because J() conflicts with the function J() in XLConnect and rJava. Please use data.table() instead, or define an alias yourself. J() will continue to work *inside* DT[...] as documented. This warning is issued from v1.8.3. J() will be unavailable for use outside DT[...] from v1.8.4. Only then will the conflict with rJava and XLConnect be resolved.") > > data.table(...) > > } > >> packageVersion('data.table') > > [1] '1.8.6' > > ================================ > > Is it time to remove it? > > -- > > Ken Williams, Senior Research Scientist > > _WIND__LOGICS___ > > http://windlogics.com > > ------------------------- > > CONFIDENTIALITY NOTICE: This e-mail message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution of any kind is strictly prohibited. If you are not the intended recipient, please contact the sender via reply e-mail and destroy all copies of the original message. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From statquant at outlook.com Thu Feb 21 11:22:42 2013 From: statquant at outlook.com (stat quant) Date: Thu, 21 Feb 2013 11:22:42 +0100 Subject: [datatable-help] data.table seems to be buiding on Rforge for a few days Message-ID: Hello all, It's been a few days since data.table 1.8.7 lattest release is shown as "Building" on Rforge Is it expected ? Cheers Colin -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdowle at mdowle.plus.com Thu Feb 21 11:54:36 2013 From: mdowle at mdowle.plus.com (Matthew Dowle) Date: Thu, 21 Feb 2013 10:54:36 +0000 Subject: [datatable-help] data.table seems to be buiding on Rforge for a few days In-Reply-To: References: Message-ID: <9baf23aa169fbfa048419191c644440d@imap.plus.net> Hi, Yes it's expected, sadly: R-Forge's build batch has been very poor for many years. Had high hopes after the upgrade at Christmas (it seemed much better since then). It's always been very reliable for everything else (SVN, help and commit email list, and issue trackers). This time is worse than normal though. I raised a support request a few days ago : https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2549&group_id=34&atid=194 Matthew On 21.02.2013 10:22, stat quant wrote: > Hello all, It's been a few days since data.table 1.8.7 lattest release is shown as "Building" on Rforge [1] > Is it expected ? > Cheers > Colin Links: ------ [1] https://r-forge.r-project.org/R/?group_id=240 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdowle at mdowle.plus.com Thu Feb 21 14:45:58 2013 From: mdowle at mdowle.plus.com (Matthew Dowle) Date: Thu, 21 Feb 2013 13:45:58 +0000 Subject: [datatable-help] data.table seems to be buiding on Rforge for a few days In-Reply-To: <9baf23aa169fbfa048419191c644440d@imap.plus.net> References: <9baf23aa169fbfa048419191c644440d@imap.plus.net> Message-ID: <00efc204d6c39fc8f4cdb5b86ad42a2b@imap.plus.net> Latest Windows .zip (commit 813) now on data.table homepage. Usually takes up to an hour for the www to update. Then a Ctrl+F5 to flush cache. Or, just grab the .zip directly from www directory : https://r-forge.r-project.org/scm/viewvc.php/www/?root=datatable On 21.02.2013 10:54, Matthew Dowle wrote: > Hi, > > Yes it's expected, sadly: R-Forge's build batch has been very poor for many years. Had high hopes > > after the upgrade at Christmas (it seemed much better since then). It's always been very reliable for everything else (SVN, help and commit email list, and issue trackers). > > This time is worse than normal though. I raised a support request a few days ago : > > https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2549&group_id=34&atid=194 > > Matthew > > On 21.02.2013 10:22, stat quant wrote: > >> Hello all, It's been a few days since data.table 1.8.7 lattest release is shown as "Building" on Rforge [1] >> Is it expected ? >> Cheers >> Colin Links: ------ [1] https://r-forge.r-project.org/R/?group_id=240 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gleynes+r at gmail.com Thu Feb 21 16:49:50 2013 From: gleynes+r at gmail.com (Gene Leynes) Date: Thu, 21 Feb 2013 09:49:50 -0600 Subject: [datatable-help] Update columns in data.table programmatically Message-ID: I want to update a group of columns programmatically. Based on a predetermined list I want to convert the classes of some columns. This is simple a simple task with data.frame, but in data.table this requires a confusing combination of `substitute`, `as.symbol`, and `eval`. Am I doing this right? My example: https://gist.github.com/geneorama/4998308 I was about to post a question, but SO suggested this answer: http://stackoverflow.com/questions/8374816/loop-through-columns-in-a-data-table-and-transform-those-columns Thank you, Gene -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdowle at mdowle.plus.com Thu Feb 21 17:34:52 2013 From: mdowle at mdowle.plus.com (Matthew Dowle) Date: Thu, 21 Feb 2013 16:34:52 +0000 Subject: [datatable-help] Update columns in data.table programmatically In-Reply-To: References: Message-ID: Hi, for (.col in FactorColumns) dt[,.col:=as.factor(get(.col)),with=FALSE] for (.col in NumericColumns) dt[,.col:=as.numeric(get(.col)),with=FALSE] or, for (.col in FactorColumns) dt[,c(.col):=as.factor(get(.col))] for (.col in NumericColumns) dt[,c(.col):=as.numeric(get(.col))] or, for (.col in FactorColumns) set(dt,j=.col,value=as.factor(dt[[.col]]) for (.col in NumericColumns) set(dt,j=.col,value=as.numeric(dt[[.col]]) or (with no for loop), dt[, c(FactorColumns):=lapply(.SD,as.factor), .SDcols=FactorColumns] dt[, c(NumericColumns):=lapply(.SD,as.numeric), .SDcols=NumericColumns] But the for loops are probably faster and easier to follow. That S.O. is quite old and could do with updating. := and with=FALSE have improved since then. Matthew On 21.02.2013 15:49, Gene Leynes wrote: > I want to update a group of columns programmatically. Based on a predetermined list I want to convert the classes of some columns. > This is simple a simple task with data.frame, but in data.table this requires a confusing combination of `substitute`, `as.symbol`, and `eval`. > Am I doing this right? > My example: https://gist.github.com/geneorama/4998308 [1] > I was about to post a question, but SO suggested this answer: > http://stackoverflow.com/questions/8374816/loop-through-columns-in-a-data-table-and-transform-those-columns [2] > > Thank you, > Gene Links: ------ [1] https://gist.github.com/geneorama/4998308 [2] http://stackoverflow.com/questions/8374816/loop-through-columns-in-a-data-table-and-transform-those-columns -------------- next part -------------- An HTML attachment was scrubbed... URL: From mailinglist.honeypot at gmail.com Thu Feb 21 18:21:53 2013 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Thu, 21 Feb 2013 12:21:53 -0500 Subject: [datatable-help] data.table seems to be buiding on Rforge for a few days In-Reply-To: <00efc204d6c39fc8f4cdb5b86ad42a2b@imap.plus.net> References: <9baf23aa169fbfa048419191c644440d@imap.plus.net> <00efc204d6c39fc8f4cdb5b86ad42a2b@imap.plus.net> Message-ID: > Latest Windows .zip (commit 813) now on data.table homepage. Usually takes > up to an hour for the www to update. Then a Ctrl+F5 to flush cache. > > Or, just grab the .zip directly from www directory : > https://r-forge.r-project.org/scm/viewvc.php/www/?root=datatable *Or* just update from SVN and compile form source -- from what I gather, it's rather easy(er) these days for "you Windows folk"s than it used to be, no? Or is it still a sufficiently big enough PITA that it's not worth it? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From mdowle at mdowle.plus.com Thu Feb 21 18:53:02 2013 From: mdowle at mdowle.plus.com (Matthew Dowle) Date: Thu, 21 Feb 2013 17:53:02 +0000 Subject: [datatable-help] data.table seems to be buiding on Rforge for a few days In-Reply-To: References: <9baf23aa169fbfa048419191c644440d@imap.plus.net> <00efc204d6c39fc8f4cdb5b86ad42a2b@imap.plus.net> Message-ID: On 21.02.2013 17:21, Steve Lianoglou wrote: >> Latest Windows .zip (commit 813) now on data.table homepage. >> Usually takes >> up to an hour for the www to update. Then a Ctrl+F5 to flush cache. >> >> Or, just grab the .zip directly from www directory : >> https://r-forge.r-project.org/scm/viewvc.php/www/?root=datatable > > *Or* just update from SVN and compile form source -- from what I > gather, it's rather easy(er) these days for "you Windows folk"s than > it used to be, no? Or is it still a sufficiently big enough PITA that > it's not worth it? I don't think I've ever built data.table on Windows, so don't know :) All I do is upload the tar.gz from Linux up to winbuilder website and the .zip comes back (amazingly reliably) via an email link within 10-20 mins. It's winbuilder that makes it easy really. *Once you have the tar.gz*. So it's just the 'svn up' and 'R CMD build' steps that are needed. But, for a full build, it installs (compiles) in order to run vignettes, which is the bit that needs a compile environment I think. I think if you set R CMD build --no-vignettes it just packages it up, and I know winbuilder will create a .zip with notes about the missing vignettes. But not sure. So, there's already a few 'don't knows'... Regardless, if we break very latest SVN version on R-Forge it might be nice to have a latest 'stable' (unstable) devel .zip of data.table on the homepage. At various working states of devel we could decide to build the .zip and upload it to www directory, as a line in the sand. Every few weeks or so perhaps. And, I trust winbuilder (Uwe Ligges) more than I trust myself. It has found quite a few issues/notes/warnings over the last year or so that I didn't find otherwise. Matthew From gleynes+r at gmail.com Thu Feb 21 19:11:49 2013 From: gleynes+r at gmail.com (Gene Leynes) Date: Thu, 21 Feb 2013 12:11:49 -0600 Subject: [datatable-help] Update columns in data.table programmatically In-Reply-To: References: Message-ID: Matthew, Thank you, this definitely does the trick. I would suggest that `get` and `set` deserve more prominent coverage in the guides. I've been using data.table for some time and I didn't know about either. I've seen `get`, but I didn't realize that you could use it that way. I tried for some time to use the .SD trick, but I couldn't get it to work. The .SD function is still a little mysterious to me, although I've used it a couple of times in different situations. Thanks for these examples, they're quite elucidating. On Thu, Feb 21, 2013 at 10:34 AM, Matthew Dowle wrote: > ** > > > > Hi, > > for (.col in FactorColumns) dt[,.col:=as.factor(get(.col)),with=FALSE] > > for (.col in NumericColumns) dt[,.col:=as.numeric(get(.col)),with=FALSE] > > > > or, > > > > for (.col in FactorColumns) dt[,c(.col):=as.factor(get(.col))] > for (.col in NumericColumns) dt[,c(.col):=as.numeric(get(.col))] > > > > or, > > > > for (.col in FactorColumns) set(dt,j=.col,value=as.factor(dt[[.col]]) > for (.col in NumericColumns) set(dt,j=.col,value=as.numeric(dt[[.col]]) > > > > or (with no for loop), > > > > dt[, c(FactorColumns):=lapply(.SD,as.factor), .SDcols=FactorColumns] > dt[, c(NumericColumns):=lapply(.SD,as.numeric), .SDcols=NumericColumns] > > > > > > But the for loops are probably faster and easier to follow. > > That S.O. is quite old and could do with updating. := and with=FALSE have improved since then. > > > > Matthew > > > > > > > > On 21.02.2013 15:49, Gene Leynes wrote: > > I want to update a group of columns programmatically. Based on a > predetermined list I want to convert the classes of some columns. > This is simple a simple task with data.frame, but in data.table this > requires a confusing combination of `substitute`, `as.symbol`, and `eval`. > Am I doing this right? > My example: https://gist.github.com/geneorama/4998308 > I was about to post a question, but SO suggested this answer: > > http://stackoverflow.com/questions/8374816/loop-through-columns-in-a-data-table-and-transform-those-columns > Thank you, > Gene > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdowle at mdowle.plus.com Thu Feb 21 23:06:02 2013 From: mdowle at mdowle.plus.com (Matthew Dowle) Date: Thu, 21 Feb 2013 22:06:02 -0000 Subject: [datatable-help] Update columns in data.table programmatically In-Reply-To: References: Message-ID: <96f74a8417289c40a9cb82b0d37a66f2.squirrel@webmail.plus.net> Great. I agree. It's sometimes difficult/time consuming (for me) to translate rough ideas into precise changes that can be committed. You are probably in the best position to guide the documentation as you would like. Anyone is welcome to join the project and improve documentation or code. And there is the data.table wiki. Even more precise suggestions are very welcome. Matthew > Matthew, > > Thank you, this definitely does the trick. > > I would suggest that `get` and `set` deserve more prominent coverage in > the > guides. I've been using data.table for some time and I didn't know about > either. I've seen `get`, but I didn't realize that you could use it that > way. > > I tried for some time to use the .SD trick, but I couldn't get it to work. > The .SD function is still a little mysterious to me, although I've used > it > a couple of times in different situations. > > Thanks for these examples, they're quite elucidating. > > > On Thu, Feb 21, 2013 at 10:34 AM, Matthew Dowle > wrote: > >> ** >> >> >> >> Hi, >> >> for (.col in FactorColumns) dt[,.col:=as.factor(get(.col)),with=FALSE] >> >> for (.col in NumericColumns) dt[,.col:=as.numeric(get(.col)),with=FALSE] >> >> >> >> or, >> >> >> >> for (.col in FactorColumns) dt[,c(.col):=as.factor(get(.col))] >> for (.col in NumericColumns) dt[,c(.col):=as.numeric(get(.col))] >> >> >> >> or, >> >> >> >> for (.col in FactorColumns) set(dt,j=.col,value=as.factor(dt[[.col]]) >> for (.col in NumericColumns) set(dt,j=.col,value=as.numeric(dt[[.col]]) >> >> >> >> or (with no for loop), >> >> >> >> dt[, c(FactorColumns):=lapply(.SD,as.factor), .SDcols=FactorColumns] >> dt[, c(NumericColumns):=lapply(.SD,as.numeric), .SDcols=NumericColumns] >> >> >> >> >> >> But the for loops are probably faster and easier to follow. >> >> That S.O. is quite old and could do with updating. := and with=FALSE >> have improved since then. >> >> >> >> Matthew >> >> >> >> >> >> >> >> On 21.02.2013 15:49, Gene Leynes wrote: >> >> I want to update a group of columns programmatically. Based on a >> predetermined list I want to convert the classes of some columns. >> This is simple a simple task with data.frame, but in data.table this >> requires a confusing combination of `substitute`, `as.symbol`, and >> `eval`. >> Am I doing this right? >> My example: https://gist.github.com/geneorama/4998308 >> I was about to post a question, but SO suggested this answer: >> >> http://stackoverflow.com/questions/8374816/loop-through-columns-in-a-data-table-and-transform-those-columns >> Thank you, >> Gene >> >> >> >> > From statquant at outlook.com Mon Feb 25 19:40:35 2013 From: statquant at outlook.com (stat quant) Date: Mon, 25 Feb 2013 19:40:35 +0100 Subject: [datatable-help] About adding fastmatch and fasttime to data.table Message-ID: Hello list, Looking at fastmatch and fasttime, I realized that those package consists solely in 1 C file (each). We spoke about the possibility to add those to data.table, I tried to contact S.Urbanek without any success so I do not have feedback from his side. Using fastPOSIXct provide a huge gain when one have to load files with datetime, on my laptop using data.table:::fread, I realized that most of the time is spent casting datetimes to POSIXct (I have several columns). Looking at fasttime, you can see pretty good improvement (factor 15) R) ts <- as.character(.POSIXct(runif(1e6) * unclass(Sys.time()))) R) system.time(a <- as.POSIXct(ts, "GMT")) utilisateur syst?me ?coul? 6.49 0.04 6.57 R) system.time(b <- fastPOSIXct(ts, "GMT")) utilisateur syst?me ?coul? 0.40 0.00 0.41 When colClasses will be implemented in fread, can I suggest to allow using fasttime as an option ? Concerning fastmatch, the vignette already shows some nice benchmarks, I tend to do a lot of selects based on string columns, not sure if this is the case for most of us. My 0.002 cent Cheers -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.kryukov at gmail.com Mon Feb 25 23:26:28 2013 From: victor.kryukov at gmail.com (Victor Kryukov) Date: Mon, 25 Feb 2013 14:26:28 -0800 Subject: [datatable-help] Potential bug with sorting/summarizing by POSIXct and logical column Message-ID: Hello, I've encounted what looks like a bug while sorting by POSIXct and logical column, which may or may not be related to the following bug: https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975 Here are all the details: http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns Here is the test case: # First some data data <- data.table(structure(list( month = structure(c(1356998400, 1356998400, 1356998400, 1359676800, 1354320000, 1359676800, 1359676800, 1356998400, 1356998400, 1354320000, 1354320000, 1354320000, 1359676800, 1359676800, 1359676800, 1356998400, 1359676800, 1359676800, 1356998400, 1359676800, 1359676800, 1359676800, 1359676800, 1354320000, 1354320000), class = c("POSIXct", "POSIXt"), tzone = "UTC"), portal = c(TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE ), satisfaction = c(10L, 10L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, 9L, 2L, 8L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, 10L, 9L, 10L, 9L, 10L, 10L)), .Names = c("month", "portal", "satisfaction"), row.names = c(NA, -25L), class = "data.frame")) # Summarizing by month, portal with tapply works: > tapply(data$satisfaction, list(data$month, data$portal), mean) FALSE TRUE 2012-12-01 8.5 8.000000 2013-01-01 10.0 10.000000 2013-02-01 9.0 9.545455 # Summarizing with 'by' argument of data.table does not: > data[, mean(satisfaction), by = 'month,portal']> data[, mean(satisfaction), by = list(month, portal)] month portal V1 1: 2013-01-01 FALSE 10.000000 2: 2013-02-01 TRUE 9.000000 3: 2013-01-01 TRUE 10.000000 4: 2012-12-01 FALSE 8.500000 5: 2012-12-01 TRUE 7.333333 6: 2013-02-01 TRUE 9.666667 7: 2013-02-01 FALSE 9.000000 8: 2012-12-01 TRUE 10.000000 # Summarizing only this year's data works: data[month >= ymd(20130101), mean(satisfaction), by = 'month,portal'] month portal V1 1: 2013-01-01 TRUE 10.000000 2: 2013-01-01 FALSE 10.000000 3: 2013-02-01 TRUE 9.545455 4: 2013-02-01 FALSE 9.000000 Yours Sincerely, Victor Kryukov -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdowle at mdowle.plus.com Tue Feb 26 01:39:09 2013 From: mdowle at mdowle.plus.com (Matthew Dowle) Date: Tue, 26 Feb 2013 00:39:09 +0000 Subject: [datatable-help] About adding fastmatch and fasttime to data.table In-Reply-To: References: Message-ID: Hi, This sounds like a geat idea. I don't know why Simon U didn't reply, or without success, so that may depend on the way you asked, whether he is on holiday at the moment, his reaction to the precise wording of the email you wrote, or some other factor. It is difficult to tell! But we don't need to wait for him or for for you: this is open source. You have got much further than I have so if you'd like to add this please go ahead and make progress. You're very welcome to join the project and commit directly. Or if you can't for some reason please file as a feature request so it doesn't get forgotten. Matthew On 25.02.2013 18:40, stat quant wrote: > Hello list, > > Looking at fastmatch and fasttime, I realized that those package consists solely in 1 C file (each). > We spoke about the possibility to add those to data.table, I tried to contact S.Urbanek without any success so I do not have feedback from his side. > Using fastPOSIXct provide a huge gain when one have to load files with datetime, on my laptop using data.table:::fread, I realized that most of the time is spent casting datetimes to POSIXct (I have several columns). > > Looking at fasttime, you can see pretty good improvement (factor 15) > > R) ts R) system.time(a utilisateur syst?me ?coul? > 6.49 0.04 6.57 > R) system.time(b utilisateur syst?me ?coul? > 0.40 0.00 0.41 > > When colClasses will be implemented in fread, can I suggest to allow using fasttime as an option ? > Concerning fastmatch, the vignette already shows some nice benchmarks, I tend to do a lot of selects based on string columns, not sure if this is the case for most of us. > > My 0.002 cent > Cheers -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.nelson at sydney.edu.au Tue Feb 26 01:40:02 2013 From: michael.nelson at sydney.edu.au (Michael Nelson) Date: Tue, 26 Feb 2013 00:40:02 +0000 Subject: [datatable-help] Potential bug with sorting/summarizing by POSIXct and logical column In-Reply-To: References: Message-ID: <6FB5193A6CDCDF499486A833B7AFBDCD5827D4E4@EX-MBX-PRO-04.mcs.usyd.edu.au> I can't replicate this problem using data.table 1.8.7 (installed about 3 weeks ago) on R version 2.15.2 (2012-10-26) Platform: i386-w64-mingw32/i386 (32-bit) Michael ________________________________ From: datatable-help-bounces at lists.r-forge.r-project.org [datatable-help-bounces at lists.r-forge.r-project.org] on behalf of Victor Kryukov [victor.kryukov at gmail.com] Sent: Tuesday, 26 February 2013 9:26 AM To: datatable-help at lists.r-forge.r-project.org Subject: [datatable-help] Potential bug with sorting/summarizing by POSIXct and logical column Hello, I've encounted what looks like a bug while sorting by POSIXct and logical column, which may or may not be related to the following bug: https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975 Here are all the details: http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns Here is the test case: # First some data data <- data.table(structure(list( month = structure(c(1356998400, 1356998400, 1356998400, 1359676800, 1354320000, 1359676800, 1359676800, 1356998400, 1356998400, 1354320000, 1354320000, 1354320000, 1359676800, 1359676800, 1359676800, 1356998400, 1359676800, 1359676800, 1356998400, 1359676800, 1359676800, 1359676800, 1359676800, 1354320000, 1354320000), class = c("POSIXct", "POSIXt"), tzone = "UTC"), portal = c(TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE ), satisfaction = c(10L, 10L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, 9L, 2L, 8L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, 10L, 9L, 10L, 9L, 10L, 10L)), .Names = c("month", "portal", "satisfaction"), row.names = c(NA, -25L), class = "data.frame")) # Summarizing by month, portal with tapply works: > tapply(data$satisfaction, list(data$month, data$portal), mean) FALSE TRUE 2012-12-01 8.5 8.000000 2013-01-01 10.0 10.000000 2013-02-01 9.0 9.545455 # Summarizing with 'by' argument of data.table does not: > data[, mean(satisfaction), by = 'month,portal']> data[, mean(satisfaction), by = list(month, portal)] month portal V1 1: 2013-01-01 FALSE 10.000000 2: 2013-02-01 TRUE 9.000000 3: 2013-01-01 TRUE 10.000000 4: 2012-12-01 FALSE 8.500000 5: 2012-12-01 TRUE 7.333333 6: 2013-02-01 TRUE 9.666667 7: 2013-02-01 FALSE 9.000000 8: 2012-12-01 TRUE 10.000000 # Summarizing only this year's data works: data[month >= ymd(20130101), mean(satisfaction), by = 'month,portal'] month portal V1 1: 2013-01-01 TRUE 10.000000 2: 2013-01-01 FALSE 10.000000 3: 2013-02-01 TRUE 9.545455 4: 2013-02-01 FALSE 9.000000 Yours Sincerely, Victor Kryukov -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.chernyakov at gmail.com Tue Feb 26 01:46:46 2013 From: alexander.chernyakov at gmail.com (Alexander Chernyakov) Date: Mon, 25 Feb 2013 19:46:46 -0500 Subject: [datatable-help] datatable-help Digest, Vol 36, Issue 8 In-Reply-To: References: Message-ID: Regarding fasttime: my understanding is that only works after 1970. On Mon, Feb 25, 2013 at 7:41 PM, < datatable-help-request at lists.r-forge.r-project.org> wrote: > Send datatable-help mailing list submissions to > datatable-help at lists.r-forge.r-project.org > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > or, via email, send a message with subject or body 'help' to > datatable-help-request at lists.r-forge.r-project.org > > You can reach the person managing the list at > datatable-help-owner at lists.r-forge.r-project.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of datatable-help digest..." > > > Today's Topics: > > 1. About adding fastmatch and fasttime to data.table (stat quant) > 2. Potential bug with sorting/summarizing by POSIXct and logical > column (Victor Kryukov) > 3. Re: About adding fastmatch and fasttime to data.table > (Matthew Dowle) > 4. Re: Potential bug with sorting/summarizing by POSIXct and > logical column (Michael Nelson) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 25 Feb 2013 19:40:35 +0100 > From: stat quant > To: datatable-help at lists.r-forge.r-project.org > Subject: [datatable-help] About adding fastmatch and fasttime to > data.table > Message-ID: > < > CAJJHHA9qL8hURXF0+8OnPaD1t7Y5csoOLX7qDKNUqXc1XpmGCA at mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > Hello list, > > Looking at fastmatch and fasttime, I realized that those package consists > solely in 1 C file (each). > We spoke about the possibility to add those to data.table, I tried to > contact S.Urbanek without any success so I do not have feedback from his > side. > Using fastPOSIXct provide a huge gain when one have to load files with > datetime, on my laptop using data.table:::fread, I realized that most of > the time is spent casting datetimes to POSIXct (I have several columns). > > Looking at fasttime, you can see pretty good improvement (factor 15) > > R) ts <- as.character(.POSIXct(runif(1e6) * unclass(Sys.time()))) > R) system.time(a <- as.POSIXct(ts, "GMT")) > utilisateur syst?me ?coul? > 6.49 0.04 6.57 > R) system.time(b <- fastPOSIXct(ts, "GMT")) > utilisateur syst?me ?coul? > 0.40 0.00 0.41 > > When colClasses will be implemented in fread, can I suggest to allow using > fasttime as an option ? > Concerning fastmatch, the vignette already shows some nice benchmarks, I > tend to do a lot of selects based on string columns, not sure if this is > the case for most of us. > > My 0.002 cent > Cheers > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130225/f45e5d57/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Mon, 25 Feb 2013 14:26:28 -0800 > From: Victor Kryukov > To: datatable-help at lists.r-forge.r-project.org > Subject: [datatable-help] Potential bug with sorting/summarizing by > POSIXct and logical column > Message-ID: > 1X+n5suowA at mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > Hello, > > I've encounted what looks like a bug while sorting by POSIXct and logical > column, which may or may not be related to the following bug: > > > https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975 > > Here are all the details: > > http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns > > Here is the test case: > > # First some data > data <- data.table(structure(list( > month = structure(c(1356998400, 1356998400, 1356998400, > 1359676800, 1354320000, 1359676800, 1359676800, > 1356998400, 1356998400, > 1354320000, 1354320000, 1354320000, 1359676800, > 1359676800, 1359676800, > 1356998400, 1359676800, 1359676800, 1356998400, > 1359676800, 1359676800, > 1359676800, 1359676800, 1354320000, 1354320000), > class = c("POSIXct", > > "POSIXt"), tzone = "UTC"), > portal = c(TRUE, TRUE, FALSE, TRUE, > TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, > FALSE, > TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, > TRUE, TRUE > ), > satisfaction = c(10L, 10L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, > 9L, 2L, 8L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, 10L, > 9L, 10L, 9L, > 10L, 10L)), > .Names = c("month", "portal", "satisfaction"), > row.names = c(NA, -25L), class = "data.frame")) > > # Summarizing by month, portal with tapply works: > > > tapply(data$satisfaction, list(data$month, data$portal), mean) > FALSE TRUE > 2012-12-01 8.5 8.000000 > 2013-01-01 10.0 10.000000 > 2013-02-01 9.0 9.545455 > > # Summarizing with 'by' argument of data.table does not: > > > data[, mean(satisfaction), by = 'month,portal']> > data[, mean(satisfaction), by = list(month, portal)] > month portal V1 > 1: 2013-01-01 FALSE 10.000000 > 2: 2013-02-01 TRUE 9.000000 > 3: 2013-01-01 TRUE 10.000000 > 4: 2012-12-01 FALSE 8.500000 > 5: 2012-12-01 TRUE 7.333333 > 6: 2013-02-01 TRUE 9.666667 > 7: 2013-02-01 FALSE 9.000000 > 8: 2012-12-01 TRUE 10.000000 > > # Summarizing only this year's data works: > data[month >= ymd(20130101), mean(satisfaction), by = 'month,portal'] > month portal V1 > 1: 2013-01-01 TRUE 10.000000 > 2: 2013-01-01 FALSE 10.000000 > 3: 2013-02-01 TRUE 9.545455 > 4: 2013-02-01 FALSE 9.000000 > > Yours Sincerely, > Victor Kryukov > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130225/45b99e3e/attachment-0001.html > > > > ------------------------------ > > Message: 3 > Date: Tue, 26 Feb 2013 00:39:09 +0000 > From: Matthew Dowle > To: > Cc: datatable-help at lists.r-forge.r-project.org > Subject: Re: [datatable-help] About adding fastmatch and fasttime to > data.table > Message-ID: > Content-Type: text/plain; charset="utf-8" > > > > Hi, > > This sounds like a geat idea. I don't know why Simon U didn't > reply, or without success, so that may depend on the way you asked, > whether he is on holiday at the moment, his reaction to the precise > wording of the email you wrote, or some other factor. It is difficult to > tell! But we don't need to wait for him or for for you: this is open > source. You have got much further than I have so if you'd like to add > this please go ahead and make progress. You're very welcome to join the > project and commit directly. Or if you can't for some reason please file > as a feature request so it doesn't get forgotten. > > Matthew > > On > 25.02.2013 18:40, stat quant wrote: > > > Hello list, > > > > Looking at > fastmatch and fasttime, I realized that those package consists solely in > 1 C file (each). > > We spoke about the possibility to add those to > data.table, I tried to contact S.Urbanek without any success so I do not > have feedback from his side. > > Using fastPOSIXct provide a huge gain > when one have to load files with datetime, on my laptop using > data.table:::fread, I realized that most of the time is spent casting > datetimes to POSIXct (I have several columns). > > > > Looking at > fasttime, you can see pretty good improvement (factor 15) > > > > R) ts R) > system.time(a utilisateur syst?me ?coul? > > 6.49 0.04 6.57 > > R) > system.time(b utilisateur syst?me ?coul? > > 0.40 0.00 0.41 > > > > When > colClasses will be implemented in fread, can I suggest to allow using > fasttime as an option ? > > Concerning fastmatch, the vignette already > shows some nice benchmarks, I tend to do a lot of selects based on > string columns, not sure if this is the case for most of us. > > > > My > 0.002 cent > > Cheers > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130226/643480c3/attachment-0001.html > > > > ------------------------------ > > Message: 4 > Date: Tue, 26 Feb 2013 00:40:02 +0000 > From: Michael Nelson > To: "datatable-help at lists.r-forge.r-project.org" > > Subject: Re: [datatable-help] Potential bug with sorting/summarizing > by POSIXct and logical column > Message-ID: > < > 6FB5193A6CDCDF499486A833B7AFBDCD5827D4E4 at EX-MBX-PRO-04.mcs.usyd.edu.au> > > Content-Type: text/plain; charset="iso-8859-1" > > I can't replicate this problem using data.table 1.8.7 (installed about 3 > weeks ago) on > R version 2.15.2 (2012-10-26) > Platform: i386-w64-mingw32/i386 (32-bit) > > Michael > ________________________________ > From: datatable-help-bounces at lists.r-forge.r-project.org [ > datatable-help-bounces at lists.r-forge.r-project.org] on behalf of Victor > Kryukov [victor.kryukov at gmail.com] > Sent: Tuesday, 26 February 2013 9:26 AM > To: datatable-help at lists.r-forge.r-project.org > Subject: [datatable-help] Potential bug with sorting/summarizing by > POSIXct and logical column > > Hello, > > I've encounted what looks like a bug while sorting by POSIXct and logical > column, which may or may not be related to the following bug: > > > https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975 > > Here are all the details: > http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns > > Here is the test case: > > # First some data > data <- data.table(structure(list( > month = structure(c(1356998400, 1356998400, 1356998400, > 1359676800, 1354320000, 1359676800, 1359676800, > 1356998400, 1356998400, > 1354320000, 1354320000, 1354320000, 1359676800, > 1359676800, 1359676800, > 1356998400, 1359676800, 1359676800, 1356998400, > 1359676800, 1359676800, > 1359676800, 1359676800, 1354320000, 1354320000), > class = c("POSIXct", > > "POSIXt"), tzone = "UTC"), > portal = c(TRUE, TRUE, FALSE, TRUE, > TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, > FALSE, > TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, > TRUE, TRUE > ), > satisfaction = c(10L, 10L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, > 9L, 2L, 8L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, 10L, > 9L, 10L, 9L, > 10L, 10L)), > .Names = c("month", "portal", "satisfaction"), > row.names = c(NA, -25L), class = "data.frame")) > > # Summarizing by month, portal with tapply works: > > > tapply(data$satisfaction, list(data$month, data$portal), mean) > FALSE TRUE > 2012-12-01 8.5 8.000000 > 2013-01-01 10.0 10.000000 > 2013-02-01 9.0 9.545455 > > # Summarizing with 'by' argument of data.table does not: > > > data[, mean(satisfaction), by = 'month,portal']> > data[, mean(satisfaction), by = list(month, portal)] > month portal V1 > 1: 2013-01-01 FALSE 10.000000 > 2: 2013-02-01 TRUE 9.000000 > 3: 2013-01-01 TRUE 10.000000 > 4: 2012-12-01 FALSE 8.500000 > 5: 2012-12-01 TRUE 7.333333 > 6: 2013-02-01 TRUE 9.666667 > 7: 2013-02-01 FALSE 9.000000 > 8: 2012-12-01 TRUE 10.000000 > > # Summarizing only this year's data works: > data[month >= ymd(20130101), mean(satisfaction), by = 'month,portal'] > month portal V1 > 1: 2013-01-01 TRUE 10.000000 > 2: 2013-01-01 FALSE 10.000000 > 3: 2013-02-01 TRUE 9.545455 > 4: 2013-02-01 FALSE 9.000000 > > Yours Sincerely, > Victor Kryukov > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130226/c1945761/attachment.html > > > > ------------------------------ > > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > End of datatable-help Digest, Vol 36, Issue 8 > ********************************************* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From statquant at outlook.com Tue Feb 26 09:06:05 2013 From: statquant at outlook.com (statquant3) Date: Tue, 26 Feb 2013 00:06:05 -0800 (PST) Subject: [datatable-help] Potential bug with sorting/summarizing by POSIXct and logical column In-Reply-To: References: Message-ID: <1361865965702-4659665.post@n4.nabble.com> Viktor. Can you show us your session.Info(), It loks like a sorting bug on POSIXct that Matthew solved weeks ago, yet probably in data.table 1.8.7 (install.packages("data.table", repos="http://R-Forge.R-project.org")) -- View this message in context: http://r.789695.n4.nabble.com/Potential-bug-with-sorting-summarizing-by-POSIXct-and-logical-column-tp4659643p4659665.html Sent from the datatable-help mailing list archive at Nabble.com. From statquant at outlook.com Tue Feb 26 09:24:15 2013 From: statquant at outlook.com (statquant3) Date: Tue, 26 Feb 2013 00:24:15 -0800 (PST) Subject: [datatable-help] About adding fastmatch and fasttime to data.table In-Reply-To: References: Message-ID: <1361867055229-4659666.post@n4.nabble.com> Will fill a request and will try to spend some time on data.table source code, but knowing that I have no experience in R internals and little knowledge of C... might be for 2016 ;) -- View this message in context: http://r.789695.n4.nabble.com/About-adding-fastmatch-and-fasttime-to-data-table-tp4659622p4659666.html Sent from the datatable-help mailing list archive at Nabble.com. From mdowle at mdowle.plus.com Tue Feb 26 11:41:27 2013 From: mdowle at mdowle.plus.com (Matthew Dowle) Date: Tue, 26 Feb 2013 10:41:27 +0000 Subject: [datatable-help] About adding fastmatch and fasttime to data.table In-Reply-To: <1361867055229-4659666.post@n4.nabble.com> References: <1361867055229-4659666.post@n4.nabble.com> Message-ID: <744c101503301acc3b3418cb7c164864@imap.plus.net> Hah. I'm happy to explain the code and point you in right direction. Feel free to ask here or on S.O. Also I'll try and add data.table to github soon: might make it easier to ask questions tagged inline in the code I believe. On 26.02.2013 08:24, statquant3 wrote: > Will fill a request and will try to spend some time on data.table > source > code, but knowing that I have no experience in R internals and little > knowledge of C... might be for 2016 ;) > > > > -- > View this message in context: > > http://r.789695.n4.nabble.com/About-adding-fastmatch-and-fasttime-to-data-table-tp4659622p4659666.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help From mdowle at mdowle.plus.com Tue Feb 26 11:47:02 2013 From: mdowle at mdowle.plus.com (Matthew Dowle) Date: Tue, 26 Feb 2013 10:47:02 +0000 Subject: [datatable-help] datatable-help Digest, Vol 36, Issue 8 In-Reply-To: References: Message-ID: <3f39e40c41e041dc434c9c9348925026@imap.plus.net> Thanks. Have added that (1970 potential issue) to statquant's FR to follow up... https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2582&group_id=240&atid=978 On 26.02.2013 00:46, Alexander Chernyakov wrote: > Regarding fasttime: my understanding is that only works after 1970. > > On Mon, Feb 25, 2013 at 7:41 PM, wrote: > >> Send datatable-help mailing list submissions to >> datatable-help at lists.r-forge.r-project.org [1] >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help [2] >> >> or, via email, send a message with subject or body 'help' to >> datatable-help-request at lists.r-forge.r-project.org [3] >> >> You can reach the person managing the list at >> datatable-help-owner at lists.r-forge.r-project.org [4] >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of datatable-help digest..." >> >> Today's Topics: >> >> 1. About adding fastmatch and fasttime to data.table (stat quant) >> 2. Potential bug with sorting/summarizing by POSIXct and logical >> column (Victor Kryukov) >> 3. Re: About adding fastmatch and fasttime to data.table >> (Matthew Dowle) >> 4. Re: Potential bug with sorting/summarizing by POSIXct and >> logical column (Michael Nelson) >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Mon, 25 Feb 2013 19:40:35 +0100 >> From: stat quant >> To: datatable-help at lists.r-forge.r-project.org [6] >> Subject: [datatable-help] About adding fastmatch and fasttime to >> data.table >> Message-ID: >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Hello list, >> >> Looking at fastmatch and fasttime, I realized that those package consists >> solely in 1 C file (each). >> We spoke about the possibility to add those to data.table, I tried to >> contact S.Urbanek without any success so I do not have feedback from his >> side. >> Using fastPOSIXct provide a huge gain when one have to load files with >> datetime, on my laptop using data.table:::fread, I realized that most of >> the time is spent casting datetimes to POSIXct (I have several columns). >> >> Looking at fasttime, you can see pretty good improvement (factor 15) >> >> R) ts R) system.time(a utilisateur syst?me ?coul? >> 6.49 0.04 6.57 >> R) system.time(b utilisateur syst?me ?coul? >> 0.40 0.00 0.41 >> >> When colClasses will be implemented in fread, can I suggest to allow using >> fasttime as an option ? >> Concerning fastmatch, the vignette already shows some nice benchmarks, I >> tend to do a lot of selects based on string columns, not sure if this is >> the case for most of us. >> >> My 0.002 cent >> Cheers >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: >> >> ------------------------------ >> >> Message: 2 >> Date: Mon, 25 Feb 2013 14:26:28 -0800 >> From: Victor Kryukov >> To: datatable-help at lists.r-forge.r-project.org [10] >> Subject: [datatable-help] Potential bug with sorting/summarizing by >> POSIXct and logical column >> Message-ID: >> 1X+n5suowA at mail.gmail.com> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Hello, >> >> I've encounted what looks like a bug while sorting by POSIXct and logical >> column, which may or may not be related to the following bug: >> >> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975 [11] >> >> Here are all the details: >> http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns [12] >> >> Here is the test case: >> >> # First some data >> data month = structure(c(1356998400, 1356998400, 1356998400, >> 1359676800, 1354320000, 1359676800, 1359676800, >> 1356998400, 1356998400, >> 1354320000, 1354320000, 1354320000, 1359676800, >> 1359676800, 1359676800, >> 1356998400, 1359676800, 1359676800, 1356998400, >> 1359676800, 1359676800, >> 1359676800, 1359676800, 1354320000, 1354320000), >> class = c("POSIXct", >> >> "POSIXt"), tzone = "UTC"), >> portal = c(TRUE, TRUE, FALSE, TRUE, >> TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, >> FALSE, >> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, >> TRUE, TRUE >> ), >> satisfaction = c(10L, 10L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, >> 9L, 2L, 8L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, 10L, >> 9L, 10L, 9L, >> 10L, 10L)), >> .Names = c("month", "portal", "satisfaction"), >> row.names = c(NA, -25L), class = "data.frame")) >> >> # Summarizing by month, portal with tapply works: >> >> > tapply(data$satisfaction, list(data$month, data$portal), mean) >> FALSE TRUE >> 2012-12-01 8.5 8.000000 >> 2013-01-01 10.0 10.000000 >> 2013-02-01 9.0 9.545455 >> >> # Summarizing with 'by' argument of data.table does not: >> >> > data[, mean(satisfaction), by = 'month,portal']> >> data[, mean(satisfaction), by = list(month, portal)] >> month portal V1 >> 1: 2013-01-01 FALSE 10.000000 >> 2: 2013-02-01 TRUE 9.000000 >> 3: 2013-01-01 TRUE 10.000000 >> 4: 2012-12-01 FALSE 8.500000 >> 5: 2012-12-01 TRUE 7.333333 >> 6: 2013-02-01 TRUE 9.666667 >> 7: 2013-02-01 FALSE 9.000000 >> 8: 2012-12-01 TRUE 10.000000 >> >> # Summarizing only this year's data works: >> data[month >= ymd(20130101), mean(satisfaction), by = 'month,portal'] >> month portal V1 >> 1: 2013-01-01 TRUE 10.000000 >> 2: 2013-01-01 FALSE 10.000000 >> 3: 2013-02-01 TRUE 9.545455 >> 4: 2013-02-01 FALSE 9.000000 >> >> Yours Sincerely, >> Victor Kryukov >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: >> >> ------------------------------ >> >> Message: 3 >> Date: Tue, 26 Feb 2013 00:39:09 +0000 >> From: Matthew Dowle >> To: >> Cc: datatable-help at lists.r-forge.r-project.org [16] >> Subject: Re: [datatable-help] About adding fastmatch and fasttime to >> data.table >> Message-ID: >> Content-Type: text/plain; charset="utf-8" >> >> Hi, >> >> This sounds like a geat idea. I don't know why Simon U didn't >> reply, or without success, so that may depend on the way you asked, >> whether he is on holiday at the moment, his reaction to the precise >> wording of the email you wrote, or some other factor. It is difficult to >> tell! But we don't need to wait for him or for for you: this is open >> source. You have got much further than I have so if you'd like to add >> this please go ahead and make progress. You're very welcome to join the >> project and commit directly. Or if you can't for some reason please file >> as a feature request so it doesn't get forgotten. >> >> Matthew >> >> On >> 25.02.2013 18:40, stat quant wrote: >> >> > Hello list, >> > >> > Looking at >> fastmatch and fasttime, I realized that those package consists solely in >> 1 C file (each). >> > We spoke about the possibility to add those to >> data.table, I tried to contact S.Urbanek without any success so I do not >> have feedback from his side. >> > Using fastPOSIXct provide a huge gain >> when one have to load files with datetime, on my laptop using >> data.table:::fread, I realized that most of the time is spent casting >> datetimes to POSIXct (I have several columns). >> > >> > Looking at >> fasttime, you can see pretty good improvement (factor 15) >> > >> > R) ts R) >> system.time(a utilisateur syst?me ?coul? >> > 6.49 0.04 6.57 >> > R) >> system.time(b utilisateur syst?me ?coul? >> > 0.40 0.00 0.41 >> > >> > When >> colClasses will be implemented in fread, can I suggest to allow using >> fasttime as an option ? >> > Concerning fastmatch, the vignette already >> shows some nice benchmarks, I tend to do a lot of selects based on >> string columns, not sure if this is the case for most of us. >> > >> > My >> 0.002 cent >> > Cheers >> >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: >> >> ------------------------------ >> >> Message: 4 >> Date: Tue, 26 Feb 2013 00:40:02 +0000 >> From: Michael Nelson >> To: "datatable-help at lists.r-forge.r-project.org [20]" >> >> Subject: Re: [datatable-help] Potential bug with sorting/summarizing >> by POSIXct and logical column >> Message-ID: >> <6FB5193A6CDCDF499486A833B7AFBDCD5827D4E4 at EX-MBX-PRO-04.mcs.usyd.edu.au [22]> >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> I can't replicate this problem using data.table 1.8.7 (installed about 3 weeks ago) on >> R version 2.15.2 (2012-10-26) >> Platform: i386-w64-mingw32/i386 (32-bit) >> >> Michael >> ________________________________ >> From: datatable-help-bounces at lists.r-forge.r-project.org [23] [datatable-help-bounces at lists.r-forge.r-project.org [24]] on behalf of Victor Kryukov [victor.kryukov at gmail.com [25]] >> Sent: Tuesday, 26 February 2013 9:26 AM >> To: datatable-help at lists.r-forge.r-project.org [26] >> Subject: [datatable-help] Potential bug with sorting/summarizing by POSIXct and logical column >> >> Hello, >> >> I've encounted what looks like a bug while sorting by POSIXct and logical column, which may or may not be related to the following bug: >> >> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975 [27] >> >> Here are all the details: http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns [28] >> >> Here is the test case: >> >> # First some data >> data month = structure(c(1356998400, 1356998400, 1356998400, >> 1359676800, 1354320000, 1359676800, 1359676800, 1356998400, 1356998400, >> 1354320000, 1354320000, 1354320000, 1359676800, 1359676800, 1359676800, >> 1356998400, 1359676800, 1359676800, 1356998400, 1359676800, 1359676800, >> 1359676800, 1359676800, 1354320000, 1354320000), class = c("POSIXct", >> "POSIXt"), tzone = "UTC"), >> portal = c(TRUE, TRUE, FALSE, TRUE, >> TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, >> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE >> ), >> satisfaction = c(10L, 10L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, >> 9L, 2L, 8L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, 10L, 9L, 10L, 9L, >> 10L, 10L)), >> .Names = c("month", "portal", "satisfaction"), >> row.names = c(NA, -25L), class = "data.frame")) >> >> # Summarizing by month, portal with tapply works: >> >> > tapply(data$satisfaction, list(data$month, data$portal), mean) >> FALSE TRUE >> 2012-12-01 8.5 8.000000 >> 2013-01-01 10.0 10.000000 >> 2013-02-01 9.0 9.545455 >> >> # Summarizing with 'by' argument of data.table does not: >> >> > data[, mean(satisfaction), by = 'month,portal']> >> data[, mean(satisfaction), by = list(month, portal)] >> month portal V1 >> 1: 2013-01-01 FALSE 10.000000 >> 2: 2013-02-01 TRUE 9.000000 >> 3: 2013-01-01 TRUE 10.000000 >> 4: 2012-12-01 FALSE 8.500000 >> 5: 2012-12-01 TRUE 7.333333 >> 6: 2013-02-01 TRUE 9.666667 >> 7: 2013-02-01 FALSE 9.000000 >> 8: 2012-12-01 TRUE 10.000000 >> >> # Summarizing only this year's data works: >> data[month >= ymd(20130101), mean(satisfaction), by = 'month,portal'] >> month portal V1 >> 1: 2013-01-01 TRUE 10.000000 >> 2: 2013-01-01 FALSE 10.000000 >> 3: 2013-02-01 TRUE 9.545455 >> 4: 2013-02-01 FALSE 9.000000 >> >> Yours Sincerely, >> Victor Kryukov >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: >> >> ------------------------------ >> >> _______________________________________________ >> datatable-help mailing list >> datatable-help at lists.r-forge.r-project.org [30] >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help [31] >> >> End of datatable-help Digest, Vol 36, Issue 8 >> ********************************************* Links: ------ [1] mailto:datatable-help at lists.r-forge.r-project.org [2] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help [3] mailto:datatable-help-request at lists.r-forge.r-project.org [4] mailto:datatable-help-owner at lists.r-forge.r-project.org [5] mailto:statquant at outlook.com [6] mailto:datatable-help at lists.r-forge.r-project.org [7] mailto:CAJJHHA9qL8hURXF0%2B8OnPaD1t7Y5csoOLX7qDKNUqXc1XpmGCA at mail.gmail.com [8] http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130225/f45e5d57/attachment-0001.html [9] mailto:victor.kryukov at gmail.com [10] mailto:datatable-help at lists.r-forge.r-project.org [11] https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975 [12] http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns [13] http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130225/45b99e3e/attachment-0001.html [14] mailto:mdowle at mdowle.plus.com [15] mailto:statquant at outlook.com [16] mailto:datatable-help at lists.r-forge.r-project.org [17] mailto:aed96221d7d28ff8d77ea8823135b49a at imap.plus.net [18] http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130226/643480c3/attachment-0001.html [19] mailto:michael.nelson at sydney.edu.au [20] mailto:datatable-help at lists.r-forge.r-project.org [21] mailto:datatable-help at lists.r-forge.r-project.org [22] mailto:6FB5193A6CDCDF499486A833B7AFBDCD5827D4E4 at EX-MBX-PRO-04.mcs.usyd.edu.au [23] mailto:datatable-help-bounces at lists.r-forge.r-project.org [24] mailto:datatable-help-bounces at lists.r-forge.r-project.org [25] mailto:victor.kryukov at gmail.com [26] mailto:datatable-help at lists.r-forge.r-project.org [27] https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975 [28] http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns [29] http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130226/c1945761/attachment.html [30] mailto:datatable-help at lists.r-forge.r-project.org [31] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help [32] mailto:datatable-help-request at lists.r-forge.r-project.org -------------- next part -------------- An HTML attachment was scrubbed... URL: