From tahfounasousou at rocketmail.com Tue Dec 3 11:23:34 2013 From: tahfounasousou at rocketmail.com (krys22) Date: Tue, 3 Dec 2013 02:23:34 -0800 (PST) Subject: [datatable-help] missing rows ans cloumns in my matrix Message-ID: <1386066214571-4681549.post@n4.nabble.com> i have a problem with big matrix, in fact, after the matrix?s creation many rows and columns are invisible because the big dimension of the matrix could you help me to get may complete matrix, have you any functions or any solution to resolve this problem for example if the dimension of my matrix is 50 the rows 12, 13, 14, 15?26 didn?t exist in my matrix -- View this message in context: http://r.789695.n4.nabble.com/missing-rows-ans-cloumns-in-my-matrix-tp4681549.html Sent from the datatable-help mailing list archive at Nabble.com. From alexandre.sieira at gmail.com Tue Dec 3 15:26:56 2013 From: alexandre.sieira at gmail.com (Alexandre Sieira) Date: Tue, 3 Dec 2013 12:26:56 -0200 Subject: [datatable-help] rbindlist Message-ID: I have come across some behavior in rbindlist that look unexpected to me: > rbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3))) ? ?a b 1: 1 2 2: 4 3 So it appears to assume (without checking) that all objects have not only the same column names but also the same column order. ?So a value assigned to column ?a? in the second object was used for column ?b? in the end result (and vice-versa). I know the documentation says rbindlist uses the column types from the first entry of the list, but I didn?t see any mention to column order or names anywhere.? I suggest that column names are matched, even if they are not in the same order. Perhaps a ?use.names? parameter could be used to ask for this behavior to avoid breaking backwards compatibility.? Or, at the very least, I suggest the documentation of bindlist be updated to explicitly mention that the columns will be considered by position only, and that callers need to ensure the column orders of all objects match exactly. And that a warning is issued by rbindlist when the column names don?t match. --? Alexandre Sieira CISA, CISSP, ISO 27001 Lead Auditor "The truth is rarely pure and never simple." Oscar Wilde, The Importance of Being Earnest, 1895, Act I -------------- next part -------------- An HTML attachment was scrubbed... URL: From gsee000 at gmail.com Tue Dec 3 17:46:08 2013 From: gsee000 at gmail.com (G See) Date: Tue, 3 Dec 2013 10:46:08 -0600 Subject: [datatable-help] rbindlist In-Reply-To: References: Message-ID: I agree. Here is a related thread: http://thread.gmane.org/gmane.comp.lang.r.datatable/2231 Garrett On Tue, Dec 3, 2013 at 8:26 AM, Alexandre Sieira wrote: > I have come across some behavior in rbindlist that look unexpected to me: > >> rbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3))) > a b > 1: 1 2 > 2: 4 3 > > So it appears to assume (without checking) that all objects have not only > the same column names but also the same column order. So a value assigned > to column ?a? in the second object was used for column ?b? in the end result > (and vice-versa). > > I know the documentation says rbindlist uses the column types from the first > entry of the list, but I didn?t see any mention to column order or names > anywhere. > > I suggest that column names are matched, even if they are not in the same > order. Perhaps a ?use.names? parameter could be used to ask for this > behavior to avoid breaking backwards compatibility. > > Or, at the very least, I suggest the documentation of bindlist be updated to > explicitly mention that the columns will be considered by position only, and > that callers need to ensure the column orders of all objects match exactly. > And that a warning is issued by rbindlist when the column names don?t match. > > -- > Alexandre Sieira > CISA, CISSP, ISO 27001 Lead Auditor > > "The truth is rarely pure and never simple." > Oscar Wilde, The Importance of Being Earnest, 1895, Act I > > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help From alexandre.sieira at gmail.com Tue Dec 3 18:05:59 2013 From: alexandre.sieira at gmail.com (Alexandre Sieira) Date: Tue, 3 Dec 2013 15:05:59 -0200 Subject: [datatable-help] rbindlist In-Reply-To: References: Message-ID: For whom it may concern, I wrote a (rather bulky) wrapper around rbindlist that: - checks that the classes of columns with the same name match; - fills in any missing columns with NAs of the appropriate type; - reorders columns for consistency; - calls rbindlist on the results of this preprocessing. The code is here:?https://gist.github.com/asieira/7772953 The results would be as follows: > smartrbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3))) ? ?a b 1: 1 2 2: 3 4 > smartrbindlist(list(data.table(a=1, b=2), list(c=3), data.table(d="foo"))) ? ? a ?b ?c ? d 1: ?1 ?2 NA ?NA 2: NA NA ?3 ?NA 3: NA NA NA foo > smartrbindlist(list(data.table(a=1L, b=2), list(a=10))) Erro em smartrbindlist(list(data.table(a = 1L, b = 2), list(a = 10))) ? smartrbindlist: column a has different classes in entry 2 [numeric] and its predecessors [integer] Hope this helps anyone else out there. --? Alexandre Sieira CISA, CISSP, ISO 27001 Lead Auditor "The truth is rarely pure and never simple." Oscar Wilde, The Importance of Being Earnest, 1895, Act I On 3 de dezembro de 2013 at 14:46:08, G See (gsee000 at gmail.com) wrote: I agree. Here is a related thread: http://thread.gmane.org/gmane.comp.lang.r.datatable/2231 Garrett On Tue, Dec 3, 2013 at 8:26 AM, Alexandre Sieira wrote: > I have come across some behavior in rbindlist that look unexpected to me: > >> rbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3))) > a b > 1: 1 2 > 2: 4 3 > > So it appears to assume (without checking) that all objects have not only > the same column names but also the same column order. So a value assigned > to column ?a? in the second object was used for column ?b? in the end result > (and vice-versa). > > I know the documentation says rbindlist uses the column types from the first > entry of the list, but I didn?t see any mention to column order or names > anywhere. > > I suggest that column names are matched, even if they are not in the same > order. Perhaps a ?use.names? parameter could be used to ask for this > behavior to avoid breaking backwards compatibility. > > Or, at the very least, I suggest the documentation of bindlist be updated to > explicitly mention that the columns will be considered by position only, and > that callers need to ensure the column orders of all objects match exactly. > And that a warning is issued by rbindlist when the column names don?t match. > > -- > Alexandre Sieira > CISA, CISSP, ISO 27001 Lead Auditor > > "The truth is rarely pure and never simple." > Oscar Wilde, The Importance of Being Earnest, 1895, Act I > > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From eduard.antonyan at gmail.com Tue Dec 3 18:22:28 2013 From: eduard.antonyan at gmail.com (Eduard Antonyan) Date: Tue, 3 Dec 2013 11:22:28 -0600 Subject: [datatable-help] rbindlist In-Reply-To: References: Message-ID: I took a cursory look at your code - the new rbind does everything you want (check use.names and the fill arguments), and you may want to take a look at its code. On Tue, Dec 3, 2013 at 11:05 AM, Alexandre Sieira < alexandre.sieira at gmail.com> wrote: > For whom it may concern, I wrote a (rather bulky) wrapper around rbindlist > that: > > - checks that the classes of columns with the same name match; > - fills in any missing columns with NAs of the appropriate type; > - reorders columns for consistency; > - calls rbindlist on the results of this preprocessing. > > The code is here: https://gist.github.com/asieira/7772953 > > The results would be as follows: > > > smartrbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3))) > a b > 1: 1 2 > 2: 3 4 > > > smartrbindlist(list(data.table(a=1, b=2), list(c=3), > data.table(d="foo"))) > a b c d > 1: 1 2 NA NA > 2: NA NA 3 NA > 3: NA NA NA foo > > > smartrbindlist(list(data.table(a=1L, b=2), list(a=10))) > Erro em smartrbindlist(list(data.table(a = 1L, b = 2), list(a = 10))) > smartrbindlist: column a has different classes in entry 2 [numeric] and > its predecessors [integer] > > Hope this helps anyone else out there. > > -- > Alexandre Sieira > CISA, CISSP, ISO 27001 Lead Auditor > > "The truth is rarely pure and never simple." > Oscar Wilde, The Importance of Being Earnest, 1895, Act I > > On 3 de dezembro de 2013 at 14:46:08, G See (gsee000 at gmail.com) > wrote: > > I agree. Here is a related thread: > http://thread.gmane.org/gmane.comp.lang.r.datatable/2231 > > Garrett > > > On Tue, Dec 3, 2013 at 8:26 AM, Alexandre Sieira > wrote: > > I have come across some behavior in rbindlist that look unexpected to > me: > > > >> rbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3))) > > a b > > 1: 1 2 > > 2: 4 3 > > > > So it appears to assume (without checking) that all objects have not > only > > the same column names but also the same column order. So a value > assigned > > to column ?a? in the second object was used for column ?b? in the end > result > > (and vice-versa). > > > > I know the documentation says rbindlist uses the column types from the > first > > entry of the list, but I didn?t see any mention to column order or names > > anywhere. > > > > I suggest that column names are matched, even if they are not in the > same > > order. Perhaps a ?use.names? parameter could be used to ask for this > > behavior to avoid breaking backwards compatibility. > > > > Or, at the very least, I suggest the documentation of bindlist be > updated to > > explicitly mention that the columns will be considered by position only, > and > > that callers need to ensure the column orders of all objects match > exactly. > > And that a warning is issued by rbindlist when the column names don?t > match. > > > > -- > > Alexandre Sieira > > CISA, CISSP, ISO 27001 Lead Auditor > > > > "The truth is rarely pure and never simple." > > Oscar Wilde, The Importance of Being Earnest, 1895, Act I > > > > _______________________________________________ > > datatable-help mailing list > > datatable-help at lists.r-forge.r-project.org > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eduard.antonyan at gmail.com Tue Dec 3 18:24:08 2013 From: eduard.antonyan at gmail.com (Eduard Antonyan) Date: Tue, 3 Dec 2013 11:24:08 -0600 Subject: [datatable-help] rbindlist In-Reply-To: References: Message-ID: With a small difference from what you wrote I guess - the classes are coerced to the most general one now in rbindlist (and therefore in rbind). On Tue, Dec 3, 2013 at 11:22 AM, Eduard Antonyan wrote: > I took a cursory look at your code - the new rbind does everything you > want (check use.names and the fill arguments), and you may want to take a > look at its code. > > > On Tue, Dec 3, 2013 at 11:05 AM, Alexandre Sieira < > alexandre.sieira at gmail.com> wrote: > >> For whom it may concern, I wrote a (rather bulky) wrapper around >> rbindlist that: >> >> - checks that the classes of columns with the same name match; >> - fills in any missing columns with NAs of the appropriate type; >> - reorders columns for consistency; >> - calls rbindlist on the results of this preprocessing. >> >> The code is here: https://gist.github.com/asieira/7772953 >> >> The results would be as follows: >> >> > smartrbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3))) >> a b >> 1: 1 2 >> 2: 3 4 >> >> > smartrbindlist(list(data.table(a=1, b=2), list(c=3), >> data.table(d="foo"))) >> a b c d >> 1: 1 2 NA NA >> 2: NA NA 3 NA >> 3: NA NA NA foo >> >> > smartrbindlist(list(data.table(a=1L, b=2), list(a=10))) >> Erro em smartrbindlist(list(data.table(a = 1L, b = 2), list(a = 10))) >> smartrbindlist: column a has different classes in entry 2 [numeric] and >> its predecessors [integer] >> >> Hope this helps anyone else out there. >> >> -- >> Alexandre Sieira >> CISA, CISSP, ISO 27001 Lead Auditor >> >> "The truth is rarely pure and never simple." >> Oscar Wilde, The Importance of Being Earnest, 1895, Act I >> >> On 3 de dezembro de 2013 at 14:46:08, G See (gsee000 at gmail.com) >> wrote: >> >> I agree. Here is a related thread: >> http://thread.gmane.org/gmane.comp.lang.r.datatable/2231 >> >> Garrett >> >> >> On Tue, Dec 3, 2013 at 8:26 AM, Alexandre Sieira >> wrote: >> > I have come across some behavior in rbindlist that look unexpected to >> me: >> > >> >> rbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3))) >> > a b >> > 1: 1 2 >> > 2: 4 3 >> > >> > So it appears to assume (without checking) that all objects have not >> only >> > the same column names but also the same column order. So a value >> assigned >> > to column ?a? in the second object was used for column ?b? in the end >> result >> > (and vice-versa). >> > >> > I know the documentation says rbindlist uses the column types from the >> first >> > entry of the list, but I didn?t see any mention to column order or >> names >> > anywhere. >> > >> > I suggest that column names are matched, even if they are not in the >> same >> > order. Perhaps a ?use.names? parameter could be used to ask for this >> > behavior to avoid breaking backwards compatibility. >> > >> > Or, at the very least, I suggest the documentation of bindlist be >> updated to >> > explicitly mention that the columns will be considered by position >> only, and >> > that callers need to ensure the column orders of all objects match >> exactly. >> > And that a warning is issued by rbindlist when the column names don?t >> match. >> > >> > -- >> > Alexandre Sieira >> > CISA, CISSP, ISO 27001 Lead Auditor >> > >> > "The truth is rarely pure and never simple." >> > Oscar Wilde, The Importance of Being Earnest, 1895, Act I >> > >> > _______________________________________________ >> > datatable-help mailing list >> > datatable-help at lists.r-forge.r-project.org >> > >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >> >> >> _______________________________________________ >> datatable-help mailing list >> datatable-help at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.sieira at gmail.com Tue Dec 3 18:48:11 2013 From: alexandre.sieira at gmail.com (Alexandre Sieira) Date: Tue, 3 Dec 2013 15:48:11 -0200 Subject: [datatable-help] rbindlist In-Reply-To: References: Message-ID: Thanks for pointing this out, Eduard.? You are absolutely right. I just looked at the SVN repository HEAD and saw a new parameter called ?fill? was added to .rbind.data.table that would also accomplish something else I added to my function. Very nice! Looking forward to the new release. :) --? Alexandre Sieira CISA, CISSP, ISO 27001 Lead Auditor "The truth is rarely pure and never simple." Oscar Wilde, The Importance of Being Earnest, 1895, Act I On 3 de dezembro de 2013 at 15:22:48, Eduard Antonyan (eduard.antonyan at gmail.com) wrote: I took a cursory look at your code - the new rbind does everything you want (check use.names and the fill arguments), and you may want to take a look at its code. On Tue, Dec 3, 2013 at 11:05 AM, Alexandre Sieira wrote: For whom it may concern, I wrote a (rather bulky) wrapper around rbindlist that: - checks that the classes of columns with the same name match; - fills in any missing columns with NAs of the appropriate type; - reorders columns for consistency; - calls rbindlist on the results of this preprocessing. The code is here:?https://gist.github.com/asieira/7772953 The results would be as follows: > smartrbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3))) ? ?a b 1: 1 2 2: 3 4 > smartrbindlist(list(data.table(a=1, b=2), list(c=3), data.table(d="foo"))) ? ? a ?b ?c ? d 1: ?1 ?2 NA ?NA 2: NA NA ?3 ?NA 3: NA NA NA foo > smartrbindlist(list(data.table(a=1L, b=2), list(a=10))) Erro em smartrbindlist(list(data.table(a = 1L, b = 2), list(a = 10))) ? smartrbindlist: column a has different classes in entry 2 [numeric] and its predecessors [integer] Hope this helps anyone else out there. --? Alexandre Sieira CISA, CISSP, ISO 27001 Lead Auditor "The truth is rarely pure and never simple." Oscar Wilde, The Importance of Being Earnest, 1895, Act I On 3 de dezembro de 2013 at 14:46:08, G See (gsee000 at gmail.com) wrote: I agree. Here is a related thread: http://thread.gmane.org/gmane.comp.lang.r.datatable/2231 Garrett On Tue, Dec 3, 2013 at 8:26 AM, Alexandre Sieira wrote: > I have come across some behavior in rbindlist that look unexpected to me: > >> rbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3))) > a b > 1: 1 2 > 2: 4 3 > > So it appears to assume (without checking) that all objects have not only > the same column names but also the same column order. So a value assigned > to column ?a? in the second object was used for column ?b? in the end result > (and vice-versa). > > I know the documentation says rbindlist uses the column types from the first > entry of the list, but I didn?t see any mention to column order or names > anywhere. > > I suggest that column names are matched, even if they are not in the same > order. Perhaps a ?use.names? parameter could be used to ask for this > behavior to avoid breaking backwards compatibility. > > Or, at the very least, I suggest the documentation of bindlist be updated to > explicitly mention that the columns will be considered by position only, and > that callers need to ensure the column orders of all objects match exactly. > And that a warning is issued by rbindlist when the column names don?t match. > > -- > Alexandre Sieira > CISA, CISSP, ISO 27001 Lead Auditor > > "The truth is rarely pure and never simple." > Oscar Wilde, The Importance of Being Earnest, 1895, Act I > > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdowle at mdowle.plus.com Tue Dec 10 18:05:22 2013 From: mdowle at mdowle.plus.com (Matt Dowle) Date: Tue, 10 Dec 2013 17:05:22 +0000 Subject: [datatable-help] Cologne on Friday Message-ID: <52A749D2.7040406@mdowle.plus.com> Hi, If anyone is in or near Cologne on Friday, I'm presenting with Arun : http://www.meetup.com/KoelnRUG/ Hope to meet a few of you there. Regards, Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From aragorn168b at gmail.com Tue Dec 10 22:44:53 2013 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Tue, 10 Dec 2013 22:44:53 +0100 Subject: [datatable-help] Cologne on Friday In-Reply-To: <52A749D2.7040406@mdowle.plus.com> References: <52A749D2.7040406@mdowle.plus.com> Message-ID: <97AAB816652F46A6978FC8508BEE2233@gmail.com> Oh yes. Looking forward to it :)! Arun On Tuesday, December 10, 2013 at 6:05 PM, Matt Dowle wrote: > > Hi, > > If anyone is in or near Cologne on Friday, I'm presenting with Arun : > > http://www.meetup.com/KoelnRUG/ > > Hope to meet a few of you there. > > Regards, > Matt > > > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org) > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aragorn168b at gmail.com Tue Dec 10 22:45:56 2013 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Tue, 10 Dec 2013 22:45:56 +0100 Subject: [datatable-help] Cologne on Friday In-Reply-To: <97AAB816652F46A6978FC8508BEE2233@gmail.com> References: <52A749D2.7040406@mdowle.plus.com> <97AAB816652F46A6978FC8508BEE2233@gmail.com> Message-ID: <41704C9EAF02401C99FDA608AA05A211@gmail.com> Here's info on when/where/what etc: http://www.meetup.com/KoelnRUG/events/146708302/ Arun On Tuesday, December 10, 2013 at 10:44 PM, Arunkumar Srinivasan wrote: > Oh yes. Looking forward to it :)! > > Arun > > > On Tuesday, December 10, 2013 at 6:05 PM, Matt Dowle wrote: > > > > > Hi, > > > > If anyone is in or near Cologne on Friday, I'm presenting with Arun : > > > > http://www.meetup.com/KoelnRUG/ > > > > Hope to meet a few of you there. > > > > Regards, > > Matt > > > > > > _______________________________________________ > > datatable-help mailing list > > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org) > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenhuashan at gmail.com Sat Dec 14 02:30:54 2013 From: chenhuashan at gmail.com (Huashan Chen) Date: Fri, 13 Dec 2013 17:30:54 -0800 (PST) Subject: [datatable-help] Fail to add new columns within a function Message-ID: <1386984654635-4682173.post@n4.nabble.com> I just found out that when the column quota are reached, adding new columns within a function will fail. Blow are the testing code: testF2=function(x){ add_var<-function(varname){ x[, `:=`(eval(substitute(varname)), 1), with=F] } sapply(paste0('a', 1:101), add_var) } dd=data.table(a=1:3) truelength(dd) testF2(dd) dim(dd) # only 100 columns dd[, new:=3] dim(dd) # adding new column outside a function is OK. -- View this message in context: http://r.789695.n4.nabble.com/Fail-to-add-new-columns-within-a-function-tp4682173.html Sent from the datatable-help mailing list archive at Nabble.com. From aragorn168b at gmail.com Sat Dec 14 14:10:01 2013 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Sat, 14 Dec 2013 14:10:01 +0100 Subject: [datatable-help] Fail to add new columns within a function In-Reply-To: <1386984654635-4682173.post@n4.nabble.com> References: <1386984654635-4682173.post@n4.nabble.com> Message-ID: <40AC36D389E643519828E005ABF732D0@gmail.com> Hi Huashan, Great reproducible example! Would you mind filing a bug report here (https://r-forge.r-project.org/tracker/?func=browse&group_id=240&atid=975)? Thank you, Arun On Saturday, December 14, 2013 at 2:30 AM, Huashan Chen wrote: > I just found out that when the column quota are reached, adding new columns > within a function will fail. > > Blow are the testing code: > > testF2=function(x){ > add_var<-function(varname){ > x[, `:=`(eval(substitute(varname)), 1), with=F] > } > sapply(paste0('a', 1:101), add_var) > } > > dd=data.table(a=1:3) > truelength(dd) > testF2(dd) > dim(dd) # only 100 columns > > dd[, new:=3] > dim(dd) # adding new column outside a function is OK. > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Fail-to-add-new-columns-within-a-function-tp4682173.html > Sent from the datatable-help mailing list archive at Nabble.com (http://Nabble.com). > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org) > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdowle at mdowle.plus.com Sun Dec 15 01:28:04 2013 From: mdowle at mdowle.plus.com (Matt Dowle) Date: Sun, 15 Dec 2013 00:28:04 +0000 Subject: [datatable-help] Fail to add new columns within a function In-Reply-To: <40AC36D389E643519828E005ABF732D0@gmail.com> References: <1386984654635-4682173.post@n4.nabble.com> <40AC36D389E643519828E005ABF732D0@gmail.com> Message-ID: <52ACF794.10408@mdowle.plus.com> Hi, This isn't a bug really. A documentation or too low default issue maybe. When all spare slots are used up, there is no choice but to make a shallow copy and create a new vector of column pointer slots. This is the pointer (address in RAM) which any variable names (symbols) point to. When this happens, data.table does a reasonable job of changing the symbol in calling scope too, but within a function within a function it's tricky. In your function, x is actually being updated by reference, but in local scope when the shallow copy happens ... when the spare slots are used up. By default : datatable.alloccol = quote(max(100L,ncol(DT)+64L)) Some people just change this to be a much larger number. That's the easiest. Just over-allocate massively : options(datatable.alloccol = 10000) If you have under 50 tables, this won't matter a jot. If you have 1000's of tables, then the spare space could become significant. Assuming 64bit, 10000 * 8bytes / 1024^2 = 78KB. Knowing this allows you to choose the appropriate amount of over-allocation for your case. 50 tables * 78KB = 4MB = e.g. 0.01% of 32GB Or, if you know you are about to add a lot of columns by reference via a function, you can increase the over-allocation of one table using the alloc.col function : alloc.col(DT, 200) In case the example was actually close to the real example, you can add a lot of columns in one step and the LHS of := can be an expression : DT[, paste0('a', 1:101) := 1] # add 101 columns named "a1", "a2" ... "a101", all set to 1 and set() may be an easier alternative to := in this case, now that it can add columns as from v1.8.11 If there is a real world example where it really needs to be wrapped in a function in a function then that would be needed to see (or an example closer to reality) to convince (me at least) that we need to do better here. HTH, Matt On 14/12/13 13:10, Arunkumar Srinivasan wrote: > Hi Huashan, > Great reproducible example! Would you mind filing a bug report here > ? > Thank you, > Arun > > On Saturday, December 14, 2013 at 2:30 AM, Huashan Chen wrote: > >> I just found out that when the column quota are reached, adding new >> columns >> within a function will fail. >> >> Blow are the testing code: >> >> testF2=function(x){ >> add_var<-function(varname){ >> x[, `:=`(eval(substitute(varname)), 1), with=F] >> } >> sapply(paste0('a', 1:101), add_var) >> } >> >> dd=data.table(a=1:3) >> truelength(dd) >> testF2(dd) >> dim(dd) # only 100 columns >> >> dd[, new:=3] >> dim(dd) # adding new column outside a function is OK. >> >> >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Fail-to-add-new-columns-within-a-function-tp4682173.html >> Sent from the datatable-help mailing list archive at Nabble.com >> . >> _______________________________________________ >> datatable-help mailing list >> datatable-help at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenhuashan at gmail.com Mon Dec 16 00:26:24 2013 From: chenhuashan at gmail.com (Huashan Chen) Date: Sun, 15 Dec 2013 15:26:24 -0800 (PST) Subject: [datatable-help] Fail to add new columns within a function In-Reply-To: <52ACF794.10408@mdowle.plus.com> References: <1386984654635-4682173.post@n4.nabble.com> <40AC36D389E643519828E005ABF732D0@gmail.com> <52ACF794.10408@mdowle.plus.com> Message-ID: <1387149983998-4682250.post@n4.nabble.com> Hi Matthew, Thank you for the thoughtful reply. It's from a real example I am using where a master dataset is to be merged with other datasets by selected rows and columns(may or may not exist in master dataset). This function is call multiple times. As can see from the simple example, a master dataset is passed in by reference to avoid duplications. As you also pointed out, alloc.col(DT, some large value) outside the function can fix this problem. But I am wondering if the all.col() call within the function could be more preferable? -- View this message in context: http://r.789695.n4.nabble.com/Fail-to-add-new-columns-within-a-function-tp4682173p4682250.html Sent from the datatable-help mailing list archive at Nabble.com. From mdowle at mdowle.plus.com Tue Dec 17 20:11:20 2013 From: mdowle at mdowle.plus.com (Matt Dowle) Date: Tue, 17 Dec 2013 19:11:20 +0000 Subject: [datatable-help] Fail to add new columns within a function In-Reply-To: <1387149983998-4682250.post@n4.nabble.com> References: <1386984654635-4682173.post@n4.nabble.com> <40AC36D389E643519828E005ABF732D0@gmail.com> <52ACF794.10408@mdowle.plus.com> <1387149983998-4682250.post@n4.nabble.com> Message-ID: <52B0A1D8.7070407@mdowle.plus.com> On 15/12/13 23:26, Huashan Chen wrote: > Hi Matthew, > > Thank you for the thoughtful reply. It's from a real example I am using > where a master dataset is to be merged with other datasets by selected rows > and columns(may or may not exist in master dataset). This function is call > multiple times. As can see from the simple example, a master dataset is > passed in by reference to avoid duplications. > > As you also pointed out, alloc.col(DT, some large value) outside the > function can fix this problem. But I am wondering if the all.col() call > within the function could be more preferable? If the name of the master table doesn't change, then you don't need to pass it in at all. Just use it directly inside the function and then yes the alloc.col will work. But if it's being merged with another dataset, then that merge will create a new table so I'm now confused again as the code didn't come through in this thread from nabble. The question seems like a good one and would be best on Stack Overflow where we can see the code, edit and comment etc. http://stackoverflow.com/questions/tagged/data.table Thanks, Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdowle at mdowle.plus.com Tue Dec 17 20:15:15 2013 From: mdowle at mdowle.plus.com (Matt Dowle) Date: Tue, 17 Dec 2013 19:15:15 +0000 Subject: [datatable-help] Latest presentation Message-ID: <52B0A2C3.1050406@mdowle.plus.com> The presentation Arun and I gave on Friday is now on the homepage : http://datatable.r-forge.r-project.org/CologneR_2013.pdf Matt From chenhuashan at gmail.com Wed Dec 18 10:04:29 2013 From: chenhuashan at gmail.com (Huashan Chen) Date: Wed, 18 Dec 2013 01:04:29 -0800 (PST) Subject: [datatable-help] Fail to add new columns within a function In-Reply-To: <52B0A1D8.7070407@mdowle.plus.com> References: <1386984654635-4682173.post@n4.nabble.com> <40AC36D389E643519828E005ABF732D0@gmail.com> <52ACF794.10408@mdowle.plus.com> <1387149983998-4682250.post@n4.nabble.com> <52B0A1D8.7070407@mdowle.plus.com> Message-ID: <1387357469738-4682393.post@n4.nabble.com> OK, here is the complete code with some mock functions from my example. # data: data.table object # fn: a filename to read data from merge_data<-function(fn, data){ fs<-getSavedata(fn) # read as data.frame if (is.null(fs)) stop('Empty data file') # return a character vector of variable names which are to merged, some variables in fs will not be merged to DT newvars<-selectVars(names(fs)) stopifnot(length(newvars) > 0) # determine which rows to use caseid<-someCustomFunc(fs) add_var<-function(varname){ data[caseid, `:=`(eval(substitute(varname)), fs[, toupper(eval(substitute(varname)))]), with=F] } invisible(sapply(newvars, add_var)) } # calling function merge_data('some file', DT) DT # display the updated results In this case, I think a warning from merge_data() when the quota is reached would be appreciated. Of couse, I could have added a check within the function to avoid unintended action. if (truelength(data) <= ncol(data) + 64L) stop('increase colunmn quota using alloc.col() before calling this function.') -- View this message in context: http://r.789695.n4.nabble.com/Fail-to-add-new-columns-within-a-function-tp4682173p4682393.html Sent from the datatable-help mailing list archive at Nabble.com. From mdowle at mdowle.plus.com Wed Dec 18 10:58:46 2013 From: mdowle at mdowle.plus.com (Matt Dowle) Date: Wed, 18 Dec 2013 09:58:46 +0000 Subject: [datatable-help] Fail to add new columns within a function In-Reply-To: <1387357469738-4682393.post@n4.nabble.com> References: <1386984654635-4682173.post@n4.nabble.com> <40AC36D389E643519828E005ABF732D0@gmail.com> <52ACF794.10408@mdowle.plus.com> <1387149983998-4682250.post@n4.nabble.com> <52B0A1D8.7070407@mdowle.plus.com> <1387357469738-4682393.post@n4.nabble.com> Message-ID: <52B171D6.1000209@mdowle.plus.com> Why are you doing this iteratively? Can't you load all the files into a list, rbindlist and then reshape? On 18/12/13 09:04, Huashan Chen wrote: > OK, here is the complete code with some mock functions from my example. > > # data: data.table object > # fn: a filename to read data from > merge_data<-function(fn, data){ > fs<-getSavedata(fn) # read as data.frame > if (is.null(fs)) stop('Empty data file') > > # return a character vector of variable names which are to merged, some > variables in fs will not be merged to DT > newvars<-selectVars(names(fs)) > stopifnot(length(newvars) > 0) > > # determine which rows to use > caseid<-someCustomFunc(fs) > > add_var<-function(varname){ > data[caseid, `:=`(eval(substitute(varname)), fs[, > toupper(eval(substitute(varname)))]), with=F] > } > invisible(sapply(newvars, add_var)) > } > > # calling function > merge_data('some file', DT) > DT # display the updated results > > > In this case, I think a warning from merge_data() when the quota is reached > would be appreciated. Of couse, I could have added a check within the > function to avoid unintended action. > > if (truelength(data) <= ncol(data) + 64L) stop('increase colunmn quota > using alloc.col() before calling this function.') > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Fail-to-add-new-columns-within-a-function-tp4682173p4682393.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > From mdowle at mdowle.plus.com Wed Dec 18 11:01:04 2013 From: mdowle at mdowle.plus.com (Matt Dowle) Date: Wed, 18 Dec 2013 10:01:04 +0000 Subject: [datatable-help] Fail to add new columns within a function In-Reply-To: <1387357469738-4682393.post@n4.nabble.com> References: <1386984654635-4682173.post@n4.nabble.com> <40AC36D389E643519828E005ABF732D0@gmail.com> <52ACF794.10408@mdowle.plus.com> <1387149983998-4682250.post@n4.nabble.com> <52B0A1D8.7070407@mdowle.plus.com> <1387357469738-4682393.post@n4.nabble.com> Message-ID: <52B17260.7080703@mdowle.plus.com> Why are you doing this iteratively? Can't you load all the files into a list, rbindlist and then reshape? What kind of data is this e.g. which field? On 18/12/13 09:04, Huashan Chen wrote: > OK, here is the complete code with some mock functions from my example. > > # data: data.table object > # fn: a filename to read data from > merge_data<-function(fn, data){ > fs<-getSavedata(fn) # read as data.frame > if (is.null(fs)) stop('Empty data file') > > # return a character vector of variable names which are to merged, some > variables in fs will not be merged to DT > newvars<-selectVars(names(fs)) > stopifnot(length(newvars) > 0) > > # determine which rows to use > caseid<-someCustomFunc(fs) > > add_var<-function(varname){ > data[caseid, `:=`(eval(substitute(varname)), fs[, > toupper(eval(substitute(varname)))]), with=F] > } > invisible(sapply(newvars, add_var)) > } > > # calling function > merge_data('some file', DT) > DT # display the updated results > > > In this case, I think a warning from merge_data() when the quota is reached > would be appreciated. Of couse, I could have added a check within the > function to avoid unintended action. > > if (truelength(data) <= ncol(data) + 64L) stop('increase colunmn quota > using alloc.col() before calling this function.') > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Fail-to-add-new-columns-within-a-function-tp4682173p4682393.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > From kevinushey at gmail.com Thu Dec 19 02:54:57 2013 From: kevinushey at gmail.com (Kevin Ushey) Date: Wed, 18 Dec 2013 17:54:57 -0800 Subject: [datatable-help] 'by' on a numeric column produces inconsistent output Message-ID: I'm cross-posting this from the GitHub mirror: https://github.com/arunsrinivasan/datatable/issues/2 For reference, I only see this with the latest RForge version of data.table (1.8.11), not the CRAN version of data.table. ----- library(data.table, lib="/Users/kevinushey/Library/R/3.1/library") set.seed(32) n <- 3 dt <- data.table( y=rnorm(n), by=round( rnorm(n), 1) ) dt[, list(max=max(y, na.rm=TRUE)), by=list(by) ] dt[, list(max=max(y, na.rm=TRUE)), by=list(by) ] produces the output > dt[, + list(max=max(y, na.rm=TRUE)), + by=list(by) + ] by max 1: 0.4 0.01464054 2: 0.4 0.87328871 3: 0.7 -1.02794620 > > dt[, + list(max=max(y, na.rm=TRUE)), + by=list(by) + ] by max 1: 0.4 0.8732887 2: 0.7 -1.0279462 For some reason, the first return is wrong, while the second (and all subsequent) output is correct. Any idea what's going on? > sessionInfo() R Under development (unstable) (2013-12-12 r64453) Platform: x86_64-apple-darwin13.0.0 (64-bit) locale: [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] data.table_1.8.11 knitr_1.5 devtools_1.4.1.99 BiocInstaller_1.13.3 loaded via a namespace (and not attached): [1] compiler_3.1.0 digest_0.6.4 evaluate_0.5.1 formatR_0.10 httr_0.2 memoise_0.1 [7] parallel_3.1.0 plyr_1.8 RCurl_1.95-4.1 reshape2_1.2.2 stringr_0.6.2 tools_3.1.0 [13] whisker_0.3-2 --- Kevin From michael.nelson at sydney.edu.au Thu Dec 19 03:50:06 2013 From: michael.nelson at sydney.edu.au (Michael Nelson) Date: Thu, 19 Dec 2013 02:50:06 +0000 Subject: [datatable-help] 'by' on a numeric column produces inconsistent output In-Reply-To: References: Message-ID: <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05> Using data.table 1.8.11 (Fresh install from r-forge today) R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) I get by max 1: 0.7 0.01464054 2: 0.4 0.87328871 3: 0.4 -1.02794620 On both runs. ________________________________________ From: datatable-help-bounces at lists.r-forge.r-project.org [datatable-help-bounces at lists.r-forge.r-project.org] on behalf of Kevin Ushey [kevinushey at gmail.com] Sent: Thursday, 19 December 2013 12:54 PM To: datatable-help at lists.r-forge.r-project.org Subject: [datatable-help] 'by' on a numeric column produces inconsistent output I'm cross-posting this from the GitHub mirror: https://github.com/arunsrinivasan/datatable/issues/2 For reference, I only see this with the latest RForge version of data.table (1.8.11), not the CRAN version of data.table. ----- library(data.table, lib="/Users/kevinushey/Library/R/3.1/library") set.seed(32) n <- 3 dt <- data.table( y=rnorm(n), by=round( rnorm(n), 1) ) dt[, list(max=max(y, na.rm=TRUE)), by=list(by) ] dt[, list(max=max(y, na.rm=TRUE)), by=list(by) ] produces the output > dt[, + list(max=max(y, na.rm=TRUE)), + by=list(by) + ] by max 1: 0.4 0.01464054 2: 0.4 0.87328871 3: 0.7 -1.02794620 > > dt[, + list(max=max(y, na.rm=TRUE)), + by=list(by) + ] by max 1: 0.4 0.8732887 2: 0.7 -1.0279462 For some reason, the first return is wrong, while the second (and all subsequent) output is correct. Any idea what's going on? > sessionInfo() R Under development (unstable) (2013-12-12 r64453) Platform: x86_64-apple-darwin13.0.0 (64-bit) locale: [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] data.table_1.8.11 knitr_1.5 devtools_1.4.1.99 BiocInstaller_1.13.3 loaded via a namespace (and not attached): [1] compiler_3.1.0 digest_0.6.4 evaluate_0.5.1 formatR_0.10 httr_0.2 memoise_0.1 [7] parallel_3.1.0 plyr_1.8 RCurl_1.95-4.1 reshape2_1.2.2 stringr_0.6.2 tools_3.1.0 [13] whisker_0.3-2 --- Kevin _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help From aragorn168b at gmail.com Thu Dec 19 08:22:59 2013 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Thu, 19 Dec 2013 08:22:59 +0100 Subject: [datatable-help] 'by' on a numeric column produces inconsistent utput In-Reply-To: <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05> References: <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05> Message-ID: <6ED93E37928C4E849109DB7A5615EA61@gmail.com> Not sure how to debug without being able to reproduce. Tried on Mac OS X 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows machine. I consistently gives me this: > dt[, + list(max=max(y, na.rm=TRUE)), + by=list(by) + ] by max 1: 0.7 0.01464054 2: 0.4 0.87328871 > > dt[, + list(max=max(y, na.rm=TRUE)), + by=list(by) + ] by max 1: 0.7 0.01464054 2: 0.4 0.87328871 Can either of you provide me with the output of these steps in cases where there's an error? I've commented the output I get for each step. byval <- list(by=dt$by) o__ <- data.table:::fastorder(byval) # 2,3,1 f__ = data.table:::uniqlist(byval, order=o__) # 1,3 len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1 firstofeachgroup = o__[f__] # 2,1 origorder = data.table:::iradixorder(firstofeachgroup) # 2,1 f__ = f__[origorder] # 3,1 len__ = len__[origorder] # 2,1 Arun On Thursday, December 19, 2013 at 3:50 AM, Michael Nelson wrote: > Using > data.table 1.8.11 (Fresh install from r-forge today) > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > I get > > by max > 1: 0.7 0.01464054 > 2: 0.4 0.87328871 > 3: 0.4 -1.02794620 > > On both runs. > > > > > ________________________________________ > From: datatable-help-bounces at lists.r-forge.r-project.org (mailto:datatable-help-bounces at lists.r-forge.r-project.org) [datatable-help-bounces at lists.r-forge.r-project.org (mailto:datatable-help-bounces at lists.r-forge.r-project.org)] on behalf of Kevin Ushey [kevinushey at gmail.com (mailto:kevinushey at gmail.com)] > Sent: Thursday, 19 December 2013 12:54 PM > To: datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org) > Subject: [datatable-help] 'by' on a numeric column produces inconsistent output > > I'm cross-posting this from the GitHub mirror: > https://github.com/arunsrinivasan/datatable/issues/2 > > For reference, I only see this with the latest RForge version of > data.table (1.8.11), not the CRAN version of data.table. > > ----- > > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library") > set.seed(32) > n <- 3 > dt <- data.table( > y=rnorm(n), > by=round( rnorm(n), 1) > ) > > dt[, > list(max=max(y, na.rm=TRUE)), > by=list(by) > ] > > dt[, > list(max=max(y, na.rm=TRUE)), > by=list(by) > ] > > produces the output > > > dt[, > + list(max=max(y, na.rm=TRUE)), > + by=list(by) > + ] > by max > 1: 0.4 0.01464054 > 2: 0.4 0.87328871 > 3: 0.7 -1.02794620 > > > > dt[, > + list(max=max(y, na.rm=TRUE)), > + by=list(by) > + ] > by max > 1: 0.4 0.8732887 > 2: 0.7 -1.0279462 > > For some reason, the first return is wrong, while the second (and all > subsequent) output is correct. Any idea what's going on? > > > sessionInfo() > R Under development (unstable) (2013-12-12 r64453) > Platform: x86_64-apple-darwin13.0.0 (64-bit) > > locale: > [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] data.table_1.8.11 knitr_1.5 devtools_1.4.1.99 > BiocInstaller_1.13.3 > > loaded via a namespace (and not attached): > [1] compiler_3.1.0 digest_0.6.4 evaluate_0.5.1 formatR_0.10 > httr_0.2 memoise_0.1 > [7] parallel_3.1.0 plyr_1.8 RCurl_1.95-4.1 reshape2_1.2.2 > stringr_0.6.2 tools_3.1.0 > [13] whisker_0.3-2 > > --- > > Kevin > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org) > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org) > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aragorn168b at gmail.com Thu Dec 19 08:34:27 2013 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Thu, 19 Dec 2013 08:34:27 +0100 Subject: [datatable-help] 'by' on a numeric column produces inconsistent output In-Reply-To: References: Message-ID: Kevin, your output looks sorted by the "by" column, which shouldn't happen as well. So, I would consider even the second output wrong, unless you're setting key on "by". Arun On Thursday, December 19, 2013 at 2:54 AM, Kevin Ushey wrote: > I'm cross-posting this from the GitHub mirror: > https://github.com/arunsrinivasan/datatable/issues/2 > > For reference, I only see this with the latest RForge version of > data.table (1.8.11), not the CRAN version of data.table. > > ----- > > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library") > set.seed(32) > n <- 3 > dt <- data.table( > y=rnorm(n), > by=round( rnorm(n), 1) > ) > > dt[, > list(max=max(y, na.rm=TRUE)), > by=list(by) > ] > > dt[, > list(max=max(y, na.rm=TRUE)), > by=list(by) > ] > > produces the output > > > dt[, > + list(max=max(y, na.rm=TRUE)), > + by=list(by) > + ] > by max > 1: 0.4 0.01464054 > 2: 0.4 0.87328871 > 3: 0.7 -1.02794620 > > > > dt[, > + list(max=max(y, na.rm=TRUE)), > + by=list(by) > + ] > by max > 1: 0.4 0.8732887 > 2: 0.7 -1.0279462 > > For some reason, the first return is wrong, while the second (and all > subsequent) output is correct. Any idea what's going on? > > > sessionInfo() > R Under development (unstable) (2013-12-12 r64453) > Platform: x86_64-apple-darwin13.0.0 (64-bit) > > locale: > [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] data.table_1.8.11 knitr_1.5 devtools_1.4.1.99 > BiocInstaller_1.13.3 > > loaded via a namespace (and not attached): > [1] compiler_3.1.0 digest_0.6.4 evaluate_0.5.1 formatR_0.10 > httr_0.2 memoise_0.1 > [7] parallel_3.1.0 plyr_1.8 RCurl_1.95-4.1 reshape2_1.2.2 > stringr_0.6.2 tools_3.1.0 > [13] whisker_0.3-2 > > --- > > Kevin > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org) > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinushey at gmail.com Thu Dec 19 08:37:21 2013 From: kevinushey at gmail.com (Kevin Ushey) Date: Wed, 18 Dec 2013 23:37:21 -0800 Subject: [datatable-help] 'by' on a numeric column produces inconsistent utput In-Reply-To: <6ED93E37928C4E849109DB7A5615EA61@gmail.com> References: <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05> <6ED93E37928C4E849109DB7A5615EA61@gmail.com> Message-ID: Hi Arun, Here's the output on my machine -- other information missing from before; it's with OSX Mavericks, with R and data.table compiled with Apple clang. --- > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library") > set.seed(32) > n <- 3 > dt <- data.table( + y=rnorm(n), + by=round( rnorm(n), 1) + ) > ## run one > byval <- list(by=dt$by) > (o__ <- data.table:::fastorder(byval)) # 2,3,1 [1] 2 3 1 > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 [1] 1 2 3 > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 [1] 1 1 1 > (firstofeachgroup = o__[f__]) # 2,1 [1] 2 3 1 > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 [1] 3 1 2 > (f__ = f__[origorder]) # 3,1 [1] 3 1 2 > (len__ = len__[origorder]) # 2,1 [1] 1 1 1 ## run two > (o__ <- data.table:::fastorder(byval)) # 2,3,1 [1] 1 2 3 > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 [1] 1 3 > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 [1] 2 1 > (firstofeachgroup = o__[f__]) # 2,1 [1] 1 3 > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 [1] 1 2 > (f__ = f__[origorder]) # 3,1 [1] 1 3 > (len__ = len__[origorder]) # 2,1 [1] 2 1 On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan wrote: > Not sure how to debug without being able to reproduce. Tried on Mac OS X > 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows > machine. I consistently gives me this: > >> dt[, > + list(max=max(y, na.rm=TRUE)), > + by=list(by) > + ] > by max > 1: 0.7 0.01464054 > 2: 0.4 0.87328871 >> >> dt[, > + list(max=max(y, na.rm=TRUE)), > + by=list(by) > + ] > by max > 1: 0.7 0.01464054 > 2: 0.4 0.87328871 > > Can either of you provide me with the output of these steps in cases where > there's an error? I've commented the output I get for each step. > > byval <- list(by=dt$by) > o__ <- data.table:::fastorder(byval) # 2,3,1 > f__ = data.table:::uniqlist(byval, order=o__) # 1,3 > len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1 > firstofeachgroup = o__[f__] # 2,1 > origorder = data.table:::iradixorder(firstofeachgroup) # 2,1 > f__ = f__[origorder] # 3,1 > len__ = len__[origorder] # 2,1 > > > Arun > > <...snip...> From aragorn168b at gmail.com Thu Dec 19 08:44:16 2013 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Thu, 19 Dec 2013 08:44:16 +0100 Subject: [datatable-help] 'by' on a numeric column produces inconsistent utput In-Reply-To: References: <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05> <6ED93E37928C4E849109DB7A5615EA61@gmail.com> Message-ID: Aha, the issue seems to be with 'uniqlist', not sure why it gives > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 1,2,3 for you and 1,3 consistently for me. I'll revert this back to `duplist` for now. Not sure how to solve this though. I've tried it so far on 3 machines: 1) OS X 10.8.5 + libvm (gcc) 2) OS X Mavericks + Clang 3) Debian Weezy + gcc All of them give consistent output. Man this is such a drag. Arun On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote: > Hi Arun, > > Here's the output on my machine -- other information missing from > before; it's with OSX Mavericks, with R and data.table compiled with > Apple clang. > > --- > > > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library") > > set.seed(32) > > n <- 3 > > dt <- data.table( > > > > + y=rnorm(n), > + by=round( rnorm(n), 1) > + ) > > > > ## run one > > byval <- list(by=dt$by) > > (o__ <- data.table:::fastorder(byval)) # 2,3,1 > > > > [1] 2 3 1 > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > [1] 1 2 3 > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 > > [1] 1 1 1 > > (firstofeachgroup = o__[f__]) # 2,1 > > [1] 2 3 1 > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 > > [1] 3 1 2 > > (f__ = f__[origorder]) # 3,1 > > [1] 3 1 2 > > (len__ = len__[origorder]) # 2,1 > > [1] 1 1 1 > > ## run two > > (o__ <- data.table:::fastorder(byval)) # 2,3,1 > > [1] 1 2 3 > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > [1] 1 3 > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 > > [1] 2 1 > > (firstofeachgroup = o__[f__]) # 2,1 > > [1] 1 3 > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 > > [1] 1 2 > > (f__ = f__[origorder]) # 3,1 > > [1] 1 3 > > (len__ = len__[origorder]) # 2,1 > > [1] 2 1 > > On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan > wrote: > > Not sure how to debug without being able to reproduce. Tried on Mac OS X > > 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows > > machine. I consistently gives me this: > > > > > dt[, > > + list(max=max(y, na.rm=TRUE)), > > + by=list(by) > > + ] > > by max > > 1: 0.7 0.01464054 > > 2: 0.4 0.87328871 > > > > > > dt[, > > + list(max=max(y, na.rm=TRUE)), > > + by=list(by) > > + ] > > by max > > 1: 0.7 0.01464054 > > 2: 0.4 0.87328871 > > > > Can either of you provide me with the output of these steps in cases where > > there's an error? I've commented the output I get for each step. > > > > byval <- list(by=dt$by) > > o__ <- data.table:::fastorder(byval) # 2,3,1 > > f__ = data.table:::uniqlist(byval, order=o__) # 1,3 > > len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1 > > firstofeachgroup = o__[f__] # 2,1 > > origorder = data.table:::iradixorder(firstofeachgroup) # 2,1 > > f__ = f__[origorder] # 3,1 > > len__ = len__[origorder] # 2,1 > > > > > > Arun > > > > <...snip...> -------------- next part -------------- An HTML attachment was scrubbed... URL: From szehnder at uni-bonn.de Thu Dec 19 08:49:38 2013 From: szehnder at uni-bonn.de (Simon Zehnder) Date: Thu, 19 Dec 2013 08:49:38 +0100 Subject: [datatable-help] 'by' on a numeric column produces inconsistent utput In-Reply-To: References: <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05> <6ED93E37928C4E849109DB7A5615EA61@gmail.com> Message-ID: Arun, if you could send me the reproducible code in copyable form I can as well try it on Mac OS X Mavericks with gcc 4.8. Best Simon On 19 Dec 2013, at 08:44, Arunkumar Srinivasan wrote: > Aha, the issue seems to be with 'uniqlist', not sure why it gives >> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > 1,2,3 for you and 1,3 consistently for me. I'll revert this back to `duplist` for now. Not sure how to solve this though. I've tried it so far on 3 machines: > > 1) OS X 10.8.5 + libvm (gcc) > 2) OS X Mavericks + Clang > 3) Debian Weezy + gcc > > All of them give consistent output. Man this is such a drag. > > Arun > > On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote: > >> Hi Arun, >> >> Here's the output on my machine -- other information missing from >> before; it's with OSX Mavericks, with R and data.table compiled with >> Apple clang. >> >> --- >> >>> library(data.table, lib="/Users/kevinushey/Library/R/3.1/library") >>> set.seed(32) >>> n <- 3 >>> dt <- data.table( >> + y=rnorm(n), >> + by=round( rnorm(n), 1) >> + ) >> ## run one >>> byval <- list(by=dt$by) >>> (o__ <- data.table:::fastorder(byval)) # 2,3,1 >> [1] 2 3 1 >>> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 >> [1] 1 2 3 >>> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 >> [1] 1 1 1 >>> (firstofeachgroup = o__[f__]) # 2,1 >> [1] 2 3 1 >>> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 >> [1] 3 1 2 >>> (f__ = f__[origorder]) # 3,1 >> [1] 3 1 2 >>> (len__ = len__[origorder]) # 2,1 >> [1] 1 1 1 >> >> ## run two >>> (o__ <- data.table:::fastorder(byval)) # 2,3,1 >> [1] 1 2 3 >>> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 >> [1] 1 3 >>> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 >> [1] 2 1 >>> (firstofeachgroup = o__[f__]) # 2,1 >> [1] 1 3 >>> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 >> [1] 1 2 >>> (f__ = f__[origorder]) # 3,1 >> [1] 1 3 >>> (len__ = len__[origorder]) # 2,1 >> [1] 2 1 >> >> On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan >> wrote: >>> Not sure how to debug without being able to reproduce. Tried on Mac OS X >>> 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows >>> machine. I consistently gives me this: >>> >>>> dt[, >>> + list(max=max(y, na.rm=TRUE)), >>> + by=list(by) >>> + ] >>> by max >>> 1: 0.7 0.01464054 >>> 2: 0.4 0.87328871 >>>> >>>> dt[, >>> + list(max=max(y, na.rm=TRUE)), >>> + by=list(by) >>> + ] >>> by max >>> 1: 0.7 0.01464054 >>> 2: 0.4 0.87328871 >>> >>> Can either of you provide me with the output of these steps in cases where >>> there's an error? I've commented the output I get for each step. >>> >>> byval <- list(by=dt$by) >>> o__ <- data.table:::fastorder(byval) # 2,3,1 >>> f__ = data.table:::uniqlist(byval, order=o__) # 1,3 >>> len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1 >>> firstofeachgroup = o__[f__] # 2,1 >>> origorder = data.table:::iradixorder(firstofeachgroup) # 2,1 >>> f__ = f__[origorder] # 3,1 >>> len__ = len__[origorder] # 2,1 >>> >>> >>> Arun >>> >>> <...snip...> > > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help From kevinushey at gmail.com Thu Dec 19 08:55:18 2013 From: kevinushey at gmail.com (Kevin Ushey) Date: Wed, 18 Dec 2013 23:55:18 -0800 Subject: [datatable-help] 'by' on a numeric column produces inconsistent utput In-Reply-To: References: <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05> <6ED93E37928C4E849109DB7A5615EA61@gmail.com> Message-ID: Hmm, I am seeing that after the data.table:::fastorder call, the dt itself is modified. Notice that 'by' is rearranged without modifying 'y'. > dt y by 1: 0.01464054 0.7 2: 0.87328871 0.4 3: -1.02794620 0.4 > (o__ <- data.table:::fastorder(byval)) # 2,3,1 [1] 2 3 1 > dt y by 1: 0.01464054 0.4 2: 0.87328871 0.4 3: -1.02794620 0.7 On Wed, Dec 18, 2013 at 11:44 PM, Arunkumar Srinivasan wrote: > Aha, the issue seems to be with 'uniqlist', not sure why it gives > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > 1,2,3 for you and 1,3 consistently for me. I'll revert this back to > `duplist` for now. Not sure how to solve this though. I've tried it so far > on 3 machines: > > 1) OS X 10.8.5 + libvm (gcc) > 2) OS X Mavericks + Clang > 3) Debian Weezy + gcc > > All of them give consistent output. Man this is such a drag. > > Arun > > On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote: > > Hi Arun, > > Here's the output on my machine -- other information missing from > before; it's with OSX Mavericks, with R and data.table compiled with > Apple clang. > > --- > > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library") > set.seed(32) > n <- 3 > dt <- data.table( > > + y=rnorm(n), > + by=round( rnorm(n), 1) > + ) > > ## run one > > byval <- list(by=dt$by) > (o__ <- data.table:::fastorder(byval)) # 2,3,1 > > [1] 2 3 1 > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > [1] 1 2 3 > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 > > [1] 1 1 1 > > (firstofeachgroup = o__[f__]) # 2,1 > > [1] 2 3 1 > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 > > [1] 3 1 2 > > (f__ = f__[origorder]) # 3,1 > > [1] 3 1 2 > > (len__ = len__[origorder]) # 2,1 > > [1] 1 1 1 > > ## run two > > (o__ <- data.table:::fastorder(byval)) # 2,3,1 > > [1] 1 2 3 > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > [1] 1 3 > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 > > [1] 2 1 > > (firstofeachgroup = o__[f__]) # 2,1 > > [1] 1 3 > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 > > [1] 1 2 > > (f__ = f__[origorder]) # 3,1 > > [1] 1 3 > > (len__ = len__[origorder]) # 2,1 > > [1] 2 1 > > On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan > wrote: > > Not sure how to debug without being able to reproduce. Tried on Mac OS X > 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows > machine. I consistently gives me this: > > dt[, > > + list(max=max(y, na.rm=TRUE)), > + by=list(by) > + ] > by max > 1: 0.7 0.01464054 > 2: 0.4 0.87328871 > > > dt[, > > + list(max=max(y, na.rm=TRUE)), > + by=list(by) > + ] > by max > 1: 0.7 0.01464054 > 2: 0.4 0.87328871 > > Can either of you provide me with the output of these steps in cases where > there's an error? I've commented the output I get for each step. > > byval <- list(by=dt$by) > o__ <- data.table:::fastorder(byval) # 2,3,1 > f__ = data.table:::uniqlist(byval, order=o__) # 1,3 > len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1 > firstofeachgroup = o__[f__] # 2,1 > origorder = data.table:::iradixorder(firstofeachgroup) # 2,1 > f__ = f__[origorder] # 3,1 > len__ = len__[origorder] # 2,1 > > > Arun > > <...snip...> > > From aragorn168b at gmail.com Thu Dec 19 09:02:02 2013 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Thu, 19 Dec 2013 09:02:02 +0100 Subject: [datatable-help] 'by' on a numeric column produces inconsistent utput In-Reply-To: References: <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05> <6ED93E37928C4E849109DB7A5615EA61@gmail.com> Message-ID: Ah, that explains it as well. So a copy is not being sent to fastorder, but that only happens the first time? I'll write again if there are more questions. Thanks again Kevin. Arun On Thursday, December 19, 2013 at 8:55 AM, Kevin Ushey wrote: > Hmm, I am seeing that after the data.table:::fastorder call, the dt > itself is modified. Notice that 'by' is rearranged without modifying > 'y'. > > > dt > y by > 1: 0.01464054 0.7 > 2: 0.87328871 0.4 > 3: -1.02794620 0.4 > > (o__ <- data.table:::fastorder(byval)) # 2,3,1 > > [1] 2 3 1 > > dt > > y by > 1: 0.01464054 0.4 > 2: 0.87328871 0.4 > 3: -1.02794620 0.7 > > On Wed, Dec 18, 2013 at 11:44 PM, Arunkumar Srinivasan > wrote: > > Aha, the issue seems to be with 'uniqlist', not sure why it gives > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > > > 1,2,3 for you and 1,3 consistently for me. I'll revert this back to > > `duplist` for now. Not sure how to solve this though. I've tried it so far > > on 3 machines: > > > > 1) OS X 10.8.5 + libvm (gcc) > > 2) OS X Mavericks + Clang > > 3) Debian Weezy + gcc > > > > All of them give consistent output. Man this is such a drag. > > > > Arun > > > > On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote: > > > > Hi Arun, > > > > Here's the output on my machine -- other information missing from > > before; it's with OSX Mavericks, with R and data.table compiled with > > Apple clang. > > > > --- > > > > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library") > > set.seed(32) > > n <- 3 > > dt <- data.table( > > > > + y=rnorm(n), > > + by=round( rnorm(n), 1) > > + ) > > > > ## run one > > > > byval <- list(by=dt$by) > > (o__ <- data.table:::fastorder(byval)) # 2,3,1 > > > > [1] 2 3 1 > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > > > [1] 1 2 3 > > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 > > > > [1] 1 1 1 > > > > (firstofeachgroup = o__[f__]) # 2,1 > > > > [1] 2 3 1 > > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 > > > > [1] 3 1 2 > > > > (f__ = f__[origorder]) # 3,1 > > > > [1] 3 1 2 > > > > (len__ = len__[origorder]) # 2,1 > > > > [1] 1 1 1 > > > > ## run two > > > > (o__ <- data.table:::fastorder(byval)) # 2,3,1 > > > > [1] 1 2 3 > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > > > [1] 1 3 > > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 > > > > [1] 2 1 > > > > (firstofeachgroup = o__[f__]) # 2,1 > > > > [1] 1 3 > > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 > > > > [1] 1 2 > > > > (f__ = f__[origorder]) # 3,1 > > > > [1] 1 3 > > > > (len__ = len__[origorder]) # 2,1 > > > > [1] 2 1 > > > > On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan > > wrote: > > > > Not sure how to debug without being able to reproduce. Tried on Mac OS X > > 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows > > machine. I consistently gives me this: > > > > dt[, > > > > + list(max=max(y, na.rm=TRUE)), > > + by=list(by) > > + ] > > by max > > 1: 0.7 0.01464054 > > 2: 0.4 0.87328871 > > > > > > dt[, > > > > + list(max=max(y, na.rm=TRUE)), > > + by=list(by) > > + ] > > by max > > 1: 0.7 0.01464054 > > 2: 0.4 0.87328871 > > > > Can either of you provide me with the output of these steps in cases where > > there's an error? I've commented the output I get for each step. > > > > byval <- list(by=dt$by) > > o__ <- data.table:::fastorder(byval) # 2,3,1 > > f__ = data.table:::uniqlist(byval, order=o__) # 1,3 > > len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1 > > firstofeachgroup = o__[f__] # 2,1 > > origorder = data.table:::iradixorder(firstofeachgroup) # 2,1 > > f__ = f__[origorder] # 3,1 > > len__ = len__[origorder] # 2,1 > > > > > > Arun > > > > <...snip...> -------------- next part -------------- An HTML attachment was scrubbed... URL: From aragorn168b at gmail.com Thu Dec 19 09:05:33 2013 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Thu, 19 Dec 2013 09:05:33 +0100 Subject: [datatable-help] 'by' on a numeric column produces inconsistent utput In-Reply-To: References: <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05> <6ED93E37928C4E849109DB7A5615EA61@gmail.com> Message-ID: <38C67C72F30C4C738B3A33DAF700ED3B@gmail.com> Simon, sure. set.seed(32) n <- 3 dt <- data.table( y=rnorm(n), by=round( rnorm(n), 1) ) dt[, list(max=max(y, na.rm=TRUE)), by=list(by) ] dt[, list(max=max(y, na.rm=TRUE)), by=list(by) ] Arun On Thursday, December 19, 2013 at 8:49 AM, Simon Zehnder wrote: > Arun, > > if you could send me the reproducible code in copyable form I can as well try it on Mac OS X Mavericks with gcc 4.8. > > Best > > Simon > > On 19 Dec 2013, at 08:44, Arunkumar Srinivasan wrote: > > > Aha, the issue seems to be with 'uniqlist', not sure why it gives > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > > > 1,2,3 for you and 1,3 consistently for me. I'll revert this back to `duplist` for now. Not sure how to solve this though. I've tried it so far on 3 machines: > > > > 1) OS X 10.8.5 + libvm (gcc) > > 2) OS X Mavericks + Clang > > 3) Debian Weezy + gcc > > > > All of them give consistent output. Man this is such a drag. > > > > Arun > > > > On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote: > > > > > Hi Arun, > > > > > > Here's the output on my machine -- other information missing from > > > before; it's with OSX Mavericks, with R and data.table compiled with > > > Apple clang. > > > > > > --- > > > > > > > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library") > > > > set.seed(32) > > > > n <- 3 > > > > dt <- data.table( > > > > > > > > > > + y=rnorm(n), > > > + by=round( rnorm(n), 1) > > > + ) > > > ## run one > > > > byval <- list(by=dt$by) > > > > (o__ <- data.table:::fastorder(byval)) # 2,3,1 > > > > > > > > > > [1] 2 3 1 > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > > > > > [1] 1 2 3 > > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 > > > > > > [1] 1 1 1 > > > > (firstofeachgroup = o__[f__]) # 2,1 > > > > > > [1] 2 3 1 > > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 > > > > > > [1] 3 1 2 > > > > (f__ = f__[origorder]) # 3,1 > > > > > > [1] 3 1 2 > > > > (len__ = len__[origorder]) # 2,1 > > > > > > [1] 1 1 1 > > > > > > ## run two > > > > (o__ <- data.table:::fastorder(byval)) # 2,3,1 > > > > > > [1] 1 2 3 > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > > > > > [1] 1 3 > > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 > > > > > > [1] 2 1 > > > > (firstofeachgroup = o__[f__]) # 2,1 > > > > > > [1] 1 3 > > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 > > > > > > [1] 1 2 > > > > (f__ = f__[origorder]) # 3,1 > > > > > > [1] 1 3 > > > > (len__ = len__[origorder]) # 2,1 > > > > > > [1] 2 1 > > > > > > On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan > > > wrote: > > > > Not sure how to debug without being able to reproduce. Tried on Mac OS X > > > > 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows > > > > machine. I consistently gives me this: > > > > > > > > > dt[, > > > > + list(max=max(y, na.rm=TRUE)), > > > > + by=list(by) > > > > + ] > > > > by max > > > > 1: 0.7 0.01464054 > > > > 2: 0.4 0.87328871 > > > > > > > > > > dt[, > > > > + list(max=max(y, na.rm=TRUE)), > > > > + by=list(by) > > > > + ] > > > > by max > > > > 1: 0.7 0.01464054 > > > > 2: 0.4 0.87328871 > > > > > > > > Can either of you provide me with the output of these steps in cases where > > > > there's an error? I've commented the output I get for each step. > > > > > > > > byval <- list(by=dt$by) > > > > o__ <- data.table:::fastorder(byval) # 2,3,1 > > > > f__ = data.table:::uniqlist(byval, order=o__) # 1,3 > > > > len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1 > > > > firstofeachgroup = o__[f__] # 2,1 > > > > origorder = data.table:::iradixorder(firstofeachgroup) # 2,1 > > > > f__ = f__[origorder] # 3,1 > > > > len__ = len__[origorder] # 2,1 > > > > > > > > > > > > Arun > > > > > > > > <...snip...> > > > > _______________________________________________ > > datatable-help mailing list > > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org) > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From szehnder at uni-bonn.de Thu Dec 19 09:26:12 2013 From: szehnder at uni-bonn.de (Simon Zehnder) Date: Thu, 19 Dec 2013 09:26:12 +0100 Subject: [datatable-help] 'by' on a numeric column produces inconsistent utput In-Reply-To: <38C67C72F30C4C738B3A33DAF700ED3B@gmail.com> References: <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05> <6ED93E37928C4E849109DB7A5615EA61@gmail.com> <38C67C72F30C4C738B3A33DAF700ED3B@gmail.com> Message-ID: <7A37F022-444B-4708-8947-59437F4AA090@uni-bonn.de> Hi Arun, here the results on Mac OS X Mavericks with gcc 4.8.2 data.table 1.8.10: > set.seed(32) > n <- 3 > dt <- data.table( + y=rnorm(n), + by=round( rnorm(n), 1) + ) > > dt[, + list(max=max(y, na.rm=TRUE)), + by=list(by) + ] by max 1: 0.7 0.01464054 2: 0.4 0.87328871 > > dt[, + list(max=max(y, na.rm=TRUE)), + by=list(by) + ] by max 1: 0.7 0.01464054 2: 0.4 0.87328871 data.table 1.8.11: > set.seed(32) > n <- 3 > dt <- data.table( + y=rnorm(n), + by=round( rnorm(n), 1) + ) > > dt[, + list(max=max(y, na.rm=TRUE)), + by=list(by) + ] by max 1: 0.7 0.01464054 2: 0.4 0.87328871 > > dt[, + list(max=max(y, na.rm=TRUE)), + by=list(by) + ] by max 1: 0.7 0.01464054 2: 0.4 0.87328871 Best Simon On 19 Dec 2013, at 09:05, Arunkumar Srinivasan wrote: > Simon, sure. > > set.seed(32) > n <- 3 > dt <- data.table( > y=rnorm(n), > by=round( rnorm(n), 1) > ) > > dt[, > list(max=max(y, na.rm=TRUE)), > by=list(by) > ] > > dt[, > list(max=max(y, na.rm=TRUE)), > by=list(by) > ] > > > > Arun > > On Thursday, December 19, 2013 at 8:49 AM, Simon Zehnder wrote: > >> Arun, >> >> if you could send me the reproducible code in copyable form I can as well try it on Mac OS X Mavericks with gcc 4.8. >> >> Best >> >> Simon >> >> On 19 Dec 2013, at 08:44, Arunkumar Srinivasan wrote: >> >>> Aha, the issue seems to be with 'uniqlist', not sure why it gives >>>> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 >>> 1,2,3 for you and 1,3 consistently for me. I'll revert this back to `duplist` for now. Not sure how to solve this though. I've tried it so far on 3 machines: >>> >>> 1) OS X 10.8.5 + libvm (gcc) >>> 2) OS X Mavericks + Clang >>> 3) Debian Weezy + gcc >>> >>> All of them give consistent output. Man this is such a drag. >>> >>> Arun >>> >>> On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote: >>> >>>> Hi Arun, >>>> >>>> Here's the output on my machine -- other information missing from >>>> before; it's with OSX Mavericks, with R and data.table compiled with >>>> Apple clang. >>>> >>>> --- >>>> >>>>> library(data.table, lib="/Users/kevinushey/Library/R/3.1/library") >>>>> set.seed(32) >>>>> n <- 3 >>>>> dt <- data.table( >>>> + y=rnorm(n), >>>> + by=round( rnorm(n), 1) >>>> + ) >>>> ## run one >>>>> byval <- list(by=dt$by) >>>>> (o__ <- data.table:::fastorder(byval)) # 2,3,1 >>>> [1] 2 3 1 >>>>> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 >>>> [1] 1 2 3 >>>>> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 >>>> [1] 1 1 1 >>>>> (firstofeachgroup = o__[f__]) # 2,1 >>>> [1] 2 3 1 >>>>> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 >>>> [1] 3 1 2 >>>>> (f__ = f__[origorder]) # 3,1 >>>> [1] 3 1 2 >>>>> (len__ = len__[origorder]) # 2,1 >>>> [1] 1 1 1 >>>> >>>> ## run two >>>>> (o__ <- data.table:::fastorder(byval)) # 2,3,1 >>>> [1] 1 2 3 >>>>> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 >>>> [1] 1 3 >>>>> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 >>>> [1] 2 1 >>>>> (firstofeachgroup = o__[f__]) # 2,1 >>>> [1] 1 3 >>>>> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 >>>> [1] 1 2 >>>>> (f__ = f__[origorder]) # 3,1 >>>> [1] 1 3 >>>>> (len__ = len__[origorder]) # 2,1 >>>> [1] 2 1 >>>> >>>> On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan >>>> wrote: >>>>> Not sure how to debug without being able to reproduce. Tried on Mac OS X >>>>> 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows >>>>> machine. I consistently gives me this: >>>>> >>>>>> dt[, >>>>> + list(max=max(y, na.rm=TRUE)), >>>>> + by=list(by) >>>>> + ] >>>>> by max >>>>> 1: 0.7 0.01464054 >>>>> 2: 0.4 0.87328871 >>>>>> >>>>>> dt[, >>>>> + list(max=max(y, na.rm=TRUE)), >>>>> + by=list(by) >>>>> + ] >>>>> by max >>>>> 1: 0.7 0.01464054 >>>>> 2: 0.4 0.87328871 >>>>> >>>>> Can either of you provide me with the output of these steps in cases where >>>>> there's an error? I've commented the output I get for each step. >>>>> >>>>> byval <- list(by=dt$by) >>>>> o__ <- data.table:::fastorder(byval) # 2,3,1 >>>>> f__ = data.table:::uniqlist(byval, order=o__) # 1,3 >>>>> len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1 >>>>> firstofeachgroup = o__[f__] # 2,1 >>>>> origorder = data.table:::iradixorder(firstofeachgroup) # 2,1 >>>>> f__ = f__[origorder] # 3,1 >>>>> len__ = len__[origorder] # 2,1 >>>>> >>>>> >>>>> Arun >>>>> >>>>> <...snip...> >>> >>> _______________________________________________ >>> datatable-help mailing list >>> datatable-help at lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > From aragorn168b at gmail.com Thu Dec 19 09:36:13 2013 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Thu, 19 Dec 2013 09:36:13 +0100 Subject: [datatable-help] 'by' on a numeric column produces inconsistent utput In-Reply-To: <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05> References: <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05> Message-ID: @mnel, I'm not sure I understand your output. Yours is different from the correct output, but it is also different from Kevin's. Basically, dt[, max(y), by=by] has no effect on yours and just returns back dt? Arun On Thursday, December 19, 2013 at 3:50 AM, Michael Nelson wrote: > Using > data.table 1.8.11 (Fresh install from r-forge today) > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > I get > > by max > 1: 0.7 0.01464054 > 2: 0.4 0.87328871 > 3: 0.4 -1.02794620 > > On both runs. > > > > > ________________________________________ > From: datatable-help-bounces at lists.r-forge.r-project.org (mailto:datatable-help-bounces at lists.r-forge.r-project.org) [datatable-help-bounces at lists.r-forge.r-project.org (mailto:datatable-help-bounces at lists.r-forge.r-project.org)] on behalf of Kevin Ushey [kevinushey at gmail.com (mailto:kevinushey at gmail.com)] > Sent: Thursday, 19 December 2013 12:54 PM > To: datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org) > Subject: [datatable-help] 'by' on a numeric column produces inconsistent output > > I'm cross-posting this from the GitHub mirror: > https://github.com/arunsrinivasan/datatable/issues/2 > > For reference, I only see this with the latest RForge version of > data.table (1.8.11), not the CRAN version of data.table. > > ----- > > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library") > set.seed(32) > n <- 3 > dt <- data.table( > y=rnorm(n), > by=round( rnorm(n), 1) > ) > > dt[, > list(max=max(y, na.rm=TRUE)), > by=list(by) > ] > > dt[, > list(max=max(y, na.rm=TRUE)), > by=list(by) > ] > > produces the output > > > dt[, > + list(max=max(y, na.rm=TRUE)), > + by=list(by) > + ] > by max > 1: 0.4 0.01464054 > 2: 0.4 0.87328871 > 3: 0.7 -1.02794620 > > > > dt[, > + list(max=max(y, na.rm=TRUE)), > + by=list(by) > + ] > by max > 1: 0.4 0.8732887 > 2: 0.7 -1.0279462 > > For some reason, the first return is wrong, while the second (and all > subsequent) output is correct. Any idea what's going on? > > > sessionInfo() > R Under development (unstable) (2013-12-12 r64453) > Platform: x86_64-apple-darwin13.0.0 (64-bit) > > locale: > [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] data.table_1.8.11 knitr_1.5 devtools_1.4.1.99 > BiocInstaller_1.13.3 > > loaded via a namespace (and not attached): > [1] compiler_3.1.0 digest_0.6.4 evaluate_0.5.1 formatR_0.10 > httr_0.2 memoise_0.1 > [7] parallel_3.1.0 plyr_1.8 RCurl_1.95-4.1 reshape2_1.2.2 > stringr_0.6.2 tools_3.1.0 > [13] whisker_0.3-2 > > --- > > Kevin > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org) > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org) > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aragorn168b at gmail.com Thu Dec 19 09:39:19 2013 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Thu, 19 Dec 2013 09:39:19 +0100 Subject: [datatable-help] 'by' on a numeric column produces inconsistent utput In-Reply-To: References: <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05> Message-ID: <6C86E3EEE8E041999B7F02D6BE27215F@gmail.com> As I was just writing Kevin, I think (if @mnel could verify his output is correct), the reason is because Kevin's using R-devel... If you do the following shown below, then, `xx` should *not* have the same address as *dt$by* (as is the case for me). But for Kevin, they seem to be pointing to the same location and I can't figure out why it would/should, from how R has been working so far. byval <- list(by=dt$by) address(dt$by) # [1] "0x7fa848ad8608" address(byval) # [1] "0x7fa84a93fa68" xx = byval[[1L]] address(xx) # [1] "0x7fa848e3fc48" address(list(xx)) [1] "0x7fa84aa1ba78" data.table:::dradixorder(xx) # [1] 2 3 1 byval $by [1] 0.7 0.4 0.4 Arun On Thursday, December 19, 2013 at 9:36 AM, Arunkumar Srinivasan wrote: > @mnel, I'm not sure I understand your output. Yours is different from the correct output, but it is also different from Kevin's. Basically, dt[, max(y), by=by] has no effect on yours and just returns back dt? > > Arun > > > On Thursday, December 19, 2013 at 3:50 AM, Michael Nelson wrote: > > > Using > > data.table 1.8.11 (Fresh install from r-forge today) > > R version 3.0.2 (2013-09-25) > > Platform: x86_64-w64-mingw32/x64 (64-bit) > > > > I get > > > > by max > > 1: 0.7 0.01464054 > > 2: 0.4 0.87328871 > > 3: 0.4 -1.02794620 > > > > On both runs. > > > > > > > > > > ________________________________________ > > From: datatable-help-bounces at lists.r-forge.r-project.org (mailto:datatable-help-bounces at lists.r-forge.r-project.org) [datatable-help-bounces at lists.r-forge.r-project.org (mailto:datatable-help-bounces at lists.r-forge.r-project.org)] on behalf of Kevin Ushey [kevinushey at gmail.com (mailto:kevinushey at gmail.com)] > > Sent: Thursday, 19 December 2013 12:54 PM > > To: datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org) > > Subject: [datatable-help] 'by' on a numeric column produces inconsistent output > > > > I'm cross-posting this from the GitHub mirror: > > https://github.com/arunsrinivasan/datatable/issues/2 > > > > For reference, I only see this with the latest RForge version of > > data.table (1.8.11), not the CRAN version of data.table. > > > > ----- > > > > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library") > > set.seed(32) > > n <- 3 > > dt <- data.table( > > y=rnorm(n), > > by=round( rnorm(n), 1) > > ) > > > > dt[, > > list(max=max(y, na.rm=TRUE)), > > by=list(by) > > ] > > > > dt[, > > list(max=max(y, na.rm=TRUE)), > > by=list(by) > > ] > > > > produces the output > > > > > dt[, > > + list(max=max(y, na.rm=TRUE)), > > + by=list(by) > > + ] > > by max > > 1: 0.4 0.01464054 > > 2: 0.4 0.87328871 > > 3: 0.7 -1.02794620 > > > > > > dt[, > > + list(max=max(y, na.rm=TRUE)), > > + by=list(by) > > + ] > > by max > > 1: 0.4 0.8732887 > > 2: 0.7 -1.0279462 > > > > For some reason, the first return is wrong, while the second (and all > > subsequent) output is correct. Any idea what's going on? > > > > > sessionInfo() > > R Under development (unstable) (2013-12-12 r64453) > > Platform: x86_64-apple-darwin13.0.0 (64-bit) > > > > locale: > > [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] data.table_1.8.11 knitr_1.5 devtools_1.4.1.99 > > BiocInstaller_1.13.3 > > > > loaded via a namespace (and not attached): > > [1] compiler_3.1.0 digest_0.6.4 evaluate_0.5.1 formatR_0.10 > > httr_0.2 memoise_0.1 > > [7] parallel_3.1.0 plyr_1.8 RCurl_1.95-4.1 reshape2_1.2.2 > > stringr_0.6.2 tools_3.1.0 > > [13] whisker_0.3-2 > > > > --- > > > > Kevin > > _______________________________________________ > > datatable-help mailing list > > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org) > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > _______________________________________________ > > datatable-help mailing list > > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org) > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aragorn168b at gmail.com Thu Dec 19 09:43:17 2013 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Thu, 19 Dec 2013 09:43:17 +0100 Subject: [datatable-help] 'by' on a numeric column produces inconsistent utput In-Reply-To: <7A37F022-444B-4708-8947-59437F4AA090@uni-bonn.de> References: <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05> <6ED93E37928C4E849109DB7A5615EA61@gmail.com> <38C67C72F30C4C738B3A33DAF700ED3B@gmail.com> <7A37F022-444B-4708-8947-59437F4AA090@uni-bonn.de> Message-ID: Simon, Thanks. One more towards my way :). I think we've nailed down the problem to R-devel version. I'll write again once I discuss it over with Kevin. Arun On Thursday, December 19, 2013 at 9:26 AM, Simon Zehnder wrote: > Hi Arun, > > here the results on Mac OS X Mavericks with gcc 4.8.2 > > data.table 1.8.10: > > > set.seed(32) > > n <- 3 > > dt <- data.table( > > > > + y=rnorm(n), > + by=round( rnorm(n), 1) > + ) > > > > dt[, > + list(max=max(y, na.rm=TRUE)), > + by=list(by) > + ] > by max > 1: 0.7 0.01464054 > 2: 0.4 0.87328871 > > > > dt[, > + list(max=max(y, na.rm=TRUE)), > + by=list(by) > + ] > by max > 1: 0.7 0.01464054 > 2: 0.4 0.87328871 > > data.table 1.8.11: > > > set.seed(32) > > n <- 3 > > dt <- data.table( > > > > + y=rnorm(n), > + by=round( rnorm(n), 1) > + ) > > > > dt[, > + list(max=max(y, na.rm=TRUE)), > + by=list(by) > + ] > by max > 1: 0.7 0.01464054 > 2: 0.4 0.87328871 > > > > dt[, > + list(max=max(y, na.rm=TRUE)), > + by=list(by) > + ] > by max > 1: 0.7 0.01464054 > 2: 0.4 0.87328871 > > Best > > Simon > > > On 19 Dec 2013, at 09:05, Arunkumar Srinivasan wrote: > > > Simon, sure. > > > > set.seed(32) > > n <- 3 > > dt <- data.table( > > y=rnorm(n), > > by=round( rnorm(n), 1) > > ) > > > > dt[, > > list(max=max(y, na.rm=TRUE)), > > by=list(by) > > ] > > > > dt[, > > list(max=max(y, na.rm=TRUE)), > > by=list(by) > > ] > > > > > > > > Arun > > > > On Thursday, December 19, 2013 at 8:49 AM, Simon Zehnder wrote: > > > > > Arun, > > > > > > if you could send me the reproducible code in copyable form I can as well try it on Mac OS X Mavericks with gcc 4.8. > > > > > > Best > > > > > > Simon > > > > > > On 19 Dec 2013, at 08:44, Arunkumar Srinivasan wrote: > > > > > > > Aha, the issue seems to be with 'uniqlist', not sure why it gives > > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > > > > > > > 1,2,3 for you and 1,3 consistently for me. I'll revert this back to `duplist` for now. Not sure how to solve this though. I've tried it so far on 3 machines: > > > > > > > > 1) OS X 10.8.5 + libvm (gcc) > > > > 2) OS X Mavericks + Clang > > > > 3) Debian Weezy + gcc > > > > > > > > All of them give consistent output. Man this is such a drag. > > > > > > > > Arun > > > > > > > > On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote: > > > > > > > > > Hi Arun, > > > > > > > > > > Here's the output on my machine -- other information missing from > > > > > before; it's with OSX Mavericks, with R and data.table compiled with > > > > > Apple clang. > > > > > > > > > > --- > > > > > > > > > > > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library") > > > > > > set.seed(32) > > > > > > n <- 3 > > > > > > dt <- data.table( > > > > > > > > > > > > > > > > + y=rnorm(n), > > > > > + by=round( rnorm(n), 1) > > > > > + ) > > > > > ## run one > > > > > > byval <- list(by=dt$by) > > > > > > (o__ <- data.table:::fastorder(byval)) # 2,3,1 > > > > > > > > > > > > > > > > [1] 2 3 1 > > > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > > > > > > > > > [1] 1 2 3 > > > > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 > > > > > > > > > > [1] 1 1 1 > > > > > > (firstofeachgroup = o__[f__]) # 2,1 > > > > > > > > > > [1] 2 3 1 > > > > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 > > > > > > > > > > [1] 3 1 2 > > > > > > (f__ = f__[origorder]) # 3,1 > > > > > > > > > > [1] 3 1 2 > > > > > > (len__ = len__[origorder]) # 2,1 > > > > > > > > > > [1] 1 1 1 > > > > > > > > > > ## run two > > > > > > (o__ <- data.table:::fastorder(byval)) # 2,3,1 > > > > > > > > > > [1] 1 2 3 > > > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > > > > > > > > > [1] 1 3 > > > > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 > > > > > > > > > > [1] 2 1 > > > > > > (firstofeachgroup = o__[f__]) # 2,1 > > > > > > > > > > [1] 1 3 > > > > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 > > > > > > > > > > [1] 1 2 > > > > > > (f__ = f__[origorder]) # 3,1 > > > > > > > > > > [1] 1 3 > > > > > > (len__ = len__[origorder]) # 2,1 > > > > > > > > > > [1] 2 1 > > > > > > > > > > On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan > > > > > wrote: > > > > > > Not sure how to debug without being able to reproduce. Tried on Mac OS X > > > > > > 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows > > > > > > machine. I consistently gives me this: > > > > > > > > > > > > > dt[, > > > > > > + list(max=max(y, na.rm=TRUE)), > > > > > > + by=list(by) > > > > > > + ] > > > > > > by max > > > > > > 1: 0.7 0.01464054 > > > > > > 2: 0.4 0.87328871 > > > > > > > > > > > > > > dt[, > > > > > > + list(max=max(y, na.rm=TRUE)), > > > > > > + by=list(by) > > > > > > + ] > > > > > > by max > > > > > > 1: 0.7 0.01464054 > > > > > > 2: 0.4 0.87328871 > > > > > > > > > > > > Can either of you provide me with the output of these steps in cases where > > > > > > there's an error? I've commented the output I get for each step. > > > > > > > > > > > > byval <- list(by=dt$by) > > > > > > o__ <- data.table:::fastorder(byval) # 2,3,1 > > > > > > f__ = data.table:::uniqlist(byval, order=o__) # 1,3 > > > > > > len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1 > > > > > > firstofeachgroup = o__[f__] # 2,1 > > > > > > origorder = data.table:::iradixorder(firstofeachgroup) # 2,1 > > > > > > f__ = f__[origorder] # 3,1 > > > > > > len__ = len__[origorder] # 2,1 > > > > > > > > > > > > > > > > > > Arun > > > > > > > > > > > > <...snip...> > > > > > > > > _______________________________________________ > > > > datatable-help mailing list > > > > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org) > > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aragorn168b at gmail.com Thu Dec 19 12:56:01 2013 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Thu, 19 Dec 2013 12:56:01 +0100 Subject: [datatable-help] 'by' on a numeric column produces inconsistent utput In-Reply-To: References: <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05> <6ED93E37928C4E849109DB7A5615EA61@gmail.com> <38C67C72F30C4C738B3A33DAF700ED3B@gmail.com> <7A37F022-444B-4708-8947-59437F4AA090@uni-bonn.de> Message-ID: <83EFE01CB21543C2AFA86A4F5FAD03FC@gmail.com> Just tested this on the devel version (today's). And yes, this issue happens. But I'm not sure if this is an issue with 'data.table' per-se: On a clean session, if you do this: require(data.table) set.seed(32) n <- 3 dt <- data.table(y=rnorm(n), by=round( rnorm(n), 1)) ll <- list(dt$by) yy <- ll[[1L]] address(dt$by) # [1] "0x7fad3c524a40" address(ll[[1L]]) # [1] "0x7fad3c524a40" address(yy) # [1] "0x7fad3c524a40" You see that all three are pointing to the same address. And that's why the result is wrong because internally "yy" will be changed by reference during "fastorder". And it is *not* supposed to point to "yy" but to have made a copy. After doing it the first time, the pointing changes back to how it's in R-stable.. Not sure if this is desirable. Probably should report on R-devel. On R-3.0.2, the same commands as above on a clean session: require(data.table) set.seed(32) n <- 3 dt <- data.table(y=rnorm(n), by=round( rnorm(n), 1)) ll <- list(dt$by) yy <- ll[[1L]] address(dt$by) # [1] "0x7fc35b640408" address(ll[[1L]]) # [1] "0x7fc35a0ec838" address(yy) # [1] "0x7fc35a0ec838" Arun On Thursday, December 19, 2013 at 9:43 AM, Arunkumar Srinivasan wrote: > Simon, > > Thanks. One more towards my way :). I think we've nailed down the problem to R-devel version. I'll write again once I discuss it over with Kevin. > > Arun > > > On Thursday, December 19, 2013 at 9:26 AM, Simon Zehnder wrote: > > > Hi Arun, > > > > here the results on Mac OS X Mavericks with gcc 4.8.2 > > > > data.table 1.8.10: > > > > > set.seed(32) > > > n <- 3 > > > dt <- data.table( > > > > > > > + y=rnorm(n), > > + by=round( rnorm(n), 1) > > + ) > > > > > > dt[, > > + list(max=max(y, na.rm=TRUE)), > > + by=list(by) > > + ] > > by max > > 1: 0.7 0.01464054 > > 2: 0.4 0.87328871 > > > > > > dt[, > > + list(max=max(y, na.rm=TRUE)), > > + by=list(by) > > + ] > > by max > > 1: 0.7 0.01464054 > > 2: 0.4 0.87328871 > > > > data.table 1.8.11: > > > > > set.seed(32) > > > n <- 3 > > > dt <- data.table( > > > > > > > + y=rnorm(n), > > + by=round( rnorm(n), 1) > > + ) > > > > > > dt[, > > + list(max=max(y, na.rm=TRUE)), > > + by=list(by) > > + ] > > by max > > 1: 0.7 0.01464054 > > 2: 0.4 0.87328871 > > > > > > dt[, > > + list(max=max(y, na.rm=TRUE)), > > + by=list(by) > > + ] > > by max > > 1: 0.7 0.01464054 > > 2: 0.4 0.87328871 > > > > Best > > > > Simon > > > > > > On 19 Dec 2013, at 09:05, Arunkumar Srinivasan wrote: > > > > > Simon, sure. > > > > > > set.seed(32) > > > n <- 3 > > > dt <- data.table( > > > y=rnorm(n), > > > by=round( rnorm(n), 1) > > > ) > > > > > > dt[, > > > list(max=max(y, na.rm=TRUE)), > > > by=list(by) > > > ] > > > > > > dt[, > > > list(max=max(y, na.rm=TRUE)), > > > by=list(by) > > > ] > > > > > > > > > > > > Arun > > > > > > On Thursday, December 19, 2013 at 8:49 AM, Simon Zehnder wrote: > > > > > > > Arun, > > > > > > > > if you could send me the reproducible code in copyable form I can as well try it on Mac OS X Mavericks with gcc 4.8. > > > > > > > > Best > > > > > > > > Simon > > > > > > > > On 19 Dec 2013, at 08:44, Arunkumar Srinivasan wrote: > > > > > > > > > Aha, the issue seems to be with 'uniqlist', not sure why it gives > > > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > > > > > > > > > 1,2,3 for you and 1,3 consistently for me. I'll revert this back to `duplist` for now. Not sure how to solve this though. I've tried it so far on 3 machines: > > > > > > > > > > 1) OS X 10.8.5 + libvm (gcc) > > > > > 2) OS X Mavericks + Clang > > > > > 3) Debian Weezy + gcc > > > > > > > > > > All of them give consistent output. Man this is such a drag. > > > > > > > > > > Arun > > > > > > > > > > On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote: > > > > > > > > > > > Hi Arun, > > > > > > > > > > > > Here's the output on my machine -- other information missing from > > > > > > before; it's with OSX Mavericks, with R and data.table compiled with > > > > > > Apple clang. > > > > > > > > > > > > --- > > > > > > > > > > > > > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library") > > > > > > > set.seed(32) > > > > > > > n <- 3 > > > > > > > dt <- data.table( > > > > > > > > > > > > > > > > > > > + y=rnorm(n), > > > > > > + by=round( rnorm(n), 1) > > > > > > + ) > > > > > > ## run one > > > > > > > byval <- list(by=dt$by) > > > > > > > (o__ <- data.table:::fastorder(byval)) # 2,3,1 > > > > > > > > > > > > > > > > > > > [1] 2 3 1 > > > > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > > > > > > > > > > > [1] 1 2 3 > > > > > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 > > > > > > > > > > > > [1] 1 1 1 > > > > > > > (firstofeachgroup = o__[f__]) # 2,1 > > > > > > > > > > > > [1] 2 3 1 > > > > > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 > > > > > > > > > > > > [1] 3 1 2 > > > > > > > (f__ = f__[origorder]) # 3,1 > > > > > > > > > > > > [1] 3 1 2 > > > > > > > (len__ = len__[origorder]) # 2,1 > > > > > > > > > > > > [1] 1 1 1 > > > > > > > > > > > > ## run two > > > > > > > (o__ <- data.table:::fastorder(byval)) # 2,3,1 > > > > > > > > > > > > [1] 1 2 3 > > > > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > > > > > > > > > > > [1] 1 3 > > > > > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 > > > > > > > > > > > > [1] 2 1 > > > > > > > (firstofeachgroup = o__[f__]) # 2,1 > > > > > > > > > > > > [1] 1 3 > > > > > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 > > > > > > > > > > > > [1] 1 2 > > > > > > > (f__ = f__[origorder]) # 3,1 > > > > > > > > > > > > [1] 1 3 > > > > > > > (len__ = len__[origorder]) # 2,1 > > > > > > > > > > > > [1] 2 1 > > > > > > > > > > > > On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan > > > > > > wrote: > > > > > > > Not sure how to debug without being able to reproduce. Tried on Mac OS X > > > > > > > 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows > > > > > > > machine. I consistently gives me this: > > > > > > > > > > > > > > > dt[, > > > > > > > + list(max=max(y, na.rm=TRUE)), > > > > > > > + by=list(by) > > > > > > > + ] > > > > > > > by max > > > > > > > 1: 0.7 0.01464054 > > > > > > > 2: 0.4 0.87328871 > > > > > > > > > > > > > > > > dt[, > > > > > > > + list(max=max(y, na.rm=TRUE)), > > > > > > > + by=list(by) > > > > > > > + ] > > > > > > > by max > > > > > > > 1: 0.7 0.01464054 > > > > > > > 2: 0.4 0.87328871 > > > > > > > > > > > > > > Can either of you provide me with the output of these steps in cases where > > > > > > > there's an error? I've commented the output I get for each step. > > > > > > > > > > > > > > byval <- list(by=dt$by) > > > > > > > o__ <- data.table:::fastorder(byval) # 2,3,1 > > > > > > > f__ = data.table:::uniqlist(byval, order=o__) # 1,3 > > > > > > > len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1 > > > > > > > firstofeachgroup = o__[f__] # 2,1 > > > > > > > origorder = data.table:::iradixorder(firstofeachgroup) # 2,1 > > > > > > > f__ = f__[origorder] # 3,1 > > > > > > > len__ = len__[origorder] # 2,1 > > > > > > > > > > > > > > > > > > > > > Arun > > > > > > > > > > > > > > <...snip...> > > > > > > > > > > _______________________________________________ > > > > > datatable-help mailing list > > > > > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org) > > > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aragorn168b at gmail.com Thu Dec 19 14:47:39 2013 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Thu, 19 Dec 2013 14:47:39 +0100 Subject: [datatable-help] 'by' on a numeric column produces inconsistent utput In-Reply-To: <83EFE01CB21543C2AFA86A4F5FAD03FC@gmail.com> References: <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05> <6ED93E37928C4E849109DB7A5615EA61@gmail.com> <38C67C72F30C4C738B3A33DAF700ED3B@gmail.com> <7A37F022-444B-4708-8947-59437F4AA090@uni-bonn.de> <83EFE01CB21543C2AFA86A4F5FAD03FC@gmail.com> Message-ID: The issue has been fixed in commit 1054 now. Once r-forge build kicks in, be sure to update, especially if you're working with R-devel version. Arun On Thursday, December 19, 2013 at 12:56 PM, Arunkumar Srinivasan wrote: > Just tested this on the devel version (today's). And yes, this issue happens. But I'm not sure if this is an issue with 'data.table' per-se: > > On a clean session, if you do this: > > require(data.table) > set.seed(32) > n <- 3 > dt <- data.table(y=rnorm(n), by=round( rnorm(n), 1)) > > ll <- list(dt$by) > yy <- ll[[1L]] > address(dt$by) # [1] "0x7fad3c524a40" > address(ll[[1L]]) # [1] "0x7fad3c524a40" > address(yy) # [1] "0x7fad3c524a40" > > > You see that all three are pointing to the same address. And that's why the result is wrong because internally "yy" will be changed by reference during "fastorder". And it is *not* supposed to point to "yy" but to have made a copy. > > After doing it the first time, the pointing changes back to how it's in R-stable.. Not sure if this is desirable. Probably should report on R-devel. > > On R-3.0.2, the same commands as above on a clean session: > > require(data.table) > set.seed(32) > n <- 3 > dt <- data.table(y=rnorm(n), by=round( rnorm(n), 1)) > > ll <- list(dt$by) > yy <- ll[[1L]] > address(dt$by) # [1] "0x7fc35b640408" > address(ll[[1L]]) # [1] "0x7fc35a0ec838" > address(yy) # [1] "0x7fc35a0ec838" > > > > > Arun > > > On Thursday, December 19, 2013 at 9:43 AM, Arunkumar Srinivasan wrote: > > > Simon, > > > > Thanks. One more towards my way :). I think we've nailed down the problem to R-devel version. I'll write again once I discuss it over with Kevin. > > > > Arun > > > > > > On Thursday, December 19, 2013 at 9:26 AM, Simon Zehnder wrote: > > > > > Hi Arun, > > > > > > here the results on Mac OS X Mavericks with gcc 4.8.2 > > > > > > data.table 1.8.10: > > > > > > > set.seed(32) > > > > n <- 3 > > > > dt <- data.table( > > > > > > > > > > + y=rnorm(n), > > > + by=round( rnorm(n), 1) > > > + ) > > > > > > > > dt[, > > > + list(max=max(y, na.rm=TRUE)), > > > + by=list(by) > > > + ] > > > by max > > > 1: 0.7 0.01464054 > > > 2: 0.4 0.87328871 > > > > > > > > dt[, > > > + list(max=max(y, na.rm=TRUE)), > > > + by=list(by) > > > + ] > > > by max > > > 1: 0.7 0.01464054 > > > 2: 0.4 0.87328871 > > > > > > data.table 1.8.11: > > > > > > > set.seed(32) > > > > n <- 3 > > > > dt <- data.table( > > > > > > > > > > + y=rnorm(n), > > > + by=round( rnorm(n), 1) > > > + ) > > > > > > > > dt[, > > > + list(max=max(y, na.rm=TRUE)), > > > + by=list(by) > > > + ] > > > by max > > > 1: 0.7 0.01464054 > > > 2: 0.4 0.87328871 > > > > > > > > dt[, > > > + list(max=max(y, na.rm=TRUE)), > > > + by=list(by) > > > + ] > > > by max > > > 1: 0.7 0.01464054 > > > 2: 0.4 0.87328871 > > > > > > Best > > > > > > Simon > > > > > > > > > On 19 Dec 2013, at 09:05, Arunkumar Srinivasan wrote: > > > > > > > Simon, sure. > > > > > > > > set.seed(32) > > > > n <- 3 > > > > dt <- data.table( > > > > y=rnorm(n), > > > > by=round( rnorm(n), 1) > > > > ) > > > > > > > > dt[, > > > > list(max=max(y, na.rm=TRUE)), > > > > by=list(by) > > > > ] > > > > > > > > dt[, > > > > list(max=max(y, na.rm=TRUE)), > > > > by=list(by) > > > > ] > > > > > > > > > > > > > > > > Arun > > > > > > > > On Thursday, December 19, 2013 at 8:49 AM, Simon Zehnder wrote: > > > > > > > > > Arun, > > > > > > > > > > if you could send me the reproducible code in copyable form I can as well try it on Mac OS X Mavericks with gcc 4.8. > > > > > > > > > > Best > > > > > > > > > > Simon > > > > > > > > > > On 19 Dec 2013, at 08:44, Arunkumar Srinivasan wrote: > > > > > > > > > > > Aha, the issue seems to be with 'uniqlist', not sure why it gives > > > > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > > > > > > > > > > > 1,2,3 for you and 1,3 consistently for me. I'll revert this back to `duplist` for now. Not sure how to solve this though. I've tried it so far on 3 machines: > > > > > > > > > > > > 1) OS X 10.8.5 + libvm (gcc) > > > > > > 2) OS X Mavericks + Clang > > > > > > 3) Debian Weezy + gcc > > > > > > > > > > > > All of them give consistent output. Man this is such a drag. > > > > > > > > > > > > Arun > > > > > > > > > > > > On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote: > > > > > > > > > > > > > Hi Arun, > > > > > > > > > > > > > > Here's the output on my machine -- other information missing from > > > > > > > before; it's with OSX Mavericks, with R and data.table compiled with > > > > > > > Apple clang. > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library") > > > > > > > > set.seed(32) > > > > > > > > n <- 3 > > > > > > > > dt <- data.table( > > > > > > > > > > > > > > > > > > > > > > + y=rnorm(n), > > > > > > > + by=round( rnorm(n), 1) > > > > > > > + ) > > > > > > > ## run one > > > > > > > > byval <- list(by=dt$by) > > > > > > > > (o__ <- data.table:::fastorder(byval)) # 2,3,1 > > > > > > > > > > > > > > > > > > > > > > [1] 2 3 1 > > > > > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > > > > > > > > > > > > > [1] 1 2 3 > > > > > > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 > > > > > > > > > > > > > > [1] 1 1 1 > > > > > > > > (firstofeachgroup = o__[f__]) # 2,1 > > > > > > > > > > > > > > [1] 2 3 1 > > > > > > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 > > > > > > > > > > > > > > [1] 3 1 2 > > > > > > > > (f__ = f__[origorder]) # 3,1 > > > > > > > > > > > > > > [1] 3 1 2 > > > > > > > > (len__ = len__[origorder]) # 2,1 > > > > > > > > > > > > > > [1] 1 1 1 > > > > > > > > > > > > > > ## run two > > > > > > > > (o__ <- data.table:::fastorder(byval)) # 2,3,1 > > > > > > > > > > > > > > [1] 1 2 3 > > > > > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3 > > > > > > > > > > > > > > [1] 1 3 > > > > > > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1 > > > > > > > > > > > > > > [1] 2 1 > > > > > > > > (firstofeachgroup = o__[f__]) # 2,1 > > > > > > > > > > > > > > [1] 1 3 > > > > > > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1 > > > > > > > > > > > > > > [1] 1 2 > > > > > > > > (f__ = f__[origorder]) # 3,1 > > > > > > > > > > > > > > [1] 1 3 > > > > > > > > (len__ = len__[origorder]) # 2,1 > > > > > > > > > > > > > > [1] 2 1 > > > > > > > > > > > > > > On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan > > > > > > > wrote: > > > > > > > > Not sure how to debug without being able to reproduce. Tried on Mac OS X > > > > > > > > 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows > > > > > > > > machine. I consistently gives me this: > > > > > > > > > > > > > > > > > dt[, > > > > > > > > + list(max=max(y, na.rm=TRUE)), > > > > > > > > + by=list(by) > > > > > > > > + ] > > > > > > > > by max > > > > > > > > 1: 0.7 0.01464054 > > > > > > > > 2: 0.4 0.87328871 > > > > > > > > > > > > > > > > > > dt[, > > > > > > > > + list(max=max(y, na.rm=TRUE)), > > > > > > > > + by=list(by) > > > > > > > > + ] > > > > > > > > by max > > > > > > > > 1: 0.7 0.01464054 > > > > > > > > 2: 0.4 0.87328871 > > > > > > > > > > > > > > > > Can either of you provide me with the output of these steps in cases where > > > > > > > > there's an error? I've commented the output I get for each step. > > > > > > > > > > > > > > > > byval <- list(by=dt$by) > > > > > > > > o__ <- data.table:::fastorder(byval) # 2,3,1 > > > > > > > > f__ = data.table:::uniqlist(byval, order=o__) # 1,3 > > > > > > > > len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1 > > > > > > > > firstofeachgroup = o__[f__] # 2,1 > > > > > > > > origorder = data.table:::iradixorder(firstofeachgroup) # 2,1 > > > > > > > > f__ = f__[origorder] # 3,1 > > > > > > > > len__ = len__[origorder] # 2,1 > > > > > > > > > > > > > > > > > > > > > > > > Arun > > > > > > > > > > > > > > > > <...snip...> > > > > > > > > > > > > _______________________________________________ > > > > > > datatable-help mailing list > > > > > > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org) > > > > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: