From tahfounasousou at rocketmail.com  Tue Dec  3 11:23:34 2013
From: tahfounasousou at rocketmail.com (krys22)
Date: Tue, 3 Dec 2013 02:23:34 -0800 (PST)
Subject: [datatable-help] missing rows ans cloumns in my matrix
Message-ID: <1386066214571-4681549.post@n4.nabble.com>

i have a problem with big matrix, in fact, after the matrix?s creation many
rows and columns are invisible because the big dimension of the matrix
could you help me to get may complete matrix, have you any functions or any
solution to resolve this problem
for example if the dimension of my matrix is 50 the rows 12, 13, 14, 15?26
didn?t exist in my matrix


--
View this message in context: http://r.789695.n4.nabble.com/missing-rows-ans-cloumns-in-my-matrix-tp4681549.html
Sent from the datatable-help mailing list archive at Nabble.com.

From alexandre.sieira at gmail.com  Tue Dec  3 15:26:56 2013
From: alexandre.sieira at gmail.com (Alexandre Sieira)
Date: Tue, 3 Dec 2013 12:26:56 -0200
Subject: [datatable-help] rbindlist
Message-ID: <etPan.529dea30.1f16e9e8.ac@MacBook-Pro-de-Alexandre-Sieira.local>

I have come across some behavior in rbindlist that look unexpected to me:

> rbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3)))
? ?a b
1: 1 2
2: 4 3

So it appears to assume (without checking) that all objects have not only the same column names but also the same column order. ?So a value assigned to column ?a? in the second object was used for column ?b? in the end result (and vice-versa).

I know the documentation says rbindlist uses the column types from the first entry of the list, but I didn?t see any mention to column order or names anywhere.?

I suggest that column names are matched, even if they are not in the same order. Perhaps a ?use.names? parameter could be used to ask for this behavior to avoid breaking backwards compatibility.?

Or, at the very least, I suggest the documentation of bindlist be updated to explicitly mention that the columns will be considered by position only, and that callers need to ensure the column orders of all objects match exactly. And that a warning is issued by rbindlist when the column names don?t match.

--?
Alexandre Sieira
CISA, CISSP, ISO 27001 Lead Auditor

"The truth is rarely pure and never simple."
Oscar Wilde, The Importance of Being Earnest, 1895, Act I
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131203/9df93d2d/attachment.html>

From gsee000 at gmail.com  Tue Dec  3 17:46:08 2013
From: gsee000 at gmail.com (G See)
Date: Tue, 3 Dec 2013 10:46:08 -0600
Subject: [datatable-help] rbindlist
In-Reply-To: <etPan.529dea30.1f16e9e8.ac@MacBook-Pro-de-Alexandre-Sieira.local>
References: <etPan.529dea30.1f16e9e8.ac@MacBook-Pro-de-Alexandre-Sieira.local>
Message-ID: <CA+xi=qbBE1-+6Gh3KxJikCjcXWj7yrRUVoTHJn9uJNHwmvYGVQ@mail.gmail.com>

I agree.  Here is a related thread:
http://thread.gmane.org/gmane.comp.lang.r.datatable/2231

Garrett


On Tue, Dec 3, 2013 at 8:26 AM, Alexandre Sieira
<alexandre.sieira at gmail.com> wrote:
> I have come across some behavior in rbindlist that look unexpected to me:
>
>> rbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3)))
>    a b
> 1: 1 2
> 2: 4 3
>
> So it appears to assume (without checking) that all objects have not only
> the same column names but also the same column order.  So a value assigned
> to column ?a? in the second object was used for column ?b? in the end result
> (and vice-versa).
>
> I know the documentation says rbindlist uses the column types from the first
> entry of the list, but I didn?t see any mention to column order or names
> anywhere.
>
> I suggest that column names are matched, even if they are not in the same
> order. Perhaps a ?use.names? parameter could be used to ask for this
> behavior to avoid breaking backwards compatibility.
>
> Or, at the very least, I suggest the documentation of bindlist be updated to
> explicitly mention that the columns will be considered by position only, and
> that callers need to ensure the column orders of all objects match exactly.
> And that a warning is issued by rbindlist when the column names don?t match.
>
> --
> Alexandre Sieira
> CISA, CISSP, ISO 27001 Lead Auditor
>
> "The truth is rarely pure and never simple."
> Oscar Wilde, The Importance of Being Earnest, 1895, Act I
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

From alexandre.sieira at gmail.com  Tue Dec  3 18:05:59 2013
From: alexandre.sieira at gmail.com (Alexandre Sieira)
Date: Tue, 3 Dec 2013 15:05:59 -0200
Subject: [datatable-help] rbindlist
In-Reply-To: <CA+xi=qbBE1-+6Gh3KxJikCjcXWj7yrRUVoTHJn9uJNHwmvYGVQ@mail.gmail.com>
References: <etPan.529dea30.1f16e9e8.ac@MacBook-Pro-de-Alexandre-Sieira.local>
 <CA+xi=qbBE1-+6Gh3KxJikCjcXWj7yrRUVoTHJn9uJNHwmvYGVQ@mail.gmail.com>
Message-ID: <etPan.529e0f77.ded7263.ac@MacBook-Pro-de-Alexandre-Sieira.local>

For whom it may concern, I wrote a (rather bulky) wrapper around rbindlist that:

- checks that the classes of columns with the same name match;
- fills in any missing columns with NAs of the appropriate type;
- reorders columns for consistency;
- calls rbindlist on the results of this preprocessing.

The code is here:?https://gist.github.com/asieira/7772953

The results would be as follows:

> smartrbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3)))
? ?a b
1: 1 2
2: 3 4

> smartrbindlist(list(data.table(a=1, b=2), list(c=3), data.table(d="foo")))
? ? a ?b ?c ? d
1: ?1 ?2 NA ?NA
2: NA NA ?3 ?NA
3: NA NA NA foo

> smartrbindlist(list(data.table(a=1L, b=2), list(a=10)))
Erro em smartrbindlist(list(data.table(a = 1L, b = 2), list(a = 10)))
? smartrbindlist: column a has different classes in entry 2 [numeric] and its predecessors [integer]

Hope this helps anyone else out there.

--?
Alexandre Sieira
CISA, CISSP, ISO 27001 Lead Auditor

"The truth is rarely pure and never simple."
Oscar Wilde, The Importance of Being Earnest, 1895, Act I

On 3 de dezembro de 2013 at 14:46:08, G See (gsee000 at gmail.com) wrote:

I agree. Here is a related thread:  
http://thread.gmane.org/gmane.comp.lang.r.datatable/2231  

Garrett  


On Tue, Dec 3, 2013 at 8:26 AM, Alexandre Sieira  
<alexandre.sieira at gmail.com> wrote:  
> I have come across some behavior in rbindlist that look unexpected to me:  
>  
>> rbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3)))  
> a b  
> 1: 1 2  
> 2: 4 3  
>  
> So it appears to assume (without checking) that all objects have not only  
> the same column names but also the same column order. So a value assigned  
> to column ?a? in the second object was used for column ?b? in the end result  
> (and vice-versa).  
>  
> I know the documentation says rbindlist uses the column types from the first  
> entry of the list, but I didn?t see any mention to column order or names  
> anywhere.  
>  
> I suggest that column names are matched, even if they are not in the same  
> order. Perhaps a ?use.names? parameter could be used to ask for this  
> behavior to avoid breaking backwards compatibility.  
>  
> Or, at the very least, I suggest the documentation of bindlist be updated to  
> explicitly mention that the columns will be considered by position only, and  
> that callers need to ensure the column orders of all objects match exactly.  
> And that a warning is issued by rbindlist when the column names don?t match.  
>  
> --  
> Alexandre Sieira  
> CISA, CISSP, ISO 27001 Lead Auditor  
>  
> "The truth is rarely pure and never simple."  
> Oscar Wilde, The Importance of Being Earnest, 1895, Act I  
>  
> _______________________________________________  
> datatable-help mailing list  
> datatable-help at lists.r-forge.r-project.org  
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131203/33a4d0b0/attachment.html>

From eduard.antonyan at gmail.com  Tue Dec  3 18:22:28 2013
From: eduard.antonyan at gmail.com (Eduard Antonyan)
Date: Tue, 3 Dec 2013 11:22:28 -0600
Subject: [datatable-help] rbindlist
In-Reply-To: <etPan.529e0f77.ded7263.ac@MacBook-Pro-de-Alexandre-Sieira.local>
References: <etPan.529dea30.1f16e9e8.ac@MacBook-Pro-de-Alexandre-Sieira.local>
 <CA+xi=qbBE1-+6Gh3KxJikCjcXWj7yrRUVoTHJn9uJNHwmvYGVQ@mail.gmail.com>
 <etPan.529e0f77.ded7263.ac@MacBook-Pro-de-Alexandre-Sieira.local>
Message-ID: <CAHZcBOrSyUYPdZUjA23cOCXOcbd7Svb_VWWCD2-0PALwSVPO7Q@mail.gmail.com>

I took a cursory look at your code - the new rbind does everything you want
(check use.names and the fill arguments), and you may want to take a look
at its code.


On Tue, Dec 3, 2013 at 11:05 AM, Alexandre Sieira <
alexandre.sieira at gmail.com> wrote:

> For whom it may concern, I wrote a (rather bulky) wrapper around rbindlist
> that:
>
> - checks that the classes of columns with the same name match;
> - fills in any missing columns with NAs of the appropriate type;
> - reorders columns for consistency;
> - calls rbindlist on the results of this preprocessing.
>
> The code is here: https://gist.github.com/asieira/7772953
>
> The results would be as follows:
>
> > smartrbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3)))
>    a b
> 1: 1 2
> 2: 3 4
>
> > smartrbindlist(list(data.table(a=1, b=2), list(c=3),
> data.table(d="foo")))
>     a  b  c   d
> 1:  1  2 NA  NA
> 2: NA NA  3  NA
> 3: NA NA NA foo
>
> > smartrbindlist(list(data.table(a=1L, b=2), list(a=10)))
> Erro em smartrbindlist(list(data.table(a = 1L, b = 2), list(a = 10)))
>   smartrbindlist: column a has different classes in entry 2 [numeric] and
> its predecessors [integer]
>
> Hope this helps anyone else out there.
>
> --
> Alexandre Sieira
> CISA, CISSP, ISO 27001 Lead Auditor
>
> "The truth is rarely pure and never simple."
> Oscar Wilde, The Importance of Being Earnest, 1895, Act I
>
> On 3 de dezembro de 2013 at 14:46:08, G See (gsee000 at gmail.com<//gsee000 at gmail.com>)
> wrote:
>
> I agree. Here is a related thread:
> http://thread.gmane.org/gmane.comp.lang.r.datatable/2231
>
> Garrett
>
>
> On Tue, Dec 3, 2013 at 8:26 AM, Alexandre Sieira
> <alexandre.sieira at gmail.com> wrote:
> > I have come across some behavior in rbindlist that look unexpected to
> me:
> >
> >> rbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3)))
> > a b
> > 1: 1 2
> > 2: 4 3
> >
> > So it appears to assume (without checking) that all objects have not
> only
> > the same column names but also the same column order. So a value
> assigned
> > to column ?a? in the second object was used for column ?b? in the end
> result
> > (and vice-versa).
> >
> > I know the documentation says rbindlist uses the column types from the
> first
> > entry of the list, but I didn?t see any mention to column order or names
> > anywhere.
> >
> > I suggest that column names are matched, even if they are not in the
> same
> > order. Perhaps a ?use.names? parameter could be used to ask for this
> > behavior to avoid breaking backwards compatibility.
> >
> > Or, at the very least, I suggest the documentation of bindlist be
> updated to
> > explicitly mention that the columns will be considered by position only,
> and
> > that callers need to ensure the column orders of all objects match
> exactly.
> > And that a warning is issued by rbindlist when the column names don?t
> match.
> >
> > --
> > Alexandre Sieira
> > CISA, CISSP, ISO 27001 Lead Auditor
> >
> > "The truth is rarely pure and never simple."
> > Oscar Wilde, The Importance of Being Earnest, 1895, Act I
> >
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131203/42ab545e/attachment-0001.html>

From eduard.antonyan at gmail.com  Tue Dec  3 18:24:08 2013
From: eduard.antonyan at gmail.com (Eduard Antonyan)
Date: Tue, 3 Dec 2013 11:24:08 -0600
Subject: [datatable-help] rbindlist
In-Reply-To: <CAHZcBOrSyUYPdZUjA23cOCXOcbd7Svb_VWWCD2-0PALwSVPO7Q@mail.gmail.com>
References: <etPan.529dea30.1f16e9e8.ac@MacBook-Pro-de-Alexandre-Sieira.local>
 <CA+xi=qbBE1-+6Gh3KxJikCjcXWj7yrRUVoTHJn9uJNHwmvYGVQ@mail.gmail.com>
 <etPan.529e0f77.ded7263.ac@MacBook-Pro-de-Alexandre-Sieira.local>
 <CAHZcBOrSyUYPdZUjA23cOCXOcbd7Svb_VWWCD2-0PALwSVPO7Q@mail.gmail.com>
Message-ID: <CAHZcBOqXOdf6UZ9v3xRQqLFg2ir=AHfkb3=Fo4H+i1Y5tKoTLA@mail.gmail.com>

With a small difference from what you wrote I guess - the classes are
coerced to the most general one now in rbindlist (and therefore in rbind).


On Tue, Dec 3, 2013 at 11:22 AM, Eduard Antonyan
<eduard.antonyan at gmail.com>wrote:

> I took a cursory look at your code - the new rbind does everything you
> want (check use.names and the fill arguments), and you may want to take a
> look at its code.
>
>
> On Tue, Dec 3, 2013 at 11:05 AM, Alexandre Sieira <
> alexandre.sieira at gmail.com> wrote:
>
>> For whom it may concern, I wrote a (rather bulky) wrapper around
>> rbindlist that:
>>
>> - checks that the classes of columns with the same name match;
>> - fills in any missing columns with NAs of the appropriate type;
>> - reorders columns for consistency;
>> - calls rbindlist on the results of this preprocessing.
>>
>>  The code is here: https://gist.github.com/asieira/7772953
>>
>> The results would be as follows:
>>
>> > smartrbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3)))
>>    a b
>> 1: 1 2
>> 2: 3 4
>>
>> > smartrbindlist(list(data.table(a=1, b=2), list(c=3),
>> data.table(d="foo")))
>>     a  b  c   d
>> 1:  1  2 NA  NA
>> 2: NA NA  3  NA
>> 3: NA NA NA foo
>>
>> > smartrbindlist(list(data.table(a=1L, b=2), list(a=10)))
>> Erro em smartrbindlist(list(data.table(a = 1L, b = 2), list(a = 10)))
>>   smartrbindlist: column a has different classes in entry 2 [numeric] and
>> its predecessors [integer]
>>
>> Hope this helps anyone else out there.
>>
>> --
>> Alexandre Sieira
>> CISA, CISSP, ISO 27001 Lead Auditor
>>
>> "The truth is rarely pure and never simple."
>> Oscar Wilde, The Importance of Being Earnest, 1895, Act I
>>
>> On 3 de dezembro de 2013 at 14:46:08, G See (gsee000 at gmail.com<//gsee000 at gmail.com>)
>> wrote:
>>
>> I agree. Here is a related thread:
>> http://thread.gmane.org/gmane.comp.lang.r.datatable/2231
>>
>> Garrett
>>
>>
>> On Tue, Dec 3, 2013 at 8:26 AM, Alexandre Sieira
>> <alexandre.sieira at gmail.com> wrote:
>> > I have come across some behavior in rbindlist that look unexpected to
>> me:
>> >
>> >> rbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3)))
>> > a b
>> > 1: 1 2
>> > 2: 4 3
>> >
>> > So it appears to assume (without checking) that all objects have not
>> only
>> > the same column names but also the same column order. So a value
>> assigned
>> > to column ?a? in the second object was used for column ?b? in the end
>> result
>> > (and vice-versa).
>> >
>> > I know the documentation says rbindlist uses the column types from the
>> first
>> > entry of the list, but I didn?t see any mention to column order or
>> names
>> > anywhere.
>> >
>> > I suggest that column names are matched, even if they are not in the
>> same
>> > order. Perhaps a ?use.names? parameter could be used to ask for this
>> > behavior to avoid breaking backwards compatibility.
>> >
>> > Or, at the very least, I suggest the documentation of bindlist be
>> updated to
>> > explicitly mention that the columns will be considered by position
>> only, and
>> > that callers need to ensure the column orders of all objects match
>> exactly.
>> > And that a warning is issued by rbindlist when the column names don?t
>> match.
>> >
>> > --
>> > Alexandre Sieira
>> > CISA, CISSP, ISO 27001 Lead Auditor
>> >
>> > "The truth is rarely pure and never simple."
>> > Oscar Wilde, The Importance of Being Earnest, 1895, Act I
>> >
>> > _______________________________________________
>> > datatable-help mailing list
>> > datatable-help at lists.r-forge.r-project.org
>> >
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131203/1c3f1b69/attachment.html>

From alexandre.sieira at gmail.com  Tue Dec  3 18:48:11 2013
From: alexandre.sieira at gmail.com (Alexandre Sieira)
Date: Tue, 3 Dec 2013 15:48:11 -0200
Subject: [datatable-help] rbindlist
In-Reply-To: <CAHZcBOrSyUYPdZUjA23cOCXOcbd7Svb_VWWCD2-0PALwSVPO7Q@mail.gmail.com>
References: <etPan.529dea30.1f16e9e8.ac@MacBook-Pro-de-Alexandre-Sieira.local>
 <CA+xi=qbBE1-+6Gh3KxJikCjcXWj7yrRUVoTHJn9uJNHwmvYGVQ@mail.gmail.com>
 <etPan.529e0f77.ded7263.ac@MacBook-Pro-de-Alexandre-Sieira.local>
 <CAHZcBOrSyUYPdZUjA23cOCXOcbd7Svb_VWWCD2-0PALwSVPO7Q@mail.gmail.com>
Message-ID: <etPan.529e195b.1befd79f.ac@MacBook-Pro-de-Alexandre-Sieira.local>

Thanks for pointing this out, Eduard.?

You are absolutely right. I just looked at the SVN repository HEAD and saw a new parameter called ?fill? was added to .rbind.data.table that would also accomplish something else I added to my function. Very nice! Looking forward to the new release. :)

--?
Alexandre Sieira
CISA, CISSP, ISO 27001 Lead Auditor

"The truth is rarely pure and never simple."
Oscar Wilde, The Importance of Being Earnest, 1895, Act I

On 3 de dezembro de 2013 at 15:22:48, Eduard Antonyan (eduard.antonyan at gmail.com) wrote:

I took a cursory look at your code - the new rbind does everything you want (check use.names and the fill arguments), and you may want to take a look at its code.


On Tue, Dec 3, 2013 at 11:05 AM, Alexandre Sieira <alexandre.sieira at gmail.com> wrote:
For whom it may concern, I wrote a (rather bulky) wrapper around rbindlist that:

- checks that the classes of columns with the same name match;
- fills in any missing columns with NAs of the appropriate type;
- reorders columns for consistency;
- calls rbindlist on the results of this preprocessing.

The code is here:?https://gist.github.com/asieira/7772953

The results would be as follows:

> smartrbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3)))
? ?a b
1: 1 2
2: 3 4

> smartrbindlist(list(data.table(a=1, b=2), list(c=3), data.table(d="foo")))
? ? a ?b ?c ? d
1: ?1 ?2 NA ?NA
2: NA NA ?3 ?NA
3: NA NA NA foo

> smartrbindlist(list(data.table(a=1L, b=2), list(a=10)))
Erro em smartrbindlist(list(data.table(a = 1L, b = 2), list(a = 10)))
? smartrbindlist: column a has different classes in entry 2 [numeric] and its predecessors [integer]

Hope this helps anyone else out there.

--?
Alexandre Sieira
CISA, CISSP, ISO 27001 Lead Auditor

"The truth is rarely pure and never simple."
Oscar Wilde, The Importance of Being Earnest, 1895, Act I

On 3 de dezembro de 2013 at 14:46:08, G See (gsee000 at gmail.com) wrote:

I agree. Here is a related thread:
http://thread.gmane.org/gmane.comp.lang.r.datatable/2231

Garrett


On Tue, Dec 3, 2013 at 8:26 AM, Alexandre Sieira
<alexandre.sieira at gmail.com> wrote:
> I have come across some behavior in rbindlist that look unexpected to me:
>
>> rbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3)))
> a b
> 1: 1 2
> 2: 4 3
>
> So it appears to assume (without checking) that all objects have not only
> the same column names but also the same column order. So a value assigned
> to column ?a? in the second object was used for column ?b? in the end result
> (and vice-versa).
>
> I know the documentation says rbindlist uses the column types from the first
> entry of the list, but I didn?t see any mention to column order or names
> anywhere.
>
> I suggest that column names are matched, even if they are not in the same
> order. Perhaps a ?use.names? parameter could be used to ask for this
> behavior to avoid breaking backwards compatibility.
>
> Or, at the very least, I suggest the documentation of bindlist be updated to
> explicitly mention that the columns will be considered by position only, and
> that callers need to ensure the column orders of all objects match exactly.
> And that a warning is issued by rbindlist when the column names don?t match.
>
> --
> Alexandre Sieira
> CISA, CISSP, ISO 27001 Lead Auditor
>
> "The truth is rarely pure and never simple."
> Oscar Wilde, The Importance of Being Earnest, 1895, Act I
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131203/8b971d1c/attachment-0001.html>

From mdowle at mdowle.plus.com  Tue Dec 10 18:05:22 2013
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Tue, 10 Dec 2013 17:05:22 +0000
Subject: [datatable-help] Cologne on Friday
Message-ID: <52A749D2.7040406@mdowle.plus.com>


Hi,

If anyone is in or near Cologne on Friday, I'm presenting with Arun :

http://www.meetup.com/KoelnRUG/

Hope to meet a few of you there.

Regards,
Matt


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131210/931fb7d6/attachment.html>

From aragorn168b at gmail.com  Tue Dec 10 22:44:53 2013
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Tue, 10 Dec 2013 22:44:53 +0100
Subject: [datatable-help] Cologne on Friday
In-Reply-To: <52A749D2.7040406@mdowle.plus.com>
References: <52A749D2.7040406@mdowle.plus.com>
Message-ID: <97AAB816652F46A6978FC8508BEE2233@gmail.com>

Oh yes. Looking forward to it :)! 

Arun


On Tuesday, December 10, 2013 at 6:05 PM, Matt Dowle wrote:

> 
> Hi,
> 
> If anyone is in or near Cologne on Friday, I'm presenting with Arun :
> 
> http://www.meetup.com/KoelnRUG/
> 
> Hope to meet a few of you there.
> 
> Regards,
> Matt
> 
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131210/2b0422a5/attachment.html>

From aragorn168b at gmail.com  Tue Dec 10 22:45:56 2013
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Tue, 10 Dec 2013 22:45:56 +0100
Subject: [datatable-help] Cologne on Friday
In-Reply-To: <97AAB816652F46A6978FC8508BEE2233@gmail.com>
References: <52A749D2.7040406@mdowle.plus.com>
 <97AAB816652F46A6978FC8508BEE2233@gmail.com>
Message-ID: <41704C9EAF02401C99FDA608AA05A211@gmail.com>

Here's info on when/where/what etc:
http://www.meetup.com/KoelnRUG/events/146708302/ 

Arun


On Tuesday, December 10, 2013 at 10:44 PM, Arunkumar Srinivasan wrote:

> Oh yes. Looking forward to it :)! 
> 
> Arun
> 
> 
> On Tuesday, December 10, 2013 at 6:05 PM, Matt Dowle wrote:
> 
> > 
> > Hi,
> > 
> > If anyone is in or near Cologne on Friday, I'm presenting with Arun :
> > 
> > http://www.meetup.com/KoelnRUG/
> > 
> > Hope to meet a few of you there.
> > 
> > Regards,
> > Matt
> > 
> > 
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > 
> > 
> > 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131210/6f654421/attachment.html>

From chenhuashan at gmail.com  Sat Dec 14 02:30:54 2013
From: chenhuashan at gmail.com (Huashan Chen)
Date: Fri, 13 Dec 2013 17:30:54 -0800 (PST)
Subject: [datatable-help] Fail to add new columns within a function
Message-ID: <1386984654635-4682173.post@n4.nabble.com>

I just found out that when the column quota are reached, adding new columns
within a function will fail.

Blow are the testing code:

testF2=function(x){  
  add_var<-function(varname){
    x[, `:=`(eval(substitute(varname)), 1), with=F]  
  }
  sapply(paste0('a', 1:101), add_var)
}

dd=data.table(a=1:3)
truelength(dd)
testF2(dd)
dim(dd)   # only 100 columns

dd[, new:=3]
dim(dd)  # adding new column outside a function is OK.


--
View this message in context: http://r.789695.n4.nabble.com/Fail-to-add-new-columns-within-a-function-tp4682173.html
Sent from the datatable-help mailing list archive at Nabble.com.

From aragorn168b at gmail.com  Sat Dec 14 14:10:01 2013
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Sat, 14 Dec 2013 14:10:01 +0100
Subject: [datatable-help] Fail to add new columns within a function
In-Reply-To: <1386984654635-4682173.post@n4.nabble.com>
References: <1386984654635-4682173.post@n4.nabble.com>
Message-ID: <40AC36D389E643519828E005ABF732D0@gmail.com>

Hi Huashan,
Great reproducible example! Would you mind filing a bug report here (https://r-forge.r-project.org/tracker/?func=browse&group_id=240&atid=975)?
Thank you,
Arun


On Saturday, December 14, 2013 at 2:30 AM, Huashan Chen wrote:

> I just found out that when the column quota are reached, adding new columns
> within a function will fail.
> 
> Blow are the testing code:
> 
> testF2=function(x){ 
> add_var<-function(varname){
> x[, `:=`(eval(substitute(varname)), 1), with=F] 
> }
> sapply(paste0('a', 1:101), add_var)
> }
> 
> dd=data.table(a=1:3)
> truelength(dd)
> testF2(dd)
> dim(dd) # only 100 columns
> 
> dd[, new:=3]
> dim(dd) # adding new column outside a function is OK.
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Fail-to-add-new-columns-within-a-function-tp4682173.html
> Sent from the datatable-help mailing list archive at Nabble.com (http://Nabble.com).
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131214/3fad8db9/attachment.html>

From mdowle at mdowle.plus.com  Sun Dec 15 01:28:04 2013
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Sun, 15 Dec 2013 00:28:04 +0000
Subject: [datatable-help] Fail to add new columns within a function
In-Reply-To: <40AC36D389E643519828E005ABF732D0@gmail.com>
References: <1386984654635-4682173.post@n4.nabble.com>
 <40AC36D389E643519828E005ABF732D0@gmail.com>
Message-ID: <52ACF794.10408@mdowle.plus.com>


Hi,

This isn't a bug really.   A documentation or too low default issue maybe.

When all spare slots are used up, there is no choice but to make a 
shallow copy and create a new vector of column pointer slots. This is 
the pointer (address in RAM) which any variable names (symbols) point 
to.   When this happens, data.table does a reasonable job of changing 
the symbol in calling scope too,  but within a function within a 
function it's tricky.  In your function,  x is actually being updated by 
reference, but in local scope when the shallow copy happens ... when the 
spare slots are used up.

By default :

datatable.alloccol = quote(max(100L,ncol(DT)+64L))

Some people just change this to be a much larger number.  That's the 
easiest.  Just over-allocate massively :

options(datatable.alloccol = 10000)

If you have under 50 tables,  this won't matter a jot.   If you have 
1000's of tables, then the spare space could become significant.

Assuming 64bit,  10000 * 8bytes / 1024^2 = 78KB.   Knowing this allows 
you to choose the appropriate amount of over-allocation for your 
case.    50 tables * 78KB = 4MB = e.g. 0.01% of 32GB

Or,  if you know you are about to add a lot of columns by reference via 
a function,  you can increase the over-allocation of one table using the 
alloc.col function :

alloc.col(DT, 200)

In case the example was actually close to the real example,  you can add 
a lot of columns in one step and the LHS of := can be an expression :

DT[, paste0('a', 1:101) := 1]   # add 101 columns named "a1", "a2" ... 
"a101", all set to 1

and set() may be an easier alternative to := in this case,  now that it 
can add columns as from v1.8.11

If there is a real world example where it really needs to be wrapped in 
a function in a function then that would be needed to see (or an example 
closer to reality) to convince (me at least) that we need to do better here.

HTH,
Matt


On 14/12/13 13:10, Arunkumar Srinivasan wrote:
> Hi Huashan,
> Great reproducible example! Would you mind filing a bug report here 
> <https://r-forge.r-project.org/tracker/?func=browse&group_id=240&atid=975>?
> Thank you,
> Arun
>
> On Saturday, December 14, 2013 at 2:30 AM, Huashan Chen wrote:
>
>> I just found out that when the column quota are reached, adding new 
>> columns
>> within a function will fail.
>>
>> Blow are the testing code:
>>
>> testF2=function(x){
>> add_var<-function(varname){
>> x[, `:=`(eval(substitute(varname)), 1), with=F]
>> }
>> sapply(paste0('a', 1:101), add_var)
>> }
>>
>> dd=data.table(a=1:3)
>> truelength(dd)
>> testF2(dd)
>> dim(dd) # only 100 columns
>>
>> dd[, new:=3]
>> dim(dd) # adding new column outside a function is OK.
>>
>>
>>
>> --
>> View this message in context: 
>> http://r.789695.n4.nabble.com/Fail-to-add-new-columns-within-a-function-tp4682173.html
>> Sent from the datatable-help mailing list archive at Nabble.com 
>> <http://Nabble.com>.
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org 
>> <mailto:datatable-help at lists.r-forge.r-project.org>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131215/10ee9f1e/attachment.html>

From chenhuashan at gmail.com  Mon Dec 16 00:26:24 2013
From: chenhuashan at gmail.com (Huashan Chen)
Date: Sun, 15 Dec 2013 15:26:24 -0800 (PST)
Subject: [datatable-help] Fail to add new columns within a function
In-Reply-To: <52ACF794.10408@mdowle.plus.com>
References: <1386984654635-4682173.post@n4.nabble.com>
 <40AC36D389E643519828E005ABF732D0@gmail.com> <52ACF794.10408@mdowle.plus.com>
Message-ID: <1387149983998-4682250.post@n4.nabble.com>

Hi Matthew,

Thank you for the thoughtful reply. It's from a real example I am using
where a master dataset is to be merged with other datasets by selected rows
and columns(may or may not exist in master dataset). This function is call
multiple times. As can see from the simple example, a master dataset is
passed in by reference to avoid duplications. 

As you also pointed out, alloc.col(DT, some large value) outside the
function can fix this problem. But I am wondering if the all.col() call
within the function could be more preferable?


--
View this message in context: http://r.789695.n4.nabble.com/Fail-to-add-new-columns-within-a-function-tp4682173p4682250.html
Sent from the datatable-help mailing list archive at Nabble.com.

From mdowle at mdowle.plus.com  Tue Dec 17 20:11:20 2013
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Tue, 17 Dec 2013 19:11:20 +0000
Subject: [datatable-help] Fail to add new columns within a function
In-Reply-To: <1387149983998-4682250.post@n4.nabble.com>
References: <1386984654635-4682173.post@n4.nabble.com>
 <40AC36D389E643519828E005ABF732D0@gmail.com> <52ACF794.10408@mdowle.plus.com>
 <1387149983998-4682250.post@n4.nabble.com>
Message-ID: <52B0A1D8.7070407@mdowle.plus.com>

On 15/12/13 23:26, Huashan Chen wrote:
> Hi Matthew,
>
> Thank you for the thoughtful reply. It's from a real example I am using
> where a master dataset is to be merged with other datasets by selected rows
> and columns(may or may not exist in master dataset). This function is call
> multiple times. As can see from the simple example, a master dataset is
> passed in by reference to avoid duplications.
>
> As you also pointed out, alloc.col(DT, some large value) outside the
> function can fix this problem. But I am wondering if the all.col() call
> within the function could be more preferable?
If the name of the master table doesn't change,  then you don't need to 
pass it in at all.   Just use it directly inside the function and then 
yes the alloc.col will work.   But if it's being merged with another 
dataset, then that merge will create a new table so I'm now confused 
again as the code didn't come through in this thread from nabble.

The question seems like a good one and would be best on Stack Overflow 
where we can see the code, edit and comment etc.
http://stackoverflow.com/questions/tagged/data.table 
<http://stackoverflow.com/questions/tagged/data.table?sort=active&pagesize=50>

Thanks, Matt

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131217/d2d6670f/attachment.html>

From mdowle at mdowle.plus.com  Tue Dec 17 20:15:15 2013
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Tue, 17 Dec 2013 19:15:15 +0000
Subject: [datatable-help] Latest presentation
Message-ID: <52B0A2C3.1050406@mdowle.plus.com>


The presentation Arun and I gave on Friday is now on the homepage :

http://datatable.r-forge.r-project.org/CologneR_2013.pdf

Matt


From chenhuashan at gmail.com  Wed Dec 18 10:04:29 2013
From: chenhuashan at gmail.com (Huashan Chen)
Date: Wed, 18 Dec 2013 01:04:29 -0800 (PST)
Subject: [datatable-help] Fail to add new columns within a function
In-Reply-To: <52B0A1D8.7070407@mdowle.plus.com>
References: <1386984654635-4682173.post@n4.nabble.com>
 <40AC36D389E643519828E005ABF732D0@gmail.com> <52ACF794.10408@mdowle.plus.com>
 <1387149983998-4682250.post@n4.nabble.com> <52B0A1D8.7070407@mdowle.plus.com>
Message-ID: <1387357469738-4682393.post@n4.nabble.com>

OK, here is the complete code with some mock functions from my example.

# data: data.table object
# fn: a filename to read data from 
merge_data<-function(fn, data){
  fs<-getSavedata(fn) # read as data.frame
  if (is.null(fs)) stop('Empty data file')
  
  # return a character vector of variable names which are to merged, some
variables in fs will not be merged to DT
  newvars<-selectVars(names(fs)) 
  stopifnot(length(newvars) > 0)
    
    # determine which rows to use  
    caseid<-someCustomFunc(fs)
  
  add_var<-function(varname){
      data[caseid, `:=`(eval(substitute(varname)), fs[,
toupper(eval(substitute(varname)))]), with=F]  
    }
    invisible(sapply(newvars, add_var))
}

# calling function
merge_data('some file', DT)
DT # display the updated results


In this case, I think a warning from merge_data() when the quota is reached
would be appreciated. Of couse, I could have added a check within the
function to avoid unintended action.

    if (truelength(data) <= ncol(data) + 64L) stop('increase colunmn quota
using alloc.col() before calling this function.')


--
View this message in context: http://r.789695.n4.nabble.com/Fail-to-add-new-columns-within-a-function-tp4682173p4682393.html
Sent from the datatable-help mailing list archive at Nabble.com.

From mdowle at mdowle.plus.com  Wed Dec 18 10:58:46 2013
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Wed, 18 Dec 2013 09:58:46 +0000
Subject: [datatable-help] Fail to add new columns within a function
In-Reply-To: <1387357469738-4682393.post@n4.nabble.com>
References: <1386984654635-4682173.post@n4.nabble.com>
 <40AC36D389E643519828E005ABF732D0@gmail.com> <52ACF794.10408@mdowle.plus.com>
 <1387149983998-4682250.post@n4.nabble.com> <52B0A1D8.7070407@mdowle.plus.com>
 <1387357469738-4682393.post@n4.nabble.com>
Message-ID: <52B171D6.1000209@mdowle.plus.com>


Why are you doing this iteratively?  Can't you load all the files into a 
list,  rbindlist and then reshape?

On 18/12/13 09:04, Huashan Chen wrote:
> OK, here is the complete code with some mock functions from my example.
>
> # data: data.table object
> # fn: a filename to read data from
> merge_data<-function(fn, data){
>    fs<-getSavedata(fn) # read as data.frame
>    if (is.null(fs)) stop('Empty data file')
>    
>    # return a character vector of variable names which are to merged, some
> variables in fs will not be merged to DT
>    newvars<-selectVars(names(fs))
>    stopifnot(length(newvars) > 0)
>      
>      # determine which rows to use
>      caseid<-someCustomFunc(fs)
>    
>    add_var<-function(varname){
>        data[caseid, `:=`(eval(substitute(varname)), fs[,
> toupper(eval(substitute(varname)))]), with=F]
>      }
>      invisible(sapply(newvars, add_var))
> }
>
> # calling function
> merge_data('some file', DT)
> DT # display the updated results
>
>
> In this case, I think a warning from merge_data() when the quota is reached
> would be appreciated. Of couse, I could have added a check within the
> function to avoid unintended action.
>
>      if (truelength(data) <= ncol(data) + 64L) stop('increase colunmn quota
> using alloc.col() before calling this function.')
>
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Fail-to-add-new-columns-within-a-function-tp4682173p4682393.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>


From mdowle at mdowle.plus.com  Wed Dec 18 11:01:04 2013
From: mdowle at mdowle.plus.com (Matt Dowle)
Date: Wed, 18 Dec 2013 10:01:04 +0000
Subject: [datatable-help] Fail to add new columns within a function
In-Reply-To: <1387357469738-4682393.post@n4.nabble.com>
References: <1386984654635-4682173.post@n4.nabble.com>
 <40AC36D389E643519828E005ABF732D0@gmail.com> <52ACF794.10408@mdowle.plus.com>
 <1387149983998-4682250.post@n4.nabble.com> <52B0A1D8.7070407@mdowle.plus.com>
 <1387357469738-4682393.post@n4.nabble.com>
Message-ID: <52B17260.7080703@mdowle.plus.com>


Why are you doing this iteratively?  Can't you load all the files into a 
list,  rbindlist and then reshape?

What kind of data is this e.g. which field?


On 18/12/13 09:04, Huashan Chen wrote:
> OK, here is the complete code with some mock functions from my example.
>
> # data: data.table object
> # fn: a filename to read data from
> merge_data<-function(fn, data){
>    fs<-getSavedata(fn) # read as data.frame
>    if (is.null(fs)) stop('Empty data file')
>
>    # return a character vector of variable names which are to merged, some
> variables in fs will not be merged to DT
>    newvars<-selectVars(names(fs))
>    stopifnot(length(newvars) > 0)
>
>      # determine which rows to use
>      caseid<-someCustomFunc(fs)
>
>    add_var<-function(varname){
>        data[caseid, `:=`(eval(substitute(varname)), fs[,
> toupper(eval(substitute(varname)))]), with=F]
>      }
>      invisible(sapply(newvars, add_var))
> }
>
> # calling function
> merge_data('some file', DT)
> DT # display the updated results
>
>
> In this case, I think a warning from merge_data() when the quota is reached
> would be appreciated. Of couse, I could have added a check within the
> function to avoid unintended action.
>
>      if (truelength(data) <= ncol(data) + 64L) stop('increase colunmn quota
> using alloc.col() before calling this function.')
>
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Fail-to-add-new-columns-within-a-function-tp4682173p4682393.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>


From kevinushey at gmail.com  Thu Dec 19 02:54:57 2013
From: kevinushey at gmail.com (Kevin Ushey)
Date: Wed, 18 Dec 2013 17:54:57 -0800
Subject: [datatable-help] 'by' on a numeric column produces inconsistent
	output
Message-ID: <CAJXgQP2nAGMrz1Amzw5mH1ekDUmKpffGPrZ=5LLES4CWWBXBLQ@mail.gmail.com>

I'm cross-posting this from the GitHub mirror:
https://github.com/arunsrinivasan/datatable/issues/2

For reference, I only see this with the latest RForge version of
data.table (1.8.11), not the CRAN version of data.table.

-----

library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
set.seed(32)
n <- 3
dt <- data.table(
  y=rnorm(n),
  by=round( rnorm(n), 1)
)

dt[,
  list(max=max(y, na.rm=TRUE)),
  by=list(by)
]

dt[,
  list(max=max(y, na.rm=TRUE)),
  by=list(by)
]

produces the output

> dt[,
+   list(max=max(y, na.rm=TRUE)),
+   by=list(by)
+ ]
    by         max
1: 0.4  0.01464054
2: 0.4  0.87328871
3: 0.7 -1.02794620
>
> dt[,
+   list(max=max(y, na.rm=TRUE)),
+   by=list(by)
+ ]
    by        max
1: 0.4  0.8732887
2: 0.7 -1.0279462

For some reason, the first return is wrong, while the second (and all
subsequent) output is correct. Any idea what's going on?

> sessionInfo()
R Under development (unstable) (2013-12-12 r64453)
Platform: x86_64-apple-darwin13.0.0 (64-bit)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.8.11    knitr_1.5            devtools_1.4.1.99
BiocInstaller_1.13.3

loaded via a namespace (and not attached):
 [1] compiler_3.1.0 digest_0.6.4   evaluate_0.5.1 formatR_0.10
httr_0.2       memoise_0.1
 [7] parallel_3.1.0 plyr_1.8       RCurl_1.95-4.1 reshape2_1.2.2
stringr_0.6.2  tools_3.1.0
[13] whisker_0.3-2

---

Kevin

From michael.nelson at sydney.edu.au  Thu Dec 19 03:50:06 2013
From: michael.nelson at sydney.edu.au (Michael Nelson)
Date: Thu, 19 Dec 2013 02:50:06 +0000
Subject: [datatable-help] 'by' on a numeric column produces
	inconsistent	output
In-Reply-To: <CAJXgQP2nAGMrz1Amzw5mH1ekDUmKpffGPrZ=5LLES4CWWBXBLQ@mail.gmail.com>
References: <CAJXgQP2nAGMrz1Amzw5mH1ekDUmKpffGPrZ=5LLES4CWWBXBLQ@mail.gmail.com>
Message-ID: <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05>

Using
data.table 1.8.11 (Fresh install from r-forge today)
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)

I get

    by         max
1: 0.7  0.01464054
2: 0.4  0.87328871
3: 0.4 -1.02794620

On both runs.


________________________________________
From: datatable-help-bounces at lists.r-forge.r-project.org [datatable-help-bounces at lists.r-forge.r-project.org] on behalf of Kevin Ushey [kevinushey at gmail.com]
Sent: Thursday, 19 December 2013 12:54 PM
To: datatable-help at lists.r-forge.r-project.org
Subject: [datatable-help] 'by' on a numeric column produces inconsistent        output

I'm cross-posting this from the GitHub mirror:
https://github.com/arunsrinivasan/datatable/issues/2

For reference, I only see this with the latest RForge version of
data.table (1.8.11), not the CRAN version of data.table.

-----

library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
set.seed(32)
n <- 3
dt <- data.table(
  y=rnorm(n),
  by=round( rnorm(n), 1)
)

dt[,
  list(max=max(y, na.rm=TRUE)),
  by=list(by)
]

dt[,
  list(max=max(y, na.rm=TRUE)),
  by=list(by)
]

produces the output

> dt[,
+   list(max=max(y, na.rm=TRUE)),
+   by=list(by)
+ ]
    by         max
1: 0.4  0.01464054
2: 0.4  0.87328871
3: 0.7 -1.02794620
>
> dt[,
+   list(max=max(y, na.rm=TRUE)),
+   by=list(by)
+ ]
    by        max
1: 0.4  0.8732887
2: 0.7 -1.0279462

For some reason, the first return is wrong, while the second (and all
subsequent) output is correct. Any idea what's going on?

> sessionInfo()
R Under development (unstable) (2013-12-12 r64453)
Platform: x86_64-apple-darwin13.0.0 (64-bit)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.8.11    knitr_1.5            devtools_1.4.1.99
BiocInstaller_1.13.3

loaded via a namespace (and not attached):
 [1] compiler_3.1.0 digest_0.6.4   evaluate_0.5.1 formatR_0.10
httr_0.2       memoise_0.1
 [7] parallel_3.1.0 plyr_1.8       RCurl_1.95-4.1 reshape2_1.2.2
stringr_0.6.2  tools_3.1.0
[13] whisker_0.3-2

---

Kevin
_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

From aragorn168b at gmail.com  Thu Dec 19 08:22:59 2013
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Thu, 19 Dec 2013 08:22:59 +0100
Subject: [datatable-help] 'by' on a numeric column produces inconsistent
 utput
In-Reply-To: <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05>
References: <CAJXgQP2nAGMrz1Amzw5mH1ekDUmKpffGPrZ=5LLES4CWWBXBLQ@mail.gmail.com>
 <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05>
Message-ID: <6ED93E37928C4E849109DB7A5615EA61@gmail.com>

Not sure how to debug without being able to reproduce. Tried on Mac OS X 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows machine. I consistently gives me this:

> dt[,
+    list(max=max(y, na.rm=TRUE)),
+    by=list(by)
+    ]
    by        max
1: 0.7 0.01464054
2: 0.4 0.87328871
> 
> dt[,
+    list(max=max(y, na.rm=TRUE)),
+    by=list(by)
+    ]
    by        max
1: 0.7 0.01464054
2: 0.4 0.87328871


Can either of you provide me with the output of these steps in cases where there's an error? I've commented the output I get for each step. 

byval <- list(by=dt$by) 
o__ <- data.table:::fastorder(byval) # 2,3,1
f__ = data.table:::uniqlist(byval, order=o__) # 1,3
len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1
firstofeachgroup = o__[f__] # 2,1
origorder = data.table:::iradixorder(firstofeachgroup) # 2,1
f__ = f__[origorder] # 3,1
len__ = len__[origorder] # 2,1


Arun


On Thursday, December 19, 2013 at 3:50 AM, Michael Nelson wrote:

> Using
> data.table 1.8.11 (Fresh install from r-forge today)
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> 
> I get
> 
> by max
> 1: 0.7 0.01464054
> 2: 0.4 0.87328871
> 3: 0.4 -1.02794620
> 
> On both runs.
> 
> 
> 
> 
> ________________________________________
> From: datatable-help-bounces at lists.r-forge.r-project.org (mailto:datatable-help-bounces at lists.r-forge.r-project.org) [datatable-help-bounces at lists.r-forge.r-project.org (mailto:datatable-help-bounces at lists.r-forge.r-project.org)] on behalf of Kevin Ushey [kevinushey at gmail.com (mailto:kevinushey at gmail.com)]
> Sent: Thursday, 19 December 2013 12:54 PM
> To: datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> Subject: [datatable-help] 'by' on a numeric column produces inconsistent output
> 
> I'm cross-posting this from the GitHub mirror:
> https://github.com/arunsrinivasan/datatable/issues/2
> 
> For reference, I only see this with the latest RForge version of
> data.table (1.8.11), not the CRAN version of data.table.
> 
> -----
> 
> library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
> set.seed(32)
> n <- 3
> dt <- data.table(
> y=rnorm(n),
> by=round( rnorm(n), 1)
> )
> 
> dt[,
> list(max=max(y, na.rm=TRUE)),
> by=list(by)
> ]
> 
> dt[,
> list(max=max(y, na.rm=TRUE)),
> by=list(by)
> ]
> 
> produces the output
> 
> > dt[,
> + list(max=max(y, na.rm=TRUE)),
> + by=list(by)
> + ]
> by max
> 1: 0.4 0.01464054
> 2: 0.4 0.87328871
> 3: 0.7 -1.02794620
> > 
> > dt[,
> + list(max=max(y, na.rm=TRUE)),
> + by=list(by)
> + ]
> by max
> 1: 0.4 0.8732887
> 2: 0.7 -1.0279462
> 
> For some reason, the first return is wrong, while the second (and all
> subsequent) output is correct. Any idea what's going on?
> 
> > sessionInfo()
> R Under development (unstable) (2013-12-12 r64453)
> Platform: x86_64-apple-darwin13.0.0 (64-bit)
> 
> locale:
> [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
> 
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> 
> other attached packages:
> [1] data.table_1.8.11 knitr_1.5 devtools_1.4.1.99
> BiocInstaller_1.13.3
> 
> loaded via a namespace (and not attached):
> [1] compiler_3.1.0 digest_0.6.4 evaluate_0.5.1 formatR_0.10
> httr_0.2 memoise_0.1
> [7] parallel_3.1.0 plyr_1.8 RCurl_1.95-4.1 reshape2_1.2.2
> stringr_0.6.2 tools_3.1.0
> [13] whisker_0.3-2
> 
> ---
> 
> Kevin
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131219/3caffa96/attachment.html>

From aragorn168b at gmail.com  Thu Dec 19 08:34:27 2013
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Thu, 19 Dec 2013 08:34:27 +0100
Subject: [datatable-help] 'by' on a numeric column produces inconsistent
 output
In-Reply-To: <CAJXgQP2nAGMrz1Amzw5mH1ekDUmKpffGPrZ=5LLES4CWWBXBLQ@mail.gmail.com>
References: <CAJXgQP2nAGMrz1Amzw5mH1ekDUmKpffGPrZ=5LLES4CWWBXBLQ@mail.gmail.com>
Message-ID: <F05E29560A3344BB8F8E220C786D61BB@gmail.com>

Kevin, your output looks sorted by the "by" column, which shouldn't happen as well. So, I would consider even the second output wrong, unless you're setting key on "by". 

Arun


On Thursday, December 19, 2013 at 2:54 AM, Kevin Ushey wrote:

> I'm cross-posting this from the GitHub mirror:
> https://github.com/arunsrinivasan/datatable/issues/2
> 
> For reference, I only see this with the latest RForge version of
> data.table (1.8.11), not the CRAN version of data.table.
> 
> -----
> 
> library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
> set.seed(32)
> n <- 3
> dt <- data.table(
> y=rnorm(n),
> by=round( rnorm(n), 1)
> )
> 
> dt[,
> list(max=max(y, na.rm=TRUE)),
> by=list(by)
> ]
> 
> dt[,
> list(max=max(y, na.rm=TRUE)),
> by=list(by)
> ]
> 
> produces the output
> 
> > dt[,
> + list(max=max(y, na.rm=TRUE)),
> + by=list(by)
> + ]
> by max
> 1: 0.4 0.01464054
> 2: 0.4 0.87328871
> 3: 0.7 -1.02794620
> > 
> > dt[,
> + list(max=max(y, na.rm=TRUE)),
> + by=list(by)
> + ]
> by max
> 1: 0.4 0.8732887
> 2: 0.7 -1.0279462
> 
> For some reason, the first return is wrong, while the second (and all
> subsequent) output is correct. Any idea what's going on?
> 
> > sessionInfo()
> R Under development (unstable) (2013-12-12 r64453)
> Platform: x86_64-apple-darwin13.0.0 (64-bit)
> 
> locale:
> [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
> 
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> 
> other attached packages:
> [1] data.table_1.8.11 knitr_1.5 devtools_1.4.1.99
> BiocInstaller_1.13.3
> 
> loaded via a namespace (and not attached):
> [1] compiler_3.1.0 digest_0.6.4 evaluate_0.5.1 formatR_0.10
> httr_0.2 memoise_0.1
> [7] parallel_3.1.0 plyr_1.8 RCurl_1.95-4.1 reshape2_1.2.2
> stringr_0.6.2 tools_3.1.0
> [13] whisker_0.3-2
> 
> ---
> 
> Kevin
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131219/6b0b369f/attachment-0001.html>

From kevinushey at gmail.com  Thu Dec 19 08:37:21 2013
From: kevinushey at gmail.com (Kevin Ushey)
Date: Wed, 18 Dec 2013 23:37:21 -0800
Subject: [datatable-help] 'by' on a numeric column produces inconsistent
	utput
In-Reply-To: <6ED93E37928C4E849109DB7A5615EA61@gmail.com>
References: <CAJXgQP2nAGMrz1Amzw5mH1ekDUmKpffGPrZ=5LLES4CWWBXBLQ@mail.gmail.com>
 <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05>
 <6ED93E37928C4E849109DB7A5615EA61@gmail.com>
Message-ID: <CAJXgQP2qWYj0pfSTi_0vDoL+W3VWqq2PA4BdF_daoNF0_nKQGw@mail.gmail.com>

Hi Arun,

Here's the output on my machine -- other information missing from
before; it's with OSX Mavericks, with R and data.table compiled with
Apple clang.

---

> library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
> set.seed(32)
> n <- 3
> dt <- data.table(
+   y=rnorm(n),
+   by=round( rnorm(n), 1)
+ )
>
## run one
> byval <- list(by=dt$by)
> (o__ <- data.table:::fastorder(byval)) # 2,3,1
[1] 2 3 1
> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
[1] 1 2 3
> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
[1] 1 1 1
> (firstofeachgroup = o__[f__]) # 2,1
[1] 2 3 1
> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
[1] 3 1 2
> (f__ = f__[origorder]) # 3,1
[1] 3 1 2
> (len__ = len__[origorder]) # 2,1
[1] 1 1 1

## run two
> (o__ <- data.table:::fastorder(byval)) # 2,3,1
[1] 1 2 3
> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
[1] 1 3
> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
[1] 2 1
> (firstofeachgroup = o__[f__]) # 2,1
[1] 1 3
> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
[1] 1 2
> (f__ = f__[origorder]) # 3,1
[1] 1 3
> (len__ = len__[origorder]) # 2,1
[1] 2 1

On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan
<aragorn168b at gmail.com> wrote:
> Not sure how to debug without being able to reproduce. Tried on Mac OS X
> 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows
> machine. I consistently gives me this:
>
>> dt[,
> +    list(max=max(y, na.rm=TRUE)),
> +    by=list(by)
> +    ]
>     by        max
> 1: 0.7 0.01464054
> 2: 0.4 0.87328871
>>
>> dt[,
> +    list(max=max(y, na.rm=TRUE)),
> +    by=list(by)
> +    ]
>     by        max
> 1: 0.7 0.01464054
> 2: 0.4 0.87328871
>
> Can either of you provide me with the output of these steps in cases where
> there's an error? I've commented the output I get for each step.
>
> byval <- list(by=dt$by)
> o__ <- data.table:::fastorder(byval) # 2,3,1
> f__ = data.table:::uniqlist(byval, order=o__) # 1,3
> len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1
> firstofeachgroup = o__[f__] # 2,1
> origorder = data.table:::iradixorder(firstofeachgroup) # 2,1
> f__ = f__[origorder] # 3,1
> len__ = len__[origorder] # 2,1
>
>
> Arun
>
> <...snip...>

From aragorn168b at gmail.com  Thu Dec 19 08:44:16 2013
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Thu, 19 Dec 2013 08:44:16 +0100
Subject: [datatable-help] 'by' on a numeric column produces inconsistent
 utput
In-Reply-To: <CAJXgQP2qWYj0pfSTi_0vDoL+W3VWqq2PA4BdF_daoNF0_nKQGw@mail.gmail.com>
References: <CAJXgQP2nAGMrz1Amzw5mH1ekDUmKpffGPrZ=5LLES4CWWBXBLQ@mail.gmail.com>
 <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05>
 <6ED93E37928C4E849109DB7A5615EA61@gmail.com>
 <CAJXgQP2qWYj0pfSTi_0vDoL+W3VWqq2PA4BdF_daoNF0_nKQGw@mail.gmail.com>
Message-ID: <A02AE6C8D853482FB0921EC1E64BF92B@gmail.com>

Aha, the issue seems to be with 'uniqlist', not sure why it gives  
> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
1,2,3 for you and 1,3 consistently for me. I'll revert this back to `duplist` for now. Not sure how to solve this though. I've tried it so far on 3 machines:


1) OS X 10.8.5 + libvm (gcc)
2) OS X Mavericks + Clang 
3) Debian Weezy + gcc

All of them give consistent output. Man this is such a drag.

Arun


On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote:

> Hi Arun,
> 
> Here's the output on my machine -- other information missing from
> before; it's with OSX Mavericks, with R and data.table compiled with
> Apple clang.
> 
> ---
> 
> > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
> > set.seed(32)
> > n <- 3
> > dt <- data.table(
> > 
> 
> + y=rnorm(n),
> + by=round( rnorm(n), 1)
> + )
> > 
> 
> ## run one
> > byval <- list(by=dt$by)
> > (o__ <- data.table:::fastorder(byval)) # 2,3,1
> > 
> 
> [1] 2 3 1
> > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> 
> [1] 1 2 3
> > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
> 
> [1] 1 1 1
> > (firstofeachgroup = o__[f__]) # 2,1
> 
> [1] 2 3 1
> > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
> 
> [1] 3 1 2
> > (f__ = f__[origorder]) # 3,1
> 
> [1] 3 1 2
> > (len__ = len__[origorder]) # 2,1
> 
> [1] 1 1 1
> 
> ## run two
> > (o__ <- data.table:::fastorder(byval)) # 2,3,1
> 
> [1] 1 2 3
> > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> 
> [1] 1 3
> > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
> 
> [1] 2 1
> > (firstofeachgroup = o__[f__]) # 2,1
> 
> [1] 1 3
> > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
> 
> [1] 1 2
> > (f__ = f__[origorder]) # 3,1
> 
> [1] 1 3
> > (len__ = len__[origorder]) # 2,1
> 
> [1] 2 1
> 
> On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan
> <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > Not sure how to debug without being able to reproduce. Tried on Mac OS X
> > 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows
> > machine. I consistently gives me this:
> > 
> > > dt[,
> > + list(max=max(y, na.rm=TRUE)),
> > + by=list(by)
> > + ]
> > by max
> > 1: 0.7 0.01464054
> > 2: 0.4 0.87328871
> > > 
> > > dt[,
> > + list(max=max(y, na.rm=TRUE)),
> > + by=list(by)
> > + ]
> > by max
> > 1: 0.7 0.01464054
> > 2: 0.4 0.87328871
> > 
> > Can either of you provide me with the output of these steps in cases where
> > there's an error? I've commented the output I get for each step.
> > 
> > byval <- list(by=dt$by)
> > o__ <- data.table:::fastorder(byval) # 2,3,1
> > f__ = data.table:::uniqlist(byval, order=o__) # 1,3
> > len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1
> > firstofeachgroup = o__[f__] # 2,1
> > origorder = data.table:::iradixorder(firstofeachgroup) # 2,1
> > f__ = f__[origorder] # 3,1
> > len__ = len__[origorder] # 2,1
> > 
> > 
> > Arun
> > 
> > <...snip...> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131219/13c50432/attachment.html>

From szehnder at uni-bonn.de  Thu Dec 19 08:49:38 2013
From: szehnder at uni-bonn.de (Simon Zehnder)
Date: Thu, 19 Dec 2013 08:49:38 +0100
Subject: [datatable-help] 'by' on a numeric column produces inconsistent
	utput
In-Reply-To: <A02AE6C8D853482FB0921EC1E64BF92B@gmail.com>
References: <CAJXgQP2nAGMrz1Amzw5mH1ekDUmKpffGPrZ=5LLES4CWWBXBLQ@mail.gmail.com>
 <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05>
 <6ED93E37928C4E849109DB7A5615EA61@gmail.com>
 <CAJXgQP2qWYj0pfSTi_0vDoL+W3VWqq2PA4BdF_daoNF0_nKQGw@mail.gmail.com>
 <A02AE6C8D853482FB0921EC1E64BF92B@gmail.com>
Message-ID: <BC84B9CB-2809-4344-B7C3-29A891CE11BA@uni-bonn.de>

Arun,

if you could send me the reproducible code in copyable form I can as well try it on Mac OS X Mavericks with gcc 4.8. 

Best

Simon

On 19 Dec 2013, at 08:44, Arunkumar Srinivasan <aragorn168b at gmail.com> wrote:

> Aha, the issue seems to be with 'uniqlist', not sure why it gives 
>> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> 1,2,3 for you and 1,3 consistently for me. I'll revert this back to `duplist` for now. Not sure how to solve this though. I've tried it so far on 3 machines:
> 
> 1) OS X 10.8.5 + libvm (gcc)
> 2) OS X Mavericks + Clang 
> 3) Debian Weezy + gcc
> 
> All of them give consistent output. Man this is such a drag.
> 
> Arun
> 
> On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote:
> 
>> Hi Arun,
>> 
>> Here's the output on my machine -- other information missing from
>> before; it's with OSX Mavericks, with R and data.table compiled with
>> Apple clang.
>> 
>> ---
>> 
>>> library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
>>> set.seed(32)
>>> n <- 3
>>> dt <- data.table(
>> + y=rnorm(n),
>> + by=round( rnorm(n), 1)
>> + )
>> ## run one
>>> byval <- list(by=dt$by)
>>> (o__ <- data.table:::fastorder(byval)) # 2,3,1
>> [1] 2 3 1
>>> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
>> [1] 1 2 3
>>> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
>> [1] 1 1 1
>>> (firstofeachgroup = o__[f__]) # 2,1
>> [1] 2 3 1
>>> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
>> [1] 3 1 2
>>> (f__ = f__[origorder]) # 3,1
>> [1] 3 1 2
>>> (len__ = len__[origorder]) # 2,1
>> [1] 1 1 1
>> 
>> ## run two
>>> (o__ <- data.table:::fastorder(byval)) # 2,3,1
>> [1] 1 2 3
>>> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
>> [1] 1 3
>>> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
>> [1] 2 1
>>> (firstofeachgroup = o__[f__]) # 2,1
>> [1] 1 3
>>> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
>> [1] 1 2
>>> (f__ = f__[origorder]) # 3,1
>> [1] 1 3
>>> (len__ = len__[origorder]) # 2,1
>> [1] 2 1
>> 
>> On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan
>> <aragorn168b at gmail.com> wrote:
>>> Not sure how to debug without being able to reproduce. Tried on Mac OS X
>>> 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows
>>> machine. I consistently gives me this:
>>> 
>>>> dt[,
>>> + list(max=max(y, na.rm=TRUE)),
>>> + by=list(by)
>>> + ]
>>> by max
>>> 1: 0.7 0.01464054
>>> 2: 0.4 0.87328871
>>>> 
>>>> dt[,
>>> + list(max=max(y, na.rm=TRUE)),
>>> + by=list(by)
>>> + ]
>>> by max
>>> 1: 0.7 0.01464054
>>> 2: 0.4 0.87328871
>>> 
>>> Can either of you provide me with the output of these steps in cases where
>>> there's an error? I've commented the output I get for each step.
>>> 
>>> byval <- list(by=dt$by)
>>> o__ <- data.table:::fastorder(byval) # 2,3,1
>>> f__ = data.table:::uniqlist(byval, order=o__) # 1,3
>>> len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1
>>> firstofeachgroup = o__[f__] # 2,1
>>> origorder = data.table:::iradixorder(firstofeachgroup) # 2,1
>>> f__ = f__[origorder] # 3,1
>>> len__ = len__[origorder] # 2,1
>>> 
>>> 
>>> Arun
>>> 
>>> <...snip...>
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


From kevinushey at gmail.com  Thu Dec 19 08:55:18 2013
From: kevinushey at gmail.com (Kevin Ushey)
Date: Wed, 18 Dec 2013 23:55:18 -0800
Subject: [datatable-help] 'by' on a numeric column produces inconsistent
	utput
In-Reply-To: <A02AE6C8D853482FB0921EC1E64BF92B@gmail.com>
References: <CAJXgQP2nAGMrz1Amzw5mH1ekDUmKpffGPrZ=5LLES4CWWBXBLQ@mail.gmail.com>
 <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05>
 <6ED93E37928C4E849109DB7A5615EA61@gmail.com>
 <CAJXgQP2qWYj0pfSTi_0vDoL+W3VWqq2PA4BdF_daoNF0_nKQGw@mail.gmail.com>
 <A02AE6C8D853482FB0921EC1E64BF92B@gmail.com>
Message-ID: <CAJXgQP1quJM1PoDOTLt86vAtd5DgQn1WNSc6f+=ktzhrOXx1Qg@mail.gmail.com>

Hmm, I am seeing that after the data.table:::fastorder call, the dt
itself is modified. Notice that 'by' is rearranged without modifying
'y'.

> dt
             y  by
1:  0.01464054 0.7
2:  0.87328871 0.4
3: -1.02794620 0.4
> (o__ <- data.table:::fastorder(byval)) # 2,3,1
[1] 2 3 1
> dt
             y  by
1:  0.01464054 0.4
2:  0.87328871 0.4
3: -1.02794620 0.7

On Wed, Dec 18, 2013 at 11:44 PM, Arunkumar Srinivasan
<aragorn168b at gmail.com> wrote:
> Aha, the issue seems to be with 'uniqlist', not sure why it gives
>
> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
>
> 1,2,3 for you and 1,3 consistently for me. I'll revert this back to
> `duplist` for now. Not sure how to solve this though. I've tried it so far
> on 3 machines:
>
> 1) OS X 10.8.5 + libvm (gcc)
> 2) OS X Mavericks + Clang
> 3) Debian Weezy + gcc
>
> All of them give consistent output. Man this is such a drag.
>
> Arun
>
> On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote:
>
> Hi Arun,
>
> Here's the output on my machine -- other information missing from
> before; it's with OSX Mavericks, with R and data.table compiled with
> Apple clang.
>
> ---
>
> library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
> set.seed(32)
> n <- 3
> dt <- data.table(
>
> + y=rnorm(n),
> + by=round( rnorm(n), 1)
> + )
>
> ## run one
>
> byval <- list(by=dt$by)
> (o__ <- data.table:::fastorder(byval)) # 2,3,1
>
> [1] 2 3 1
>
> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
>
> [1] 1 2 3
>
> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
>
> [1] 1 1 1
>
> (firstofeachgroup = o__[f__]) # 2,1
>
> [1] 2 3 1
>
> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
>
> [1] 3 1 2
>
> (f__ = f__[origorder]) # 3,1
>
> [1] 3 1 2
>
> (len__ = len__[origorder]) # 2,1
>
> [1] 1 1 1
>
> ## run two
>
> (o__ <- data.table:::fastorder(byval)) # 2,3,1
>
> [1] 1 2 3
>
> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
>
> [1] 1 3
>
> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
>
> [1] 2 1
>
> (firstofeachgroup = o__[f__]) # 2,1
>
> [1] 1 3
>
> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
>
> [1] 1 2
>
> (f__ = f__[origorder]) # 3,1
>
> [1] 1 3
>
> (len__ = len__[origorder]) # 2,1
>
> [1] 2 1
>
> On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan
> <aragorn168b at gmail.com> wrote:
>
> Not sure how to debug without being able to reproduce. Tried on Mac OS X
> 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows
> machine. I consistently gives me this:
>
> dt[,
>
> + list(max=max(y, na.rm=TRUE)),
> + by=list(by)
> + ]
> by max
> 1: 0.7 0.01464054
> 2: 0.4 0.87328871
>
>
> dt[,
>
> + list(max=max(y, na.rm=TRUE)),
> + by=list(by)
> + ]
> by max
> 1: 0.7 0.01464054
> 2: 0.4 0.87328871
>
> Can either of you provide me with the output of these steps in cases where
> there's an error? I've commented the output I get for each step.
>
> byval <- list(by=dt$by)
> o__ <- data.table:::fastorder(byval) # 2,3,1
> f__ = data.table:::uniqlist(byval, order=o__) # 1,3
> len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1
> firstofeachgroup = o__[f__] # 2,1
> origorder = data.table:::iradixorder(firstofeachgroup) # 2,1
> f__ = f__[origorder] # 3,1
> len__ = len__[origorder] # 2,1
>
>
> Arun
>
> <...snip...>
>
>

From aragorn168b at gmail.com  Thu Dec 19 09:02:02 2013
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Thu, 19 Dec 2013 09:02:02 +0100
Subject: [datatable-help] 'by' on a numeric column produces inconsistent
 utput
In-Reply-To: <CAJXgQP1quJM1PoDOTLt86vAtd5DgQn1WNSc6f+=ktzhrOXx1Qg@mail.gmail.com>
References: <CAJXgQP2nAGMrz1Amzw5mH1ekDUmKpffGPrZ=5LLES4CWWBXBLQ@mail.gmail.com>
 <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05>
 <6ED93E37928C4E849109DB7A5615EA61@gmail.com>
 <CAJXgQP2qWYj0pfSTi_0vDoL+W3VWqq2PA4BdF_daoNF0_nKQGw@mail.gmail.com>
 <A02AE6C8D853482FB0921EC1E64BF92B@gmail.com>
 <CAJXgQP1quJM1PoDOTLt86vAtd5DgQn1WNSc6f+=ktzhrOXx1Qg@mail.gmail.com>
Message-ID: <BFE3E850909F4B16AFBB9A4F3B458CBC@gmail.com>

Ah, that explains it as well. So a copy is not being sent to fastorder, but that only happens the first time? I'll write again if there are more questions.   

Thanks again Kevin.


Arun


On Thursday, December 19, 2013 at 8:55 AM, Kevin Ushey wrote:

> Hmm, I am seeing that after the data.table:::fastorder call, the dt
> itself is modified. Notice that 'by' is rearranged without modifying
> 'y'.
>  
> > dt
> y by
> 1: 0.01464054 0.7
> 2: 0.87328871 0.4
> 3: -1.02794620 0.4
> > (o__ <- data.table:::fastorder(byval)) # 2,3,1
>  
> [1] 2 3 1
> > dt
>  
> y by
> 1: 0.01464054 0.4
> 2: 0.87328871 0.4
> 3: -1.02794620 0.7
>  
> On Wed, Dec 18, 2013 at 11:44 PM, Arunkumar Srinivasan
> <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > Aha, the issue seems to be with 'uniqlist', not sure why it gives
> >  
> > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> >  
> > 1,2,3 for you and 1,3 consistently for me. I'll revert this back to
> > `duplist` for now. Not sure how to solve this though. I've tried it so far
> > on 3 machines:
> >  
> > 1) OS X 10.8.5 + libvm (gcc)
> > 2) OS X Mavericks + Clang
> > 3) Debian Weezy + gcc
> >  
> > All of them give consistent output. Man this is such a drag.
> >  
> > Arun
> >  
> > On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote:
> >  
> > Hi Arun,
> >  
> > Here's the output on my machine -- other information missing from
> > before; it's with OSX Mavericks, with R and data.table compiled with
> > Apple clang.
> >  
> > ---
> >  
> > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
> > set.seed(32)
> > n <- 3
> > dt <- data.table(
> >  
> > + y=rnorm(n),
> > + by=round( rnorm(n), 1)
> > + )
> >  
> > ## run one
> >  
> > byval <- list(by=dt$by)
> > (o__ <- data.table:::fastorder(byval)) # 2,3,1
> >  
> > [1] 2 3 1
> >  
> > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> >  
> > [1] 1 2 3
> >  
> > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
> >  
> > [1] 1 1 1
> >  
> > (firstofeachgroup = o__[f__]) # 2,1
> >  
> > [1] 2 3 1
> >  
> > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
> >  
> > [1] 3 1 2
> >  
> > (f__ = f__[origorder]) # 3,1
> >  
> > [1] 3 1 2
> >  
> > (len__ = len__[origorder]) # 2,1
> >  
> > [1] 1 1 1
> >  
> > ## run two
> >  
> > (o__ <- data.table:::fastorder(byval)) # 2,3,1
> >  
> > [1] 1 2 3
> >  
> > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> >  
> > [1] 1 3
> >  
> > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
> >  
> > [1] 2 1
> >  
> > (firstofeachgroup = o__[f__]) # 2,1
> >  
> > [1] 1 3
> >  
> > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
> >  
> > [1] 1 2
> >  
> > (f__ = f__[origorder]) # 3,1
> >  
> > [1] 1 3
> >  
> > (len__ = len__[origorder]) # 2,1
> >  
> > [1] 2 1
> >  
> > On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan
> > <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> >  
> > Not sure how to debug without being able to reproduce. Tried on Mac OS X
> > 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows
> > machine. I consistently gives me this:
> >  
> > dt[,
> >  
> > + list(max=max(y, na.rm=TRUE)),
> > + by=list(by)
> > + ]
> > by max
> > 1: 0.7 0.01464054
> > 2: 0.4 0.87328871
> >  
> >  
> > dt[,
> >  
> > + list(max=max(y, na.rm=TRUE)),
> > + by=list(by)
> > + ]
> > by max
> > 1: 0.7 0.01464054
> > 2: 0.4 0.87328871
> >  
> > Can either of you provide me with the output of these steps in cases where
> > there's an error? I've commented the output I get for each step.
> >  
> > byval <- list(by=dt$by)
> > o__ <- data.table:::fastorder(byval) # 2,3,1
> > f__ = data.table:::uniqlist(byval, order=o__) # 1,3
> > len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1
> > firstofeachgroup = o__[f__] # 2,1
> > origorder = data.table:::iradixorder(firstofeachgroup) # 2,1
> > f__ = f__[origorder] # 3,1
> > len__ = len__[origorder] # 2,1
> >  
> >  
> > Arun
> >  
> > <...snip...>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131219/dd7cd03d/attachment-0001.html>

From aragorn168b at gmail.com  Thu Dec 19 09:05:33 2013
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Thu, 19 Dec 2013 09:05:33 +0100
Subject: [datatable-help] 'by' on a numeric column produces inconsistent
 utput
In-Reply-To: <BC84B9CB-2809-4344-B7C3-29A891CE11BA@uni-bonn.de>
References: <CAJXgQP2nAGMrz1Amzw5mH1ekDUmKpffGPrZ=5LLES4CWWBXBLQ@mail.gmail.com>
 <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05>
 <6ED93E37928C4E849109DB7A5615EA61@gmail.com>
 <CAJXgQP2qWYj0pfSTi_0vDoL+W3VWqq2PA4BdF_daoNF0_nKQGw@mail.gmail.com>
 <A02AE6C8D853482FB0921EC1E64BF92B@gmail.com>
 <BC84B9CB-2809-4344-B7C3-29A891CE11BA@uni-bonn.de>
Message-ID: <38C67C72F30C4C738B3A33DAF700ED3B@gmail.com>

Simon, sure. 

set.seed(32)
n <- 3
dt <- data.table(
y=rnorm(n),
by=round( rnorm(n), 1)
)

dt[,
list(max=max(y, na.rm=TRUE)),
by=list(by)
]

dt[,
list(max=max(y, na.rm=TRUE)),
by=list(by)
]


Arun


On Thursday, December 19, 2013 at 8:49 AM, Simon Zehnder wrote:

> Arun,
> 
> if you could send me the reproducible code in copyable form I can as well try it on Mac OS X Mavericks with gcc 4.8. 
> 
> Best
> 
> Simon
> 
> On 19 Dec 2013, at 08:44, Arunkumar Srinivasan <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> 
> > Aha, the issue seems to be with 'uniqlist', not sure why it gives 
> > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> > 
> > 1,2,3 for you and 1,3 consistently for me. I'll revert this back to `duplist` for now. Not sure how to solve this though. I've tried it so far on 3 machines:
> > 
> > 1) OS X 10.8.5 + libvm (gcc)
> > 2) OS X Mavericks + Clang 
> > 3) Debian Weezy + gcc
> > 
> > All of them give consistent output. Man this is such a drag.
> > 
> > Arun
> > 
> > On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote:
> > 
> > > Hi Arun,
> > > 
> > > Here's the output on my machine -- other information missing from
> > > before; it's with OSX Mavericks, with R and data.table compiled with
> > > Apple clang.
> > > 
> > > ---
> > > 
> > > > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
> > > > set.seed(32)
> > > > n <- 3
> > > > dt <- data.table(
> > > > 
> > > 
> > > + y=rnorm(n),
> > > + by=round( rnorm(n), 1)
> > > + )
> > > ## run one
> > > > byval <- list(by=dt$by)
> > > > (o__ <- data.table:::fastorder(byval)) # 2,3,1
> > > > 
> > > 
> > > [1] 2 3 1
> > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> > > 
> > > [1] 1 2 3
> > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
> > > 
> > > [1] 1 1 1
> > > > (firstofeachgroup = o__[f__]) # 2,1
> > > 
> > > [1] 2 3 1
> > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
> > > 
> > > [1] 3 1 2
> > > > (f__ = f__[origorder]) # 3,1
> > > 
> > > [1] 3 1 2
> > > > (len__ = len__[origorder]) # 2,1
> > > 
> > > [1] 1 1 1
> > > 
> > > ## run two
> > > > (o__ <- data.table:::fastorder(byval)) # 2,3,1
> > > 
> > > [1] 1 2 3
> > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> > > 
> > > [1] 1 3
> > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
> > > 
> > > [1] 2 1
> > > > (firstofeachgroup = o__[f__]) # 2,1
> > > 
> > > [1] 1 3
> > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
> > > 
> > > [1] 1 2
> > > > (f__ = f__[origorder]) # 3,1
> > > 
> > > [1] 1 3
> > > > (len__ = len__[origorder]) # 2,1
> > > 
> > > [1] 2 1
> > > 
> > > On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan
> > > <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > > > Not sure how to debug without being able to reproduce. Tried on Mac OS X
> > > > 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows
> > > > machine. I consistently gives me this:
> > > > 
> > > > > dt[,
> > > > + list(max=max(y, na.rm=TRUE)),
> > > > + by=list(by)
> > > > + ]
> > > > by max
> > > > 1: 0.7 0.01464054
> > > > 2: 0.4 0.87328871
> > > > > 
> > > > > dt[,
> > > > + list(max=max(y, na.rm=TRUE)),
> > > > + by=list(by)
> > > > + ]
> > > > by max
> > > > 1: 0.7 0.01464054
> > > > 2: 0.4 0.87328871
> > > > 
> > > > Can either of you provide me with the output of these steps in cases where
> > > > there's an error? I've commented the output I get for each step.
> > > > 
> > > > byval <- list(by=dt$by)
> > > > o__ <- data.table:::fastorder(byval) # 2,3,1
> > > > f__ = data.table:::uniqlist(byval, order=o__) # 1,3
> > > > len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1
> > > > firstofeachgroup = o__[f__] # 2,1
> > > > origorder = data.table:::iradixorder(firstofeachgroup) # 2,1
> > > > f__ = f__[origorder] # 3,1
> > > > len__ = len__[origorder] # 2,1
> > > > 
> > > > 
> > > > Arun
> > > > 
> > > > <...snip...>
> > 
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > 
> 
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131219/5e27dc58/attachment.html>

From szehnder at uni-bonn.de  Thu Dec 19 09:26:12 2013
From: szehnder at uni-bonn.de (Simon Zehnder)
Date: Thu, 19 Dec 2013 09:26:12 +0100
Subject: [datatable-help] 'by' on a numeric column produces inconsistent
	utput
In-Reply-To: <38C67C72F30C4C738B3A33DAF700ED3B@gmail.com>
References: <CAJXgQP2nAGMrz1Amzw5mH1ekDUmKpffGPrZ=5LLES4CWWBXBLQ@mail.gmail.com>
 <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05>
 <6ED93E37928C4E849109DB7A5615EA61@gmail.com>
 <CAJXgQP2qWYj0pfSTi_0vDoL+W3VWqq2PA4BdF_daoNF0_nKQGw@mail.gmail.com>
 <A02AE6C8D853482FB0921EC1E64BF92B@gmail.com>
 <BC84B9CB-2809-4344-B7C3-29A891CE11BA@uni-bonn.de>
 <38C67C72F30C4C738B3A33DAF700ED3B@gmail.com>
Message-ID: <7A37F022-444B-4708-8947-59437F4AA090@uni-bonn.de>

Hi Arun,

here the results on Mac OS X Mavericks with gcc 4.8.2

data.table 1.8.10:

> set.seed(32)
> n <- 3
> dt <- data.table(
+ y=rnorm(n),
+ by=round( rnorm(n), 1)
+ )
>
> dt[,
+ list(max=max(y, na.rm=TRUE)),
+ by=list(by)
+ ]
    by        max
1: 0.7 0.01464054
2: 0.4 0.87328871
>
> dt[,
+ list(max=max(y, na.rm=TRUE)),
+ by=list(by)
+ ]
    by        max
1: 0.7 0.01464054
2: 0.4 0.87328871

data.table 1.8.11:

> set.seed(32)
> n <- 3
> dt <- data.table(
+ y=rnorm(n),
+ by=round( rnorm(n), 1)
+ )
>
> dt[,
+ list(max=max(y, na.rm=TRUE)),
+ by=list(by)
+ ]
    by        max
1: 0.7 0.01464054
2: 0.4 0.87328871
>
> dt[,
+ list(max=max(y, na.rm=TRUE)),
+ by=list(by)
+ ]
    by        max
1: 0.7 0.01464054
2: 0.4 0.87328871

Best

Simon


On 19 Dec 2013, at 09:05, Arunkumar Srinivasan <aragorn168b at gmail.com> wrote:

> Simon, sure.
> 
> set.seed(32)
> n <- 3
> dt <- data.table(
> y=rnorm(n),
> by=round( rnorm(n), 1)
> )
> 
> dt[,
> list(max=max(y, na.rm=TRUE)),
> by=list(by)
> ]
> 
> dt[,
> list(max=max(y, na.rm=TRUE)),
> by=list(by)
> ]
> 
> 
> 
> Arun
> 
> On Thursday, December 19, 2013 at 8:49 AM, Simon Zehnder wrote:
> 
>> Arun,
>> 
>> if you could send me the reproducible code in copyable form I can as well try it on Mac OS X Mavericks with gcc 4.8.
>> 
>> Best
>> 
>> Simon
>> 
>> On 19 Dec 2013, at 08:44, Arunkumar Srinivasan <aragorn168b at gmail.com> wrote:
>> 
>>> Aha, the issue seems to be with 'uniqlist', not sure why it gives
>>>> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
>>> 1,2,3 for you and 1,3 consistently for me. I'll revert this back to `duplist` for now. Not sure how to solve this though. I've tried it so far on 3 machines:
>>> 
>>> 1) OS X 10.8.5 + libvm (gcc)
>>> 2) OS X Mavericks + Clang
>>> 3) Debian Weezy + gcc
>>> 
>>> All of them give consistent output. Man this is such a drag.
>>> 
>>> Arun
>>> 
>>> On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote:
>>> 
>>>> Hi Arun,
>>>> 
>>>> Here's the output on my machine -- other information missing from
>>>> before; it's with OSX Mavericks, with R and data.table compiled with
>>>> Apple clang.
>>>> 
>>>> ---
>>>> 
>>>>> library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
>>>>> set.seed(32)
>>>>> n <- 3
>>>>> dt <- data.table(
>>>> + y=rnorm(n),
>>>> + by=round( rnorm(n), 1)
>>>> + )
>>>> ## run one
>>>>> byval <- list(by=dt$by)
>>>>> (o__ <- data.table:::fastorder(byval)) # 2,3,1
>>>> [1] 2 3 1
>>>>> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
>>>> [1] 1 2 3
>>>>> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
>>>> [1] 1 1 1
>>>>> (firstofeachgroup = o__[f__]) # 2,1
>>>> [1] 2 3 1
>>>>> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
>>>> [1] 3 1 2
>>>>> (f__ = f__[origorder]) # 3,1
>>>> [1] 3 1 2
>>>>> (len__ = len__[origorder]) # 2,1
>>>> [1] 1 1 1
>>>> 
>>>> ## run two
>>>>> (o__ <- data.table:::fastorder(byval)) # 2,3,1
>>>> [1] 1 2 3
>>>>> (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
>>>> [1] 1 3
>>>>> (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
>>>> [1] 2 1
>>>>> (firstofeachgroup = o__[f__]) # 2,1
>>>> [1] 1 3
>>>>> (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
>>>> [1] 1 2
>>>>> (f__ = f__[origorder]) # 3,1
>>>> [1] 1 3
>>>>> (len__ = len__[origorder]) # 2,1
>>>> [1] 2 1
>>>> 
>>>> On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan
>>>> <aragorn168b at gmail.com> wrote:
>>>>> Not sure how to debug without being able to reproduce. Tried on Mac OS X
>>>>> 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows
>>>>> machine. I consistently gives me this:
>>>>> 
>>>>>> dt[,
>>>>> + list(max=max(y, na.rm=TRUE)),
>>>>> + by=list(by)
>>>>> + ]
>>>>> by max
>>>>> 1: 0.7 0.01464054
>>>>> 2: 0.4 0.87328871
>>>>>> 
>>>>>> dt[,
>>>>> + list(max=max(y, na.rm=TRUE)),
>>>>> + by=list(by)
>>>>> + ]
>>>>> by max
>>>>> 1: 0.7 0.01464054
>>>>> 2: 0.4 0.87328871
>>>>> 
>>>>> Can either of you provide me with the output of these steps in cases where
>>>>> there's an error? I've commented the output I get for each step.
>>>>> 
>>>>> byval <- list(by=dt$by)
>>>>> o__ <- data.table:::fastorder(byval) # 2,3,1
>>>>> f__ = data.table:::uniqlist(byval, order=o__) # 1,3
>>>>> len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1
>>>>> firstofeachgroup = o__[f__] # 2,1
>>>>> origorder = data.table:::iradixorder(firstofeachgroup) # 2,1
>>>>> f__ = f__[origorder] # 3,1
>>>>> len__ = len__[origorder] # 2,1
>>>>> 
>>>>> 
>>>>> Arun
>>>>> 
>>>>> <...snip...>
>>> 
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> 


From aragorn168b at gmail.com  Thu Dec 19 09:36:13 2013
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Thu, 19 Dec 2013 09:36:13 +0100
Subject: [datatable-help] 'by' on a numeric column produces inconsistent
 utput
In-Reply-To: <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05>
References: <CAJXgQP2nAGMrz1Amzw5mH1ekDUmKpffGPrZ=5LLES4CWWBXBLQ@mail.gmail.com>
 <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05>
Message-ID: <C8ABBFBCFEA0443281B88D6F88D21B1C@gmail.com>

@mnel, I'm not sure I understand your output. Yours is different from the correct output, but it is also different from Kevin's. Basically, dt[, max(y), by=by] has no effect on yours and just returns back dt? 

Arun


On Thursday, December 19, 2013 at 3:50 AM, Michael Nelson wrote:

> Using
> data.table 1.8.11 (Fresh install from r-forge today)
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> 
> I get
> 
> by max
> 1: 0.7 0.01464054
> 2: 0.4 0.87328871
> 3: 0.4 -1.02794620
> 
> On both runs.
> 
> 
> 
> 
> ________________________________________
> From: datatable-help-bounces at lists.r-forge.r-project.org (mailto:datatable-help-bounces at lists.r-forge.r-project.org) [datatable-help-bounces at lists.r-forge.r-project.org (mailto:datatable-help-bounces at lists.r-forge.r-project.org)] on behalf of Kevin Ushey [kevinushey at gmail.com (mailto:kevinushey at gmail.com)]
> Sent: Thursday, 19 December 2013 12:54 PM
> To: datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> Subject: [datatable-help] 'by' on a numeric column produces inconsistent output
> 
> I'm cross-posting this from the GitHub mirror:
> https://github.com/arunsrinivasan/datatable/issues/2
> 
> For reference, I only see this with the latest RForge version of
> data.table (1.8.11), not the CRAN version of data.table.
> 
> -----
> 
> library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
> set.seed(32)
> n <- 3
> dt <- data.table(
> y=rnorm(n),
> by=round( rnorm(n), 1)
> )
> 
> dt[,
> list(max=max(y, na.rm=TRUE)),
> by=list(by)
> ]
> 
> dt[,
> list(max=max(y, na.rm=TRUE)),
> by=list(by)
> ]
> 
> produces the output
> 
> > dt[,
> + list(max=max(y, na.rm=TRUE)),
> + by=list(by)
> + ]
> by max
> 1: 0.4 0.01464054
> 2: 0.4 0.87328871
> 3: 0.7 -1.02794620
> > 
> > dt[,
> + list(max=max(y, na.rm=TRUE)),
> + by=list(by)
> + ]
> by max
> 1: 0.4 0.8732887
> 2: 0.7 -1.0279462
> 
> For some reason, the first return is wrong, while the second (and all
> subsequent) output is correct. Any idea what's going on?
> 
> > sessionInfo()
> R Under development (unstable) (2013-12-12 r64453)
> Platform: x86_64-apple-darwin13.0.0 (64-bit)
> 
> locale:
> [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
> 
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> 
> other attached packages:
> [1] data.table_1.8.11 knitr_1.5 devtools_1.4.1.99
> BiocInstaller_1.13.3
> 
> loaded via a namespace (and not attached):
> [1] compiler_3.1.0 digest_0.6.4 evaluate_0.5.1 formatR_0.10
> httr_0.2 memoise_0.1
> [7] parallel_3.1.0 plyr_1.8 RCurl_1.95-4.1 reshape2_1.2.2
> stringr_0.6.2 tools_3.1.0
> [13] whisker_0.3-2
> 
> ---
> 
> Kevin
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131219/5210092c/attachment-0001.html>

From aragorn168b at gmail.com  Thu Dec 19 09:39:19 2013
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Thu, 19 Dec 2013 09:39:19 +0100
Subject: [datatable-help] 'by' on a numeric column produces inconsistent
 utput
In-Reply-To: <C8ABBFBCFEA0443281B88D6F88D21B1C@gmail.com>
References: <CAJXgQP2nAGMrz1Amzw5mH1ekDUmKpffGPrZ=5LLES4CWWBXBLQ@mail.gmail.com>
 <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05>
 <C8ABBFBCFEA0443281B88D6F88D21B1C@gmail.com>
Message-ID: <6C86E3EEE8E041999B7F02D6BE27215F@gmail.com>

As I was just writing Kevin, I think (if @mnel could verify his output is correct), the reason is because Kevin's using R-devel... 

If you do the following shown below, then, `xx` should *not* have the same address as *dt$by* (as is the case for me). But for Kevin, they seem to be pointing to the same location and I can't figure out why it would/should, from how R has been working so far.
byval <- list(by=dt$by)
address(dt$by)
# [1] "0x7fa848ad8608"

address(byval)
# [1] "0x7fa84a93fa68"

xx = byval[[1L]]
address(xx)
# [1] "0x7fa848e3fc48"

address(list(xx))
[1] "0x7fa84aa1ba78"

data.table:::dradixorder(xx)
# [1] 2 3 1

byval
$by
[1] 0.7 0.4 0.4


Arun


On Thursday, December 19, 2013 at 9:36 AM, Arunkumar Srinivasan wrote:

> @mnel, I'm not sure I understand your output. Yours is different from the correct output, but it is also different from Kevin's. Basically, dt[, max(y), by=by] has no effect on yours and just returns back dt? 
> 
> Arun
> 
> 
> On Thursday, December 19, 2013 at 3:50 AM, Michael Nelson wrote:
> 
> > Using
> > data.table 1.8.11 (Fresh install from r-forge today)
> > R version 3.0.2 (2013-09-25)
> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> > 
> > I get
> > 
> > by max
> > 1: 0.7 0.01464054
> > 2: 0.4 0.87328871
> > 3: 0.4 -1.02794620
> > 
> > On both runs.
> > 
> > 
> > 
> > 
> > ________________________________________
> > From: datatable-help-bounces at lists.r-forge.r-project.org (mailto:datatable-help-bounces at lists.r-forge.r-project.org) [datatable-help-bounces at lists.r-forge.r-project.org (mailto:datatable-help-bounces at lists.r-forge.r-project.org)] on behalf of Kevin Ushey [kevinushey at gmail.com (mailto:kevinushey at gmail.com)]
> > Sent: Thursday, 19 December 2013 12:54 PM
> > To: datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > Subject: [datatable-help] 'by' on a numeric column produces inconsistent output
> > 
> > I'm cross-posting this from the GitHub mirror:
> > https://github.com/arunsrinivasan/datatable/issues/2
> > 
> > For reference, I only see this with the latest RForge version of
> > data.table (1.8.11), not the CRAN version of data.table.
> > 
> > -----
> > 
> > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
> > set.seed(32)
> > n <- 3
> > dt <- data.table(
> > y=rnorm(n),
> > by=round( rnorm(n), 1)
> > )
> > 
> > dt[,
> > list(max=max(y, na.rm=TRUE)),
> > by=list(by)
> > ]
> > 
> > dt[,
> > list(max=max(y, na.rm=TRUE)),
> > by=list(by)
> > ]
> > 
> > produces the output
> > 
> > > dt[,
> > + list(max=max(y, na.rm=TRUE)),
> > + by=list(by)
> > + ]
> > by max
> > 1: 0.4 0.01464054
> > 2: 0.4 0.87328871
> > 3: 0.7 -1.02794620
> > > 
> > > dt[,
> > + list(max=max(y, na.rm=TRUE)),
> > + by=list(by)
> > + ]
> > by max
> > 1: 0.4 0.8732887
> > 2: 0.7 -1.0279462
> > 
> > For some reason, the first return is wrong, while the second (and all
> > subsequent) output is correct. Any idea what's going on?
> > 
> > > sessionInfo()
> > R Under development (unstable) (2013-12-12 r64453)
> > Platform: x86_64-apple-darwin13.0.0 (64-bit)
> > 
> > locale:
> > [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
> > 
> > attached base packages:
> > [1] stats graphics grDevices utils datasets methods base
> > 
> > other attached packages:
> > [1] data.table_1.8.11 knitr_1.5 devtools_1.4.1.99
> > BiocInstaller_1.13.3
> > 
> > loaded via a namespace (and not attached):
> > [1] compiler_3.1.0 digest_0.6.4 evaluate_0.5.1 formatR_0.10
> > httr_0.2 memoise_0.1
> > [7] parallel_3.1.0 plyr_1.8 RCurl_1.95-4.1 reshape2_1.2.2
> > stringr_0.6.2 tools_3.1.0
> > [13] whisker_0.3-2
> > 
> > ---
> > 
> > Kevin
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > 
> > 
> > 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131219/5e1d067f/attachment.html>

From aragorn168b at gmail.com  Thu Dec 19 09:43:17 2013
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Thu, 19 Dec 2013 09:43:17 +0100
Subject: [datatable-help] 'by' on a numeric column produces inconsistent
 utput
In-Reply-To: <7A37F022-444B-4708-8947-59437F4AA090@uni-bonn.de>
References: <CAJXgQP2nAGMrz1Amzw5mH1ekDUmKpffGPrZ=5LLES4CWWBXBLQ@mail.gmail.com>
 <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05>
 <6ED93E37928C4E849109DB7A5615EA61@gmail.com>
 <CAJXgQP2qWYj0pfSTi_0vDoL+W3VWqq2PA4BdF_daoNF0_nKQGw@mail.gmail.com>
 <A02AE6C8D853482FB0921EC1E64BF92B@gmail.com>
 <BC84B9CB-2809-4344-B7C3-29A891CE11BA@uni-bonn.de>
 <38C67C72F30C4C738B3A33DAF700ED3B@gmail.com>
 <7A37F022-444B-4708-8947-59437F4AA090@uni-bonn.de>
Message-ID: <EEC4D0A02B8145EF862B5C9887B6DCE3@gmail.com>

Simon, 

Thanks. One more towards my way :). I think we've nailed down the problem to R-devel version. I'll write again once I discuss it over with Kevin. 

Arun


On Thursday, December 19, 2013 at 9:26 AM, Simon Zehnder wrote:

> Hi Arun,
> 
> here the results on Mac OS X Mavericks with gcc 4.8.2
> 
> data.table 1.8.10:
> 
> > set.seed(32)
> > n <- 3
> > dt <- data.table(
> > 
> 
> + y=rnorm(n),
> + by=round( rnorm(n), 1)
> + )
> > 
> > dt[,
> + list(max=max(y, na.rm=TRUE)),
> + by=list(by)
> + ]
> by max
> 1: 0.7 0.01464054
> 2: 0.4 0.87328871
> > 
> > dt[,
> + list(max=max(y, na.rm=TRUE)),
> + by=list(by)
> + ]
> by max
> 1: 0.7 0.01464054
> 2: 0.4 0.87328871
> 
> data.table 1.8.11:
> 
> > set.seed(32)
> > n <- 3
> > dt <- data.table(
> > 
> 
> + y=rnorm(n),
> + by=round( rnorm(n), 1)
> + )
> > 
> > dt[,
> + list(max=max(y, na.rm=TRUE)),
> + by=list(by)
> + ]
> by max
> 1: 0.7 0.01464054
> 2: 0.4 0.87328871
> > 
> > dt[,
> + list(max=max(y, na.rm=TRUE)),
> + by=list(by)
> + ]
> by max
> 1: 0.7 0.01464054
> 2: 0.4 0.87328871
> 
> Best
> 
> Simon
> 
> 
> On 19 Dec 2013, at 09:05, Arunkumar Srinivasan <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> 
> > Simon, sure.
> > 
> > set.seed(32)
> > n <- 3
> > dt <- data.table(
> > y=rnorm(n),
> > by=round( rnorm(n), 1)
> > )
> > 
> > dt[,
> > list(max=max(y, na.rm=TRUE)),
> > by=list(by)
> > ]
> > 
> > dt[,
> > list(max=max(y, na.rm=TRUE)),
> > by=list(by)
> > ]
> > 
> > 
> > 
> > Arun
> > 
> > On Thursday, December 19, 2013 at 8:49 AM, Simon Zehnder wrote:
> > 
> > > Arun,
> > > 
> > > if you could send me the reproducible code in copyable form I can as well try it on Mac OS X Mavericks with gcc 4.8.
> > > 
> > > Best
> > > 
> > > Simon
> > > 
> > > On 19 Dec 2013, at 08:44, Arunkumar Srinivasan <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > > 
> > > > Aha, the issue seems to be with 'uniqlist', not sure why it gives
> > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> > > > 
> > > > 1,2,3 for you and 1,3 consistently for me. I'll revert this back to `duplist` for now. Not sure how to solve this though. I've tried it so far on 3 machines:
> > > > 
> > > > 1) OS X 10.8.5 + libvm (gcc)
> > > > 2) OS X Mavericks + Clang
> > > > 3) Debian Weezy + gcc
> > > > 
> > > > All of them give consistent output. Man this is such a drag.
> > > > 
> > > > Arun
> > > > 
> > > > On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote:
> > > > 
> > > > > Hi Arun,
> > > > > 
> > > > > Here's the output on my machine -- other information missing from
> > > > > before; it's with OSX Mavericks, with R and data.table compiled with
> > > > > Apple clang.
> > > > > 
> > > > > ---
> > > > > 
> > > > > > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
> > > > > > set.seed(32)
> > > > > > n <- 3
> > > > > > dt <- data.table(
> > > > > > 
> > > > > 
> > > > > + y=rnorm(n),
> > > > > + by=round( rnorm(n), 1)
> > > > > + )
> > > > > ## run one
> > > > > > byval <- list(by=dt$by)
> > > > > > (o__ <- data.table:::fastorder(byval)) # 2,3,1
> > > > > > 
> > > > > 
> > > > > [1] 2 3 1
> > > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> > > > > 
> > > > > [1] 1 2 3
> > > > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
> > > > > 
> > > > > [1] 1 1 1
> > > > > > (firstofeachgroup = o__[f__]) # 2,1
> > > > > 
> > > > > [1] 2 3 1
> > > > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
> > > > > 
> > > > > [1] 3 1 2
> > > > > > (f__ = f__[origorder]) # 3,1
> > > > > 
> > > > > [1] 3 1 2
> > > > > > (len__ = len__[origorder]) # 2,1
> > > > > 
> > > > > [1] 1 1 1
> > > > > 
> > > > > ## run two
> > > > > > (o__ <- data.table:::fastorder(byval)) # 2,3,1
> > > > > 
> > > > > [1] 1 2 3
> > > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> > > > > 
> > > > > [1] 1 3
> > > > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
> > > > > 
> > > > > [1] 2 1
> > > > > > (firstofeachgroup = o__[f__]) # 2,1
> > > > > 
> > > > > [1] 1 3
> > > > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
> > > > > 
> > > > > [1] 1 2
> > > > > > (f__ = f__[origorder]) # 3,1
> > > > > 
> > > > > [1] 1 3
> > > > > > (len__ = len__[origorder]) # 2,1
> > > > > 
> > > > > [1] 2 1
> > > > > 
> > > > > On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan
> > > > > <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > > > > > Not sure how to debug without being able to reproduce. Tried on Mac OS X
> > > > > > 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows
> > > > > > machine. I consistently gives me this:
> > > > > > 
> > > > > > > dt[,
> > > > > > + list(max=max(y, na.rm=TRUE)),
> > > > > > + by=list(by)
> > > > > > + ]
> > > > > > by max
> > > > > > 1: 0.7 0.01464054
> > > > > > 2: 0.4 0.87328871
> > > > > > > 
> > > > > > > dt[,
> > > > > > + list(max=max(y, na.rm=TRUE)),
> > > > > > + by=list(by)
> > > > > > + ]
> > > > > > by max
> > > > > > 1: 0.7 0.01464054
> > > > > > 2: 0.4 0.87328871
> > > > > > 
> > > > > > Can either of you provide me with the output of these steps in cases where
> > > > > > there's an error? I've commented the output I get for each step.
> > > > > > 
> > > > > > byval <- list(by=dt$by)
> > > > > > o__ <- data.table:::fastorder(byval) # 2,3,1
> > > > > > f__ = data.table:::uniqlist(byval, order=o__) # 1,3
> > > > > > len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1
> > > > > > firstofeachgroup = o__[f__] # 2,1
> > > > > > origorder = data.table:::iradixorder(firstofeachgroup) # 2,1
> > > > > > f__ = f__[origorder] # 3,1
> > > > > > len__ = len__[origorder] # 2,1
> > > > > > 
> > > > > > 
> > > > > > Arun
> > > > > > 
> > > > > > <...snip...>
> > > > 
> > > > _______________________________________________
> > > > datatable-help mailing list
> > > > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > > > 
> > > 
> > > 
> > 
> > 
> 
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131219/effc3224/attachment-0001.html>

From aragorn168b at gmail.com  Thu Dec 19 12:56:01 2013
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Thu, 19 Dec 2013 12:56:01 +0100
Subject: [datatable-help] 'by' on a numeric column produces inconsistent
 utput
In-Reply-To: <EEC4D0A02B8145EF862B5C9887B6DCE3@gmail.com>
References: <CAJXgQP2nAGMrz1Amzw5mH1ekDUmKpffGPrZ=5LLES4CWWBXBLQ@mail.gmail.com>
 <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05>
 <6ED93E37928C4E849109DB7A5615EA61@gmail.com>
 <CAJXgQP2qWYj0pfSTi_0vDoL+W3VWqq2PA4BdF_daoNF0_nKQGw@mail.gmail.com>
 <A02AE6C8D853482FB0921EC1E64BF92B@gmail.com>
 <BC84B9CB-2809-4344-B7C3-29A891CE11BA@uni-bonn.de>
 <38C67C72F30C4C738B3A33DAF700ED3B@gmail.com>
 <7A37F022-444B-4708-8947-59437F4AA090@uni-bonn.de>
 <EEC4D0A02B8145EF862B5C9887B6DCE3@gmail.com>
Message-ID: <83EFE01CB21543C2AFA86A4F5FAD03FC@gmail.com>

Just tested this on the devel version (today's). And yes, this issue happens. But I'm not sure if this is an issue with 'data.table' per-se: 

On a clean session, if you do this:

require(data.table)
set.seed(32)
n <- 3
dt <- data.table(y=rnorm(n), by=round( rnorm(n), 1))

ll <- list(dt$by)
yy <- ll[[1L]]
address(dt$by) # [1] "0x7fad3c524a40"
address(ll[[1L]]) # [1] "0x7fad3c524a40"
address(yy) # [1] "0x7fad3c524a40"


You see that all three are pointing to the same address. And that's why the result is wrong because internally "yy" will be changed by reference during "fastorder". And it is *not* supposed to point to "yy" but to have made a copy.

After doing it the first time, the pointing changes back to how it's in R-stable.. Not sure if this is desirable. Probably should report on R-devel.

On R-3.0.2, the same commands as above on a clean session:

require(data.table)
set.seed(32)
n <- 3
dt <- data.table(y=rnorm(n), by=round( rnorm(n), 1))

ll <- list(dt$by)
yy <- ll[[1L]]
address(dt$by) # [1] "0x7fc35b640408"
address(ll[[1L]]) # [1] "0x7fc35a0ec838"
address(yy) # [1] "0x7fc35a0ec838"


Arun


On Thursday, December 19, 2013 at 9:43 AM, Arunkumar Srinivasan wrote:

> Simon, 
> 
> Thanks. One more towards my way :). I think we've nailed down the problem to R-devel version. I'll write again once I discuss it over with Kevin. 
> 
> Arun
> 
> 
> On Thursday, December 19, 2013 at 9:26 AM, Simon Zehnder wrote:
> 
> > Hi Arun,
> > 
> > here the results on Mac OS X Mavericks with gcc 4.8.2
> > 
> > data.table 1.8.10:
> > 
> > > set.seed(32)
> > > n <- 3
> > > dt <- data.table(
> > > 
> > 
> > + y=rnorm(n),
> > + by=round( rnorm(n), 1)
> > + )
> > > 
> > > dt[,
> > + list(max=max(y, na.rm=TRUE)),
> > + by=list(by)
> > + ]
> > by max
> > 1: 0.7 0.01464054
> > 2: 0.4 0.87328871
> > > 
> > > dt[,
> > + list(max=max(y, na.rm=TRUE)),
> > + by=list(by)
> > + ]
> > by max
> > 1: 0.7 0.01464054
> > 2: 0.4 0.87328871
> > 
> > data.table 1.8.11:
> > 
> > > set.seed(32)
> > > n <- 3
> > > dt <- data.table(
> > > 
> > 
> > + y=rnorm(n),
> > + by=round( rnorm(n), 1)
> > + )
> > > 
> > > dt[,
> > + list(max=max(y, na.rm=TRUE)),
> > + by=list(by)
> > + ]
> > by max
> > 1: 0.7 0.01464054
> > 2: 0.4 0.87328871
> > > 
> > > dt[,
> > + list(max=max(y, na.rm=TRUE)),
> > + by=list(by)
> > + ]
> > by max
> > 1: 0.7 0.01464054
> > 2: 0.4 0.87328871
> > 
> > Best
> > 
> > Simon
> > 
> > 
> > On 19 Dec 2013, at 09:05, Arunkumar Srinivasan <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > 
> > > Simon, sure.
> > > 
> > > set.seed(32)
> > > n <- 3
> > > dt <- data.table(
> > > y=rnorm(n),
> > > by=round( rnorm(n), 1)
> > > )
> > > 
> > > dt[,
> > > list(max=max(y, na.rm=TRUE)),
> > > by=list(by)
> > > ]
> > > 
> > > dt[,
> > > list(max=max(y, na.rm=TRUE)),
> > > by=list(by)
> > > ]
> > > 
> > > 
> > > 
> > > Arun
> > > 
> > > On Thursday, December 19, 2013 at 8:49 AM, Simon Zehnder wrote:
> > > 
> > > > Arun,
> > > > 
> > > > if you could send me the reproducible code in copyable form I can as well try it on Mac OS X Mavericks with gcc 4.8.
> > > > 
> > > > Best
> > > > 
> > > > Simon
> > > > 
> > > > On 19 Dec 2013, at 08:44, Arunkumar Srinivasan <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > > > 
> > > > > Aha, the issue seems to be with 'uniqlist', not sure why it gives
> > > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> > > > > 
> > > > > 1,2,3 for you and 1,3 consistently for me. I'll revert this back to `duplist` for now. Not sure how to solve this though. I've tried it so far on 3 machines:
> > > > > 
> > > > > 1) OS X 10.8.5 + libvm (gcc)
> > > > > 2) OS X Mavericks + Clang
> > > > > 3) Debian Weezy + gcc
> > > > > 
> > > > > All of them give consistent output. Man this is such a drag.
> > > > > 
> > > > > Arun
> > > > > 
> > > > > On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote:
> > > > > 
> > > > > > Hi Arun,
> > > > > > 
> > > > > > Here's the output on my machine -- other information missing from
> > > > > > before; it's with OSX Mavericks, with R and data.table compiled with
> > > > > > Apple clang.
> > > > > > 
> > > > > > ---
> > > > > > 
> > > > > > > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
> > > > > > > set.seed(32)
> > > > > > > n <- 3
> > > > > > > dt <- data.table(
> > > > > > > 
> > > > > > 
> > > > > > + y=rnorm(n),
> > > > > > + by=round( rnorm(n), 1)
> > > > > > + )
> > > > > > ## run one
> > > > > > > byval <- list(by=dt$by)
> > > > > > > (o__ <- data.table:::fastorder(byval)) # 2,3,1
> > > > > > > 
> > > > > > 
> > > > > > [1] 2 3 1
> > > > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> > > > > > 
> > > > > > [1] 1 2 3
> > > > > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
> > > > > > 
> > > > > > [1] 1 1 1
> > > > > > > (firstofeachgroup = o__[f__]) # 2,1
> > > > > > 
> > > > > > [1] 2 3 1
> > > > > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
> > > > > > 
> > > > > > [1] 3 1 2
> > > > > > > (f__ = f__[origorder]) # 3,1
> > > > > > 
> > > > > > [1] 3 1 2
> > > > > > > (len__ = len__[origorder]) # 2,1
> > > > > > 
> > > > > > [1] 1 1 1
> > > > > > 
> > > > > > ## run two
> > > > > > > (o__ <- data.table:::fastorder(byval)) # 2,3,1
> > > > > > 
> > > > > > [1] 1 2 3
> > > > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> > > > > > 
> > > > > > [1] 1 3
> > > > > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
> > > > > > 
> > > > > > [1] 2 1
> > > > > > > (firstofeachgroup = o__[f__]) # 2,1
> > > > > > 
> > > > > > [1] 1 3
> > > > > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
> > > > > > 
> > > > > > [1] 1 2
> > > > > > > (f__ = f__[origorder]) # 3,1
> > > > > > 
> > > > > > [1] 1 3
> > > > > > > (len__ = len__[origorder]) # 2,1
> > > > > > 
> > > > > > [1] 2 1
> > > > > > 
> > > > > > On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan
> > > > > > <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > > > > > > Not sure how to debug without being able to reproduce. Tried on Mac OS X
> > > > > > > 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows
> > > > > > > machine. I consistently gives me this:
> > > > > > > 
> > > > > > > > dt[,
> > > > > > > + list(max=max(y, na.rm=TRUE)),
> > > > > > > + by=list(by)
> > > > > > > + ]
> > > > > > > by max
> > > > > > > 1: 0.7 0.01464054
> > > > > > > 2: 0.4 0.87328871
> > > > > > > > 
> > > > > > > > dt[,
> > > > > > > + list(max=max(y, na.rm=TRUE)),
> > > > > > > + by=list(by)
> > > > > > > + ]
> > > > > > > by max
> > > > > > > 1: 0.7 0.01464054
> > > > > > > 2: 0.4 0.87328871
> > > > > > > 
> > > > > > > Can either of you provide me with the output of these steps in cases where
> > > > > > > there's an error? I've commented the output I get for each step.
> > > > > > > 
> > > > > > > byval <- list(by=dt$by)
> > > > > > > o__ <- data.table:::fastorder(byval) # 2,3,1
> > > > > > > f__ = data.table:::uniqlist(byval, order=o__) # 1,3
> > > > > > > len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1
> > > > > > > firstofeachgroup = o__[f__] # 2,1
> > > > > > > origorder = data.table:::iradixorder(firstofeachgroup) # 2,1
> > > > > > > f__ = f__[origorder] # 3,1
> > > > > > > len__ = len__[origorder] # 2,1
> > > > > > > 
> > > > > > > 
> > > > > > > Arun
> > > > > > > 
> > > > > > > <...snip...>
> > > > > 
> > > > > _______________________________________________
> > > > > datatable-help mailing list
> > > > > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> > 
> > 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131219/0bfcf15f/attachment.html>

From aragorn168b at gmail.com  Thu Dec 19 14:47:39 2013
From: aragorn168b at gmail.com (Arunkumar Srinivasan)
Date: Thu, 19 Dec 2013 14:47:39 +0100
Subject: [datatable-help] 'by' on a numeric column produces inconsistent
 utput
In-Reply-To: <83EFE01CB21543C2AFA86A4F5FAD03FC@gmail.com>
References: <CAJXgQP2nAGMrz1Amzw5mH1ekDUmKpffGPrZ=5LLES4CWWBXBLQ@mail.gmail.com>
 <6FB5193A6CDCDF499486A833B7AFBDCDB20BB8D7@ex-mbx-pro-05>
 <6ED93E37928C4E849109DB7A5615EA61@gmail.com>
 <CAJXgQP2qWYj0pfSTi_0vDoL+W3VWqq2PA4BdF_daoNF0_nKQGw@mail.gmail.com>
 <A02AE6C8D853482FB0921EC1E64BF92B@gmail.com>
 <BC84B9CB-2809-4344-B7C3-29A891CE11BA@uni-bonn.de>
 <38C67C72F30C4C738B3A33DAF700ED3B@gmail.com>
 <7A37F022-444B-4708-8947-59437F4AA090@uni-bonn.de>
 <EEC4D0A02B8145EF862B5C9887B6DCE3@gmail.com>
 <83EFE01CB21543C2AFA86A4F5FAD03FC@gmail.com>
Message-ID: <FBA51CDE1751485D86C5ED97D21467F9@gmail.com>

The issue has been fixed in commit 1054 now. Once r-forge build kicks in, be sure to update, especially if you're working with R-devel version. 

Arun


On Thursday, December 19, 2013 at 12:56 PM, Arunkumar Srinivasan wrote:

> Just tested this on the devel version (today's). And yes, this issue happens. But I'm not sure if this is an issue with 'data.table' per-se: 
> 
> On a clean session, if you do this:
> 
> require(data.table)
> set.seed(32)
> n <- 3
> dt <- data.table(y=rnorm(n), by=round( rnorm(n), 1))
> 
> ll <- list(dt$by)
> yy <- ll[[1L]]
> address(dt$by) # [1] "0x7fad3c524a40"
> address(ll[[1L]]) # [1] "0x7fad3c524a40"
> address(yy) # [1] "0x7fad3c524a40"
> 
> 
> You see that all three are pointing to the same address. And that's why the result is wrong because internally "yy" will be changed by reference during "fastorder". And it is *not* supposed to point to "yy" but to have made a copy.
> 
> After doing it the first time, the pointing changes back to how it's in R-stable.. Not sure if this is desirable. Probably should report on R-devel.
> 
> On R-3.0.2, the same commands as above on a clean session:
> 
> require(data.table)
> set.seed(32)
> n <- 3
> dt <- data.table(y=rnorm(n), by=round( rnorm(n), 1))
> 
> ll <- list(dt$by)
> yy <- ll[[1L]]
> address(dt$by) # [1] "0x7fc35b640408"
> address(ll[[1L]]) # [1] "0x7fc35a0ec838"
> address(yy) # [1] "0x7fc35a0ec838"
> 
> 
> 
> 
> Arun
> 
> 
> On Thursday, December 19, 2013 at 9:43 AM, Arunkumar Srinivasan wrote:
> 
> > Simon, 
> > 
> > Thanks. One more towards my way :). I think we've nailed down the problem to R-devel version. I'll write again once I discuss it over with Kevin. 
> > 
> > Arun
> > 
> > 
> > On Thursday, December 19, 2013 at 9:26 AM, Simon Zehnder wrote:
> > 
> > > Hi Arun,
> > > 
> > > here the results on Mac OS X Mavericks with gcc 4.8.2
> > > 
> > > data.table 1.8.10:
> > > 
> > > > set.seed(32)
> > > > n <- 3
> > > > dt <- data.table(
> > > > 
> > > 
> > > + y=rnorm(n),
> > > + by=round( rnorm(n), 1)
> > > + )
> > > > 
> > > > dt[,
> > > + list(max=max(y, na.rm=TRUE)),
> > > + by=list(by)
> > > + ]
> > > by max
> > > 1: 0.7 0.01464054
> > > 2: 0.4 0.87328871
> > > > 
> > > > dt[,
> > > + list(max=max(y, na.rm=TRUE)),
> > > + by=list(by)
> > > + ]
> > > by max
> > > 1: 0.7 0.01464054
> > > 2: 0.4 0.87328871
> > > 
> > > data.table 1.8.11:
> > > 
> > > > set.seed(32)
> > > > n <- 3
> > > > dt <- data.table(
> > > > 
> > > 
> > > + y=rnorm(n),
> > > + by=round( rnorm(n), 1)
> > > + )
> > > > 
> > > > dt[,
> > > + list(max=max(y, na.rm=TRUE)),
> > > + by=list(by)
> > > + ]
> > > by max
> > > 1: 0.7 0.01464054
> > > 2: 0.4 0.87328871
> > > > 
> > > > dt[,
> > > + list(max=max(y, na.rm=TRUE)),
> > > + by=list(by)
> > > + ]
> > > by max
> > > 1: 0.7 0.01464054
> > > 2: 0.4 0.87328871
> > > 
> > > Best
> > > 
> > > Simon
> > > 
> > > 
> > > On 19 Dec 2013, at 09:05, Arunkumar Srinivasan <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > > 
> > > > Simon, sure.
> > > > 
> > > > set.seed(32)
> > > > n <- 3
> > > > dt <- data.table(
> > > > y=rnorm(n),
> > > > by=round( rnorm(n), 1)
> > > > )
> > > > 
> > > > dt[,
> > > > list(max=max(y, na.rm=TRUE)),
> > > > by=list(by)
> > > > ]
> > > > 
> > > > dt[,
> > > > list(max=max(y, na.rm=TRUE)),
> > > > by=list(by)
> > > > ]
> > > > 
> > > > 
> > > > 
> > > > Arun
> > > > 
> > > > On Thursday, December 19, 2013 at 8:49 AM, Simon Zehnder wrote:
> > > > 
> > > > > Arun,
> > > > > 
> > > > > if you could send me the reproducible code in copyable form I can as well try it on Mac OS X Mavericks with gcc 4.8.
> > > > > 
> > > > > Best
> > > > > 
> > > > > Simon
> > > > > 
> > > > > On 19 Dec 2013, at 08:44, Arunkumar Srinivasan <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > > > > 
> > > > > > Aha, the issue seems to be with 'uniqlist', not sure why it gives
> > > > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> > > > > > 
> > > > > > 1,2,3 for you and 1,3 consistently for me. I'll revert this back to `duplist` for now. Not sure how to solve this though. I've tried it so far on 3 machines:
> > > > > > 
> > > > > > 1) OS X 10.8.5 + libvm (gcc)
> > > > > > 2) OS X Mavericks + Clang
> > > > > > 3) Debian Weezy + gcc
> > > > > > 
> > > > > > All of them give consistent output. Man this is such a drag.
> > > > > > 
> > > > > > Arun
> > > > > > 
> > > > > > On Thursday, December 19, 2013 at 8:37 AM, Kevin Ushey wrote:
> > > > > > 
> > > > > > > Hi Arun,
> > > > > > > 
> > > > > > > Here's the output on my machine -- other information missing from
> > > > > > > before; it's with OSX Mavericks, with R and data.table compiled with
> > > > > > > Apple clang.
> > > > > > > 
> > > > > > > ---
> > > > > > > 
> > > > > > > > library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
> > > > > > > > set.seed(32)
> > > > > > > > n <- 3
> > > > > > > > dt <- data.table(
> > > > > > > > 
> > > > > > > 
> > > > > > > + y=rnorm(n),
> > > > > > > + by=round( rnorm(n), 1)
> > > > > > > + )
> > > > > > > ## run one
> > > > > > > > byval <- list(by=dt$by)
> > > > > > > > (o__ <- data.table:::fastorder(byval)) # 2,3,1
> > > > > > > > 
> > > > > > > 
> > > > > > > [1] 2 3 1
> > > > > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> > > > > > > 
> > > > > > > [1] 1 2 3
> > > > > > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
> > > > > > > 
> > > > > > > [1] 1 1 1
> > > > > > > > (firstofeachgroup = o__[f__]) # 2,1
> > > > > > > 
> > > > > > > [1] 2 3 1
> > > > > > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
> > > > > > > 
> > > > > > > [1] 3 1 2
> > > > > > > > (f__ = f__[origorder]) # 3,1
> > > > > > > 
> > > > > > > [1] 3 1 2
> > > > > > > > (len__ = len__[origorder]) # 2,1
> > > > > > > 
> > > > > > > [1] 1 1 1
> > > > > > > 
> > > > > > > ## run two
> > > > > > > > (o__ <- data.table:::fastorder(byval)) # 2,3,1
> > > > > > > 
> > > > > > > [1] 1 2 3
> > > > > > > > (f__ = data.table:::uniqlist(byval, order=o__)) # 1,3
> > > > > > > 
> > > > > > > [1] 1 3
> > > > > > > > (len__ = data.table:::uniqlengths(f__, nrow(dt))) # 2,1
> > > > > > > 
> > > > > > > [1] 2 1
> > > > > > > > (firstofeachgroup = o__[f__]) # 2,1
> > > > > > > 
> > > > > > > [1] 1 3
> > > > > > > > (origorder = data.table:::iradixorder(firstofeachgroup)) # 2,1
> > > > > > > 
> > > > > > > [1] 1 2
> > > > > > > > (f__ = f__[origorder]) # 3,1
> > > > > > > 
> > > > > > > [1] 1 3
> > > > > > > > (len__ = len__[origorder]) # 2,1
> > > > > > > 
> > > > > > > [1] 2 1
> > > > > > > 
> > > > > > > On Wed, Dec 18, 2013 at 11:22 PM, Arunkumar Srinivasan
> > > > > > > <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > > > > > > > Not sure how to debug without being able to reproduce. Tried on Mac OS X
> > > > > > > > 10.8.5 and Debian GNU/Linux 7 (wheezy). I don't have access to a windows
> > > > > > > > machine. I consistently gives me this:
> > > > > > > > 
> > > > > > > > > dt[,
> > > > > > > > + list(max=max(y, na.rm=TRUE)),
> > > > > > > > + by=list(by)
> > > > > > > > + ]
> > > > > > > > by max
> > > > > > > > 1: 0.7 0.01464054
> > > > > > > > 2: 0.4 0.87328871
> > > > > > > > > 
> > > > > > > > > dt[,
> > > > > > > > + list(max=max(y, na.rm=TRUE)),
> > > > > > > > + by=list(by)
> > > > > > > > + ]
> > > > > > > > by max
> > > > > > > > 1: 0.7 0.01464054
> > > > > > > > 2: 0.4 0.87328871
> > > > > > > > 
> > > > > > > > Can either of you provide me with the output of these steps in cases where
> > > > > > > > there's an error? I've commented the output I get for each step.
> > > > > > > > 
> > > > > > > > byval <- list(by=dt$by)
> > > > > > > > o__ <- data.table:::fastorder(byval) # 2,3,1
> > > > > > > > f__ = data.table:::uniqlist(byval, order=o__) # 1,3
> > > > > > > > len__ = data.table:::uniqlengths(f__, nrow(dt)) # 2,1
> > > > > > > > firstofeachgroup = o__[f__] # 2,1
> > > > > > > > origorder = data.table:::iradixorder(firstofeachgroup) # 2,1
> > > > > > > > f__ = f__[origorder] # 3,1
> > > > > > > > len__ = len__[origorder] # 2,1
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Arun
> > > > > > > > 
> > > > > > > > <...snip...>
> > > > > > 
> > > > > > _______________________________________________
> > > > > > datatable-help mailing list
> > > > > > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > > > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131219/a1328aec/attachment-0001.html>