[datatable-help] changing data.table by-without-by syntax to require a "by"

Matthew Dowle mdowle at mdowle.plus.com
Thu Apr 25 17:32:14 CEST 2013


 

I'd appreciate some input from others whether they agree or not. If
you have a view perhaps let me know off list, or on list, whichever you
prefer. 

Thanks, 

Matthew 

On 25.04.2013 13:45, Eduard Antonyan
wrote: 

> Well, so can .I or .N or .GRP or .BY, yet those are used as
special names, which is exactly why I suggested .J. 
> The problem with
using 'missingness' is that it already means smth very different when i
is not a join/cross, it means *don't* do a by, thus introducing the
whole case thing one has to through in their head every time as in OP
(which of course becomes automatic after a while, but it's a cost
nonetheless, which is in particular high for new people). So I see
absence of 'by' as an already taken and used signal and thus something
else has to be used for the new signal of cross apply (it doesn't have
to be the specific option I mentioned above). This is exactly why I find
optional turning off of this behavior unsatisfactory, and I don't see
that as a solution to this at all. 
> I think in the x+y context the
appropriate analog is - what if that added x and y normally, but when x
and y were data.frames it did element by element multiplication instead?
Yes that's possible to do, and possible to document, but it's not a good
idea, because it takes place of adding them element by element. The
recycling behavior doesn't do that - what that does is it says it
doesn't really make sense to add them as is, but we can do that after
recycling, so let's recycle. It doesn't take the place of another
existing way of adding vectors. 
> 
> On Apr 25, 2013, at 4:28 AM,
Matthew Dowle <mdowle at mdowle.plus.com [15]> wrote:
> 
>> I see what
you're getting at. But .J may be a column name, which is the current
meaning of by = single symbol. And why .J? If not .J, or any single
symbol what else instead? A character value such as by="irows" is taken
to mean the "irows" column currently (for consistency with
by="colA,colB,colC"). But some signal needs to be passed to by=, then
(you're suggesting), to trigger the cross apply by each i row.
Currently, that signal is missingness (which I like, rely on, and use
with join inherited scope).
>> 
>> As I wrote in the S.O. thread, I'm
happy to make it optional (i.e. an option to turn off by-without-by),
since there is no downside. But you've continued to argue for a change
to the default, iiuc.
>> 
>> Maybe it helps to consider :
>> 
>> x+y
>>

>> Fundamentally in R this depends on what x and y are. Most of us
probably assume (as a first thought) that x and y are vectors and know
that this will apply "+" elementwise, recycling y if necessary. In R we
like and write code like this all the time. I think of X[Y, j] in the
same way: j is the operation (like +) which is applied for each row of
Y. If you need j for the entire set that Y joins to, then like a FAQ
says, make j missing too and it's X[Y][,j]. But providing a way to make
X[Y,j] do the same as X[Y][,j] would be nice and is on the list:
drop=TRUE would do that (as someone mentioned on the S.O. thread). So
maybe the new option would be datatable.drop (but with default FALSE not
TRUE). If you wanted to turn off by-without-by you might set
options(datatable.drop=TRUE). Then you can use data.table how you prefer
(explicit by) and I can use it how I prefer.
>> 
>> I'm happy to add the
argument to [.data.table, and make its default changeable via a global
option in the usual way. 
>> 
>> Matthew 
>> 
>> On 25.04.2013 05:16,
Eduard Antonyan wrote: 
>> 
>>> That's really interesting, I can't
currently think of another way of doing that as after X[Y] is done the
necessary information is lost. 
>>> To retain that functionality and
achieve better readability, as in OP, I think smth along the lines of
X[Y, head(.SD, i.top), by=.J] would be a good replacement for current
syntax. 
>>> 
>>> On Apr 24, 2013, at 6:01 PM, Eduard Antonyan
<eduard.antonyan at gmail.com [14]> wrote:
>>> 
>>>> that's an interesting
example - I didn't realize current behavior would do that, I'm not at a
PC anymore but I'll definitely think about it and report back, as it's
not immediately obvious to me 
>>>> 
>>>> On Wed, Apr 24, 2013 at 5:50
PM, Matthew Dowle <mdowle at mdowle.plus.com [13]> wrote:
>>>> 
>>>>> i.
prefix is just a robust way to reference join inherited columns: the
'top' column in the i table. Like table aliases in SQL. 
>>>>> 
>>>>>
What about this? : 
>>>>> 1> X = data.table(a=1:3,b=1:15, key="a")
>>>>>
1> X
>>>>> a b
>>>>> 1: 1 1
>>>>> 2: 1 4
>>>>> 3: 1 7
>>>>> 4: 1
10
>>>>> 5: 1 13
>>>>> 6: 2 2
>>>>> 7: 2 5
>>>>> 8: 2 8
>>>>> 9: 2
11
>>>>> 10: 2 14
>>>>> 11: 3 3
>>>>> 12: 3 6
>>>>> 13: 3 9
>>>>> 14: 3
12
>>>>> 15: 3 15 
>>>>> 
>>>>> 1> Y = data.table(a=c(1,2,1),
top=c(3,4,2))
>>>>> 
>>>>> 1> Y
>>>>> a top
>>>>> 1: 1 3
>>>>> 2: 2
4
>>>>> 3: 1 2
>>>>> 1> X[Y, head(.SD,i.top)]
>>>>> a b
>>>>> 1: 1
1
>>>>> 2: 1 4
>>>>> 3: 1 7
>>>>> 4: 2 2
>>>>> 5: 2 5
>>>>> 6: 2 8
>>>>>
7: 2 11
>>>>> 8: 1 1 
>>>>> 
>>>>> 9: 1 4
>>>>> 1> 
>>>>> 
>>>>> On
24.04.2013 23:43, Eduard Antonyan wrote: 
>>>>> 
>>>>>> I assumed they
meant create a table :) 
>>>>>> that looks cool, what's i.top ? I can
get a very similar to yours result by writing: 
>>>>>> X[Y][, head(.SD,
top[1]), by = a] 
>>>>>> and I probably would want the following to
produce your result (this might depend a little on what exactly i.top
is): 
>>>>>> X[Y, head(.SD, i.top), by = a] 
>>>>>> 
>>>>>> On Wed, Apr
24, 2013 at 5:28 PM, Matthew Dowle <mdowle at mdowle.plus.com [12]>
wrote:
>>>>>> 
>>>>>>> That sentence on that linked webpage seems
incorect English, since table is a noun not a verb. Should "table" be
"join" perhaps? 
>>>>>>> 
>>>>>>> Anyway, by-without-by is often used
with join inherited scope (JIS). For example, translating their example
: 
>>>>>>> 
>>>>>>> 1> X = data.table(a=1:3,b=1:15, key="a")
>>>>>>> 1>
X
>>>>>>> a b
>>>>>>> 1: 1 1
>>>>>>> 2: 1 4
>>>>>>> 3: 1 7
>>>>>>> 4: 1
10
>>>>>>> 5: 1 13
>>>>>>> 6: 2 2
>>>>>>> 7: 2 5
>>>>>>> 8: 2 8
>>>>>>>
9: 2 11
>>>>>>> 10: 2 14
>>>>>>> 11: 3 3
>>>>>>> 12: 3 6
>>>>>>>

>>>>>>> 13: 3 9
>>>>>>> 14: 3 12
>>>>>>> 15: 3 15
>>>>>>> 1> Y =
data.table(a=c(1,2), top=c(3,4))
>>>>>>> 1> Y
>>>>>>> a top
>>>>>>> 1: 1
3
>>>>>>> 2: 2 4
>>>>>>> 1> X[Y, head(.SD,i.top)]
>>>>>>> a b
>>>>>>> 1:
1 1
>>>>>>> 2: 1 4
>>>>>>> 3: 1 7
>>>>>>> 4: 2 2
>>>>>>> 5: 2 5
>>>>>>>

>>>>>>> 6: 2 8
>>>>>>> 7: 2 11
>>>>>>> 1> 
>>>>>>> 
>>>>>>> If there
was no by-without-by (analogous to CROSS BY), then how would that be
done?
>>>>>>> 
>>>>>>> On 24.04.2013 22:22, Eduard Antonyan wrote:

>>>>>>> 
>>>>>>>> By that you mean current behavior? You'd get current
behavior by explicitly specifying the appropriate "by" (i.e. "by" equal
to the key). 
>>>>>>>> Btw, I'm trying to understand SQL CROSS APPLY vs
JOIN using
http://explainextended.com/2009/07/16/inner-join-vs-cross-apply/ [9],
and I can't figure out how by-without-by (or with by-with-by for that
matter:) ) helps with e.g. the first example there: 
>>>>>>>> "We table
table1 and table2. table1 has a column called rowcount. 
>>>>>>>>

>>>>>>>> For each row from table1 we need to select first rowcount rows
from table2, ordered by table2.id [10]" 
>>>>>>>> 
>>>>>>>> On Wed, Apr
24, 2013 at 4:01 PM, Matthew Dowle <mdowle at mdowle.plus.com [11]>
wrote:
>>>>>>>> 
>>>>>>>>> But then what would be analogous to CROSS
APPLY in SQL?
>>>>>>>>> 
>>>>>>>>> > I'd agree with Eduard, although
it's probably too late to change behavior
>>>>>>>>> > now. Maybe for
data.table.2? Eduard's proposal seems more closely
>>>>>>>>> > aligned
with SQL behavior as well (SELECT/JOIN, then GROUP, but only
if
>>>>>>>>> > requested).
>>>>>>>>> >
>>>>>>>>> > S.
>>>>>>>>>
>
>>>>>>>>> >> Date: Mon, 22 Apr 2013 08:17:59 -0700 >> From:
eduard.antonyan at gmail.com [1]
>>>>>>>>> >> To:
datatable-help at lists.r-forge.r-project.org [2]
>>>>>>>>> 
>>>>>>>>>>>
Subject: Re: [datatable-help] changing data.table
by-without-by
>>>>>>>>> >> syntax to require a "by"
>>>>>>>>>
>>
>>>>>>>>> >> I think you're missing the point Michael. Just because
it's possible to
>>>>>>>>> >> do it
>>>>>>>>> >> the way it's done now,
doesn't mean that's the best way, as I've tried
>>>>>>>>> >>
to
>>>>>>>>> >> argue in the OP. I don't think you've addressed the
issue of unnecessary
>>>>>>>>> >> complexity pointed out in
OP.
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> --
>>>>>>>>> >>
View this message in context:
>>>>>>>>> >>
http://r.789695.n4.nabble.com/changing-data-table-by-without-by-syntax-to-require-a-by-tp4664770p4664990.html
[3]
>>>>>>>>> >> Sent from the datatable-help mailing list archive at
Nabble.com [4].
>>>>>>>>> >>
_______________________________________________
>>>>>>>>> >>
datatable-help mailing list >>
datatable-help at lists.r-forge.r-project.org [5]
>>>>>>>>> 
>>>>>>>>>>>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[6]
>>>>>>>>> >
_______________________________________________
>>>>>>>>> >
datatable-help mailing list > datatable-help at lists.r-forge.r-project.org
[7]
>>>>>>>>> >
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[8]

 

Links:
------
[1] mailto:eduard.antonyan at gmail.com
[2]
mailto:datatable-help at lists.r-forge.r-project.org
[3]
http://r.789695.n4.nabble.com/changing-data-table-by-without-by-syntax-to-require-a-by-tp4664770p4664990.html
[4]
http://Nabble.com
[5]
mailto:datatable-help at lists.r-forge.r-project.org
[6]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[7]
mailto:datatable-help at lists.r-forge.r-project.org
[8]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[9]
http://explainextended.com/2009/07/16/inner-join-vs-cross-apply/
[10]
http://table2.id
[11] mailto:mdowle at mdowle.plus.com
[12]
mailto:mdowle at mdowle.plus.com
[13] mailto:mdowle at mdowle.plus.com
[14]
mailto:eduard.antonyan at gmail.com
[15] mailto:mdowle at mdowle.plus.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130425/b9b6daec/attachment.html>


More information about the datatable-help mailing list