[datatable-help] changing data.table by-without-by syntax to require a "by"

Matthew Dowle mdowle at mdowle.plus.com
Thu Apr 25 11:28:43 CEST 2013


 

I see what you're getting at. But .J may be a column name, which is
the current meaning of by = single symbol. And why .J? If not .J, or any
single symbol what else instead? A character value such as by="irows" is
taken to mean the "irows" column currently (for consistency with
by="colA,colB,colC"). But some signal needs to be passed to by=, then
(you're suggesting), to trigger the cross apply by each i row.
Currently, that signal is missingness (which I like, rely on, and use
with join inherited scope).

As I wrote in the S.O. thread, I'm happy to
make it optional (i.e. an option to turn off by-without-by), since there
is no downside. But you've continued to argue for a change to the
default, iiuc.

Maybe it helps to consider :

 x+y

Fundamentally in R
this depends on what x and y are. Most of us probably assume (as a first
thought) that x and y are vectors and know that this will apply "+"
elementwise, recycling y if necessary. In R we like and write code like
this all the time. I think of X[Y, j] in the same way: j is the
operation (like +) which is applied for each row of Y. If you need j for
the entire set that Y joins to, then like a FAQ says, make j missing too
and it's X[Y][,j]. But providing a way to make X[Y,j] do the same as
X[Y][,j] would be nice and is on the list: drop=TRUE would do that (as
someone mentioned on the S.O. thread). So maybe the new option would be
datatable.drop (but with default FALSE not TRUE). If you wanted to turn
off by-without-by you might set options(datatable.drop=TRUE). Then you
can use data.table how you prefer (explicit by) and I can use it how I
prefer.

I'm happy to add the argument to [.data.table, and make its
default changeable via a global option in the usual way. 

Matthew 

On
25.04.2013 05:16, Eduard Antonyan wrote: 

> That's really interesting,
I can't currently think of another way of doing that as after X[Y] is
done the necessary information is lost. 
> To retain that functionality
and achieve better readability, as in OP, I think smth along the lines
of X[Y, head(.SD, i.top), by=.J] would be a good replacement for current
syntax. 
> 
> On Apr 24, 2013, at 6:01 PM, Eduard Antonyan
<eduard.antonyan at gmail.com [14]> wrote:
> 
>> that's an interesting
example - I didn't realize current behavior would do that, I'm not at a
PC anymore but I'll definitely think about it and report back, as it's
not immediately obvious to me 
>> 
>> On Wed, Apr 24, 2013 at 5:50 PM,
Matthew Dowle <mdowle at mdowle.plus.com [13]> wrote:
>> 
>>> i. prefix is
just a robust way to reference join inherited columns: the 'top' column
in the i table. Like table aliases in SQL. 
>>> 
>>> What about this? :

>>> 1> X = data.table(a=1:3,b=1:15, key="a")
>>> 1> X
>>> a b
>>> 1: 1
1
>>> 2: 1 4
>>> 3: 1 7
>>> 4: 1 10
>>> 5: 1 13
>>> 6: 2 2
>>> 7: 2
5
>>> 8: 2 8
>>> 9: 2 11
>>> 10: 2 14
>>> 11: 3 3
>>> 12: 3 6
>>> 13: 3
9
>>> 14: 3 12
>>> 15: 3 15 
>>> 
>>> 1> Y = data.table(a=c(1,2,1),
top=c(3,4,2))
>>> 
>>> 1> Y
>>> a top
>>> 1: 1 3
>>> 2: 2 4
>>> 3: 1
2
>>> 1> X[Y, head(.SD,i.top)]
>>> a b
>>> 1: 1 1
>>> 2: 1 4
>>> 3: 1
7
>>> 4: 2 2
>>> 5: 2 5
>>> 6: 2 8
>>> 7: 2 11
>>> 8: 1 1 
>>> 
>>> 9: 1
4
>>> 1> 
>>> 
>>> On 24.04.2013 23:43, Eduard Antonyan wrote: 
>>>

>>>> I assumed they meant create a table :) 
>>>> that looks cool,
what's i.top ? I can get a very similar to yours result by writing:

>>>> X[Y][, head(.SD, top[1]), by = a] 
>>>> and I probably would want
the following to produce your result (this might depend a little on what
exactly i.top is): 
>>>> X[Y, head(.SD, i.top), by = a] 
>>>> 
>>>> On
Wed, Apr 24, 2013 at 5:28 PM, Matthew Dowle <mdowle at mdowle.plus.com
[12]> wrote:
>>>> 
>>>>> That sentence on that linked webpage seems
incorect English, since table is a noun not a verb. Should "table" be
"join" perhaps? 
>>>>> 
>>>>> Anyway, by-without-by is often used with
join inherited scope (JIS). For example, translating their example :

>>>>> 
>>>>> 1> X = data.table(a=1:3,b=1:15, key="a")
>>>>> 1> X
>>>>>
a b
>>>>> 1: 1 1
>>>>> 2: 1 4
>>>>> 3: 1 7
>>>>> 4: 1 10
>>>>> 5: 1
13
>>>>> 6: 2 2
>>>>> 7: 2 5
>>>>> 8: 2 8
>>>>> 9: 2 11
>>>>> 10: 2
14
>>>>> 11: 3 3
>>>>> 12: 3 6
>>>>> 
>>>>> 13: 3 9
>>>>> 14: 3 12
>>>>>
15: 3 15
>>>>> 1> Y = data.table(a=c(1,2), top=c(3,4))
>>>>> 1> Y
>>>>>
a top
>>>>> 1: 1 3
>>>>> 2: 2 4
>>>>> 1> X[Y, head(.SD,i.top)]
>>>>> a
b
>>>>> 1: 1 1
>>>>> 2: 1 4
>>>>> 3: 1 7
>>>>> 4: 2 2
>>>>> 5: 2 5
>>>>>

>>>>> 6: 2 8
>>>>> 7: 2 11
>>>>> 1> 
>>>>> 
>>>>> If there was no
by-without-by (analogous to CROSS BY), then how would that be
done?
>>>>> 
>>>>> On 24.04.2013 22:22, Eduard Antonyan wrote: 
>>>>>

>>>>>> By that you mean current behavior? You'd get current behavior by
explicitly specifying the appropriate "by" (i.e. "by" equal to the key).

>>>>>> Btw, I'm trying to understand SQL CROSS APPLY vs JOIN using
http://explainextended.com/2009/07/16/inner-join-vs-cross-apply/ [9],
and I can't figure out how by-without-by (or with by-with-by for that
matter:) ) helps with e.g. the first example there: 
>>>>>> "We table
table1 and table2. table1 has a column called rowcount. 
>>>>>> 
>>>>>>
For each row from table1 we need to select first rowcount rows from
table2, ordered by table2.id [10]" 
>>>>>> 
>>>>>> On Wed, Apr 24, 2013
at 4:01 PM, Matthew Dowle <mdowle at mdowle.plus.com [11]> wrote:
>>>>>>

>>>>>>> But then what would be analogous to CROSS APPLY in SQL?
>>>>>>>

>>>>>>> > I'd agree with Eduard, although it's probably too late to
change behavior
>>>>>>> > now. Maybe for data.table.2? Eduard's proposal
seems more closely
>>>>>>> > aligned with SQL behavior as well
(SELECT/JOIN, then GROUP, but only if
>>>>>>> > requested).
>>>>>>>
>
>>>>>>> > S.
>>>>>>> >
>>>>>>> >> Date: Mon, 22 Apr 2013 08:17:59
-0700 >> From: eduard.antonyan at gmail.com [1]
>>>>>>> >> To:
datatable-help at lists.r-forge.r-project.org [2]
>>>>>>> 
>>>>>>>>>
Subject: Re: [datatable-help] changing data.table by-without-by
>>>>>>>
>> syntax to require a "by"
>>>>>>> >>
>>>>>>> >> I think you're missing
the point Michael. Just because it's possible to
>>>>>>> >> do
it
>>>>>>> >> the way it's done now, doesn't mean that's the best way,
as I've tried
>>>>>>> >> to
>>>>>>> >> argue in the OP. I don't think
you've addressed the issue of unnecessary
>>>>>>> >> complexity pointed
out in OP.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >>
View this message in context:
>>>>>>> >>
http://r.789695.n4.nabble.com/changing-data-table-by-without-by-syntax-to-require-a-by-tp4664770p4664990.html
[3]
>>>>>>> >> Sent from the datatable-help mailing list archive at
Nabble.com [4].
>>>>>>> >>
_______________________________________________
>>>>>>> >>
datatable-help mailing list >>
datatable-help at lists.r-forge.r-project.org [5]
>>>>>>> 
>>>>>>>>>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[6]
>>>>>>> > _______________________________________________
>>>>>>> >
datatable-help mailing list > datatable-help at lists.r-forge.r-project.org
[7]
>>>>>>> >
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[8]

 

Links:
------
[1] mailto:eduard.antonyan at gmail.com
[2]
mailto:datatable-help at lists.r-forge.r-project.org
[3]
http://r.789695.n4.nabble.com/changing-data-table-by-without-by-syntax-to-require-a-by-tp4664770p4664990.html
[4]
http://Nabble.com
[5]
mailto:datatable-help at lists.r-forge.r-project.org
[6]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[7]
mailto:datatable-help at lists.r-forge.r-project.org
[8]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
[9]
http://explainextended.com/2009/07/16/inner-join-vs-cross-apply/
[10]
http://table2.id
[11] mailto:mdowle at mdowle.plus.com
[12]
mailto:mdowle at mdowle.plus.com
[13] mailto:mdowle at mdowle.plus.com
[14]
mailto:eduard.antonyan at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130425/2131e1ec/attachment-0001.html>


More information about the datatable-help mailing list