[datatable-help] changing data.table by-without-by syntax to require a "by"

Eduard Antonyan eduard.antonyan at gmail.com
Thu Apr 25 06:16:24 CEST 2013


That's really interesting, I can't currently think of another way of doing
that as after X[Y] is done the necessary information is lost.

To retain that functionality and achieve better readability, as in OP, I
think smth along the lines of X[Y, head(.SD, i.top), by=.J] would be a good
replacement for current syntax.


On Apr 24, 2013, at 6:01 PM, Eduard Antonyan <eduard.antonyan at gmail.com>
wrote:

that's an interesting example - I didn't realize current behavior would do
that, I'm not at a PC anymore but I'll definitely think about it and report
back, as it's not immediately obvious to me


On Wed, Apr 24, 2013 at 5:50 PM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:

> **
>
>
>
> i. prefix is just a robust way to reference join inherited columns:   the
> 'top' column in the i table.   Like table aliases in SQL.
>
> What about this? :
>
> 1> X = data.table(a=1:3,b=1:15, key="a")
> 1> X
>     a  b
>  1: 1  1
>  2: 1  4
>  3: 1  7
>  4: 1 10
>  5: 1 13
>  6: 2  2
>  7: 2  5
>  8: 2  8
>  9: 2 11
> 10: 2 14
>
> 11: 3  3
> 12: 3  6
> 13: 3  9
> 14: 3 12
> 15: 3 15
> 1> Y = data.table(a=c(1,2,1), top=c(3,4,2))
>
> 1> Y
>    a top
> 1: 1   3
> 2: 2   4
> 3: 1   2
> 1> X[Y, head(.SD,i.top)]
>
>    a  b
> 1: 1  1
> 2: 1  4
> 3: 1  7
> 4: 2  2
> 5: 2  5
> 6: 2  8
> 7: 2 11
> 8: 1  1
> 9: 1  4
> 1>
>
>
>
> On 24.04.2013 23:43, Eduard Antonyan wrote:
>
> I assumed they meant create a table :)
> that looks cool, what's i.top ? I can get a very similar to yours result
> by writing:
> X[Y][, head(.SD, top[1]), by = a]
> and I probably would want the following to produce your result (this might
> depend a little on what exactly i.top is):
> X[Y, head(.SD, i.top), by = a]
>
>
> On Wed, Apr 24, 2013 at 5:28 PM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:
>
>>
>>
>> That sentence on that linked webpage seems incorect English, since table
>> is a noun not a verb.  Should "table" be "join" perhaps?
>>
>> Anyway, by-without-by is often used with join inherited scope (JIS).  For
>> example, translating their example :
>>
>> 1> X = data.table(a=1:3,b=1:15, key="a")
>> 1> X
>>     a  b
>>  1: 1  1
>>  2: 1  4
>>  3: 1  7
>>  4: 1 10
>>  5: 1 13
>>  6: 2  2
>>  7: 2  5
>>  8: 2  8
>>  9: 2 11
>> 10: 2 14
>> 11: 3  3
>> 12: 3  6
>>
>>
>> 13: 3  9
>> 14: 3 12
>> 15: 3 15
>> 1> Y = data.table(a=c(1,2), top=c(3,4))
>> 1> Y
>>    a top
>> 1: 1   3
>> 2: 2   4
>> 1> X[Y, head(.SD,i.top)]
>>    a  b
>> 1: 1  1
>> 2: 1  4
>> 3: 1  7
>> 4: 2  2
>> 5: 2  5
>>
>>
>> 6: 2  8
>> 7: 2 11
>> 1>
>>
>>
>>
>> If there was no by-without-by (analogous to CROSS BY),  then how would that be done?
>>
>>
>>
>> On 24.04.2013 22:22, Eduard Antonyan wrote:
>>
>> By that you mean current behavior? You'd get current behavior by
>> explicitly specifying the appropriate "by" (i.e. "by" equal to the key).
>> Btw, I'm trying to understand SQL CROSS APPLY vs JOIN using
>> http://explainextended.com/2009/07/16/inner-join-vs-cross-apply/, and I
>> can't figure out how by-without-by (or with by-with-by for that matter:) )
>> helps with e.g. the first example there:
>> "We table table1 and table2. table1 has a column called rowcount.
>>
>> For each row from table1 we need to select first rowcount rows from
>> table2, ordered by table2.id"
>>
>>
>>
>>
>> On Wed, Apr 24, 2013 at 4:01 PM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:
>>
>>> But then what would be analogous to CROSS APPLY in SQL?
>>>
>>> > I'd agree with Eduard, although it's probably too late to change
>>> behavior
>>> > now.  Maybe for data.table.2?  Eduard's proposal seems more closely
>>> > aligned with SQL behavior as well (SELECT/JOIN, then GROUP, but only if
>>> > requested).
>>> >
>>> > S.
>>> >
>>> >> Date: Mon, 22 Apr 2013 08:17:59 -0700
>>> >> From: eduard.antonyan at gmail.com
>>> >> To: datatable-help at lists.r-forge.r-project.org
>>> >> Subject: Re: [datatable-help] changing data.table by-without-by
>>> >> syntax       to      require a "by"
>>> >>
>>> >> I think you're missing the point Michael. Just because it's possible
>>> to
>>> >> do it
>>> >> the way it's done now, doesn't mean that's the best way, as I've tried
>>> >> to
>>> >> argue in the OP. I don't think you've addressed the issue of
>>> unnecessary
>>> >> complexity pointed out in OP.
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> View this message in context:
>>> >>
>>> http://r.789695.n4.nabble.com/changing-data-table-by-without-by-syntax-to-require-a-by-tp4664770p4664990.html
>>> >> Sent from the datatable-help mailing list archive at Nabble.com.
>>> >> _______________________________________________
>>> >> datatable-help mailing list
>>> >> datatable-help at lists.r-forge.r-project.org
>>> >>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>> >
>>> _______________________________________________
>>> > datatable-help mailing list
>>> > datatable-help at lists.r-forge.r-project.org
>>> >
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>>
>>>
>>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130424/75c910a5/attachment.html>


More information about the datatable-help mailing list