[datatable-help] merging syntax

Eduard Antonyan eduard.antonyan at gmail.com
Thu Jun 20 22:24:02 CEST 2013


I think the following will achieve the same as your merge's:

setkey(g_ctpty, ctpty_head)
setkey(links, ctpty_head)
setkey(g_iss, iss_head)
g_iss[data.table(g_ctpty[links], key = "iss_head")]

And in general, merge(X, Y, all.x = TRUE) is (more or less) equivalent to
Y[X]


On Thu, Jun 20, 2013 at 11:43 AM, Ivan Alves <papucho at me.com> wrote:

> Many thanks to both Eduard and Frank on the issue of the needed key.  One
> aspect of the merging that is not clear is how to do 'inner' vs. 'outer'
> 'joins'  (like in SQL).  Whereas it works with merge (using the all.x=TRUE
> option), how is it done with data.table?  In the improved example below
>
> g_ctpty = data.table(ctpty_head=c("a","b","c","d"), ctpty_cty=c("US","DE",
> "JP","CN"))
> g_iss = data.table(iss_head=c("a","b","c","d"), iss_cty=c("US","DE","JP",
> "CN"))
> links = data.table(ctpty_head=c("a","b","c"), iss_head=c("b","b","a"))
> setkey(g_ctpty,ctpty_head)
> setkey(g_iss,iss_head)
> merge(
>   merge(links, g_ctpty, all.x = TRUE, by = "ctpty_head"),
>   g_iss,
>   all.x = TRUE, by = "iss_head"
> )
> g_iss[g_ctpty[links]] # error
> links[g_ctpty][g_iss] # still error
> setkey(links,ctpty_head,iss_head) # keys are needed
> links[g_ctpty][g_iss] # how to get inner join?
>
> How do it not include the last line in the link? (Again, it works with
> merge).  Many thanks.
>
> Ivan
>
>
> On 18 Jun 2013, at 17:34, Eduard Antonyan <eduard.antonyan at gmail.com>
> wrote:
>
> Frank has already answered why you're getting the results you are.
>
> Re "how do I tell DT to do *separate matchings at each join*?":
>
> Currently you have to use 'merge' or 'data.table', see this -
> https://r-forge.r-project.org/tracker/?func=detail&atid=978&aid=4675&group_id=240 and
> the SO link inside.
>
>
> On Tue, Jun 18, 2013 at 10:08 AM, Ivan Alves <papucho at me.com> wrote:
>
>> Hi Frank,
>>
>> Many thanks for the thoughts.
>>
>> It is something that has to do with the keys, that is for sure (see
>> below). A double DT join does not understand that it has to do one join by
>> one variable and the other by another variable: the output simply has lines
>> where the two keys are the same (ctpty_head==iss_head),which is of course
>> not optimal.
>>
>> how do I tell DT to do *separate matchings at each join*?  Setting setkey
>> (links,ctpty_head,iss_head) before the join does not work either.
>>
>> > key(g_ctpty[links])
>> NULL
>> > key(g_ctpty)
>> [1] "ctpty_head"
>> > key(g_iss)
>> [1] "iss_head"
>> > key(links)
>> NULL
>>
>> Hi Arunkumar,
>>
>> An example would look like follows:
>>
>> g_ctpty = data.table(ctpty_head=c("a","b","c"),
>> ctpty_cty=c("US","DE","JP"))
>> g_iss = data.table(iss_head=c("a","b","c"), iss_cty=c("US","DE","JP"))
>> links = data.table(ctpty_head=c("a","b","c"), iss_head=c("b","b","a"))
>>
>> merge(
>> +     merge(links, g_ctpty, all.x = TRUE, by = "ctpty_head"),
>> +     g_iss,
>> +     all.x = TRUE, by = "iss_head"
>> +   )
>>    iss_head ctpty_head ctpty_cty iss_cty
>> 1:        a          c        JP      US
>> 2:        b          a        US      DE
>> 3:        b          b        DE      DE
>> > g_iss[g_ctpty[links]]
>>    iss_head iss_cty ctpty_cty iss_head.1
>> 1:        a      US        US          b
>> 2:        b      DE        DE          b
>> 3:        c      JP        JP          a
>> > setkey(links,ctpty_head,iss_head)
>> > g_iss[g_ctpty[links]]
>>    iss_head iss_cty ctpty_cty iss_head.1
>> 1:        a      US        US          b
>> 2:        b      DE        DE          b
>> 3:        c      JP        JP          a
>>
>> On 17 Jun 2013, at 14:36, Frank Erickson <FErickson at psu.edu> wrote:
>>
>> I think that key(g_ctpty[links]) == key(g_ctpty) == "ctpty_head", which
>> is used when you do your second merge with [, instead of "iss_head". You
>> can check this by running key(g_ctpty[links]) . --Frank
>>
>>
>> On Mon, Jun 17, 2013 at 1:57 AM, Arunkumar Srinivasan <
>> aragorn168b at gmail.com> wrote:
>>
>>>  Since you have the data as well, why not provide it (a small part at
>>> least with which your issue is reproducible)? Isn't it much easier than to
>>> ask everyone who's willing to help to create a data and test your code?
>>>
>>> Arun
>>>
>>> On Monday, June 17, 2013 at 8:54 AM, Ivan Alves wrote:
>>>
>>> Dear all,
>>>
>>> I am not sure I understand the syntax for merging data.tables. I have
>>> keyed the two 'satelite' tables from which I want to match information to
>>> the main table 'links'
>>>
>>> g_ctpty <- gultimate[,list(ctpty_head,ctpty_cty)]
>>> setkey(g_ctpty,ctpty_head)
>>> g_iss <- gultimate[,list(iss_head,iss_cty)]
>>> setkey(g_iss,iss_head)
>>>
>>> Why are the two below not equivalent?
>>>
>>> This works:
>>>
>>> data = merge(
>>> merge(links, g_ctpty, all.x = TRUE, by = "ctpty_head"),
>>>  g_iss,
>>> all.x = TRUE, by = "iss_head"
>>> ),
>>>
>>> And this does not:
>>>
>>> data = g_iss[g_ctpty[links]],
>>>
>>> Any guidance would be appreciated.
>>> Kind regards,
>>> Ivan
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>>
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130620/32abc5e1/attachment-0001.html>


More information about the datatable-help mailing list