[datatable-help] merging syntax

Ivan Alves papucho at me.com
Tue Jun 18 17:08:55 CEST 2013


Hi Frank,

Many thanks for the thoughts.

It is something that has to do with the keys, that is for sure (see below). A double DT join does not understand that it has to do one join by one variable and the other by another variable: the output simply has lines where the two keys are the same (ctpty_head==iss_head),which is of course not optimal.

how do I tell DT to do separate matchings at each join?  Setting setkey(links,ctpty_head,iss_head) before the join does not work either.

> key(g_ctpty[links])
NULL
> key(g_ctpty)
[1] "ctpty_head"
> key(g_iss)
[1] "iss_head"
> key(links)
NULL

Hi Arunkumar,

An example would look like follows:

g_ctpty = data.table(ctpty_head=c("a","b","c"), ctpty_cty=c("US","DE","JP"))
g_iss = data.table(iss_head=c("a","b","c"), iss_cty=c("US","DE","JP"))
links = data.table(ctpty_head=c("a","b","c"), iss_head=c("b","b","a"))

merge(
+     merge(links, g_ctpty, all.x = TRUE, by = "ctpty_head"),
+     g_iss,
+     all.x = TRUE, by = "iss_head"
+   )
   iss_head ctpty_head ctpty_cty iss_cty
1:        a          c        JP      US
2:        b          a        US      DE
3:        b          b        DE      DE
> g_iss[g_ctpty[links]]
   iss_head iss_cty ctpty_cty iss_head.1
1:        a      US        US          b
2:        b      DE        DE          b
3:        c      JP        JP          a
> setkey(links,ctpty_head,iss_head)
> g_iss[g_ctpty[links]]
   iss_head iss_cty ctpty_cty iss_head.1
1:        a      US        US          b
2:        b      DE        DE          b
3:        c      JP        JP          a

On 17 Jun 2013, at 14:36, Frank Erickson <FErickson at psu.edu> wrote:

> I think that key(g_ctpty[links]) == key(g_ctpty) == "ctpty_head", which is used when you do your second merge with [, instead of "iss_head". You can check this by running key(g_ctpty[links]) . --Frank
> 
> 
> On Mon, Jun 17, 2013 at 1:57 AM, Arunkumar Srinivasan <aragorn168b at gmail.com> wrote:
> Since you have the data as well, why not provide it (a small part at least with which your issue is reproducible)? Isn't it much easier than to ask everyone who's willing to help to create a data and test your code?
> 
> Arun
> 
> On Monday, June 17, 2013 at 8:54 AM, Ivan Alves wrote:
> 
>> Dear all,
>> 
>> I am not sure I understand the syntax for merging data.tables. I have keyed the two 'satelite' tables from which I want to match information to the main table 'links'
>> 
>> g_ctpty <- gultimate[,list(ctpty_head,ctpty_cty)]
>> setkey(g_ctpty,ctpty_head)
>> g_iss <- gultimate[,list(iss_head,iss_cty)]
>> setkey(g_iss,iss_head)
>> 
>> Why are the two below not equivalent?
>> 
>> This works:
>> 
>> data = merge(
>> merge(links, g_ctpty, all.x = TRUE, by = "ctpty_head"),
>> g_iss,
>> all.x = TRUE, by = "iss_head"
>> ),
>> 
>> And this does not:
>> 
>> data = g_iss[g_ctpty[links]],
>> 
>> Any guidance would be appreciated.
>> Kind regards,
>> Ivan
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> 
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130618/03795717/attachment.html>


More information about the datatable-help mailing list