<html><head></head><body><div style="color:#000; background-color:#fff; font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, Sans-Serif;font-size:13px"><div id="yui_3_16_0_1_1451573513750_7042">Hi,</div><div id="yui_3_16_0_1_1451573513750_7084">I have some trouble in understanding the data.table procedure for joining two tables. Let me start by taking up two example data tables :</div><div id="yui_3_16_0_1_1451573513750_7183"><br></div><div id="yui_3_16_0_1_1451573513750_7168" dir="ltr">library(data.table)<br class="" id="yui_3_16_0_1_1451573513750_7121">############ the first data.table example<br class="" id="yui_3_16_0_1_1451573513750_7123">mt<-data.table(mtcars)<br class="" id="yui_3_16_0_1_1451573513750_7125">## some modifications to the data.table<br class="" id="yui_3_16_0_1_1451573513750_7127">s1<-1:32;s1[seq(2,32,by=2)]<-NA<br class="" id="yui_3_16_0_1_1451573513750_7129">mt[,"cntrl":=s1];mt[,"cylO":=cyl];mt[,"cyl":=cyl*2]<br class="" id="yui_3_16_0_1_1451573513750_7131">setkey(mt,gear,carb,cylO,cntrl)<br class="" id="yui_3_16_0_1_1451573513750_7133">mt<br class="" id="yui_3_16_0_1_1451573513750_7135">## More modifications<br class="" id="yui_3_16_0_1_1451573513750_7137">mt[gear == 3 & carb ==3 & cylO == 8 & mpg == 16.4,cntrl:=14]<br class="" id="yui_3_16_0_1_1451573513750_7139">str(mt)<br class="" id="yui_3_16_0_1_1451573513750_7141">mt<br class="" id="yui_3_16_0_1_1451573513750_7143">############## the second data.table example<br class="" id="yui_3_16_0_1_1451573513750_7145">nt<-data.table(gear= c(3,3,3),carb=c(1,3,3),cylO=c(4,8,8),price=c(11,44,55),cntrl=c(21,13,14))<br class="" id="yui_3_16_0_1_1451573513750_7147">setkey(nt,gear,carb,cylO,cntrl)<br class="" id="yui_3_16_0_1_1451573513750_7149">############# merging as a data frame<br class="" id="yui_3_16_0_1_1451573513750_7151">rdJoin<-merge.data.frame(mt,nt,by.x=c("gear","carb","cylO","cntrl"),by.y=c("gear","carb","cylO","cntrl"),all.x=TRUE)<br class="" id="yui_3_16_0_1_1451573513750_7153">str(rdJoin)<br class="" id="yui_3_16_0_1_1451573513750_7155">rdJoin<br class="" id="yui_3_16_0_1_1451573513750_7157">############## questions<br class="" id="yui_3_16_0_1_1451573513750_7159"># What is the data.table command to get rdJoin?<br class="" id="yui_3_16_0_1_1451573513750_7161"># How is it possible to specify the key variables for the join -see below<br class="" id="yui_3_16_0_1_1451573513750_7163"># For example, c("gear","carb") c("gear","carb","cylO") etc.<br class="" id="yui_3_16_0_1_1451573513750_7165"># Also, where the variables have different names in the two tables<br class="" id="yui_3_16_0_1_1451573513750_7167"># For example, if the cntrl variable in the first DT is "cntrl1" and "cntrl2" in the second</div><div id="yui_3_16_0_1_1451573513750_7192" dir="ltr"><br></div><div id="yui_3_16_0_1_1451573513750_7193" dir="ltr">Let me elaborate on te questions shown above. First, I would like to start with some general questions :</div><div id="yui_3_16_0_1_1451573513750_7201" dir="ltr">1. In the documentation for data.table (which includes the vignettes available so far), it is mentioned that it is sufficient if one of the two data tables being considered has keys. This is a bit confusing. The straightforward situation is if both the tables have keys. When would it be of advantage to have keys for just one of them? It would be nice if this can be explained in the to-be-released vignette on joins.</div><div id="yui_3_16_0_1_1451573513750_7276" dir="ltr">2. The merge command in base R is very clear and easy to understand. It would be nice if the data table procedure is transparent in the same way. To start with, I would like to know how I can do the following things with data table :</div><div id="yui_3_16_0_1_1451573513750_7344" dir="ltr"> (i) the data.table equivalent of the base R command</div><div id="yui_3_16_0_1_1451573513750_7343" dir="ltr"> merge.data.frame(mt,nt,by.x=c("gear","carb","cylO","cntrl"),by.y=c("gear","carb","cylO","cntrl"),all.x=TRUE)</div><div id="yui_3_16_0_1_1451573513750_7354" dir="ltr"> (ii) How it is possible to choose the number of key variables from a list :</div><div id="yui_3_16_0_1_1451573513750_7371" dir="ltr"> c("gear","carb") c("gear","carb","cylO") c("gear","carb","cylO","cntrl")</div><div id="yui_3_16_0_1_1451573513750_7475" dir="ltr"> It is very clear in the merge command how this is done. How to do that with data.table?<br></div><div id="yui_3_16_0_1_1451573513750_7398" dir="ltr"> The on argument can be used for one of the tables. How can it be specified for the other? That is, without having to use the setkey command each time a change is needed.</div><div id="yui_3_16_0_1_1451573513750_7415" dir="ltr"> (iii) How can this be done if the key variables in the two lists have different names? That is, if the cntrl variable in the first DT is "cntrl1" and "cntrl2" in the second, for example.</div><div id="yui_3_16_0_1_1451573513750_7485" dir="ltr"><br></div><div id="yui_3_16_0_1_1451573513750_7486" dir="ltr">I have found the data.table package to be very useful. It would be nice if I can understand its use better.</div><div id="yui_3_16_0_1_1451573513750_7504" dir="ltr"><br></div><div dir="ltr">Thanks for any help that I can get.</div><div dir="ltr">Ravi<br></div><div id="yui_3_16_0_1_1451573513750_7487" dir="ltr"><br></div><div id="yui_3_16_0_1_1451573513750_7488" dir="ltr"><br></div><div id="yui_3_16_0_1_1451573513750_7251" dir="ltr"><br></div><div id="yui_3_16_0_1_1451573513750_7252" dir="ltr"><br></div></div></body></html>