[datatable-help] cartesian product

Sasha Goodman sashag at stanford.edu
Thu Jan 26 03:12:37 CET 2012


This is a feature request. I'm making a record linkage routine, and would
find a cartesian product function useful. In base R, this can be done using
the merge function:

> x = data.frame(name = c("JOE","ANN","HARRY"), age =c(20,20,30) );
> y = data.frame(name = c("JOE","ANN","MIKE","LARRY"), age =c(20,20,30,30)
);
>
> merge(x,y, by.x=NULL, by.y=NULL)
   name.x age.x name.y age.y
1     JOE    20    JOE    20
2     ANN    20    JOE    20
3   HARRY    30    JOE    20
4     JOE    20    ANN    20
5     ANN    20    ANN    20
6   HARRY    30    ANN    20
7     JOE    20   MIKE    30
8     ANN    20   MIKE    30
9   HARRY    30   MIKE    30
10    JOE    20  LARRY    30
11    ANN    20  LARRY    30
12  HARRY    30  LARRY    30

However, in data.table this does not work:

> x = data.table(name = c("JOE","ANN","HARRY"), age =c(20,20,30) );
> y = data.table(name = c("JOE","ANN","MIKE","LARRY"), age =c(20,20,30,30)
);
> merge(x,y,by.x=NULL, by.y=NULL)
Error in merge.data.table(x, y, by.x = NULL, by.y = NULL) :
  Can not match keys in x and y to automatically determine appropriate `by`
parameter. Please set `by` value explicitly.

I've gotten around this with a hack using expand.grid and several left
joins, by the way.

pairs = data.table(expand.grid(1:nrow(x),1:nrow(y)))

....
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20120125/d84c0a02/attachment.htm>


More information about the datatable-help mailing list