[datatable-help] Column names after self join

Andreas Borg andreas.borg at unimedizin-mainz.de
Wed Mar 30 13:55:49 CEST 2011


Dear list members,

I started incorporating data.tabe into the RecordLinkage package for
speed improvement. Right now I am trying to use a self join on a
data.table to find from a dataset all record pairs that have equal
values for a specified column. An example table:

> dt <- data.table(id=1:4, x1=c("a","a","b","c"), x2=c(1,2,3,3), key="x1")
> dt
     id x1 x2
[1,]  1  a  1
[2,]  2  a  2
[3,]  3  b  3
[4,]  4  c  3

I do a self join to find all pairs of rows with same value for x1:

> dt[dt]
     x1 id x2 id.1 x2.1
[1,]  a  1  1    1    1
[2,]  a  2  2    1    1
[3,]  a  1  1    2    2
[4,]  a  2  2    2    2
[5,]  b  3  3    3    3
[6,]  c  4  3    4    3


The problem comes now: I want to select the columns "id" and "id.1" and
let only rows with id < id.1 pass (which means that each pair appears
only once and a row is not matched to itself). Naturally, this would be:

dt[dt][id < id.1]

but I get an error, because "id.1" is really "id" internally:

> summary.default(dt[dt])
   Length Class  Mode
x1 6      factor numeric
id 6      -none- numeric
x2 6      -none- numeric
id 6      -none- numeric
x2 6      -none- numeric

and also the other components are ambigiuos, so there seems to be no way
to discern between the two "id" columns. I would propose to change this
behaviour to the one of merge, where one gets unambigous column names:

> summary.default(merge(dt, dt, by="x1"))
     Length Class  Mode
id   6      -none- numeric
x1   6      factor numeric
x2   6      -none- numeric
id.1 6      -none- numeric
x2.1 6      -none- numeric

Or is there any other possibility to deal with this?

Anyway, thanks to the developers for creating this useful package!

Best regards,

Andreas



-- 
Andreas Borg
Medizinische Informatik

UNIVERSITÄTSMEDIZIN
der Johannes Gutenberg-Universität
Institut für Medizinische Biometrie, Epidemiologie und Informatik
Obere Zahlbacher Straße 69, 55131 Mainz
www.imbei.uni-mainz.de

Telefon +49 (0) 6131 175062
E-Mail: borg at imbei.uni-mainz.de

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte 
Informationen. Wenn Sie nicht der
richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, 
informieren Sie bitte sofort den
Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die 
unbefugte Weitergabe
dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.




More information about the datatable-help mailing list