[datatable-help] misc join questions
Johann Hibschman
jhibschman+r at gmail.com
Thu May 26 16:03:10 CEST 2011
I've run into a few questions about joins.
1. In x[i, ], how does data.table decide if "i" is meant to be an
expression, or an integer/logical/data.table?
In practice, it seems to work fine, but I worry about accidental
maskings, like if I do:
filter <- x$date > 10
x[filter,]
What happens if x has a column named "filter"? Which takes precedence?
2. Is there an easy way to join two tables, yet be protected from
unexpected keys?
For example, I often make a date-value lookup table, like
y <- data.table(date=blah, val1=blah, val2=blah, key="date")
Then I want to merge in the values with a new table, x, like:
(data.frame syntax) merge(x, y, by="date")
If x has no key, and I know the first column is date, or x has a key
and the first column in the key is date, I can do
y[x]
However, I worry that I will set a different key on x, while doing some
operation elsewhere, in which case y[x] will give nonsense. To be
extra-safe, I can do something like
cbind(x, y[J(x$date),][, -1, with=FALSE])
where the "[, -1, with=FALSE]" is to remove the date column from the
join result, so I don't end up with two date columns in my result. I
find this very ugly, but I can't find a better way. What would you
recommend?
3. Does setting a new key on a table create a copy?
If I do,
f <- function (x) {
y <- create.lookup.table()
setkey(x, date)
y[x]
}
will I create a copy of x by setting the key? In general, what
operations create copies? Is there anything that operates on
references that I have to look out for?
Thanks,
Johann
More information about the datatable-help
mailing list