[datatable-help] misc join questions

Johann Hibschman jhibschman+r at gmail.com
Thu May 26 16:03:10 CEST 2011


I've run into a few questions about joins.

1. In x[i, ], how does data.table decide if "i" is meant to be an
   expression, or an integer/logical/data.table?

In practice, it seems to work fine, but I worry about accidental
maskings, like if I do:

  filter <- x$date > 10
  x[filter,]

What happens if x has a column named "filter"?  Which takes precedence?


2. Is there an easy way to join two tables, yet be protected from
   unexpected keys?

For example, I often make a date-value lookup table, like

  y <- data.table(date=blah, val1=blah, val2=blah, key="date")

Then I want to merge in the values with a new table, x, like:

  (data.frame syntax) merge(x, y, by="date")

If x has no key, and I know the first column is date, or x has a key
and the first column in the key is date, I can do

  y[x]

However, I worry that I will set a different key on x, while doing some
operation elsewhere, in which case y[x] will give nonsense.  To be
extra-safe, I can do something like

  cbind(x, y[J(x$date),][, -1, with=FALSE])

where the "[, -1, with=FALSE]" is to remove the date column from the
join result, so I don't end up with two date columns in my result.  I
find this very ugly, but I can't find a better way.  What would you
recommend?


3. Does setting a new key on a table create a copy?

If I do,

  f <- function (x) {
    y <- create.lookup.table()
    setkey(x, date)
    y[x]
  }

will I create a copy of x by setting the key?  In general, what
operations create copies?  Is there anything that operates on
references that I have to look out for?


Thanks,
Johann



More information about the datatable-help mailing list