[datatable-help] misc join questions

Matthew Dowle mdowle at mdowle.plus.com
Thu May 26 17:33:41 CEST 2011


1. See FAQ 2.12

2. In practice I haven't experienced this problem but I see the concern. An 
option could be added "checkjoinnames" (or better name) which would issue a 
warning if the columns used in the key had different names. Perhaps it would 
take value 0 (don't check), 1 (warning) and 2 (error). The argument to 
[.data.table would default to getOption("datatable.checkjoinnames"), 
permitting a global setting, or per-query setting as desired.  Would that 
work?

3 i) Yes, and FR#1006 is to improve that.  Nudges like this help to 
encourage (thanks) :
https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1006&group_id=240&atid=978
3 ii) Watch out for the 5 general examples on the wiki and fully understand 
them.  Most of those differences are due to copying, one way or another :
http://rwiki.sciviews.org/doku.php?id=packages:cran:data.table
You might also find that as.data.table() is more efficient than 
data.table(); the former just changes the class. Also, if you know (for 
sure) the table is already sorted,  you can set the "sorted" attribute 
directly, which avoids the overhead of the (current) copy in setkey().


"Johann Hibschman" <jhibschman+r at gmail.com> wrote in message 
news:u1o62oxd0f5.fsf at ld-chrate28.citadelgroup.com...
> I've run into a few questions about joins.
>
> 1. In x[i, ], how does data.table decide if "i" is meant to be an
>   expression, or an integer/logical/data.table?
>
> In practice, it seems to work fine, but I worry about accidental
> maskings, like if I do:
>
>  filter <- x$date > 10
>  x[filter,]
>
> What happens if x has a column named "filter"?  Which takes precedence?
>
>
> 2. Is there an easy way to join two tables, yet be protected from
>   unexpected keys?
>
> For example, I often make a date-value lookup table, like
>
>  y <- data.table(date=blah, val1=blah, val2=blah, key="date")
>
> Then I want to merge in the values with a new table, x, like:
>
>  (data.frame syntax) merge(x, y, by="date")
>
> If x has no key, and I know the first column is date, or x has a key
> and the first column in the key is date, I can do
>
>  y[x]
>
> However, I worry that I will set a different key on x, while doing some
> operation elsewhere, in which case y[x] will give nonsense.  To be
> extra-safe, I can do something like
>
>  cbind(x, y[J(x$date),][, -1, with=FALSE])
>
> where the "[, -1, with=FALSE]" is to remove the date column from the
> join result, so I don't end up with two date columns in my result.  I
> find this very ugly, but I can't find a better way.  What would you
> recommend?
>
>
> 3. Does setting a new key on a table create a copy?
>
> If I do,
>
>  f <- function (x) {
>    y <- create.lookup.table()
>    setkey(x, date)
>    y[x]
>  }
>
> will I create a copy of x by setting the key?  In general, what
> operations create copies?  Is there anything that operates on
> references that I have to look out for?
>
>
> Thanks,
> Johann 





More information about the datatable-help mailing list