[datatable-help] possible FR: in x[y], switch to nomatch=0 instead of failing with "Error in vecseq..."

Michael Nelson michael.nelson at sydney.edu.au
Mon Oct 14 06:42:00 CEST 2013


The default argument to nomatch is `'getOption("datatable.nomatch")`. The default value for this is `NA`.

If you want to change this option, simply set `options(datatable.nomatch = 0)`, then the default will be as you want.

I think the current datatable.nomatch = NA is reasonable, as you are often interested in non-matches as well as matches.

x[y, nomatch=NA] to give a error in your case, then follow the advice of the error message and run

x[y, nomatch=NA, allow.cartesian = TRUE]





________________________________
From: datatable-help-bounces at lists.r-forge.r-project.org [datatable-help-bounces at lists.r-forge.r-project.org] on behalf of Frank Erickson [FErickson at psu.edu]
Sent: Monday, 14 October 2013 1:03 PM
To: data.table source forge
Subject: [datatable-help] possible FR: in x[y], switch to nomatch=0 instead of failing with "Error in vecseq..."

I don't know if this error shows up in other cases, but I always see it when I'm about to do

x[y,b:=b]

but first want to check how

x[y]

looks before creating or overwriting x$b. Here's an example:

x <- data.table(a=rep(2:3,2),key='a')
y <- data.table(a=1:4,b=4:1,key='a')

x[y]           # error
x[y,nomatch=0] # ok
x[y,b:=b]      # ok

I'd prefer to see the first attempt mapped to the second (with a suitable message), instead of erroring out. What do you all think? Is that reasonable/worthwhile?

Best,

Frank

P.S. One other point, regarding the message itself (reproduced down below): I don't understand why repeated values in i are mentioned.

-- For x[y] in my example, the problem seems to be coming from x having repeated rows, not i (y in this case);
-- whereas y[x] works just fine (despite the repeated/duplicated values in i...which is x here).

Error in vecseq(f__, len__, if (allow.cartesian) NULL else as.integer(max(nrow(x),  :
  Join results in 6 rows; more than 4 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that's ok, try including `j` and dropping `by` (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131014/7c14521a/attachment-0001.html>


More information about the datatable-help mailing list