[datatable-help] possible FR: in x[y], switch to nomatch=0 instead of failing with "Error in vecseq..."
Frank Erickson
FErickson at psu.edu
Mon Oct 14 07:02:12 CEST 2013
Thanks for pointing that out. I didn't know about (= think to search
for) that global option. I think I'll leave it as NA since, as you say,
it's reasonably useful.
I forgot that people may want to switch to allow.cartesian = TRUE (although
I never find myself wanting to use this) after seeing the error. So, a
modified (very minor) FR: have the error message suggest switching to
nomatch=0 (because this is what I personally find myself switching to after
I see the error, though I don't know how common that choice is...).
I still don't understand the mention of "duplicate key values in i" in the
message, as the problem seems to be with duplicated values in x (at least
in my example above).
--Frank
On Mon, Oct 14, 2013 at 12:42 AM, Michael Nelson <
michael.nelson at sydney.edu.au> wrote:
>
> The default argument to nomatch is `'getOption("datatable.nomatch")`. The
> default value for this is `NA`.
>
> If you want to change this option, simply set `options(datatable.nomatch
> = 0)`, then the default will be as you want.
>
> I think the current datatable.nomatch = NA is reasonable, as you are
> often interested in non-matches as well as matches.
>
> x[y, nomatch=NA] to give a error in your case, then follow the advice of
> the error message and run
>
> x[y, nomatch=NA, allow.cartesian = TRUE]
>
>
>
>
>
> ------------------------------
> *From:* datatable-help-bounces at lists.r-forge.r-project.org [
> datatable-help-bounces at lists.r-forge.r-project.org] on behalf of Frank
> Erickson [FErickson at psu.edu]
> *Sent:* Monday, 14 October 2013 1:03 PM
> *To:* data.table source forge
> *Subject:* [datatable-help] possible FR: in x[y], switch to nomatch=0
> instead of failing with "Error in vecseq..."
>
> I don't know if this error shows up in other cases, but I always see it
> when I'm about to do
>
> x[y,b:=b]
>
> but first want to check how
>
> x[y]
>
> looks before creating or overwriting x$b. Here's an example:
>
> x <- data.table(a=rep(2:3,2),key='a')
> y <- data.table(a=1:4,b=4:1,key='a')
>
> x[y] # error
> x[y,nomatch=0] # ok
> x[y,b:=b] # ok
>
> I'd prefer to see the first attempt mapped to the second (with a
> suitable message), instead of erroring out. What do you all think? Is that
> reasonable/worthwhile?
>
> Best,
>
> Frank
>
> P.S. One other point, regarding the message itself (reproduced down
> below): I don't understand why repeated values in i are mentioned.
>
> -- For x[y] in my example, the problem seems to be coming from x having
> repeated rows, not i (y in this case);
> -- whereas y[x] works just fine (despite the repeated/duplicated values in
> i...which is x here).
>
> Error in vecseq(f__, len__, if (allow.cartesian) NULL else
> as.integer(max(nrow(x), :
> Join results in 6 rows; more than 4 = max(nrow(x),nrow(i)). Check for
> duplicate key values in i, each of which join to the same group in x over
> and over again. If that's ok, try including `j` and dropping `by`
> (by-without-by) so that j runs for each group to avoid the large
> allocation. If you are sure you wish to proceed, rerun with
> allow.cartesian=TRUE. Otherwise, please search for this error message in
> the FAQ, Wiki, Stack Overflow and datatable-help for advice.
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131014/37242fe3/attachment.html>
More information about the datatable-help
mailing list