[datatable-help] Error in coercing matrices within j expressions

Tue Sep 17 22:22:03 CEST 2013

Hi,

I guess you could put them into a list and then rbind at the end:

indi <- list()
k=1
indi[[k]] <- list(i=2L,j=6L); k <- k+1
indi[[k]] <- list(4L,5L); k <- k+1
rbindlist(indi)
#    i j
# 1: 2 6
# 2: 4 5

For some reason, I couldn't get rbindlist to work unless the first item in
indi had explicit names ("i" and "j"), but names aren't needed for later
items.

This should be better than dynamically growing with rbind each time, but
there may be a faster way. If your criteria for selecting (i,j) can be
written down, there's likely a much faster way than looping like this.

Best,

--Frank

On Tue, Sep 17, 2013 at 2:13 PM, Nathaniel Graham <npgraham1 at gmail.com>wrote:

> I'm currently using a (moderately) complex function, call
> if f(), as a j expression to analyze my data.  The data itself
> is about 1.2M rows, which I analyze by group.
> A group may have as few as one row or as many as 10K.
> The output from the function is a two-column data.table
> where the rows are interesting (for my work) pairs of
> observations--I have no idea how many pairs will be
> interesting until the function runs, but in abstract it could
> be every unique combination (so as many as 50M rows
> of output for one call to f()).  It is common, and not an
> error, for groups to have no meaningful pairs to return.
>
> I've been using the following line to create the output for
> f():
>
> indices <- data.table(i = integer(), j = integer())
>
> I then append to 'indices' any useful pairs using:
>
> indices <- rbind(indices, list(idx[i], idx[j]))
>
> This works, but is very, very slow, in part because I'm
> using rbind().  I want to switch to using the built-in matrix,
> because rbind() should be much faster for them.  Using
> the following line to create the matrix:
>
> indices <- matrix(nrow = 0, ncol = 2, dimnames = list(c(NULL),c("i","j")))
>
> results in the following error:
>
> Logical error. Type of column should have been checked by now
>
> Note that the values returned are always integers.  Results are
> coerced via:
>
> data.table(indices)
>
> before returning from f().  If I don't explicitly coerce, I get the
> following error:
>
> j doesn't evaluate to the same number of columns for each group
>
> If someone could tell me what I'm doing wrong, or some other
> equivalent way to noticeably speed up the whole process, I'd
> be very grateful.
>
>
> -------
> Nathaniel Graham
> npgraham1 at gmail.com
> npgraham1 at uky.edu
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130917/8b761d0d/attachment.html>