[datatable-help] Error in coercing matrices within j expressions

Nathaniel Graham npgraham1 at gmail.com
Wed Sep 18 00:14:36 CEST 2013


It hadn't occurred to me to use CJ(), so I'll tinker with that this evening
and
see if there are any gains to be made there.  In theory it's highly
parallelizable,
and one of the posts Matthew points to in his comments (in the post you
reference) shows a way that it can be done (using the old multicore library,
so I'm not exactly sure how it maps to the parallel library).  In my case
though, the whole process appears to be memory bound rather than CPU
bound.  Since my machine is fairly optimal (i7-4770 with 4x8GB DDR3-1600),
I just don't think it's going to get dramatically faster.  That doesn't mean
I won't try...

-------
Nathaniel Graham
npgraham1 at gmail.com
npgraham1 at uky.edu


On Tue, Sep 17, 2013 at 5:52 PM, Frank Erickson <FErickson at psu.edu> wrote:

> Maybe not ultrafast, but with nice syntax:
>
> CJ(i=iset,j=jset)[criterion(i,j)]
>
> I guess it should be parallelizable, but that wouldn't be with data.table,
> if I understand this correctly:
> http://stackoverflow.com/questions/14759905/data-table-and-parallel-computing
>
>
> On Tue, Sep 17, 2013 at 5:42 PM, Nathaniel Graham <npgraham1 at gmail.com>wrote:
>
>> Oops; I meant to reply to all, and then forgot after I discarded and
>> rewrote my
>> message a few times.  I suspect (although I'm not absolutely certain)
>> that if
>> NULL or similar did the same thing as returning a 0-row data.table with
>> the
>> appropriate number of columns, some operations could be sped up a bit.
>> In those cases, the data.table code wouldn't need to check the number and
>> type of the columns returned.
>>
>> I suspect that unless someone knows a secret, ultrafast way to iterate
>> through
>> a list of all combinations of a set of items and return the subset of
>> those that
>> match some criteria, that I'm as close to optimal as I'm likely to get
>> right now.
>>
>>
>> -------
>> Nathaniel Graham
>> npgraham1 at gmail.com
>> npgraham1 at uky.edu
>>
>>
>> On Tue, Sep 17, 2013 at 5:22 PM, Frank Erickson <FErickson at psu.edu>wrote:
>>
>>> Well, rbindlist(list()) says "Null data.table" (though it doesn't pass
>>> the is.null() test). Maybe someone else has an idea how to deal with the
>>> no-results case. By the way, it's best to use "reply to all" to make sure
>>> you reply to the mailing list, too; they should be able to see your message
>>> quoted below, though.
>>>
>>> --Frank
>>>
>>>
>>> On Tue, Sep 17, 2013 at 5:03 PM, Nathaniel Graham <npgraham1 at gmail.com>wrote:
>>>
>>>> Frank,
>>>>
>>>> Thanks.  This seems to have done the trick, so long as I'm careful to
>>>> check for
>>>> zero-length lists and return data.table(i = integer(), j = integer())
>>>> in those
>>>> cases.  Essentially, I have to test every combination of i and j to see
>>>> if it's
>>>> "interesting" or not, and some groups have a lot of rows.  At the
>>>> moment I'm
>>>> attacking some other low hanging fruit, like speeding up the comparisons
>>>> I have to do.
>>>>
>>>> As a side note, it would be kind of nice if there was a simple way to
>>>> clue
>>>> data.table to the fact that there are no rows to return, like returning
>>>> NULL
>>>> or NA or similar.
>>>>
>>>> -------
>>>> Nathaniel Graham
>>>> npgraham1 at gmail.com
>>>> npgraham1 at uky.edu
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130917/ea70383a/attachment-0001.html>


More information about the datatable-help mailing list