[datatable-help] Memory usage of data.table chaining

Mick Cooney mickcooney at gmail.com
Fri Feb 20 16:42:04 CET 2015


I gave a talk about data.table last night to Dublin R and got a very
interesting question at the end of it that I hadn't thought of before.

I was showing how you can chain operations together in nice concise
one liners, the specific example I gave was:

show.dt <- trade.dt[typeID %in% showID]
                            [transactionType == side]
                            [, list(transactionID, transactTime,
transactionType,
                               typeID, typeName, quantity, price)];

print(tail(show.dt, n = count));

This code is written for the game Eve Online and is used to show the
last n number of trades on one side of a trade that my character had
done, and I used it as an example of operation chaining.

I was asked at the end of talk if the chaining of the typeID and the
transactionType was any different to using a logical AND, and my
response was that I wasn't sure, but I figured it might be, as doing
the logical AND would invoke a vector scan.

He then asked about memory use, so in the above example, do all the
subcopies of the tables get kept in memory during the invocation, in
effect mushrooming the amount of memory required?

If that was the case, I could imagine that for large tables it might
be worth going with the logical operation to prevent the multiple
copies being made?


-- 
Mick Cooney
mickcooney at gmail.com


More information about the datatable-help mailing list