[datatable-help] Memory usage of data.table chaining

Arunkumar Srinivasan aragorn168b at gmail.com
Fri Feb 20 23:36:30 CET 2015


Hi Mick,

Hope it went great!

Yes, this isn’t particularly memory efficient as you materialise the first subset, only to subset again with your second condition. The query within `[…]` can be optimised much easier when compared to chained expressions. 
What’s the rationale here for doing it this way? To take advantage of automatic indexing? It’d be great to have auto indexing optimised for complex expressions like `typeID %in% showID & transactionType == side` but until then, setting key and subsetting would be the best way. 

HTH,
Arun

On 20 Feb 2015 at 16:42:51, Mick Cooney (mickcooney at gmail.com) wrote:

I gave a talk about data.table last night to Dublin R and got a very  
interesting question at the end of it that I hadn't thought of before.  

I was showing how you can chain operations together in nice concise  
one liners, the specific example I gave was:  

show.dt <- trade.dt[typeID %in% showID]  
[transactionType == side]  
[, list(transactionID, transactTime,  
transactionType,  
typeID, typeName, quantity, price)];  

print(tail(show.dt, n = count));  

This code is written for the game Eve Online and is used to show the  
last n number of trades on one side of a trade that my character had  
done, and I used it as an example of operation chaining.  

I was asked at the end of talk if the chaining of the typeID and the  
transactionType was any different to using a logical AND, and my  
response was that I wasn't sure, but I figured it might be, as doing  
the logical AND would invoke a vector scan.  

He then asked about memory use, so in the above example, do all the  
subcopies of the tables get kept in memory during the invocation, in  
effect mushrooming the amount of memory required?  

If that was the case, I could imagine that for large tables it might  
be worth going with the logical operation to prevent the multiple  
copies being made?  


--  
Mick Cooney  
mickcooney at gmail.com  
_______________________________________________  
datatable-help mailing list  
datatable-help at lists.r-forge.r-project.org  
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20150220/9368eb11/attachment-0001.html>


More information about the datatable-help mailing list