[datatable-help] Performance observation

Matthew Dowle mdowle at mdowle.plus.com
Tue May 28 20:11:02 CEST 2013


 

Hi, 

Yes this is expected because `[.data.table` is a function call
with associated overhead. You don't want to loop calls to it. Consider
all the arguments to `[.data.table` and all the checks that must be done
for existence and type of arguments on each call. The idea is to give
[.data.table meaty calls which it can chew on. It doesn't like tiny
tasks one at a time. 

`[[` on the other hand is an R primitive. It's
part of the language. You can do very limited things with `[[` but in
this case (looking up a single column by name or position) in a loop,
that's best for the job. I use `[[` on data.table quite a lot. 

This is
also the very reason for set()'s existence: ?set says it's a 'loopable
:=' because of the `[.data.table` overhead. 

There's a feature request
to detect when [.data.table is being looped, though :


https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2028&group_id=240&atid=978


which would be more helpful of data.table, so at least it told you,
rather than having to stumble across it. 

Hope that helps, 

Matthew


On 28.05.2013 18:37, Alexandre Sieira wrote: 

> I was working on some
code today and encountered this scenario here where the performance
behavior of data.table surprised me a little. Is this expected? 
> 
>>
dt = data.table(a=rnorm(1000000)) 
> 
>> system.time( for(i in 1:100000)
j = dt[i, a] ) 
> 
> usuário sistema decorrido 
> 
> 78.064 0.426 78.034

> 
>> system.time( for(i in 1:100000) j = dt[i, "a", with=F] ) 
> 
>
usuário sistema decorrido 
> 
> 27.814 0.154 27.810 
> 
>> system.time(
for(i in 1:100000) j = dt[["a"]][i] ) 
> 
> usuário sistema decorrido 
>

> 1.227 0.006 1.225 
> (sorry about the output in portuguese) 
> Not
knowing anything about how data.table is implemented internally, I would
have assumed the three syntaxes for accessing the data.table should have
similar or at the most a small difference in performance. 
> 
> -- 
>
Alexandre Sieira
> CISA, CISSP, ISO 27001 Lead Auditor
> 
> "The truth
is rarely pure and never simple."
> Oscar Wilde, The Importance of Being
Earnest, 1895, Act I

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130528/ada90204/attachment.html>


More information about the datatable-help mailing list