[datatable-help] Performance observation
Matthew Dowle
mdowle at mdowle.plus.com
Tue May 28 20:11:02 CEST 2013
Hi,
Yes this is expected because `[.data.table` is a function call
with associated overhead. You don't want to loop calls to it. Consider
all the arguments to `[.data.table` and all the checks that must be done
for existence and type of arguments on each call. The idea is to give
[.data.table meaty calls which it can chew on. It doesn't like tiny
tasks one at a time.
`[[` on the other hand is an R primitive. It's
part of the language. You can do very limited things with `[[` but in
this case (looking up a single column by name or position) in a loop,
that's best for the job. I use `[[` on data.table quite a lot.
This is
also the very reason for set()'s existence: ?set says it's a 'loopable
:=' because of the `[.data.table` overhead.
There's a feature request
to detect when [.data.table is being looped, though :
https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2028&group_id=240&atid=978
which would be more helpful of data.table, so at least it told you,
rather than having to stumble across it.
Hope that helps,
Matthew
On 28.05.2013 18:37, Alexandre Sieira wrote:
> I was working on some
code today and encountered this scenario here where the performance
behavior of data.table surprised me a little. Is this expected?
>
>>
dt = data.table(a=rnorm(1000000))
>
>> system.time( for(i in 1:100000)
j = dt[i, a] )
>
> usuário sistema decorrido
>
> 78.064 0.426 78.034
>
>> system.time( for(i in 1:100000) j = dt[i, "a", with=F] )
>
>
usuário sistema decorrido
>
> 27.814 0.154 27.810
>
>> system.time(
for(i in 1:100000) j = dt[["a"]][i] )
>
> usuário sistema decorrido
>
> 1.227 0.006 1.225
> (sorry about the output in portuguese)
> Not
knowing anything about how data.table is implemented internally, I would
have assumed the three syntaxes for accessing the data.table should have
similar or at the most a small difference in performance.
>
> --
>
Alexandre Sieira
> CISA, CISSP, ISO 27001 Lead Auditor
>
> "The truth
is rarely pure and never simple."
> Oscar Wilde, The Importance of Being
Earnest, 1895, Act I
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130528/ada90204/attachment.html>
More information about the datatable-help
mailing list