[datatable-help] What is the fastest way to determine that data.table is empty

Matthew Dowle mdowle at mdowle.plus.com
Mon Jul 30 22:49:51 CEST 2012


I can't beat that. nrow isn't primitive (unlike length and $), so just
searching for and finding nrow in base is going to take time in a loop.
Having said that, even dim(DT) is relatively slow and dim is primitive, so
that can't be the whole reason.

> DT=data.table(a=1:3,b=4:6)
> system.time(for (i in 1:10000) nrow(DT))
   user  system elapsed
  1.128   0.004   1.138
> system.time(for (i in 1:10000) NROW(DT))
   user  system elapsed
  1.112   0.008   1.123
> system.time(for (i in 1:10000) dim(DT))
   user  system elapsed
  1.056   0.000   1.056
> system.time(for (i in 1:10000) length(DT[[1L]]))
   user  system elapsed
  0.568   0.000   0.572
> system.time(for (i in 1:10000) length(DT$a))  # yours is indeed fastest
   user  system elapsed
  0.168   0.000   0.169

However, are you 100% sure that Rprof() is telling you that's really the
bottleneck? The branch itself, or whatever is done inside the branch,
perhaps. And, you know to vectorize I assume. I usually roll my eyes when
people find significant differences of insignificant times (like this)
unless they really do need to loop for some reason.

My only thought to go faster than length(DT$a), is a direct .External call
to a C function that returns LENGTH(VECTOR_ELT(DT,0)).  As soon as that's
wrapped up in a function call, though, it'll go slower than the primitive
calls in length(DT$a) I guess.

You call also drop the != 0 part.

    if ( length(...) != 0 )

is the same as :

    if ( length(...) )

Matthew


> Hi,
>
> I have found that if(nrow(d.table) != 0) is significantly reduces
> performance of my application.
>
> Could you please advise on faster ways to figure that data.table doesn't
> contains any rows. The fastest I have found is length(d.table$column) !=
> 0.
>
>
> Andrii Riabushenko
> BARCLAYS CAPITAL
> 30 Fizkultury street
> Kiev 03150, Ukraine
> Global Dial: 8593 4077
> External Dial: +380 4459 34077
> Andrii.Riabushenko at barclays.com<mailto:Andrii.Riabushenko at barclays.com>
>
>
>
> _______________________________________________
>
> This e-mail may contain information that is confidential, privileged or
> otherwise protected from
> disclosure. If you are not an intended recipient of this e-mail, do not
> duplicate or redistribute
> it by any means. Please delete it and any attachments and notify the
> sender that you have received
> it in error. Unless specifically indicated, this e-mail is not an offer to
> buy or sell or a
> solicitation to buy or sell any securities, investment products or other
> financial product or
> service, an official confirmation of any transaction, or an official
> statement of Barclays. Any
> views or opinions presented are solely those of the author and do not
> necessarily represent those
> of Barclays. This e-mail is subject to terms available at the following
> link: www.barclays.com/emaildisclaimer.
> By messaging with Barclays you consent to the foregoing.  Barclays offers
> premier investment banking
> products and services to its clients through Barclays Bank PLC, a company
> registered in England
> (number 1026167) with its registered office at 1 Churchill Place, London,
> E14 5HP.  This email may
> relate to or be sent from other members of the Barclays Group.
>
> _______________________________________________
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list