[datatable-help] Question on data.table::chmatch and fastmatch::fmatch

stat quant statquant at outlook.com
Tue Jan 29 18:46:50 CET 2013


Hello all,
I have a lot of character columns in my data.table (usually only a few
factors like: 5e6 values drawn from {"A","B","C","D"}).
Looking on page 7-8 of the package vignette M.Dowle mention that:

   - the package fastmatch is a faster alternative for string lookups,
   using fastmatch::fmatch will build a hash map and will speed up things
   considerably
   - ...but poinpoint that the first pass is less efficient (compared to
   data.table::chmatch)
   - and finish by saying that he suggested Simon Urbanek (the fastmatch
   package maintainer) to adopt chmatch for the first call.

I have a few questions regarding data.table/fastmatch:

   - if I use something like DT[ fmatch(X,"A"),...], shall I expect
   lightening-quick subsequent selects, I mean, would DT[
   fmatch(X,c("B","D")),...] be much quicker (the select part of if)
   - Are M.D or Simon Urbanek planing to use one-another code to enhance
   both package ?


Thanks for reading
Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130129/11f91cf8/attachment.html>


More information about the datatable-help mailing list