<span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13.333333969116211px;background-color:rgb(255,255,255)">Hello all,</span><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13.333333969116211px;background-color:rgb(255,255,255)">
I have a lot of character columns in my data.table (usually only a few factors like: 5e6 values drawn from {"A","B","C","D"}).</div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13.333333969116211px;background-color:rgb(255,255,255)">
Looking on page 7-8 of the package vignette M.Dowle mention that:</div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13.333333969116211px;background-color:rgb(255,255,255)"><ul><li style="margin-left:15px">
the package fastmatch is a faster alternative for string lookups, using fastmatch::fmatch will build a hash map and will speed up things considerably</li><li style="margin-left:15px">...but poinpoint that the first pass is less efficient (compared to data.table::chmatch)</li>
<li style="margin-left:15px">and finish by saying that he suggested Simon Urbanek (the fastmatch package maintainer) to adopt chmatch for the first call.</li></ul><div>I have a few questions regarding data.table/fastmatch:</div>
<div><ul><li style="margin-left:15px">if I use something like DT[ fmatch(X,"A"),...], shall I expect lightening-quick subsequent selects, I mean, would DT[ fmatch(X,c("B","D")),...] be much quicker (the select part of if)</li>
<li style="margin-left:15px">Are M.D or Simon Urbanek planing to use one-another code to enhance both package ?</li></ul><div><br></div><div>Thanks for reading</div><div>Regards</div></div></div>