[Rcpp-devel] Regular Expressions

Gabor Grothendieck ggrothendieck at gmail.com
Tue Mar 5 02:18:52 CET 2013


On Fri, Mar 1, 2013 at 8:56 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
>
> Gabor,
>
> Here is a quick variant of one of the Boost regexp examples, particularly
> http://www.boost.org/doc/libs/1_53_0/libs/regex/example/snippets/credit_card_example.cpp
>
> // cf www.boost.org/doc/libs/1_53_0/libs/regex/example/snippets/credit_card_example.cpp
>
> #include <Rcpp.h>
>
> #include <string>
> #include <boost/regex.hpp>
>
> bool validate_card_format(const std::string& s) {
>    static const boost::regex e("(\\d{4}[- ]){3}\\d{4}");
>    return boost::regex_match(s, e);
> }
>
> const boost::regex e("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z");
> const std::string machine_format("\\1\\2\\3\\4");
> const std::string human_format("\\1-\\2-\\3-\\4");
>
> std::string machine_readable_card_number(const std::string& s) {
>    return boost::regex_replace(s, e, machine_format, boost::match_default | boost::format_sed);
> }
>
> std::string human_readable_card_number(const std::string& s) {
>    return boost::regex_replace(s, e, human_format, boost::match_default | boost::format_sed);
> }
>
> // [[Rcpp::export]]
> Rcpp::DataFrame regexDemo(std::vector<std::string> s) {
>     int n = s.size();
>
>     std::vector<bool> valid(n);
>     std::vector<std::string> machine(n);
>     std::vector<std::string> human(n);
>
>     for (int i=0; i<n; i++) {
>         valid[i]  = validate_card_format(s[i]);
>         machine[i] = machine_readable_card_number(s[i]);
>         human[i] = human_readable_card_number(s[i]);
>     }
>     return Rcpp::DataFrame::create(Rcpp::Named("input") = s,
>                                    Rcpp::Named("valid") = valid,
>                                    Rcpp::Named("machine") = machine,
>                                    Rcpp::Named("human") = human);
> }
>
> which we can test with the same input as the example has:
>
>
> R> Rcpp::sourceCpp('/tmp/boostreex.cpp')
> R> s <- c("0000111122223333", "0000 1111 2222 3333", "0000-1111-2222-3333", "000-1111-2222-3333")
> R> regexDemo(s)
>                 input valid          machine               human
> 1    0000111122223333 FALSE 0000111122223333 0000-1111-2222-3333
> 2 0000 1111 2222 3333  TRUE 0000111122223333 0000-1111-2222-3333
> 3 0000-1111-2222-3333  TRUE 0000111122223333 0000-1111-2222-3333
> 4  000-1111-2222-3333 FALSE  000111122223333  000-1111-2222-3333
> R>
>
> On Linux, you generally don't have to do anything to get Boost headers as
> they end up in /usr/include (or /usr/local/include) so for me, this just
> builds.  For R on Windows, you are quite likely to get by with the
> CRAN-provided boost tarball and an additional -I$(BOOSTLIB) etc.
>

I had no luck with sourceCpp or inline on Windows but did manage to
use R CMD SHLIB to build a dll which loads into 32-bit R (it does not
currently load in 64-bit R - haven't yet figured out why) and seems to
run ok there.  To get it to work I did replace the first and last
statements in regexDemo with these two respectively so that all inputs
and outputs are SEXPs:

    extern "C" SEXP regexDemo(SEXP ss) {
       std::vector<std::string> s = Rcpp::as<std::vector<std::string> >(ss);

    return Rcpp::wrap(Rcpp::DataFrame::create(Rcpp::Named("input") = s,
                                   Rcpp::Named("valid") = valid,
                                   Rcpp::Named("machine") = machine,
                                   Rcpp::Named("human") = human));

This isn't quite as nice as what you had but at least it runs.


--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


More information about the Rcpp-devel mailing list