[datatable-help] [R] [R-pkgs] new package 'bit64' - 1000x faster than 'int64' sponsored by Google

Stavros Macrakis macrakis at alum.mit.edu
Wed Feb 22 21:11:37 CET 2012


Jens,

Congratulations on the release of the bit64 package!  It sounds like it has
many important technical advantages over the int64 package.  You also say:

Package 'bit64' has the following advantages over package 'int64' (which
was sponsored by Google):
...
- pure GPL, no copyrights from transnational commercial company


I am not sure I understand your point here.  *Every *piece of GPL software
has copyright notices on it, based on who wrote it (in this case, Romain
Francois) or who paid for writing it (apparently Google Inc. funded
development and received a work for hire copyright).  The critical issue is
not who holds the copyright, but what license the software is provided
under.  The int64 package is just as much a "pure" GPL package as the bit64
package.

The advantage to Google of having a copyright on the software is that
Google can perhaps use it internally without being bound by the terms of
the GPL (depending on their agreement with Francois), or license it to
others under other licenses.  But of course you, as the author of bit64,
can do exactly the same thing.

As for "I happly donate the code and drop this package." -- by licensing
your code under GPL, you have already given R Core the right to incorporate
your package into R (which is itself licensed under GPL).

               -s

Package 'bit64' has the following advantages over package 'int64' (which
> was sponsored by Google):
> - true atomic vectors usable with length, dim, names etc.
> - only S3, not S4 class system used to dispatch methods
> - less RAM consumption by factor 7 (under 64 bit OS)
> - faster operations by factor 4 to 2000 (under 64 bit OS)
> - no slow-down of R's garbage collection (as caused by the pure existence
> of 'int64' objects)
> - pure GPL, no copyrights from transnational commercial company
>
> While the advantage of the atomic S3 design over the complicated S4 object
> design is obvious, it is less obvious that an external package is the best
> way to enrich R with 64bit integers. An external package will not give us
> literals such as 1LL or directly allow us to address larger vectors than
> possible with base R. But it allows us to properly address larger vectors
> in other packages such as 'ff' or 'bigmemory' and it allows us to properly
> work with large surrogate keys from external databases. An external package
> realizing just one data type also makes a perfect test bed to play with
> innovative performance enhancements. Performance tuned sorting and hashing
> are planned for the next release, which will give us fast versions of sort,
> order, merge, duplicated, unique, and table - for 64bit integers.
>
> For those who still hope that R's 'integer' will be 64bit some day, here
> is my key learning: migrating R's 'integer' from 32 to 64 bit would be RAM
> expensive. It would most likely require to also migrate R's 'double' from
> 64 to 128 bit - in order to again have a data type to which we can lossless
> coerce. The assumption that 'integer' is a proper subset of 'double' is
> scattered over R's semantics. We all expect that binary and n-ary functions
> such as '+' and 'c' do return 'double' and do not destroy information. With
> solely extending 64bit integers but not 128bit doubles, we have semantic
> changes potentially disappointing such expectations: integer64+double
> returns integer64 and does kill decimals. I did my best to make operations
> involving integer64 consistent and numerically stable - please consult the
> documentation at ?bit64 for details.
>
> Since this package is 'at risk' to create a lot of dependencies from other
> packages, I'd appreciate serious  beta-testing and also code-review,
> ideally from the R-Core team. Please check the 'Limitations' sections at
> the help page and the numerics involving "long double" in C. If the
> conclusion is that this should be better done in Base R - I happly donate
> the code and drop this package. If we have to go with an external package
> for 64bit integers, it would be great if this work could convince the Rcpp
> team including Romain about the advantages of this approach. Shouldn't we
> join forces here?
>
> Best regards
>
> Jens Oehlschlägel
> Munich, 21.2.2012
>
> ______________________________**_________________
> R-packages mailing list
> R-packages at r-project.org
> https://stat.ethz.ch/mailman/**listinfo/r-packages<https://stat.ethz.ch/mailman/listinfo/r-packages>
>
> ______________________________**________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide http://www.R-project.org/**
> posting-guide.html <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20120222/b645bd6f/attachment.html>


More information about the datatable-help mailing list