[Rcpp-devel] ColDatum constructors memory safety

Alistair Gee alistair.gee at gmail.com
Fri Mar 19 05:04:15 CET 2010


On Thu, Mar 18, 2010 at 8:03 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
>
> Note to self, no more coding when tired...
>
> On 18 March 2010 at 21:49, Dirk Eddelbuettel wrote:
> | So I became a little more ambitious and made the runit file a fuller test of
> | ColDatum and RcppFrame (both of the old API):
> |
> | test.ColDatum.vector <- function() {
> |     src <- 'std::vector<ColDatum> colDatumVector(3);
> |             colDatumVector[0].setDoubleValue(1.23);
> |             colDatumVector[1].setIntValue(42);
> |             colDatumVector[2].setLogicalValue(0);
> |             std::vector<std::string> names;
> |             names.push_back("A");
> |             names.push_back("B");
> |             names.push_back("C");
> |             RcppFrame fr(names);
> |             fr.addRow(colDatumVector);
> |             RcppResultSet rs;
> |             rs.add("data.frame", fr);
> |             return R_NilValue;';
> |     funx <- cfunction(signature(), src, Rcpp=TRUE)
> |     checkEquals(funx(), NULL, msg = "RcppColDatum.vector")
> | }
>
> That had an extra return R_NilValue; in there we didn't want. I fixed that and renamed it:
>
> R> library(RUnit)
> R> library(inline)
> R> test.RcppFrame <- function() {
> +     src <- 'std::vector<ColDatum> colDatumVector(3);
> +             colDatumVector[0].setDoubleValue(1.23);
> +             colDatumVector[1].setIntValue(42);
> +             colDatumVector[2].setLogicalValue(0);
> +             std::vector<std::string> names;
> +             names.push_back("A");
> +             names.push_back("B");
> +             names.push_back("C");
> +             RcppFrame fr(names);
> +             fr.addRow(colDatumVector);
> +             RcppResultSet rs;
> +             rs.add("data.frame", fr);
> +      return rs.getReturnList();';
> +     funx <- cfunction(signature(), src, Rcpp=TRUE)
> +     dframe <- data.frame(funx()[[1]]) ## needs a data.frame() call on first list elem
> +     checkEquals(dframe, data.frame(A=1.23, B=42, C=FALSE), msg = "RcppColDatum.vector")
> + }
> R> test.RcppFrame()
> Loading required package: Rcpp
> [1] TRUE
> R>
>
> Now -- this works fine __with or without your patch__ so at this point, I am
> still not sure what error your patch is fixing.  Can you help me out?
>
> Many thanks,  Dirk
>
> --
>  Registration is open for the 2nd International conference R / Finance 2010
>  See http://www.RinFinance.com for details, and see you in Chicago in April!
>

I'm not sure that you can verify the fix with an empirical test, short
of using something like valgrind b/c the bug would occur only due to
chance:

Here is a simpler scenario:

  ColDatum c1;       // via the default constructor
  ColDatum c2(c1);  // via the copy constructor

When c1 is constructed via the default constructor (and without my
patch), only the level field is initialized. The other fields, such as
type, numLevels, and levelNames are not initialized, so they can be
garbage values. So, if by chance, c1.type == COLTYPE_FACTOR, then when
c2 is constructed by the copy constructor, the code to copy the
levelNames is executed, which proceeds to copy the data
c1.levelNames[0] .. c1.levelNames[c1.numLevels - 1]. But c1.levelNames
wasn't initialized, so it could refer to random memory. Moreover,
c1.numLevels wasn't initialized either, so it could be anything as
well. Thus, the code could be accessing invalid memory.

Here is the code to the copy constructor:

ColDatum::ColDatum(const ColDatum& datum) {
    // Need deep copy so contruction/destruction is synchronized.
    s = datum.s;
    x = datum.x;
    i = datum.i;
    type = datum.type;
    level = datum.level;
    numLevels = datum.numLevels;
    d = datum.d;
    if (type == COLTYPE_FACTOR) {
	levelNames = new std::string[numLevels];
	for (int j = 0; j < numLevels; j++)
	    levelNames[j] = datum.levelNames[j];
    }
}

Note also the destructor. When c1's destructor is invoked, and c1.type
== COLTYPE_FACTOR, the code will attempt to delete [] levelNames but
that field was never initialized, so the code would attempt to
incorrectly deallocate memory:

ColDatum::~ColDatum() {
    if (type == COLTYPE_FACTOR) {
	// For this to work we need a deep copy when type == COLTYPE_FACTOR.
	// See the copy constructor below. It is wasteful to have
	// evey factor cell own a separate copy of levelNames, but we leave
	// the task of factoring it out (using reference counts) for
	// later.
	delete [] levelNames;
    }
}

The patch fixes both problems by initializing c1.type to
COLTYPE_UNKNOWN (which I added to the enumeration).


More information about the Rcpp-devel mailing list