[datatable-help] add a column specifying new column name via a variable

Matthew Dowle mdowle at mdowle.plus.com
Thu Jul 26 22:20:57 CEST 2012


There was a clue in the error message :

>> Error in `[.data.table`(dt, , `:=`(quote(new_col_name), NA)) :
>>   LHS of := must be a single column name when with=TRUE. When with=FALSE
>> the LHS may be a vector of column names or positions.

Trying with=FALSE :

    DT = data.table(a=1:3,b=4:6)
    newcolname = "FOO"
    DT[,newcolname:=NA,with=FALSE]
       a b FOO
    1: 1 4  NA
    2: 2 5  NA
    3: 3 6  NA

But I'm thinking that wrapping the LHS with eval() or c() [the things
George tried] should have worked too, and are more natural given that's
what we do elsewhere. I seem to remember either some TO DO in the source
or a feature request to improve that. Will take another look.

Trying set() gives error as Chris said :

    newcolname = "BAR"
    set(DT, j=newcolname, value=NA)
    Error in set(DT, j = newcolname, value = NA) :
      'BAR' is not a column name. Cannot add columns with set(), use :=
    instead to add columns by reference.

This is already FR#2077 "Improve set()'s messages to explain why it
doesn't add new columns" :
https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2077&group_id=240&atid=978

The thinking there was that set()'s purpose is for very fast updating
single cells by reference, suitable inside a loop in the rare situations a
loop is needed. To branch if the column didn't exist, and add it, would
take time to branch.  That was when character column names weren't
acceptable to set() either, though;  for speed, to save looking up the
same column name over and over.  Integer i (already warns if not) and
integer j (should warn if not) should be much faster due to the avoidance
of small allocations to coerce.

So the short answer is that set() could add new columns, but need to make
sure not at the expense of speed in the integer cases,  and with suitable
warnings to encourage use of integer j, in set() only, for speed inside
loops.  I'm probably over egging the problem and it would be simple to
achieve that, but those were the worries anyway.


> Hmmm I would have liked to see
>
> set(dt, j=new_col_name, value=NA)
>
> work but that doesn't. Any reason why necessarily?
>
> On Wed, Jul 25, 2012 at 2:03 PM, Kaupas, George
> <George.Kaupas at spansion.com> wrote:
>> I'm trying to add empty columns to data.tables using a variable
>> containing the name of the desired column, but I'm unable to figure out
>> how to dereference the variable value to satisfy the := operator.
>>
>> Here's a simple example:
>>
>> require(data.table);
>> dt <- data.table(read.table(text="N1 N2\nA B\nC D\n", header=TRUE));
>> new_col_name = "N3";
>> dt[, new_col_name := NA];
>>
>> That creates a column literally named "new_col_name", rather than "N3"
>> as desired.
>>
>> I can work around it this way:
>>
>> dt[, workaround := NA];
>> setnames(dt, "workaround", new_col_name);
>>
>> I have tried wrapping the new_col_name variable in all sorts of
>> functions such as eval(), c(), list(), quote(), etc; all of these
>> generate an error such as:
>>
>> Error in `[.data.table`(dt, , `:=`(quote(new_col_name), NA)) :
>>   LHS of := must be a single column name when with=TRUE. When with=FALSE
>> the LHS may be a vector of column names or positions.
>>
>> Surely I am overlooking something trivial; please advise.
>>
>> Thanks
>> George
>> http://stackoverflow.com/users/1313052/gkaupas
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>




More information about the datatable-help mailing list