[datatable-help] Error that I don't think should be an error

Matthew Dowle mdowle at mdowle.plus.com
Thu Oct 27 19:29:40 CEST 2011


> On 27 October 2011 12:59, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>> But user intended to delete a column that isn't there. Why should it do
>> nothing, silently?  A warning at least.  If I typed "y" by mistake and
>> meant another column that was there, I'd want to know about it straight
>> way. Otherwise it could cause problems later on, possibly silently.
>> That's
>> the most common situation we want to catch, I think.
>
> When developing code I've never once had this problem.

Maybe you did, but didn't know ;)

> When exactly
> does not deleting a column you should have deleted run into errors?

apply(DT,1,sum)
mean(lapply(.SD,sum))

things like that, silently.

>
>> The use-case "if
>> column is there, delete it, else don't" is much rarer I would have
>> thought. The rarer cases are the ones the user can code explicitly. No?
>
>
> Well the times I run into this are when I want to take a data table,
> calculate something using a by aggregation, and remerge the aggregated
> variables into the original data table.
> If the original data.table may already have the aggregated variable
> name as a column, I want to delete it before merging in order to not
> conflict with names. So I do something like DT[, w:=NULL] just to
> double check.
>
> This is a slightly roundabout way of saying that once := with by gets
> implemented, I would probably never run into this.
> But I've never had
> a mistype
Ok. I should have added computing on the language and calculated column
names. Logic that works out which column to delete but gets it wrong. In
data.table that would be  DT[,columniexpecttoexist:=NULL,with=FALSE]

> so I don't know the rarity of the two cases.  I usually
> thought though without serious compelling reasons that data.table
> should behave like data.frame where at all possible,

Yes but a little bit better is good enough to be different I think
sometimes (e.g. the addition of warning messages), unless it really is
confusing (and users say so and ask for it to be changed). Full
compatibility with other packages is maintained by the datatable-aware
concept that switches to base for them.  The differences are usually in
one direction: being more strict, safer and less silent. It requires more
effort up front but results in more robust code, I hope.

In this case, it should be a warning not error, I'm thinking.

>  and I don't know
> if it is a compelling enough reason to deviate.
>
>>
>>> Pretty simple. I think assigning NULL to a column that doesn't exist
>>> should just work as a no-op instead of failing with an error. No-op is
>>> consistent with data.frame:
>>>
>>>> dt=data.table(x=1:10)
>>>> dt$y=NULL
>>> Error in `[<-.data.table`(x, j = name, value = value) :
>>>   RHS is NULL, meaning delete column(s). But, at least one column is
>>> not present.
>>>> dt[,y:=NULL]
>>> Error in `[.data.table`(dt, , `:=`(y, NULL)) :
>>>   RHS is NULL, meaning delete column(s). But, at least one column is
>>> not present.
>>>> df=data.frame(x=1:10)
>>>> df$y=NULL
>>>
>>> If agreed I will file a bug report.
>>>
>>> -Chris
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>
>>
>>
>




More information about the datatable-help mailing list