[datatable-help] Bug report #5100 reg.

Frank Erickson FErickson at psu.edu
Thu Nov 14 19:45:52 CET 2013


For what it's worth, I use the with=FALSE version frequently without
knowing how many columns I have selected, so I like the implicit wrapping
of the columns in a list() (or implicit drop=FALSE). An example (almost)
from something I did yesterday:

mycols <- grep("^Vbar",names(DT),value=TRUE)
DT1 <- DT[,mycols,with=FALSE]

-- Frank


On Thu, Nov 14, 2013 at 11:59 AM, Eduard Antonyan <eduard.antonyan at gmail.com
> wrote:

> Perhaps a simple sentence along the lines of "drop argument is absent and
> should be considered as FALSE when comparing with data.frame in with=FALSE
> mode" would suffice. The fact that i-expression is a full-on data.table
> i-expression in with=FALSE mode will probably also cause inconsistencies.
>
>
> On Thu, Nov 14, 2013 at 10:47 AM, Arunkumar Srinivasan <
> aragorn168b at gmail.com> wrote:
>
>>  I'll try to make a list of places where data.table != data.frame
>> operation.
>>
>> Arun
>>
>> On Thursday, November 14, 2013 at 5:46 PM, Arunkumar Srinivasan wrote:
>>
>>  Glad that we agree on better-ing the documentation. However, I don't
>> find it a sound argument that we deviate from data.frame because the design
>> is bad, *when we inherit from data.frame*. The choice is already made! Too
>> many such trivial inconsistencies piles up pretty quickly and could
>> potentially result in a steep learning curve - as there are different set
>> of rules to be memorised.
>>
>> Tackling the point of "inheriting from data.frame", *but* this, this,
>> this.. and many other things are different, if can't be avoided, should be
>> *very clearly* documented (in the beginning, maybe as a cheat sheet) so
>> that people aren't confused.
>>
>>
>> Arun
>>
>> On Thursday, November 14, 2013 at 5:39 PM, Eduard Antonyan wrote:
>>
>> I agree that it's inconsistent with data.frame, and imo that's a good
>> thing. We don't replicate the drop argument, so it wouldn't be possible to
>> return a data.table when with=FALSE and either way drop=TRUE by default is
>> a bad design choice in data.frame and matrix (that is unlikely to change
>> given R-core's attitude towards that type of a thing).
>>
>> I'm always pro more and better documentation :)
>>
>>
>> On Thu, Nov 14, 2013 at 10:33 AM, Arunkumar Srinivasan <
>> aragorn168b at gmail.com> wrote:
>>
>>  Eddi, At the least, I think the documentation needs to be clearer on the
>> use of "with=FALSE". It does feel inconsistent with the fact that "j" with
>> a single column should return a vector. In data.frames, the type in "j"
>> being column names, if it's just one column name, would return a vector,
>> unless drop = FALSE. That is, DF[, "y"] will return a vector while DF[,
>> c("x", "y")] will return a data.frame. So, it is inconsistent with
>> data.frame here, I think.
>>
>>
>> Arun
>>
>> On Thursday, November 14, 2013 at 5:25 PM, Eduard Antonyan wrote:
>>
>> DT[, y] returning a vector is I think the only correct behavior, given
>> the understanding of j-expression as something evaluated in the DT
>> environment. If they want a data.table they should simply use DT[, list(y)]
>> or DT[, data.table(y)].
>>
>> I haven't thought about DT[, "y", with = FALSE] before as I pretty much
>> never use that form, but I see an argument for it staying as is, because
>> "y" and c("y") are the same and since we all presumably agree that DT[,
>> c("y", "z"), with = FALSE] should return a data.table. If DT[, c("y"), with
>> = FALSE] returned a different type that would mean inconsistent return
>> types which makes life much harder for users (as evidenced by the periodic
>> drop=FALSE questions that come up on SO).
>>
>> Going back to DT[, y], note that y and list(y) actually produce
>> *different* results (in e.g. base_env), so there is no type consistency
>> issue there between DT[, y] and DT[, list(y, z)].
>>
>>
>> On Thu, Nov 14, 2013 at 6:09 AM, Arunkumar Srinivasan <
>> aragorn168b at gmail.com> wrote:
>>
>>  Hi everybody,
>>
>> It'd be nice if you could weigh-in on the bug report filed by Bill here:
>>
>> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=5100&group_id=240&atid=975
>>
>> The gist of it is:
>>
>> require(data.table)
>> DT <- data.table(x=1:5, y=6:10, z=11:15)
>> DT[, y] # returns a vector
>> DT[, "y", with=FALSE] # returns a data.table
>>
>> The question from the bug report basically is: "why is that in the first
>> case, 'j' has only one column and we get a vector, but in the second case,
>> we get a data.table?"
>>
>> My question is: Is this behaviour okay or do you prefer that the first
>> one returns a data.table as well or the second one (with "with=FALSE")
>> returns a vector?
>>
>> Thank you,
>> Arun
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>>
>>
>>
>>
>>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131114/0ad5823b/attachment.html>


More information about the datatable-help mailing list