[datatable-help] Discrepancy between as.data.frame & as.data.table when handling nested lists

Ricardo Saporta saporta at scarletmail.rutgers.edu
Thu Aug 8 22:26:20 CEST 2013


Matthew,

<<Rick - are you asking for use cases of list columns full stop or use
cases of converting nested lists to data.table containing list columns ?>>

The Latter.   Specifically, in the last example you give
   > L = list(1,2,3,list("a",4L,3:10))    # the one nested list here
creates one list column
   > as.data.table(L)
     V1 V2 V3           V4
  1:  1  2  3            a
  2:  1  2  3            4
  3:  1  2  3 3,4,5,6,7,8,

In the example above, we are essentially creating a tuple between each
element in the sublist (what becomes V4) and all other elements in L.

I am curious as to what this might reflect in real life?  I often encounter
the opposite need, for example, the sublist being a group of sub-properties.

My intuition is that is that sublist is clearly grouped and so should most
probably be treated as such.  If anything, I would think of expanding the
sublist horizontally into additional columns.

And if I understand correctly, when the number of elements in the sublist
is the same as the length of all the other elements of the main list, then
the sublist's groups are preserved, correct?
eg:

    L2 = list( 1:2 , 11:12, 31:32, list(c("a", "b", "c"), c("q","r","s")) )
    as.data.table(L2)
#    V1 V2 V3    V4
# 1:  1 11 31 a,b,c
# 2:  2 12 32 q,r,s


My curiosity about use cases is mostly for learnings sake


Cheers,
Rick



On Thu, Aug 8, 2013 at 12:38 AM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:

>
> Agreed, intentional.
>
> > L = list(1,2,3)
> > as.data.table(L)
>    V1 V2 V3             #  3 columns, not one list column
> 1:  1  2  3
>
> > L = list(1,2,3,list("a",4L,3:10))    # the one nested list here creates
> one list column
> > as.data.table(L)
>    V1 V2 V3           V4
> 1:  1  2  3            a
> 2:  1  2  3            4
> 3:  1  2  3 3,4,5,6,7,8,
>
> Rick - are you asking for use cases of list columns full stop or use cases
> of converting nested lists to data.table containing list columns ?
>
>
>
> On 08/08/13 04:30, Ricardo Saporta wrote:
>
> Hey Frank,
>
>  Thanks for pointing out that SO link, I had missed it.
>
>  All,
>
>  I'm curious as to which used cases this functionality would be used in
> (used for?)
>
>  thanks,
> Rick
>
>
>
> On Wed, Aug 7, 2013 at 8:14 PM, Frank Erickson <FErickson at psu.edu> wrote:
>
>> Hi Rick,
>>
>>  I guess it's intentional: Matthew saw this SO question (since he edited
>> one of the answers):
>> http://stackoverflow.com/questions/9547518/creating-a-data-frame-where-a-column-is-a-list
>>
>>  Some musings: Of course, to reproduce as.data.frame-like behavior, you
>> can un-nest the list, so both functions treat it the same way.
>>
>>  Z <- unlist(Y,recursive=FALSE)
>>
>>  identical(as.data.table(Z),as.data.table(as.data.frame(Z))) # TRUE
>>  # or, equivalently (?)
>>  identical(do.call(data.table,Z),data.table(do.call(data.frame,Z))) #
>> TRUE
>>
>>
>>  On the other hand, going back the other direction (getting
>> data.table-like behavior when data.frame's is the default) is more awkward,
>> as seen in that SO question (where they mention protecting each sublist
>> with the I() function). Besides, I'm with @flodel, who asked the SO
>> question, in expecting data.table's behavior: one top-level item in the
>> list mapping to one column in the result...
>>
>>  --Frank
>>
>>  On Wed, Aug 7, 2013 at 4:56 PM, Ricardo Saporta <
>> saporta at scarletmail.rutgers.edu> wrote:
>>
>>>   Hi all,
>>>
>>>  Note the following discrepancy in structure between as.data.frame &
>>> as.data.table when called on a nested list.
>>> as.data.frame converts the sublist into individual columns whereas
>>> as.data.table stacks them into a single column and creates additional rows.
>>>
>>>  Is this intentional?
>>> -Rick
>>>
>>>
>>>  as.data.frame(X)
>>> #        start       type      end data.editDist data.second
>>> # 1 start_node is_similar end_node             1  HelloWorld
>>>
>>>  as.data.table(X)
>>> #         start       type      end       data
>>> # 1: start_node is_similar end_node          1
>>> # 2: start_node is_similar end_node HelloWorld
>>>
>>>
>>>
>>>
>>>  ### Copy+Paste'able Below ###
>>>
>>>  # Example 1:
>>> X <-  structure(list(start = "start_node", type = "is_similar", end =
>>> "end_node",
>>>     data = structure(list(editDist = 1, second = "HelloWorld"), .Names =
>>> c("editDist",
>>>     "second"))), .Names = c("start", "type", "end", "data"))
>>>
>>>  as.data.frame(X)
>>>  as.data.table(X)
>>>
>>>  as.data.table(as.data.frame(X))
>>>
>>>
>>>  # Example 2, with more elements:
>>> Y <- structure(list(start = c("start_node", "start_node"), type =
>>> c("is_similar", "is_similar"), end = c("end_node", "end_node"), data =
>>> structure(list(editDist = c(1, 1), second = c("HelloWorld", "HelloWorld")),
>>> .Names = c("editDist", "second"))), .Names = c("start", "type", "end",
>>> "data"))
>>>
>>>  as.data.frame(Y)
>>> as.data.table(Y)
>>>
>>>
>>>  _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>
>>
>
>
> _______________________________________________
> datatable-help mailing listdatatable-help at lists.r-forge.r-project.orghttps://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130808/e40247be/attachment.html>


More information about the datatable-help mailing list