[datatable-help] data.table segfaulting, need help verifying the reason

Matthew Dowle mdowle at mdowle.plus.com
Tue Sep 10 20:02:33 CEST 2013


Nothing springs to mind.  Latest version v1.8.10 from CRAN right? Or 
v1.8.11 on R-Forge?

On this bit :
 > So somewhere these key columns think they are different lengths than 
they really are, and
 > when I try to access it I go into memory I shouldn't so I segfault. 
  How can I verify this? Is
 > there something about the DT I can check to see what DT thinks these 
columns are?

.Internal(inspect(DT)) reveals the internal structure including length 
and truelength on the column pointer vector as well as each column.

But it's a really odd way of using data.table.  Iterating by row is 
going to kill performance;  data.table likes by column.

If it really has to be by row  then   DT[, fun(.SD,...), by=1:nrow(DT)]  
should be better than apply().

Matthew

On 10/09/13 18:47, Chris Neff wrote:
> Narrowing it down further,
>
> a$x
>
> segfaults and
>
> a[,x]
>
> segfaults but
>
> a[,"x", with=FALSE]
>
> doesn't.
>
>
> On Tue, Sep 10, 2013 at 1:32 PM, Chris Neff <caneff at gmail.com 
> <mailto:caneff at gmail.com>> wrote:
>
>     I'm pretty sure it is some issue of a column that thinks it is
>     bigger than it actually is.  I have tried, so far in vain, to make
>     a reproducible example that I can share.  I have one, but can't
>     share it.
>
>     What happens is this:
>
>     A data.frame is made:
>
>     > d = data.frame(...)
>
>     Then I call apply over every row, calling a different function
>     that takes in a DT as well:
>
>     l = apply(d, 1, function(x) func(x[1], x[2], DT))
>
>     This returns a data.frame.  If I rbindlist this:
>
>     a = rbindlist(l)
>
>     I can print a just fine, and it will show me all data like normal.
>     but if I try to just do
>
>     a$x
>
>     x is one of the columns that was a key in DT, then it segfaults.
>      If I ask for a column that was made by "func" and wasn't a column
>     in DT, it works fine.  If I ask for only the first 10 rows and
>     then ask for x:
>
>     a[1:10]$x
>
>     it works fine.
>
>     So somewhere these key columns think they are different lengths
>     than they really are, and when I try to access it I go into memory
>     I shouldn't so I segfault.  How can I verify this? Is there
>     something about the DT I can check to see what DT thinks these
>     columns are?
>
>
>     Also, if instead of apply when making the list, I do
>
>     l = lapply(1:nrow(d), function(i) func(x[i,1],x[i,2],DT))
>
>     and rbindlist that, it works fine too.
>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130910/96531771/attachment-0001.html>


More information about the datatable-help mailing list