[datatable-help] Ran into infinite loop when setting key/sorting columns with NA

Matthew Dowle mdowle at mdowle.plus.com
Thu Jul 29 02:44:57 CEST 2010


Its not usual to have NAs in key columns but they shouldn't cause
problems either. Do you actually have data in non-key columns on rows
where there are NA in the key columns? That seems odd but possible I
suppose. If all non-key columns are NA for those rows, we normally
remove the rows.

We'll need exact error messages and code and data to investigate
further, at least I will unless anyone else has seen this before. If you
can't reproduce then thats ok just post what you can.

There is an infinite loop problem when a data.table is created in v1.4
and saved, then used with v1.5. Make sure the class of the data.table is
c("data.table","data.frame"), if its just "data.table" in v1.5 then a
loop can sometimes occur. But that isn't related to NA in the key afaik.

Thanks.

On Wed, 2010-07-28 at 17:58 -0500, Branson Owen wrote:
> ** Should I avoid using NA in data.table forever? Any comments is
> highly appreciated. **
> 
> I am sorry that I can't present a code that reproduced the bug. I was
> working with a lot of mid-size data, each is 100K rows, 10+ columns
> with 5 columns set as key.
> 
> When I use for loop to do calculation and then set key, only 5% chance
> I will get the following bug. Don't really know why?
> 
> I was using many NAs in two of the key factor columns. When I
> transformed the data.table and try to set the same key again, it
> reacted as follow:
> 
> Version 1.4 in 64-bit R on windows: (seems to?) ran into infinite
> loop. Can't break it manually. Keep consuming CPU as observed from
> task manager.
> Version 1.5 in 32-bit R on windows: throw an error message shortly
> saying that ?(not sure the exact message) "sorting ran into infinite
> loop/iteration?"
> 
> However, 32-bit data is using the image file saved by 64-bit R.
> Therefore, I am not sure whether the above message is valid for this
> bug?
> 
> It looks like that the bug has been noticed, but can't solve yet? I
> also encountered many other problems, but the silent infinite freezing
> always come from setkey/key().
> 
> It shouldn't run into infinite loop because when I assign all NA to a
> blank string/factor value "". Setting key and sorting work again.
> 
> Currently, I reset all my data to avoid NA when using data.table
> (painful). That's why I can't reproduce the bug. I tried to fake the
> data but didn't work.
> 
> Didn't see this issue been discussed so I report to everyone.
> 
> Should I avoid using NA in data.table forever? Any comments is highly
> appreciated.
> 
> Best regards,
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list