[datatable-help] Ran into infinite loop when setting key/sorting columns with NA

Branson Owen branson.owen at gmail.com
Thu Jul 29 00:58:18 CEST 2010


** Should I avoid using NA in data.table forever? Any comments is
highly appreciated. **

I am sorry that I can't present a code that reproduced the bug. I was
working with a lot of mid-size data, each is 100K rows, 10+ columns
with 5 columns set as key.

When I use for loop to do calculation and then set key, only 5% chance
I will get the following bug. Don't really know why?

I was using many NAs in two of the key factor columns. When I
transformed the data.table and try to set the same key again, it
reacted as follow:

Version 1.4 in 64-bit R on windows: (seems to?) ran into infinite
loop. Can't break it manually. Keep consuming CPU as observed from
task manager.
Version 1.5 in 32-bit R on windows: throw an error message shortly
saying that ?(not sure the exact message) "sorting ran into infinite
loop/iteration?"

However, 32-bit data is using the image file saved by 64-bit R.
Therefore, I am not sure whether the above message is valid for this
bug?

It looks like that the bug has been noticed, but can't solve yet? I
also encountered many other problems, but the silent infinite freezing
always come from setkey/key().

It shouldn't run into infinite loop because when I assign all NA to a
blank string/factor value "". Setting key and sorting work again.

Currently, I reset all my data to avoid NA when using data.table
(painful). That's why I can't reproduce the bug. I tried to fake the
data but didn't work.

Didn't see this issue been discussed so I report to everyone.

Should I avoid using NA in data.table forever? Any comments is highly
appreciated.

Best regards,


More information about the datatable-help mailing list