[datatable-help] Ran into infinite loop when setting key/sorting columns with NA

Short, Tom TShort at epri.com
Thu Jul 29 03:47:19 CEST 2010


I haven't run into NA problems, but I did run into the v1.4 data table
used in v1.5. 

- Tom

 

> -----Original Message-----
> From: datatable-help-bounces at lists.r-forge.r-project.org 
> [mailto:datatable-help-bounces at lists.r-forge.r-project.org] 
> On Behalf Of Matthew Dowle
> Sent: Wednesday, July 28, 2010 20:45
> To: Branson Owen
> Cc: datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] Ran into infinite loop when 
> setting key/sorting columns with NA
> 
> Its not usual to have NAs in key columns but they shouldn't 
> cause problems either. Do you actually have data in non-key 
> columns on rows where there are NA in the key columns? That 
> seems odd but possible I suppose. If all non-key columns are 
> NA for those rows, we normally remove the rows.
> 
> We'll need exact error messages and code and data to 
> investigate further, at least I will unless anyone else has 
> seen this before. If you can't reproduce then thats ok just 
> post what you can.
> 
> There is an infinite loop problem when a data.table is 
> created in v1.4 and saved, then used with v1.5. Make sure the 
> class of the data.table is c("data.table","data.frame"), if 
> its just "data.table" in v1.5 then a loop can sometimes 
> occur. But that isn't related to NA in the key afaik.
> 
> Thanks.
> 
> On Wed, 2010-07-28 at 17:58 -0500, Branson Owen wrote:
> > ** Should I avoid using NA in data.table forever? Any comments is 
> > highly appreciated. **
> > 
> > I am sorry that I can't present a code that reproduced the 
> bug. I was 
> > working with a lot of mid-size data, each is 100K rows, 10+ columns 
> > with 5 columns set as key.
> > 
> > When I use for loop to do calculation and then set key, 
> only 5% chance 
> > I will get the following bug. Don't really know why?
> > 
> > I was using many NAs in two of the key factor columns. When I 
> > transformed the data.table and try to set the same key again, it 
> > reacted as follow:
> > 
> > Version 1.4 in 64-bit R on windows: (seems to?) ran into infinite 
> > loop. Can't break it manually. Keep consuming CPU as observed from 
> > task manager.
> > Version 1.5 in 32-bit R on windows: throw an error message shortly 
> > saying that ?(not sure the exact message) "sorting ran into 
> infinite 
> > loop/iteration?"
> > 
> > However, 32-bit data is using the image file saved by 64-bit R.
> > Therefore, I am not sure whether the above message is valid 
> for this 
> > bug?
> > 
> > It looks like that the bug has been noticed, but can't solve yet? I 
> > also encountered many other problems, but the silent 
> infinite freezing 
> > always come from setkey/key().
> > 
> > It shouldn't run into infinite loop because when I assign 
> all NA to a 
> > blank string/factor value "". Setting key and sorting work again.
> > 
> > Currently, I reset all my data to avoid NA when using data.table 
> > (painful). That's why I can't reproduce the bug. I tried to 
> fake the 
> > data but didn't work.
> > 
> > Didn't see this issue been discussed so I report to everyone.
> > 
> > Should I avoid using NA in data.table forever? Any comments 
> is highly 
> > appreciated.
> > 
> > Best regards,
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> > 
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable
> > -help
> 
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d
atatable-help
> 


More information about the datatable-help mailing list