[datatable-help] Ran into infinite loop when setting key/sorting columns with NA
Short, Tom
TShort at epri.com
Thu Jul 29 03:47:19 CEST 2010
I haven't run into NA problems, but I did run into the v1.4 data table
used in v1.5.
- Tom
> -----Original Message-----
> From: datatable-help-bounces at lists.r-forge.r-project.org
> [mailto:datatable-help-bounces at lists.r-forge.r-project.org]
> On Behalf Of Matthew Dowle
> Sent: Wednesday, July 28, 2010 20:45
> To: Branson Owen
> Cc: datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] Ran into infinite loop when
> setting key/sorting columns with NA
>
> Its not usual to have NAs in key columns but they shouldn't
> cause problems either. Do you actually have data in non-key
> columns on rows where there are NA in the key columns? That
> seems odd but possible I suppose. If all non-key columns are
> NA for those rows, we normally remove the rows.
>
> We'll need exact error messages and code and data to
> investigate further, at least I will unless anyone else has
> seen this before. If you can't reproduce then thats ok just
> post what you can.
>
> There is an infinite loop problem when a data.table is
> created in v1.4 and saved, then used with v1.5. Make sure the
> class of the data.table is c("data.table","data.frame"), if
> its just "data.table" in v1.5 then a loop can sometimes
> occur. But that isn't related to NA in the key afaik.
>
> Thanks.
>
> On Wed, 2010-07-28 at 17:58 -0500, Branson Owen wrote:
> > ** Should I avoid using NA in data.table forever? Any comments is
> > highly appreciated. **
> >
> > I am sorry that I can't present a code that reproduced the
> bug. I was
> > working with a lot of mid-size data, each is 100K rows, 10+ columns
> > with 5 columns set as key.
> >
> > When I use for loop to do calculation and then set key,
> only 5% chance
> > I will get the following bug. Don't really know why?
> >
> > I was using many NAs in two of the key factor columns. When I
> > transformed the data.table and try to set the same key again, it
> > reacted as follow:
> >
> > Version 1.4 in 64-bit R on windows: (seems to?) ran into infinite
> > loop. Can't break it manually. Keep consuming CPU as observed from
> > task manager.
> > Version 1.5 in 32-bit R on windows: throw an error message shortly
> > saying that ?(not sure the exact message) "sorting ran into
> infinite
> > loop/iteration?"
> >
> > However, 32-bit data is using the image file saved by 64-bit R.
> > Therefore, I am not sure whether the above message is valid
> for this
> > bug?
> >
> > It looks like that the bug has been noticed, but can't solve yet? I
> > also encountered many other problems, but the silent
> infinite freezing
> > always come from setkey/key().
> >
> > It shouldn't run into infinite loop because when I assign
> all NA to a
> > blank string/factor value "". Setting key and sorting work again.
> >
> > Currently, I reset all my data to avoid NA when using data.table
> > (painful). That's why I can't reproduce the bug. I tried to
> fake the
> > data but didn't work.
> >
> > Didn't see this issue been discussed so I report to everyone.
> >
> > Should I avoid using NA in data.table forever? Any comments
> is highly
> > appreciated.
> >
> > Best regards,
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable
> > -help
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d
atatable-help
>
More information about the datatable-help
mailing list