[datatable-help] Error in row filtering

Harish harishv_99 at yahoo.com
Tue Oct 14 14:28:08 CEST 2014


My sent-mail seems to show only a truncated version of my original request.  So let me summarize whatever got truncated.

My suspicion is that there is some issue with an optimization used when there is an integer comparison and that optimization is  being turned off when the logic is more complex.
It would be great if someone can help me understand what the root cause is so I can check where else this could be happening in my code.  My fear is that I do not know what other numbers I am getting might be incorrect.
Thanks a lot for your help.
Regards,Harish
 

     On Tuesday, October 14, 2014 5:13 AM, Harish <harishv_99 at yahoo.com> wrote:
   

 I have a very strange row-filtering issue in front of me that I can only reproduce on a very large data set.  Let me start off by giving you the end symptoms and then I will talk through some  hacks which will avoid the bug.

I have two fields of interest -- pred_bad_t_f and weight.- pred_bad_t_f is of class "integer" with two unique values, 0 and 1- weight is of class "numeric"
> dt[pred_bad_t_f == 1, sum(weight)]
[1] 6580818130
> dt[pred_bad_t_f == 1L, sum(weight)]
[1] 5414941720
As you can see, there is no reason for the second value to be any different.  I believe the first value is correct because slight changes to the filtering logic generates that value repeatedly.  Below are some examples:

> dt[1:nrow( dt)][pred_bad_t_f == 1L, sum(weight)]
[1] 6580818130> dt[TRUE & pred_bad_t_f == 1L, sum(weight)]
[1] 6580818130
s


   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20141014/3ac8a064/attachment.html>


More information about the datatable-help mailing list