[datatable-help] Bug when Merging with nomatch=0 and roll=T?

Michael Smith my.r.help at gmail.com
Fri Jun 20 13:30:05 CEST 2014


OK, no problem, here's the code. If there are any problems pasting it
into R let me know (I used parts of dput, so maybe the email line
endings are messed up). If you want I can also file a bug report on
github, just let me know.

CS <-
  data.table(
    structure(list(LPERMCO = c(7L, 33L), datadate = structure(c(15912,
15912), class = "Date"), me = c(626550.35284, 7766.385)), .Names =
c("LPERMCO",
"datadate", "me"), class = "data.frame", row.names = c(NA, -2L
)),
    key = "LPERMCO,datadate")
SP <-
  data.table(
    structure(list(PERMCO = c(7L, 7L, 33L, 33L, 33L, 33L), date =
structure(c(15884,
15917, 15884, 15884, 15917, 15917), class = "Date"), RET = c(-0.118303,
0.141225, -0.03137, -0.02533, 0.045967, 0.043694)), .Names = c("PERMCO",
"date", "RET"), class = "data.frame", row.names = c(NA, -6L)),
    key = "PERMCO,date")
sapply(CS[SP, nomatch = 0, roll = T], length)


The relevant output looks like this, both in 1.9.2 and in dev-1.9.3, and
for sapply, the "me" column should be 5 but it's 3:

> CS
   LPERMCO   datadate         me
1:       7 2013-07-26 626550.353
2:      33 2013-07-26   7766.385
> SP
   PERMCO       date       RET
1:      7 2013-06-28 -0.118303
2:      7 2013-07-31  0.141225
3:     33 2013-06-28 -0.031370
4:     33 2013-06-28 -0.025330
5:     33 2013-07-31  0.045967
6:     33 2013-07-31  0.043694
> CS[SP, nomatch = 0, roll = T]
   LPERMCO   datadate         me       RET
1:       7 2013-07-31 626550.353  0.141225
2:      33 2013-06-28   7766.385 -0.031370
3:      33 2013-06-28   7766.385 -0.025330
4:      33 2013-07-31 626550.353  0.045967
5:      33 2013-07-31   7766.385  0.043694
Warning message:
In cbind(LPERMCO = c(" 7", "33", "33", "33", "33"), datadate =
c("2013-07-31",  :
  number of rows of result is not a multiple of vector length (arg 3)
> sapply(CS[SP, nomatch = 0, roll = T], length)
 LPERMCO datadate       me      RET
       5        5        3        5


Thanks,
M





On 06/20/2014 05:17 PM, Arunkumar Srinivasan wrote:
>> For a given data.table, is there any condition …  Ergo, it's a bug,
>> right? 
> 
> Yes.
> 
>> I'll be glad 
>> to try to boil this down to something that's reproducible. 
> 
> That'd be great.
> 
> 
> Arun
> 
> From: Michael Smith my.r.help at gmail.com <mailto:my.r.help at gmail.com>
> Reply: Michael Smith my.r.help at gmail.com <mailto:my.r.help at gmail.com>
> Date: June 20, 2014 at 5:37:24 AM
> To: datatable-help at lists.r-forge.r-project.org
> datatable-help at lists.r-forge.r-project.org
> <mailto:datatable-help at lists.r-forge.r-project.org>
> Subject: Re: [datatable-help] Bug when Merging with nomatch=0 and roll=T?
> 
>> So let me rephrase my question (haven't received an answer so far):
>>
>> For a given data.table, is there any condition under which the lengths
>> of the vectors in each column may differ? Based on my understanding,
>> each data.table is also a data.frame, and with a data frame this should
>> not be possible. For example, it's not possible to have a data.frame
>> where the first column is a vector of length eight, and the second
>> column is a vector of length nine. Ergo, it's a bug, right?
>>
>> If my understanding is correct, please do let me know and I'll be glad
>> to try to boil this down to something that's reproducible.
>>
>> Thanks,
>> M
>>
>> On 06/19/2014 11:59 AM, Michael Smith wrote:
>> > By the way, I know it's not reproducible with the code below. Before
>> > going into further detail, I first wanted to ask whether this looks like
>> > a bug, or whether I've overlooked something obvious and this is expected
>> > behavior.
>> >  
>> > Thanks,
>> > M
>> >  
>> > On 06/19/2014 11:51 AM, Michael Smith wrote:
>> >> I got the following result on my keyed data tables `CS` and `SP`, which
>> >> seems like a bug (in 1.9.2 and 1.9.3 dev version) to me, since all
>> >> columns should have the _same_ length:
>> >>
>> >>> ## Works as expected:
>> >>> all((l <- sapply(CS[SP, roll = TRUE], length)) == l[1])
>> >> [1] TRUE
>> >>> ## Works as expected:
>> >>> all((l <- sapply(CS[SP, nomatch = 0], length)) == l[1])
>> >> [1] TRUE
>> >>> ## Here's the potential _bug_, when combining both:
>> >>> all((l <- sapply(CS[SP, nomatch = 0, roll = TRUE], length)) == l[1])
>> >> [1] FALSE
>> >>
>> >>
>> >> Thanks,
>> >>
>> >> M
>> >>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>


More information about the datatable-help mailing list