[datatable-help] Unexpected behavior with mult="all"

Matthew Dowle mdowle at mdowle.plus.com
Sat Jul 31 16:25:51 CEST 2010


This is how I think about it currently :

[1] The syntax of "x[y,d]" plus knowing how mult's default value is set
('first' in this case) means that a vector as long as the number of rows
in y is the result so data.table does the least work it can and returns
just the vector without adding in the data already in y. Changing mult
to "all" however means you'll usually get a varying number of items back
for each row in y, so data.table includes the y columns as a convenience
since if it didn't the result would be difficult to use (you wouldn't
know the correspondence). data.table tries to do the minimum, most
efficient thing. If you want to be less efficient (e.g. adding columns
you already know) then it's for the user to add them back. This is sort
of a principle.

[2] nomatch is by default NA so this is the same as [1]. Is that any
chance a typo and you meant nomatch=0 ?  If so then you might have a
point and perhaps something needs changing there.

The other way I think about mult='all' is grouping. The documentation
sometimes mentions 'by without by', or I might be recalling emails or
posts. Remember mult='all' gets automatically set to 'all' when you
match to not all of the columns of x's key. When mult='all' I think to
myself "for each row of y fetch me all the rows from x that match and
eval j for that group, then move on to the next row in y".  Its kind of
like a data specific 'by'. Once you realise mult='all' is like a 'by'
remember that 'by' automatically adds in the 'by' columns to the result.
Hence mult='all' behaves more like a 'by' with respect to returning
data.table rather than vector.

Example :

  X = data.table(x=1:3, y=1:4, z=rnorm(12), key="x,y")
  Y = data.table(x=1:3) 
  X[Y,sum(z)] same as X[,sum(z),by=x]

Then going further :

  X[Y[<having>],sum(z)] faster than X[,sum(z),by=x][<having>]

Lets say <having> are groups where x>2 (just one group in this
example) :

  X[Y[x>2],sum(z)] same but faster than X[,sum(z),by=x][x>2] 

which is the same as 

  X[J(3),sum(z)]

if we knew we wanted group '3' in advance for example.

These constructs (e.g. 'by without by') generalise to list() of
expressions and function calls of column variables in the usual way.

Sometimes you do want mult='all', and run the j expression on the result
as a whole, not by row of Y.  In that case, assuming Y has less columns
than key(X) meaning mult='all' (as it is in this example) :

	X[Y,length(z)]	  # j eval'd by row of Y, result 3 rows
	X[Y][,length(z)]  # length 1 vector value 12

HTH?
Matthew


On Fri, 2010-07-30 at 19:28 -0700, Harish wrote:
> I am getting some unexpected behavior with mult="all".
> 
> 1) Getting a data table when I expect a vector
> 2) Not getting NA's when I expect them (because of nomatch=NA)
> 
> ==========
> 
> Common code for examples below
> 
> x <- data.table(a=c("a","b","d","e"),b=c("A","A","B","B"),d=c(1,2,3,4), key="a,b")
> y <- data.table(g=c("a","b","c","d"),h=c("A","A","A","A"))
> 
> ==========
> 
> Issue #1: Getting a data table when I expect a vector
> 
> I am not following the logic of when a data.table is returned and when a vector is returned.  Initially, I thought that if j had only one item without a list(), a vector is returned, but I am seeing some contrary behavior.
> 
> x[y,d]  # Returns a vector as expected
> x[y,d,mult="all"]  # Returns a data.table.  Why?
> 
> Would someone help me understand why I should not expect a vector in the last query?
> 
> ==========
> Issue #2: Not getting NA's when I expect them (because of nomatch=NA)
> 
> x[y,d,nomatch=NA]  # Expected: returns a vector with NAs in them
> x[y,d,nomatch=NA,mult="all]  # Unexpected: NAs not appearing
> 
> Am I missing something?
> 
> Harish
> 
> 
> 
>       
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list