[datatable-help] J() casts to int?

Matthew Dowle mdowle at mdowle.plus.com
Wed Oct 5 00:54:08 CEST 2011


Thanks for illustrating so clearly. J() has always cast double columns
to int (as far as I remember anyway) for convenience when looking up
data from the prompt, say, to save having to remember L on typed in
values inside J().  This case, where J() is deliberately created with
more columns than x's key, using join inherited scope, I didn't
anticipate.  Or rather, was planning to achieve that output via x. and
i. prefixes in j (previous thread I think but it seems no FR number).
The way you've done it is kind of a manual way of achieving 'i.', where
'i.' corresponds to your 'prev.'. I'm thinking automatic i. will still
be nice for convenience, but I have to admit I thought it wasn't
possible at all (at least, as elegantly in one query). Presumably this
is most useful with roll=TRUE : age as well as delta.

Back to J() ... inside J() it doesn't know it's being calling as the i
argument, so it doesn't know the length of x's key. Otherwise, simple
fix would be for J() to only coerce double columns involved in the join
to x's key. Should be possible to use parent.frame() inside J() to work
out where it's being called from and the length of x's key. 

Or, perhaps all data.table joins should allow double columns in i to be
joined to int, with inefficiency warning if say the number of rows in i
is > 1000,  error/warning if fractional data is truncated, and silently
otherwise.  Then the coercion to int in J() could be removed and it's
more consistent.

Thoughts anyone?

In the meantime I can't think of any other way than using data.table()
instead of J(), which looks to work and give the right result.

Matthew


On Tue, 2011-10-04 at 10:03 -0500, Johann Hibschman wrote:
> I just noticed that J casts all its arguments to int.  Has this always
> been the case?  I can't find it documented anywhere.
> 
> I came across this while trying to do a self join, like this:
> 
>   > tmp <- data.table(date=1:5, value=10*rnorm(5), key="date")
>   > tmp
>        date     value
>   [1,]    1  3.710278
>   [2,]    2  4.571288
>   [3,]    3  2.009627
>   [4,]    4  8.237882
>   [5,]    5 -9.004814
>   > with(tmp, J(date, value))
>        date value
>   [1,]    1     3
>   [2,]    2     4
>   [3,]    3     2
>   [4,]    4     8
>   [5,]    5    -9
>   > tmp[J(date + 2, prev.date=date, prev.value=value),
>         list(prev.date, value, prev.value, delta=value-prev.value)]
>        date prev.date     value prev.value       delta
>   [1,]    3         1  2.009627          3  -0.9903734
>   [2,]    4         2  8.237882          4   4.2378817
>   [3,]    5         3 -9.004814          2 -11.0048141
>   [4,]    6         4        NA          8          NA
>   [5,]    7         5        NA         -9          NA
>   > tmp[data.table(date + 2L, prev.date=date, prev.value=value),
>         list(prev.date, value, prev.value, delta=value-prev.value)]
>        date prev.date     value prev.value      delta
>   [1,]    3         1  2.009627   3.710278  -1.700652
>   [2,]    4         2  8.237882   4.571288   3.666594
>   [3,]    5         3 -9.004814   2.009627 -11.014441
>   [4,]    6         4        NA   8.237882         NA
>   [5,]    7         5        NA  -9.004814         NA
> 
> Is this intended?  Using J let me be sloppy and do "+2" while data.table
> made me use "+2L", but then it clobbered the non-int values.
> 
> Is there a better way?
> 
> Thanks,
> Johann
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list