[Rcpp-devel] translate an R vectorized loop with three logical conditions to C++

Fri May 11 17:14:39 CEST 2012

On Fri, May 11, 2012 at 8:06 AM, Nelson Villoria <nvillori at purdue.edu> wrote:
> I am new to this list, so I hope this is the right place to ask this
> question. I am trying to translate the R vectorized loop below to C++ in
> order to speed up my calculations:
>
> Let:
>> n1
>  [1] 1 1 2 2 2 3 3 4 4 4 5 5 5 5 6 6 6 7 7 8 8 8 9 9
>> n2
>  [1] 2 4 1 3 5 2 6 1 5 7 2 4 6 8 3 5 9 4 8 5 7 9 6 8
>> w1w1
>  [1] 0.2500000 0.2500000 0.1111111 0.1111111 0.1111111 0.2500000 0.2500000
>  [8] 0.1111111 0.1111111 0.1111111 0.0625000 0.0625000 0.0625000 0.0625000
> [15] 0.1111111 0.1111111 0.1111111 0.2500000 0.2500000 0.1111111 0.1111111
> [22] 0.1111111 0.2500000 0.2500000
>
> My vectorized loop is:
>
>    tWSWS.k <- lapply(c(1:length(n1)), function(.n1){
>      lapply(c(1:length(n2)), function(.n2){
>        if(.n1!=.n2){
>         w1w1[n1==.n1 & n2==.n2]
>       }})})
>
> result=sum(unlist(tWSWS.k))

> Could you help me with this translation or at least point me out to some
> reference/example?

I would look at the calculation first.  You are assigning .n1 and .n2
to 1:24 in the two loops for a total of 24^2 evaluations of the inner
expression.  But the expression w1w1[n1 == .n1 & n2 == .n2] will be a
zero-length vector unless .n1 is a value in n1 and .n2 is a value in
n2.  So you can change to

lapply(unique(n1), function(.n1) lapply(unique(n2), function(.n2)
if(.n1 != .n2) w1w1[n1 == .n1 & n2 == .n2])))

and evaluate the inner expression a total of 9^2 times.

Now, the expression n1 == .n1 & n2 == .n2  can only be true for the
combinations present in the original data ordering.  In other words,
if you set

(dat <- data.frame(
+        n1 = c(1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 7, 7,
+               8, 8, 8, 9, 9),
+        n2 = c(2, 4, 1, 3, 5, 2, 6, 1, 5, 7, 2, 4, 6, 8, 3, 5, 9, 4, 8,
+               5, 7, 9, 6, 8),
+        w1 = c(0.25, 0.25, 0.1111111, 0.1111111, 0.1111111, 0.25,
0.25, 0.1111111,
+               0.1111111, 0.1111111, 0.0625, 0.0625, 0.0625, 0.0625,
0.1111111,
+               0.1111111, 0.1111111, 0.25, 0.25, 0.1111111,
0.1111111, 0.1111111,
+               0.25, 0.25)
+        ))
   n1 n2        w1
1   1  2 0.2500000
2   1  4 0.2500000
3   2  1 0.1111111
4   2  3 0.1111111
5   2  5 0.1111111
6   3  2 0.2500000
7   3  6 0.2500000
8   4  1 0.1111111
9   4  5 0.1111111
10  4  7 0.1111111
11  5  2 0.0625000
12  5  4 0.0625000
13  5  6 0.0625000
14  5  8 0.0625000
15  6  3 0.1111111
16  6  5 0.1111111
17  6  9 0.1111111
18  7  4 0.2500000
19  7  8 0.2500000
20  8  5 0.1111111
21  8  7 0.1111111
22  8  9 0.1111111
23  9  6 0.2500000
24  9  8 0.2500000

it is exactly those combinations for which w1w1[n1 == .n1 & n2 == .n2]
can be other than a zero-length vector.  You condition [.n1 != .n2]
can be applied to these combinations before you start the loops

dat <- subset(dat, n1 != n2)

which, in this case, doesn't eliminate any rows.

Because the combinations of n1 and n2 are unique, your loop is an
expensive way of calculating sum(w1w1).

In general, you seem to want the unique combinations of n1 and n2 that
occur in the data so I would run the loop over those combinations.
But I still don't understand the summing at the end.  It seems to me
that the result of the summation will always be the sum of w1w1 after
eliminating diagonal cases.