[datatable-help] Assignement by reference on a datatable subset

Matthew Dowle mdowle at mdowle.plus.com
Wed Feb 8 15:01:08 CET 2012


Setting mult="first" helps here. data.table doesn't know whether keys are
unique. When joining to all the columns of a key that you know is unique,
setting mult="first" is faster (or mult="last" is the same). Also, when
mult="first" (or "last"), that isn't considered by without by, and := then
works. For example,

> DT = data.table(a=1:2,b=1:4,key="a")
> DT
     a b
[1,] 1 1
[2,] 1 3
[3,] 2 2
[4,] 2 4
> DT[J(2),b:=5L]   # one group is ok
     a b
[1,] 1 1
[2,] 1 3
[3,] 2 5
[4,] 2 5
> DT[J(1:2),b:=6L]   # two or more isn't implemented when mult="all"
Error in `[.data.table`(DT, J(1:2), `:=`(b, 6L)) :
  combining bywithoutby with := in j is not yet implemented.
> DT[J(1:2),b:=6L,mult="first"]  # but, "first" works with :=
     a b
[1,] 1 6
[2,] 1 3
[3,] 2 6
[4,] 2 5
>

Now we know that,

> X = unique(DT[!is.na(val),list(id1,as.integer(val))])
> X
     id1 V2
[1,]  n1  2
[2,]  n1  7
[3,]  n1 11
> DT[X,val:=id2,mult="first"]
      id1 id2 val
 [1,]  n1   1  NA
 [2,]  n1   2   2
 [3,]  n1   3   2
 [4,]  n1   4   2
 [5,]  n1   5  NA
 [6,]  n1   6  NA
 [7,]  n1   7   7
 [8,]  n1   8   7
 [9,]  n1   9  NA
[10,]  n1  10  NA
[11,]  n1  11  11
[12,]  n1  12  11
[13,]  n2   1  NA
[14,]  n2   2  NA
[15,]  n2   3  NA
[16,]  n2   4  NA

(Thanks for the concise examples btw, helps a lot)


> Dear all,
>
> I have a new question about data completion within a datatable.
>
> Having the following datatable:
> DT <- data.table("id1" = c("n1", "n1", "n1", "n1", "n1", "n1", "n1", "n1",
> "n1", "n1", "n1", "n1", "n2", "n2", "n2", "n2")
> 	, 'id2'=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4)
> 	, val=c(NA, NA, 2, 2, NA, NA, 7, 7, NA, NA, NA, 11, NA, NA, NA, NA)
> 	, key = c("id1", "id2"))
>
> I get:
>       id1 id2 val
>  [1,]  n1   1  NA
>  [2,]  n1   2  NA
>  [3,]  n1   3   2
>  [4,]  n1   4   2
>  [5,]  n1   5  NA
>  [6,]  n1   6  NA
>  [7,]  n1   7   7
>  [8,]  n1   8   7
>  [9,]  n1   9  NA
> [10,]  n1  10  NA
> [11,]  n1  11  NA
> [12,]  n1  12  11
> [13,]  n2   1  NA
> [14,]  n2   2  NA
> [15,]  n2   3  NA
> [16,]  n2   4  NA
>
> The val column contains values of id2 per id1.
> For each id2 referenced by a val value, I would like to complete its val
> value if it is not the case, copying its id2.
> In my example, the final datatable should look like this:
>       id1 id2 val
>  [1,]  n1   1  NA
>  [2,]  n1   2   2
>  [3,]  n1   3   2
>  [4,]  n1   4   2
>  [5,]  n1   5  NA
>  [6,]  n1   6  NA
>  [7,]  n1   7   7
>  [8,]  n1   8   7
>  [9,]  n1   9  NA
> [10,]  n1  10  NA
> [11,]  n1  11  11
> [12,]  n1  12  11
> [13,]  n2   1  NA
> [14,]  n2   2  NA
> [15,]  n2   3  NA
> [16,]  n2   4  NA
> As you can see, val on lines 2 and 11 have been completed with the id2
> value.
>
> I tried like this:
> DT2 <- unique(DT[!is.na(val), c("id1", "val"), with = F])
> DT2$id2 <- DT2$val
> setkeyv(DT2, c("id1", "id2"))
> DT[DT2, val:=val.1]
>
> But I get the following message: "combining bywithoutby with := in j is
> not yet implemented."
>
> Here is the solution I finally found:
> DT <- data.table("id1" = c("n1", "n1", "n1", "n1", "n1", "n1", "n1", "n1",
> "n1", "n1", "n1", "n1", "n2", "n2", "n2", "n2"), 'id2'=c(1, 2, 3, 4, 5, 6,
> 7, 8, 9, 10, 11, 12, 1, 2, 3, 4), val=c(NA, NA, 2, 2, NA, NA, 7, 7, NA,
> NA, NA, 11, NA, NA, NA, NA), key = c("id1", "id2"))
> noms <- names(DT)
> cle <- key(DT)
> DT2 <- unique(DT[!is.na(val), c("id1", "val"), with = F])
> DT2$id2 <- DT2$val
> setkeyv(DT2, c("id1", "id2"))
> X <- DT2[DT]
> X[is.na(val.1), val.1:=val]
> DT <- X[,list(id1, id2, val.1)]
> setnames(DT, 3, "val")
> setkeyv(DT, cle)
>
> Is there a faster way to complete my data?
>
> Thanks in advance for you help.
>
> Regards,
> Cedric
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>




More information about the datatable-help mailing list