[datatable-help] Assignement by reference on a datatable subset
DUPREZ Cédric
Cedric.DUPREZ at ign.fr
Wed Feb 8 18:05:30 CET 2012
Dear Matthew,
Thank you for the help. I think I could not find this solution alone.
Regards,
Cedric
-----Message d'origine-----
De : Matthew Dowle [mailto:mdowle at mdowle.plus.com]
Envoyé : mercredi 8 février 2012 15:01
À : DUPREZ Cédric
Cc : datatable-help at r-forge.wu-wien.ac.at
Objet : Re: [datatable-help] Assignement by reference on a datatable subset
Setting mult="first" helps here. data.table doesn't know whether keys are
unique. When joining to all the columns of a key that you know is unique,
setting mult="first" is faster (or mult="last" is the same). Also, when
mult="first" (or "last"), that isn't considered by without by, and := then
works. For example,
> DT = data.table(a=1:2,b=1:4,key="a")
> DT
a b
[1,] 1 1
[2,] 1 3
[3,] 2 2
[4,] 2 4
> DT[J(2),b:=5L] # one group is ok
a b
[1,] 1 1
[2,] 1 3
[3,] 2 5
[4,] 2 5
> DT[J(1:2),b:=6L] # two or more isn't implemented when mult="all"
Error in `[.data.table`(DT, J(1:2), `:=`(b, 6L)) :
combining bywithoutby with := in j is not yet implemented.
> DT[J(1:2),b:=6L,mult="first"] # but, "first" works with :=
a b
[1,] 1 6
[2,] 1 3
[3,] 2 6
[4,] 2 5
>
Now we know that,
> X = unique(DT[!is.na(val),list(id1,as.integer(val))])
> X
id1 V2
[1,] n1 2
[2,] n1 7
[3,] n1 11
> DT[X,val:=id2,mult="first"]
id1 id2 val
[1,] n1 1 NA
[2,] n1 2 2
[3,] n1 3 2
[4,] n1 4 2
[5,] n1 5 NA
[6,] n1 6 NA
[7,] n1 7 7
[8,] n1 8 7
[9,] n1 9 NA
[10,] n1 10 NA
[11,] n1 11 11
[12,] n1 12 11
[13,] n2 1 NA
[14,] n2 2 NA
[15,] n2 3 NA
[16,] n2 4 NA
(Thanks for the concise examples btw, helps a lot)
> Dear all,
>
> I have a new question about data completion within a datatable.
>
> Having the following datatable:
> DT <- data.table("id1" = c("n1", "n1", "n1", "n1", "n1", "n1", "n1", "n1",
> "n1", "n1", "n1", "n1", "n2", "n2", "n2", "n2")
> , 'id2'=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4)
> , val=c(NA, NA, 2, 2, NA, NA, 7, 7, NA, NA, NA, 11, NA, NA, NA, NA)
> , key = c("id1", "id2"))
>
> I get:
> id1 id2 val
> [1,] n1 1 NA
> [2,] n1 2 NA
> [3,] n1 3 2
> [4,] n1 4 2
> [5,] n1 5 NA
> [6,] n1 6 NA
> [7,] n1 7 7
> [8,] n1 8 7
> [9,] n1 9 NA
> [10,] n1 10 NA
> [11,] n1 11 NA
> [12,] n1 12 11
> [13,] n2 1 NA
> [14,] n2 2 NA
> [15,] n2 3 NA
> [16,] n2 4 NA
>
> The val column contains values of id2 per id1.
> For each id2 referenced by a val value, I would like to complete its val
> value if it is not the case, copying its id2.
> In my example, the final datatable should look like this:
> id1 id2 val
> [1,] n1 1 NA
> [2,] n1 2 2
> [3,] n1 3 2
> [4,] n1 4 2
> [5,] n1 5 NA
> [6,] n1 6 NA
> [7,] n1 7 7
> [8,] n1 8 7
> [9,] n1 9 NA
> [10,] n1 10 NA
> [11,] n1 11 11
> [12,] n1 12 11
> [13,] n2 1 NA
> [14,] n2 2 NA
> [15,] n2 3 NA
> [16,] n2 4 NA
> As you can see, val on lines 2 and 11 have been completed with the id2
> value.
>
> I tried like this:
> DT2 <- unique(DT[!is.na(val), c("id1", "val"), with = F])
> DT2$id2 <- DT2$val
> setkeyv(DT2, c("id1", "id2"))
> DT[DT2, val:=val.1]
>
> But I get the following message: "combining bywithoutby with := in j is
> not yet implemented."
>
> Here is the solution I finally found:
> DT <- data.table("id1" = c("n1", "n1", "n1", "n1", "n1", "n1", "n1", "n1",
> "n1", "n1", "n1", "n1", "n2", "n2", "n2", "n2"), 'id2'=c(1, 2, 3, 4, 5, 6,
> 7, 8, 9, 10, 11, 12, 1, 2, 3, 4), val=c(NA, NA, 2, 2, NA, NA, 7, 7, NA,
> NA, NA, 11, NA, NA, NA, NA), key = c("id1", "id2"))
> noms <- names(DT)
> cle <- key(DT)
> DT2 <- unique(DT[!is.na(val), c("id1", "val"), with = F])
> DT2$id2 <- DT2$val
> setkeyv(DT2, c("id1", "id2"))
> X <- DT2[DT]
> X[is.na(val.1), val.1:=val]
> DT <- X[,list(id1, id2, val.1)]
> setnames(DT, 3, "val")
> setkeyv(DT, cle)
>
> Is there a faster way to complete my data?
>
> Thanks in advance for you help.
>
> Regards,
> Cedric
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
More information about the datatable-help
mailing list