[datatable-help] (no subject)

DUPREZ Cédric Cedric.DUPREZ at ign.fr
Tue Feb 7 16:37:42 CET 2012


Thank you for your fast answer.

In fact, I realize that my example was too simple... based on sequences for ids.
Lets imagine that my datatable is like that :

DT <- data.table('id1'=c("n1","n1","n1","n1","n1","n2","n2","n2","n2","n3","n3","n3","n3","n3","n3")
	, 'id2'= c(1,2,3,4,5,1,2,3,4,2,3,4,5,6,8)
	, 'val'=c(1,2,NA,5,6,1,NA,NA,4,NA,2,4,6,7,8), key=c("id1", "id2"))

How can I complete NAs using the same rule ?

Cedric


-----Message d'origine-----
De : Matthew Dowle [mailto:mdowle at mdowle.plus.com] 
Envoyé : mardi 7 février 2012 14:58
À : DUPREZ Cédric
Cc : datatable-help at r-forge.wu-wien.ac.at
Objet : Re: [datatable-help] (no subject)


In data.table this op is roll=TRUE; see example(data.table). Also known as
prevailing join, and last observation carried forward (locf).

The trick is to exclude the NA rows from the dataset in the first place.
There is no need to create the NAs and then fill in the NAs.  Just
roll=TRUE to the irregular data. This is faster, more memory efficient and
more flexible.

DT = DT[!is.na(val)]   # but better not to have NAs in first place

DT[CJ(1:3,1:4),roll=TRUE]   # CJ = Cross Join
      id1 id2 val
 [1,]   1   1   1
 [2,]   1   2   2
 [3,]   1   3   2
 [4,]   1   4   5
 [5,]   2   1   1
 [6,]   2   2   1
 [7,]   2   3   1
 [8,]   2   4   4
 [9,]   3   1  NA
[10,]   3   2   2
[11,]   3   3   4
[12,]   3   4   6
>

More typically it might be :

   DT[CJ(ids,dates), roll=TRUE]

and to not roll forward the last observation in each group :

   DT[CJ(ids,dates), rolltolast=TRUE]

and only to roll within each day (not to last of previous day) :

   DT[CJ(ids,dates,times), roll=TRUE]   # where key(DT) is (id,date,time)

We haven't had many questions about roll=TRUE so I'm not sure if people
haven't discovered it, or whether it just works and people don't have
issues with it.  It is very well tested and several years old, so the
latter is possible.

Matthew


> Dear all,
>
> I am looking for the best way to complete missing values in a datatable,
> according to particular rules.
>
> Having the following datatable:
> DT <- data.table('id1'=c(1,1,1,1,2,2,2,2,3,3,3,3)
>       , 'id2'= c(1,2,3,4)
>       , 'val'=c(1,2,NA,5,1,NA,NA,4,NA,2,4,6)
>       , key=c("id1", "id2"))
>
> I get:
>       id1 id2 val
>  [1,]   1   1   1
>  [2,]   1   2   2
>  [3,]   1   3  NA
>  [4,]   1   4   5
>  [5,]   2   1   1
>  [6,]   2   2  NA
>  [7,]   2   3  NA
>  [8,]   2   4   4
>  [9,]   3   1  NA
> [10,]   3   2   2
> [11,]   3   3   4
> [12,]   3   4   6
>
> The rule to complete missing values is the following: put the immediatly
> preceding value (val) from the same id1 line that is not missing.
> In my example, lines with missing values are :
>
> DT[is.na(val)]
>
>      id1 id2 val
> [1,]   1   3  NA
> [2,]   2   2  NA
> [3,]   2   3  NA
> [4,]   3   1  NA
>
> The final result for my datatable should be:
>
> DT
>       id1 id2 val
>  [1,]   1   1   1
>  [2,]   1   2   2
>  [3,]   1   3   2
>  [4,]   1   4   5
>  [5,]   2   1   1
>  [6,]   2   2   1
>  [7,]   2   3   1
>  [8,]   2   4   4
>  [9,]   3   1  NA
> [10,]   3   2   2
> [11,]   3   3   4
> [12,]   3   4   6
>
> What is the best and easiest way to complete missing values with such
> rules. I tried with joins and := operator by often get error messages like
> "combining bywithoutby with := in j is not yet implemented."
>
> Thanks in advance for your help,
>
> Cedric
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>




More information about the datatable-help mailing list