[datatable-help] (no subject)

DUPREZ Cédric Cedric.DUPREZ at ign.fr
Tue Feb 7 17:46:17 CET 2012


Great! That's it.

Thanks a lot for your help and for the datatable package.

Cheers,

Cedric

-----Message d'origine-----
De : Matthew Dowle [mailto:mdowle at mdowle.plus.com] 
Envoyé : mardi 7 février 2012 17:01
À : DUPREZ Cédric
Cc : datatable-help at r-forge.wu-wien.ac.at
Objet : RE: [datatable-help] (no subject)

Starting with your data, something like below.  In reality you wouldn't
put id2 into the table with the NAs in the first place, you'd leave DT as
the irregular data without NA, and then join to it from wherever the id2
are coming from.

> X = DT[,list(id1,id2)]
> DT = DT[!is.na(val)]
> DT[X,roll=TRUE]
      id1 id2 val
 [1,]  n1   1   1
 [2,]  n1   2   2
 [3,]  n1   3   2
 [4,]  n1   4   5
 [5,]  n1   5   6
 [6,]  n2   1   1
 [7,]  n2   2   1
 [8,]  n2   3   1
 [9,]  n2   4   4
[10,]  n3   2  NA
[11,]  n3   3   2
[12,]  n3   4   4
[13,]  n3   5   6
[14,]  n3   6   7
[15,]  n3   8   8
>



> Thank you for your fast answer.
>
> In fact, I realize that my example was too simple... based on sequences
> for ids.
> Lets imagine that my datatable is like that :
>
> DT <-
> data.table('id1'=c("n1","n1","n1","n1","n1","n2","n2","n2","n2","n3","n3","n3","n3","n3","n3")
> 	, 'id2'= c(1,2,3,4,5,1,2,3,4,2,3,4,5,6,8)
> 	, 'val'=c(1,2,NA,5,6,1,NA,NA,4,NA,2,4,6,7,8), key=c("id1", "id2"))
>
> How can I complete NAs using the same rule ?
>
> Cedric
>
>
> -----Message d'origine-----
> De : Matthew Dowle [mailto:mdowle at mdowle.plus.com]
> Envoyé : mardi 7 février 2012 14:58
> À : DUPREZ Cédric
> Cc : datatable-help at r-forge.wu-wien.ac.at
> Objet : Re: [datatable-help] (no subject)
>
>
> In data.table this op is roll=TRUE; see example(data.table). Also known as
> prevailing join, and last observation carried forward (locf).
>
> The trick is to exclude the NA rows from the dataset in the first place.
> There is no need to create the NAs and then fill in the NAs.  Just
> roll=TRUE to the irregular data. This is faster, more memory efficient and
> more flexible.
>
> DT = DT[!is.na(val)]   # but better not to have NAs in first place
>
> DT[CJ(1:3,1:4),roll=TRUE]   # CJ = Cross Join
>       id1 id2 val
>  [1,]   1   1   1
>  [2,]   1   2   2
>  [3,]   1   3   2
>  [4,]   1   4   5
>  [5,]   2   1   1
>  [6,]   2   2   1
>  [7,]   2   3   1
>  [8,]   2   4   4
>  [9,]   3   1  NA
> [10,]   3   2   2
> [11,]   3   3   4
> [12,]   3   4   6
>>
>
> More typically it might be :
>
>    DT[CJ(ids,dates), roll=TRUE]
>
> and to not roll forward the last observation in each group :
>
>    DT[CJ(ids,dates), rolltolast=TRUE]
>
> and only to roll within each day (not to last of previous day) :
>
>    DT[CJ(ids,dates,times), roll=TRUE]   # where key(DT) is (id,date,time)
>
> We haven't had many questions about roll=TRUE so I'm not sure if people
> haven't discovered it, or whether it just works and people don't have
> issues with it.  It is very well tested and several years old, so the
> latter is possible.
>
> Matthew
>
>
>> Dear all,
>>
>> I am looking for the best way to complete missing values in a datatable,
>> according to particular rules.
>>
>> Having the following datatable:
>> DT <- data.table('id1'=c(1,1,1,1,2,2,2,2,3,3,3,3)
>>       , 'id2'= c(1,2,3,4)
>>       , 'val'=c(1,2,NA,5,1,NA,NA,4,NA,2,4,6)
>>       , key=c("id1", "id2"))
>>
>> I get:
>>       id1 id2 val
>>  [1,]   1   1   1
>>  [2,]   1   2   2
>>  [3,]   1   3  NA
>>  [4,]   1   4   5
>>  [5,]   2   1   1
>>  [6,]   2   2  NA
>>  [7,]   2   3  NA
>>  [8,]   2   4   4
>>  [9,]   3   1  NA
>> [10,]   3   2   2
>> [11,]   3   3   4
>> [12,]   3   4   6
>>
>> The rule to complete missing values is the following: put the immediatly
>> preceding value (val) from the same id1 line that is not missing.
>> In my example, lines with missing values are :
>>
>> DT[is.na(val)]
>>
>>      id1 id2 val
>> [1,]   1   3  NA
>> [2,]   2   2  NA
>> [3,]   2   3  NA
>> [4,]   3   1  NA
>>
>> The final result for my datatable should be:
>>
>> DT
>>       id1 id2 val
>>  [1,]   1   1   1
>>  [2,]   1   2   2
>>  [3,]   1   3   2
>>  [4,]   1   4   5
>>  [5,]   2   1   1
>>  [6,]   2   2   1
>>  [7,]   2   3   1
>>  [8,]   2   4   4
>>  [9,]   3   1  NA
>> [10,]   3   2   2
>> [11,]   3   3   4
>> [12,]   3   4   6
>>
>> What is the best and easiest way to complete missing values with such
>> rules. I tried with joins and := operator by often get error messages
>> like
>> "combining bywithoutby with := in j is not yet implemented."
>>
>> Thanks in advance for your help,
>>
>> Cedric
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
>




More information about the datatable-help mailing list