[datatable-help] Remove some data table rows based on three conditions

Frank S. f_j_rod at hotmail.com
Wed Oct 22 19:19:05 CEST 2014


Dear all,
I'm working with a large database in wich I have some rows which have identical id and datep variables. Of these
duplicated rows, I only want to keep those row associated to the maximum value in marker variable. As an example:
DT <- data.table(
 id = rep(c(2,5),c(3,2)),
 datep = as.Date(c('1995-04-20','1995-04-20', '1997-02-19', '1998-01-15','1998-01-15')),
 marker = c(2,8,5,7,5),
 group=rep(c("A","B"),c(3,2))
 )
First, I sort by key variables: id, marker
DT[order(id,marker)]
 
But afterwards I've tried different things and I'm not able to what I want:
DT[!duplicated(DT[c('id', 'datep')])]
DT[ !(duplicated %chin% c('id','datep'))]
DT[ !(duplicated %in% c('id','datep'))]
DT[,!(duplicated(DT[c("id","datep")])), by=list(id,datep)]
unique(DT[c('id','datep')])
Please, does anyone know how to do it?
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20141022/ae77be82/attachment.html>


More information about the datatable-help mailing list