[datatable-help] Subsetting By Row Function

Arunkumar Srinivasan aragorn168b at gmail.com
Sat Jul 19 01:31:56 CEST 2014


Looking at ?data.table, there’s another way which doesn’t require sorting on “Group, Date”:

DT[DT[, .I[which.max(idx)], by=Group]$V1]
#    Group Value   Date idx
# 1:     1   yyy   July   7
# 2:     2  qqqq August   8
HTH


Arun

From: Arunkumar Srinivasan aragorn168b at gmail.com
Reply: Arunkumar Srinivasan aragorn168b at gmail.com
Date: July 19, 2014 at 1:20:40 AM
To: bgoldstein ben.goldstein at gmail.com, datatable-help at lists.r-forge.r-project.org datatable-help at lists.r-forge.r-project.org
Subject:  Re: [datatable-help] Subsetting By Row Function  

I was wondering - if you don't mind - you could briefly explain some of the 
syntax. I have seen .I and .SD but am not familiar quite with what they 
mean. I'm assuming .N is last? Is there a syntax for the first (.n?) or the 
5th (.5?) 
All special variables are explained in `?data.table`. It'd be much easier for you in the future if you go through it and try it out yourself with some dummy examples. They can be very powerful tools!

Briefly: .N contains the number of observations for each group - integer vector of length 1. If you want to refer to the first value, then you can just use 1 = .I[1], .I[5] for 5th value.. and if 5 > .N, .I[5] will return NA (like base R does when we access beyond a vector's allocated length).

.I contains the row number of the original data.table for each group. Ex: `DT <- data.table(x=c(1,2,1,1,2,1,2), y=10:16); DT[, print(.I), by=x]` gives the position of all the 1's in `DT` corresponding to x=1 first followed by all 2's in DT corresponding to x=2.

HTH

Arun

From: bgoldstein ben.goldstein at gmail.com
Reply: bgoldstein ben.goldstein at gmail.com
Date: July 19, 2014 at 1:12:17 AM
To: datatable-help at lists.r-forge.r-project.org datatable-help at lists.r-forge.r-project.org
Subject:  Re: [datatable-help] Subsetting By Row Function

Arun,

This worked perfectly - Thank you. The dates are actually full Chron dates
so it was easy to sort first by date. I ended up using your Method 2 for
speed.

I was wondering - if you don't mind - you could briefly explain some of the
syntax. I have seen .I and .SD but am not familiar quite with what they
mean. I'm assuming .N is last? Is there a syntax for the first (.n?) or the
5th (.5?)

Is '.I' saying find the index that meets this criterion? And .SD find the
group that meet the criterion?

Thank you,

Ben



--
View this message in context: http://r.789695.n4.nabble.com/Subsetting-By-Row-Function-tp4694221p4694227.html
Sent from the datatable-help mailing list archive at Nabble.com.
_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140719/e44a03d3/attachment.html>


More information about the datatable-help mailing list