[datatable-help] Subsetting By Row Function

Arunkumar Srinivasan aragorn168b at gmail.com
Sat Jul 19 01:02:41 CEST 2014


Hi Ben,

If the “Date” column (which seems to be just month names) is already in order - meaning you just want to pick the last item for each group, then this is fairly straightforward:

I assume Date is of type “character”.

Method 1:

DT[, .SD[.N], by=Group]
#    Group Value   Date
# 1:     1   yyy   July
# 2:     2  qqqq August
Method 2:

In this case, .SD is not optimised for speed yet. So, if this is slow, then you can overcome it by using .I in place of .SD as follows:

DT[DT[, .I[.N], by=Group]$V1]
#    Group Value   Date
# 1:     1   yyy   July
# 2:     2  qqqq August
Instead of subsetting entire data per group (.SD), we get the row number (.I) in DT for each group (in column V1) and then just subset those rows.

If the Date column is not necessarily sorted for each group, then we create an extra column:

Method 3:

DT[, idx := chmatch(Date, month.name)]
setkey(DT, Group, idx) # sort by group, idx
DT[DT[, .I[.N], by=Group]$V1]
#    Group Value   Date idx
# 1:     1   yyy   July   7
# 2:     2  qqqq August   8
Or if you use v1.9.3, you can use setorder instead of setkey which allows for ordering in ascending and descending order:

Method 4:

DT[, idx := chmatch(Date, month.name)]
setorder(DT, Group, -idx) # sort by group, and descending order on idx
Now we’ll need to pick the first element instead of the .Nth (last) element per group.

DT[DT[, .I[1L], by=Group]$V1]
#    Group Value   Date idx
# 1:     1   yyy   July   7
# 2:     2  qqqq August   8
And alternatively, if you don’t wish to add the extra column, you can use order(.) as follows:

Method 5:

DT[order(Group, -chmatch(Date, month.name))][, .SD[1L], by=Group]
If you want to use .I here, you’ll have to save the first part onto a variable, which essentially means you’ll use up twice the memory of your data set.. So, I’d prefer this least. But just to show all possible ways I could think of.

HTH


Arun

From: Arunkumar Srinivasan aragorn168b at gmail.com
Reply: Arunkumar Srinivasan aragorn168b at gmail.com
Date: July 19, 2014 at 12:51:04 AM
To: bgoldstein ben.goldstein at gmail.com
Cc: datatable-help at lists.r-forge.r-project.org datatable-help at lists.r-forge.r-project.org
Subject:  Re: [datatable-help] Subsetting By Row Function  

Hi Ben,

If the "Date" column (which seems to be just month names) is already in order - meaning you just want to pick the last item for each group, then this is fairly straightforward:

I assume `Date` is of type "character".

Method 1:
DT[, .SD[.N], by=Group]
#    Group Value   Date
# 1:     1   yyy   July
# 2:     2  qqqq August

Method 2:
In this case, `.SD` is not optimised for speed yet. So, if this is slow, then you can overcome it by using `.I` in place of `.SD` as follows:

DT[DT[, .I[.N], by=Group]$V1]
#    Group Value   Date
# 1:     1   yyy   July
# 2:     2  qqqq August

Instead of subsetting entire data per group (.SD), we get the row number (.I) in DT for each group (in column V1) and then just subset those rows.

---

If 


On Sat, Jul 19, 2014 at 12:40 AM, bgoldstein <ben.goldstein at gmail.com> wrote:
I am having trouble defining (and therefore searching) for this problem. I
have data like this:

Group Value Date
1         xxx   June
1         yyy   July
2         zzzz   May
2         qqqq  August
etc.


I want to subset the 'Value' of each 'Group' by the latest 'Date'. So my
output should be:

Group Value Date
1         yyy   July
2         qqqq  August
etc.

The doBy package has a firstobs() function that works but is quite slow.

What would be a data.table way to do this?

Thank you,

Ben



--
View this message in context: http://r.789695.n4.nabble.com/Subsetting-By-Row-Function-tp4694221.html
Sent from the datatable-help mailing list archive at Nabble.com.
_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140719/a2f10516/attachment.html>


More information about the datatable-help mailing list