[datatable-help] data.table and sp classes - any best practices?

Bacou, Melanie mel at mbacou.com
Fri Feb 14 13:47:16 CET 2014


I often use data.table in combination with large spatial objects 
(SpatialPolygonsDataFrame, SpatialPixelsDataFrame, etc.), but I am 
always worried about using setkey()  on a @data slot thinking that I 
might mess up the link between the data attributes and the spatial 
features (polygons, points, pixels).

I am hoping some of you might be able to clarify how best to manipulate 
data attributes inside a spatial object using data.table without running 
into potential errors.

Here is a typical use case:

# Load a sample SpatialPolygonsDataFrame from GADM
load(url("http://biogeo.ucdavis.edu/data/gadm2/R/ETH_adm3.RData"))

# My understanding is the data.frame row names should always match the 
polygon ID slots
gadm.rn <- row.names(gadm)
gadm.rn[1:5]
# [1] "1" "2" "3" "4" "5"

pid <- lapply(gadm at polygons, slot, "ID")
pid[1:5]
# [[1]]
# [1] "1"
#
# [[2]]
# [1] "2"
#
# [[3]]
# [1] "3"
#
# [[4]]
# [1] "4"
#
# [[5]]
# [1] "5"


# Let's say I need to merge external data into gadm at data using setkey()
# Here is my approach
gadm at data <- data.table(gadm at data)
row.names(gadm at data)[1:5]
# [1] "1" "2" "3" "4" "5"
# Til now row names are preserved, good.

# Let's create an explicit `rn` column to keep the initial `gadm` row names
gadm at data[, rn := gadm.rn]

# Check the ordering of the first data column
gadm at data[, PID][1:5]
# [1] 30825 30826 30827 30828 30829

# Now index gadm at data by another column
setkey(gadm at data, NAME_3)

# Verify that the row order has changed
gadm at data[, PID][1:5]
# [1] 30859 31100 31101 31145 31016

# What about row names?
row.names(gadm at data)[1:5]
# [1] "1" "2" "3" "4" "5"
# Row names are not preserved, does that mean attributes are now associated
# with the wrong polygons?

# Let's try to fix that
setkey(gadm at data, rn)
gadm at data <- gadm at data[gadm.rn]
gadm at data[, PID][1:5]
# [1] 30825 30826 30827 30828 30829
# I'm now back to the original row order, note that row names are still 
unchanged
row.names(gadm at data)[1:5]
# [1] "1" "2" "3" "4" "5"
# I assume my spatial object is now correct

I don't know whether this approach makes sense at all, or if I should 
stay away from using data.table inside sp: classes?

I much appreciate any suggestion.
Thanks, --Mel.

-- 
Melanie BACOU
International Food Policy Research Institute
Agricultural Economist, HarvestChoice
Work +1(202)862-5699
E-mail mel at mbacou.com
Visit harvestchoice.org



More information about the datatable-help mailing list