[datatable-help] data.table and sp classes - any best practices?
Bacou, Melanie
mel at mbacou.com
Fri Feb 14 13:47:16 CET 2014
I often use data.table in combination with large spatial objects
(SpatialPolygonsDataFrame, SpatialPixelsDataFrame, etc.), but I am
always worried about using setkey() on a @data slot thinking that I
might mess up the link between the data attributes and the spatial
features (polygons, points, pixels).
I am hoping some of you might be able to clarify how best to manipulate
data attributes inside a spatial object using data.table without running
into potential errors.
Here is a typical use case:
# Load a sample SpatialPolygonsDataFrame from GADM
load(url("http://biogeo.ucdavis.edu/data/gadm2/R/ETH_adm3.RData"))
# My understanding is the data.frame row names should always match the
polygon ID slots
gadm.rn <- row.names(gadm)
gadm.rn[1:5]
# [1] "1" "2" "3" "4" "5"
pid <- lapply(gadm at polygons, slot, "ID")
pid[1:5]
# [[1]]
# [1] "1"
#
# [[2]]
# [1] "2"
#
# [[3]]
# [1] "3"
#
# [[4]]
# [1] "4"
#
# [[5]]
# [1] "5"
# Let's say I need to merge external data into gadm at data using setkey()
# Here is my approach
gadm at data <- data.table(gadm at data)
row.names(gadm at data)[1:5]
# [1] "1" "2" "3" "4" "5"
# Til now row names are preserved, good.
# Let's create an explicit `rn` column to keep the initial `gadm` row names
gadm at data[, rn := gadm.rn]
# Check the ordering of the first data column
gadm at data[, PID][1:5]
# [1] 30825 30826 30827 30828 30829
# Now index gadm at data by another column
setkey(gadm at data, NAME_3)
# Verify that the row order has changed
gadm at data[, PID][1:5]
# [1] 30859 31100 31101 31145 31016
# What about row names?
row.names(gadm at data)[1:5]
# [1] "1" "2" "3" "4" "5"
# Row names are not preserved, does that mean attributes are now associated
# with the wrong polygons?
# Let's try to fix that
setkey(gadm at data, rn)
gadm at data <- gadm at data[gadm.rn]
gadm at data[, PID][1:5]
# [1] 30825 30826 30827 30828 30829
# I'm now back to the original row order, note that row names are still
unchanged
row.names(gadm at data)[1:5]
# [1] "1" "2" "3" "4" "5"
# I assume my spatial object is now correct
I don't know whether this approach makes sense at all, or if I should
stay away from using data.table inside sp: classes?
I much appreciate any suggestion.
Thanks, --Mel.
--
Melanie BACOU
International Food Policy Research Institute
Agricultural Economist, HarvestChoice
Work +1(202)862-5699
E-mail mel at mbacou.com
Visit harvestchoice.org
More information about the datatable-help
mailing list