[tlocoh-info] parameter selection, resampling, thin bursts
Andy Lyons
lyons.andy at gmail.com
Wed May 14 04:47:52 CEST 2014
Hi Anna,
Thanks for your questions. I have also added some responses below.
Andy
On 5/12/2014 8:16 AM, Wayne Marcus GETZ wrote:
> Hi Anna:
>
> Here is my response
>
>
> On Mon, May 12, 2014 at 6:56 AM, Anna Schweiger
> <anna.schweiger at nationalpark.ch
> <mailto:anna.schweiger at nationalpark.ch>> wrote:
>
> Dear T-LoCoH group, dear Andy
>
> First of all: I want to thank Andy and his colleagues for the
> great effort you are putting into T-LoCoH! I have started to use
> the package some weeks ago and I have to say, that I've hardly
> ever came across such a well explained tutorial (everything worked
> right from the start!)! Your publication is also really helpful
> and easy to follow and so are the R help files. Thank you so much!
> Good documentations make life a lot easier (and work fun)!!!
>
> However, I have a couple of questions I could not figure out
> myself. Maybe someone has some ideas on the following:
>
> 1.The first is a methodological question: I'm comparing the
> feeding areas of ibex and chamois in the summers of four
> consecutive years in one valley where they co-occur. For each year
> I have several (1-7) individuals per species. My assumptions are
> that individuals of the same species behave more similar then
> individuals of different species. In a first step, I chose the
> same values for s and a (I use the "a" method) for all individuals
> of the same species, across the years; i.e. all ibex have the same
> s and a value and all chamois another s and a value. However, I
> could also argue that the individuals of one species behave more
> similar in the same year than in the other years (maybe because of
> environmental variability). Therefore, I was wondering if
> selecting different s and a values for every year makes sense? In
> the end I'm extracting environmental data based on the polygons
> defined by what can be called "core feeding areas" (I select them
> based on duration of stay and number of separate visits). Then I
> compare the two species in GLMMs. So I'm basically pooling all
> ibex data (different ind, different years) and compare them to all
> chamois. I can account for the individual and yearly differences
> by including animal ID and year as a random effect. Still, I
> believe the parameters of all individuals from one species should
> be somewhat comparable. So far I could not quite get my head
> around this problem: Should I choose only one s and a value per
> species, or maybe only one for both species, or is it possible to
> vary s and a per year or even per individual? Do you have any
> suggestions? For me this is really tricky.
>
>
> You need to use the same s and a values for all species. However, you
> can ask the question, how robust is my result to variations in a and
> s. Thus you could see if your result holds up for all a and s or
> breaks down as these change. If it does break down, then this break
> down might have some significant implications because the implication
> might be that differences emerge or disappear, as the case may be,
> when time is given more or less weighting
I can see your quandary. I agree you need to be consistent across
individuals or datasets that you categorize in one group for subsequent
parts of the analysis. One way to be consistent is to use the same value
of 's' and 'a', another way to be consistent is to use the same process
for selecting 's' and 'a'. You've stumbled on one of T-LoCoH's
functional limitations - there isn't a magic formula for finding the
best 's' or 'a' (this can also be seen as a strength, because it keeps
the analyst and system knowledge in the picture). An alternative way for
selecting a value of 's', that you can apply consistently and fairly
easily across all data sets, is to pick the 's' value that returns the
same proportion of time selected hulls. This is a reasonable and
defensible way of selecting s when you consider that the term in the
time scaled distance equation that 's' controls is the essentially the
distance the individual could have traveled in a given time period if it
had been travelling at the maximum observed speed. In other words, two
data sets that are in many ways similar (same species, same type of
behavior) could have different values of 's' for the same time-distance
balance, because maximum observed speed of a dataset is affected by
sampling as well as behavior. I am currently of the belief that the best
way to get a consistent balancing of space-time across data sets is to
pick 's' that corresponds to a consistent proportion of time selected
hulls (which will probably result in very close 's' values in absolute
terms, we need to do some more work in this area).
The same principle applies for selecting 'a' across datasets that should
be analyzed in a similar manner - use the same value or the same
process. If you define the optimal 'a' as the one that fills spurious
holes in core areas, and minimizes cross-overs in areas where the animal
wasn't seen (which are the principles we present in the paper), you'll
probably find the same value of 'a' will do that pretty well across
similar datasets (this again however presumes the data sampling is
consistent, if sampling is greater in one dataset you'll obviously need
larger values of 'a' for the same size hulls because it is a cumulative
distance).
Hope this helps. Let us know how it goes because parameter selection is
one of the most common challenges when trying to create comparable space
use models for different datasets (which all methods have to contend with).
> My other questions are more technical:
>
> 2.I want to manually offset duplicate xy points in xyt.lxy. Is
> this possible? I want to avoid random offset when constructing
> hulls, to make the analysis repeatable. Maybe the explanation is
> somewhere in the help, but I couldn't find it...
>
> Since time is unique, I don't see how you can have overlapping points
> unless they are true duplicates. Such duplicates must be removed. So
> I am not sure I understand your question.
Duplicate locations with identical time stamps are usually a data
processing error (which is why the function that creates a LoCoH-xy
object from a series of locations checks for that). Duplicate locations
with different time stamps are usually not an error, but the result of
the animal resting, returning to the same spot, or a rounding issue. But
even when the time stamps are different, duplicate locations can still
be an issue with T-LoCoH because you need unique locations to make a
polygon.
The current version of T-LoCoH handles duplicate locations when creating
hulls. There are two options based on the value of theoffset.dups
parameter: ignoring them (at the risk of some points not having enough
unique neighbors to draw a polygon), or randomly offsetting them by a
fixed amount (the default). (The original LoCoH package had a third
option, deleting them, which didn't seem like a good idea so it was
removed). As an aside, we have discussed the possibility of adding
another option in future versions of T-LoCoH, whereby different rules
could be used to construct hulls around duplicate locations, for example
constructing a circle with a fixed radius representing the size of the
nest, water hole, etc. This would require some apriori knowledge of
behavior when the animal is observed at the same location multiple
times. If anyone has thoughts about this please let me know.
The current version of T-LoCoH has an option to add a random offset
(fixed distance, random direction) to duplicate locations when
constructing hulls. This is a problem, as Anna points out, for
reproducibility, because every time you construct hulls (and
subsequently isopleths), the duplicate locations will be offset somewhat
differently. An alternative approach is to randomly offset the duplicate
locations before constructing hulls (i.e., in the Locoh-xy object). This
should be done in full recognition that you are effectively altering the
input data (which may be fine for home range construction, but for other
analyses you would probably want to use the original locations). There
is not a function in the T-LoCoH package to randomly offset duplicate
locations in a LoCoH-xy object (I've added this to my to-do list). It
can be done with the following commands:
## These commands illustrate how to apply a random offset to
## duplicate locations in LoCoH-xy object. Note that this
## can not be undone, and any subsequent analysis or construction
## of hulls will be based on the altered data.
## Given a LoCoH-xy object called fredo.lxy
## Get the coordinates of the Locoh-xy object
xy <- coordinates(fredo.lxy$pts)
## Identify the duplicate rows
dup_idx <- duplicated(x)
## See how many locations are duplicate
table(dup_idx)
## Define the amount that duplicate locations will be randomly offset
## This is in map units.
offset <- 1
## Apply a random offset to the duplicate rows
theta <- runif(n=sum(dup_idx), min=0, max=2*pi)
xy[dup_idx,1] <- xy[dup_idx,1] + offset * cos(theta)
xy[dup_idx,2] <- xy[dup_idx,2] + offset * sin(theta)
## See if there are any more duplicate rows. (Should all be false)
table(duplicated(xy))
## Next, we create a new SpatialPointsDataFrame by
## i. Grabbing the attribute table of the existing locations
## ii. Assigning the new locations (with offsets) as the locations
pts_df <- fredo.lxy$pts at data
coordinates(pts_df) <- xy
fredo.lxy$pts <- pts_df
## The nearest neighbor lookup table is no longer valid and will
need to be
## recreated. Likewise with the ptsh (proportion of time selected
hulls v. s) table.
## Set these to null
fredo.lxy$nn <- NULL
fredo.lxy$ptsh <- NULL
## Lastly, we need to recreate the movement parameters.
fredo.lxy <- lxy.repair(fredo.lxy)
## Should be done. Inspect results
summary(fredo.lxy)
plot(fredo.lxy)
> 3.I'm resampling my data by using lxy.thin.byfreq (common sampling
> interval should be 4h, some individuals have 2h, some 10 min
> frequencies). Now, I have some cases with time gaps of about 1
> month. I would still like to include these data. Is it possible to
> split the data and include the two time periods separately? Can
> this be done by setting a value for tct in the auto.a method? I
> don't quite understand how tct works.
>
I don't fully understand this question. lxy.thin.byfreq will selectively
remove locations to get as close as it can to the desired sampling
interval. If the desired sampling interval is 4 hours, and there is a
gap of 30 days, it won't remove the points on either end of the gap. It
will only remove points where sampling interval is higher than the
desired interval. If you're seeing a different effect with
lxy.thin.byfreq let me know. The tct argument in auto.a() function is
very different, it acts a filter for identifying which points should be
used in the computation of 'a' (doesn't remove any points).
> Andy will have to explain how this works.
>
> 4.Again about resampling: As recommended in the help I thin bursts
> before resampling the data to a common time interval. I was
> wondering if the following is correct: First I inspect the
> sampling frequency plot with lxy.plot.freq. Then I thought: When
> tau.diff.max (default) = 0.02 and tau (median)=120 min, sampling
> frequencies between 117.6 - 122.4 should be fine. If I now see
> points in the plot with let's say delta t/tau = 0.95, then
> sampling frequency= 0.95*120= 108 min which is outside the range
> of tau.diff.max. In that case, should I set the threshold value in
> lxy.thin.bursts to thresh=0.98, to make sure all remaining points
> fall within the range 117.6 - 122.4? I think that having a
> sampling interval of 108 min in a dataset that should have 120 min
> is not uncommon and normally I would not think it is problematic.
> But I have only a very vague idea about the effects of such data
> intervals when the algorithms start working. Is it possible to
> provide any guidelines on thresholds for thinning bursts?
>
I can see how that can be confusing. The tau.diff.max argument in the
lxy.thin.bursts() function actually has nothing to do with how points
within "a burst" are removed (which in this context refers to a series
of locations spaced closely in time and presumed to be an error or
artifact of a hyperactive GPS recording device), or how it identifies
what group of points constitutes a burst. The tau.diff.max argument is
used downstream, after points have been removed, in computing the
movement parameters for the trajectory as a whole. The only argument
which lxy.thin.bursts() defines a 'burst' is the thresh argument. If the
median sampling interval of the data (not the desired interval, but the
actual interval), is 120 minutes, and thresh = 0.05 (i.e., 6 minutes),
then any pair of points sampled within 6 minutes of each other or less
are considered to be part of a burst, and will be thinned down to a
single point.
Note if your ultimate goal is to resample the data to achieve a uniform
sampling interval, thinning out bursts may not be necessary, the
lxy.thin.byfreq() will do the same. The lxy.thin.byfreq() function
basically lays down a time line of desired sampling times, based on the
desired sampling interval, and then grabs the closest point in time to
each one. It's a little more complicated than that but that's the gist
of it.
I should also note for reference that the way the term 'burst' in used
T-LoCoH is quite different than how the term is used in the Move package
and other movement analysis packages.
> Again, Andy will have to explain how this works.
>
> 5.And related to the question above: Should I check and thin burst
> again after resampling to a new time interval (with the new range
> of tau values?)?
>
>
After you resample to a uniform sampling interval, you shouldn't any
sampling intervals substantially smaller than that (i.e., a burst, see
above)
> 6.Generally, it is a bit hard for me to choose parameters based on
> visual interpretation (s, a, delta/tau etc. ). So far I came to
> the conclusion that this is the best I can do. However, I was
> wondering if there are any general arguments to support the
> choices one makes based on visual interpretation. Do you have an
> opinion on this? How could you argue (I'm thinking about future
> referees...)?
>
Are you still speaking in terms of resampling? The graphs of sampling
intervals, for example, are of course a guide. If you have a basis or
theory for quantifying more rigorously the rules or criteria you want
for resampling that would certainly be another way to do it. You can
extract pretty much any statistical summaries you want from movement data.
> There are arguments that one can use to justify one choice over
> another. These are based on entropy concepts, but we have yet to
> discuss or implement these methods. So I cannot be more specific at
> this time.
>
> I think that's it for the moment. I would really appreciate any
> help or comments!
>
>
> Good luck and all the best
>
> wayne
>
>
> All the best,
>
> Anna
>
> P.S.: I'm not sure if this helps, but I think I came across some
> typos in the R help file. Just in case somebody is collecting them:
>
> xyt.lxy: To disable the checking for duplicate time stamps, pass
> dup.dt.check=TRUE.
>
> lxy.thin.bursts {tlocoh}: To identify whether there are bursts in
> a LoCoH-xy dataset, and the sampling frequency of those bursts
> (i.e., the value ... TBC
>
> THANKS FOR FINDING THE TYPOS. I'M SURE THERE ARE MORE, PLEASE
> KEEP SENDING THEM TO ME.
>
> *************************************************
>
> PPlease consider the environment before printing this email.
>
>
> _______________________________________________
> Tlocoh-info mailing list
> Tlocoh-info at lists.r-forge.r-project.org
> <mailto:Tlocoh-info at lists.r-forge.r-project.org>
> http://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/tlocoh-info
>
>
>
>
> --
> __________________________________________________________
> ___________________________________________________________
>
> Professor Wayne M. Getz
> A. Starker Leopold Professor of Wildlife Ecology
> Department Environmental Science Policy & Management
> 130 Mulford Hall
> University of California at Berkeley
> CA 94720-3112, USA
>
> Campus Visitors: My office is in 5052 VLSB
>
> Fax: ( (1-510) 666-2352
> Office: (1-510) 642-8745
> Lab: (1-510) 643-1227
> email: wgetz at berkeley.edu <mailto:wgetz at berkeley.edu>
> lab: http://www.CNR.Berkeley.EDU/~getz/
> <http://www.CNR.Berkeley.EDU/%7Egetz/>
> ___________________________________________________________
> ___________________________________________________________
>
>
>
> _______________________________________________
> Tlocoh-info mailing list
> Tlocoh-info at lists.r-forge.r-project.org
> http://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/tlocoh-info
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/tlocoh-info/attachments/20140513/a918e97b/attachment-0001.html>
More information about the Tlocoh-info
mailing list