[tlocoh-info] parameter selection, resampling, thin bursts

Wed May 14 04:47:52 CEST 2014

Hi Anna,

Thanks for your questions. I have also added some responses below.

Andy

On 5/12/2014 8:16 AM, Wayne Marcus GETZ wrote:
> Hi Anna:
>
> Here is my response
>
>
> On Mon, May 12, 2014 at 6:56 AM, Anna Schweiger 
> <anna.schweiger at nationalpark.ch 
> <mailto:anna.schweiger at nationalpark.ch>> wrote:
>
>     Dear T-LoCoH group, dear Andy
>
>     First of all: I want to thank Andy and his colleagues for the
>     great effort you are putting into T-LoCoH! I have started to use
>     the package some weeks ago and I have to say, that I've hardly
>     ever came across such a well explained tutorial (everything worked
>     right from the start!)! Your publication is also really helpful
>     and easy to follow and so are the R help files. Thank you so much!
>     Good documentations make life a lot easier (and work fun)!!!
>
>     However, I have a couple of questions I could not figure out
>     myself. Maybe someone has some ideas on the following:
>
>     1.The first is a methodological question: I'm comparing the
>     feeding areas of ibex and chamois in the summers of four
>     consecutive years in one valley where they co-occur. For each year
>     I have several (1-7) individuals per species. My assumptions are
>     that individuals of the same species behave more similar then
>     individuals of different species. In a first step, I chose the
>     same values for s and a (I use the "a" method) for all individuals
>     of the same species, across the years; i.e. all ibex have the same
>     s and a value and all chamois another s and a value. However, I
>     could also argue that the individuals of one species behave more
>     similar in the same year than in the other years (maybe because of
>     environmental variability). Therefore, I was wondering if
>     selecting different s and a values for every year makes sense? In
>     the end I'm extracting environmental data based on the polygons
>     defined by what can be called "core feeding areas" (I select them
>     based on duration of stay and number of separate visits). Then I
>     compare the two species in GLMMs. So I'm basically pooling all
>     ibex data (different ind, different years) and compare them to all
>     chamois. I can account for the individual and yearly differences
>     by including animal ID and year as a random effect. Still, I
>     believe the parameters of all individuals from one species should
>     be somewhat comparable. So far I could not quite get my head
>     around this problem: Should I choose only one s and a value per
>     species, or maybe only one for both species, or is it possible to
>     vary s and a per year or even per individual? Do you have any
>     suggestions? For me this is really tricky.
>
>
> You need to use the same s and a values for all species.  However, you 
> can ask the question, how robust is my result to variations in a and 
> s.  Thus you could see if your result holds up for all a and s or 
> breaks down as these change.  If it does break down, then this break 
> down might have some significant implications because the implication 
> might be that differences emerge or disappear, as the case may be, 
> when time is given more or less weighting

I can see your quandary. I agree you need to be consistent across 
individuals or datasets that you categorize in one group for subsequent 
parts of the analysis. One way to be consistent is to use the same value 
of 's' and 'a', another way to be consistent is to use the same process 
for selecting 's' and 'a'. You've stumbled on one of T-LoCoH's 
functional limitations - there isn't a magic formula for finding the 
best 's' or 'a' (this can also be seen as a strength, because it keeps 
the analyst and system knowledge in the picture). An alternative way for 
selecting a value of 's', that you can apply consistently and fairly 
easily across all data sets, is to pick the 's' value that returns the 
same proportion of time selected hulls. This is a reasonable and 
defensible way of selecting s when you consider that the term in the 
time scaled distance equation that 's' controls is the essentially the 
distance the individual could have traveled in a given time period if it 
had been travelling at the maximum observed speed. In other words, two 
data sets that are in many ways similar (same species, same type of 
behavior) could have different values of 's' for the same time-distance 
balance, because maximum observed speed of a dataset is affected by 
sampling as well as behavior. I am currently of the belief that the best 
way to get a consistent balancing of space-time across data sets is to 
pick 's' that corresponds to a consistent proportion of time selected 
hulls (which will probably result in very close 's' values in absolute 
terms, we need to do some more work in this area).

The same principle applies for selecting 'a' across datasets that should 
be analyzed in a similar manner - use the same value or the same 
process. If you define the optimal 'a' as the one that fills spurious 
holes in core areas, and minimizes cross-overs in areas where the animal 
wasn't seen (which are the principles we present in the paper), you'll 
probably find the same value of 'a' will do that pretty well across 
similar datasets (this again however presumes the data sampling is 
consistent, if sampling is greater in one dataset you'll obviously need 
larger values of 'a' for the same size hulls because it is a cumulative 
distance).

Hope this helps. Let us know how it goes because parameter selection is 
one of the most common challenges when trying to create comparable space 
use models for different datasets (which all methods have to contend  with).

>     My other questions are more technical:
>
>     2.I want to manually offset duplicate xy points in xyt.lxy. Is
>     this possible? I want to avoid random offset when constructing
>     hulls, to make the analysis repeatable. Maybe the explanation is
>     somewhere in the help, but I couldn't find it...
>
> Since time is unique, I don't see how you can have overlapping points 
> unless they are true duplicates. Such duplicates must be removed.  So 
> I am not sure I understand your question.

Duplicate locations with identical time stamps are usually a data 
processing error (which is why the function that creates a LoCoH-xy 
object from a series of locations checks for that). Duplicate locations 
with different time stamps are usually not an error, but the result of 
the animal resting, returning to the same spot, or a rounding issue. But 
even when the time stamps are different, duplicate locations can still 
be an issue with T-LoCoH because you need unique locations to make a 
polygon.

The current version of T-LoCoH handles duplicate locations when creating 
hulls. There are two options based on the value of theoffset.dups 
parameter: ignoring them (at the risk of some points not having enough 
unique neighbors to draw a polygon), or randomly offsetting them by a 
fixed amount (the default). (The original LoCoH package had a third 
option, deleting them, which didn't seem like a good idea so it was 
removed). As an aside, we have discussed the possibility of adding 
another option in future versions of T-LoCoH, whereby different rules 
could be used to construct hulls around duplicate locations, for example 
constructing a circle with a fixed radius representing the size of the 
nest, water hole, etc. This would require some apriori knowledge of 
behavior when the animal is observed at the same location multiple 
times. If anyone has thoughts about this please let me know.

The current version of T-LoCoH has an option to add a random offset 
(fixed distance, random direction) to duplicate locations when 
constructing hulls. This is a problem, as Anna points out, for 
reproducibility, because every time you construct hulls (and 
subsequently isopleths), the duplicate locations will be offset somewhat 
differently. An alternative approach is to randomly offset the duplicate 
locations before constructing hulls (i.e., in the Locoh-xy object). This 
should be done in full recognition that you are effectively altering the 
input data (which may be fine for home range construction, but for other 
analyses you would probably want to use the original locations). There 
is not a function in the T-LoCoH package to randomly offset duplicate 
locations in a LoCoH-xy object (I've added this to my to-do list). It 
can be done with the following commands:

    ## These commands illustrate how to apply a random offset to
    ## duplicate locations in LoCoH-xy object. Note that this
    ## can not be undone, and any subsequent analysis or construction
    ## of hulls will be based on the altered data.

    ## Given a LoCoH-xy object called fredo.lxy

    ## Get the coordinates of the Locoh-xy object
    xy  <- coordinates(fredo.lxy$pts)

    ## Identify the duplicate rows
    dup_idx <- duplicated(x)

    ## See how many locations are duplicate
    table(dup_idx)

    ## Define the amount that duplicate locations will be randomly offset
    ## This is in map units.
    offset <- 1

    ## Apply a random offset to the duplicate rows
    theta <- runif(n=sum(dup_idx), min=0, max=2*pi)
    xy[dup_idx,1] <- xy[dup_idx,1] + offset * cos(theta)
    xy[dup_idx,2] <- xy[dup_idx,2] + offset * sin(theta)

    ## See if there are any more duplicate rows. (Should all be false)
    table(duplicated(xy))

    ## Next, we create a new SpatialPointsDataFrame by
    ## i. Grabbing the attribute table of the existing locations
    ## ii. Assigning the new locations (with offsets) as the locations

    pts_df <- fredo.lxy$pts at data
    coordinates(pts_df) <- xy
    fredo.lxy$pts <- pts_df

    ## The nearest neighbor lookup table is no longer valid and will 
need to be
    ## recreated. Likewise with the ptsh (proportion of time selected 
hulls v. s) table.
    ## Set these to null
    fredo.lxy$nn <- NULL
    fredo.lxy$ptsh <- NULL

    ## Lastly, we need to recreate the movement parameters.
    fredo.lxy <- lxy.repair(fredo.lxy)

    ## Should be done. Inspect results
    summary(fredo.lxy)
    plot(fredo.lxy)

>     3.I'm resampling my data by using lxy.thin.byfreq (common sampling
>     interval should be 4h, some individuals have 2h, some 10 min
>     frequencies). Now, I have some cases with time gaps of about 1
>     month. I would still like to include these data. Is it possible to
>     split the data and include the two time periods separately? Can
>     this be done by setting a value for tct in the auto.a method? I
>     don't quite understand how tct works.
>
I don't fully understand this question. lxy.thin.byfreq will selectively 
remove locations to get as close as it can to the desired sampling 
interval. If the desired sampling interval is 4 hours, and there is a 
gap of 30 days, it won't remove the points on either end of the gap. It 
will only remove points where sampling interval is higher than the 
desired interval. If you're seeing a different effect with 
lxy.thin.byfreq let me know. The tct argument in auto.a() function is 
very different, it acts a filter for identifying which points should be 
used in the computation of 'a' (doesn't remove any points).

> Andy will have to explain how this works.
>
>     4.Again about resampling: As recommended in the help I thin bursts
>     before resampling the data to a common time interval. I was
>     wondering if the following is correct: First I inspect the
>     sampling frequency plot with lxy.plot.freq. Then I thought: When
>     tau.diff.max (default) = 0.02 and tau (median)=120 min, sampling
>     frequencies between 117.6 - 122.4 should be fine. If I now see
>     points in the plot with let's say delta t/tau = 0.95, then
>     sampling frequency= 0.95*120= 108 min which is outside the range
>     of tau.diff.max. In that case, should I set the threshold value in
>     lxy.thin.bursts to thresh=0.98, to make sure all remaining points
>     fall within the range 117.6 - 122.4? I think that having a
>     sampling interval of 108 min in a dataset that should have 120 min
>     is not uncommon and normally I would not think it is problematic.
>     But I have only a very vague idea about the effects of such data
>     intervals when the algorithms start working. Is it possible to
>     provide any guidelines on thresholds for thinning bursts?
>

I can see how that can be confusing. The tau.diff.max argument in the 
lxy.thin.bursts() function actually has nothing to do with how points 
within "a burst" are removed (which in this context refers to a series 
of locations spaced closely in time and presumed to be an error or 
artifact of a hyperactive GPS recording device), or how it identifies 
what group of points constitutes a burst. The tau.diff.max argument is 
used downstream, after points have been removed, in computing the 
movement parameters for the trajectory as a whole. The only argument 
which lxy.thin.bursts() defines a 'burst' is the thresh argument. If the 
median sampling interval of the data (not the desired interval, but the 
actual interval), is 120 minutes, and thresh = 0.05 (i.e., 6 minutes), 
then any pair of points sampled within 6 minutes of each other or less 
are considered to be part of a burst, and will be thinned down to a 
single point.

Note if your ultimate goal is to resample the data to achieve a uniform 
sampling interval, thinning out bursts may not be necessary, the 
lxy.thin.byfreq() will do the same. The lxy.thin.byfreq() function 
basically lays down a time line of desired sampling times, based on the 
desired sampling interval, and then grabs the closest point in time to 
each one. It's a little more complicated than that but that's the gist 
of it.

I should also note for reference that the way the term 'burst' in used 
T-LoCoH is quite different than how the term is used in the Move package 
and other movement analysis packages.

> Again, Andy will have to explain how this works.
>
>     5.And related to the question above: Should I check and thin burst
>     again after resampling to a new time interval (with the new range
>     of tau values?)?
>
>
After you resample to a uniform sampling interval, you shouldn't any 
sampling intervals substantially smaller than that (i.e., a burst, see 
above)

>     6.Generally, it is a bit hard for me to choose parameters based on
>     visual interpretation (s, a, delta/tau etc. ). So far I came to
>     the conclusion that this is the best I can do. However, I was
>     wondering if there are any general arguments to support the
>     choices one makes based on visual interpretation. Do you have an
>     opinion on this? How could you argue (I'm thinking about future
>     referees...)?
>

Are you still speaking in terms of resampling? The graphs of sampling 
intervals, for example, are of course a guide. If you have a basis or 
theory for quantifying more rigorously the rules or criteria you want 
for resampling that would certainly be another way to do it. You can 
extract pretty much any statistical summaries you want from movement data.

> There are arguments that one can use to justify one choice over 
> another.  These are based on entropy concepts, but we have yet to 
> discuss or implement these methods.  So I cannot be more specific at 
> this time.
>
>     I think that's it for the moment. I would really appreciate any
>     help or comments!
>
>
> Good luck and all the  best
>
> wayne
>
>
>     All the best,
>
>     Anna
>
>     P.S.: I'm not sure if this helps, but I think I came across some
>     typos in the R help file. Just in case somebody is collecting them:
>
>     xyt.lxy: To disable the checking for duplicate time stamps, pass
>     dup.dt.check=TRUE.
>
>     lxy.thin.bursts {tlocoh}: To identify whether there are bursts in
>     a LoCoH-xy dataset, and the sampling frequency of those bursts
>     (i.e., the value ... TBC
>
>      THANKS FOR FINDING THE TYPOS. I'M SURE THERE ARE MORE, PLEASE
>     KEEP SENDING THEM TO ME.
>
>     *************************************************
>
>     PPlease consider the environment before printing this email.
>
>
>     _______________________________________________
>     Tlocoh-info mailing list
>     Tlocoh-info at lists.r-forge.r-project.org
>     <mailto:Tlocoh-info at lists.r-forge.r-project.org>
>     http://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/tlocoh-info
>
>
>
>
> -- 
> __________________________________________________________
> ___________________________________________________________
>
> Professor Wayne M. Getz
> A. Starker Leopold Professor of Wildlife Ecology
> Department Environmental Science Policy & Management
> 130 Mulford Hall
> University of California at Berkeley
> CA 94720-3112, USA
>
> Campus Visitors: My office is in 5052 VLSB
>
> Fax:    ( (1-510) 666-2352
> Office:    (1-510) 642-8745
> Lab:  (1-510) 643-1227
> email: wgetz at berkeley.edu <mailto:wgetz at berkeley.edu>
> lab: http://www.CNR.Berkeley.EDU/~getz/ 
> <http://www.CNR.Berkeley.EDU/%7Egetz/>
> ___________________________________________________________
> ___________________________________________________________
>
>
>
> _______________________________________________
> Tlocoh-info mailing list
> Tlocoh-info at lists.r-forge.r-project.org
> http://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/tlocoh-info

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/tlocoh-info/attachments/20140513/a918e97b/attachment-0001.html>