[adegenet-forum] Fwd: Significance of allelic contribution to discriminant functions

Thu Oct 16 13:38:42 CEST 2014

Hello again Andrea,

Glad you found what you were looking for!

Incidentally, and in case anyone else on the forum is looking to visualise
the variable contributions to discriminant axes > 1, here is some code to
do so for a toy example. (The last chunk will be the relevant bit for
creating loading plots):

# make a simulated dataset with 5 "groups"
simpop <- glSim(200, 1000, 40, k=5, sort.pop=TRUE)
snps <- as.matrix(simpop)
phen <- simpop at other$ancestral.pops

# for fun/ as a check, quickly visualise the clusters
dapc1 <- dapc(snps, phen, n.pca=50, n.da=4)
scatter(dapc1)

# create an object called foo that contains the results of running snpzip
on your dataset
foo <- snpzip(snps, phen, xval.plot=TRUE, loading.plot=TRUE,
method="centroid")

# isolate the DAPC component of the snpzip results, calling it "dapc1"
dapc1 <- foo$DAPC
# specify that you want to run the following lines for all DA (ie. from
DA=1 to DA=(k-1), where K is the number of groups in your dataset)
DA <- c(1:dapc1$n.da)
par(ask=TRUE)
# generate separate loading plots for each DA
for(i in DA){
  title <- paste("Loading Plot for DA", i, sep=" ")
  maximus <- foo$FS[[i]][[2]]
  cutoff <-
abs(dapc1$var.contr[maximus,i][(which.min(dapc1$var.contr[maximus,i]))])-0.000001
  loadingplot(dapc1$var.contr[, i], threshold=cutoff, main=title)
}

Hope that helps!
And thanks for your input: I'll try and implement the above code within
snpzip to generate loadinplots for all DA automatically in the next release
of adegenet.

Cheers,
Caitlin.

On Mon, Oct 6, 2014 at 6:09 PM, Andrea Garavito <neagef at gmail.com> wrote:

> Hello again!
>
> I took a closer look into the object created by the snpzip tool, and I
> found the contributions for all the different axes.
> I didn't noticed them before as I was looking only at the plot obtained.
>
> Thanks anyway!
> Andrea
>
>
> 2014-10-06 12:30 GMT-03:00 Andrea Garavito <neagef at gmail.com>:
>
> Hello Caitlin,
>> I was taking a look to the adegenet forum and I found this previous
>> answer about a statistical threshold for marker contributions.
>>
>> Originally I was planing to retain for each one of my discriminant
>> functions, around the 0.3% of markers with the highest contributions by
>> establishing a threshold of  3-sigma. I'm not sure if these data are
>> distributed normally, but as I have almost 5000 markers I was assuming so.
>> Then I saw your post about the snpzip analysis and decided to give it a try.
>> I tested the function with all the methods available, and I think I'll
>> use the "median" method as with the others I'm getting to many markers
>> retained (and only one with the "single" method).
>> I see that the snpzip test make the analysis for the first discriminant
>> function, but is there a way to make it also for the other discriminant
>> functions found with DAPC?
>>
>> Thanks for your answer
>> Andrea
>>
>>
>> 2014-08-26 12:58 GMT-03:00 Caitlin Collins <caitiecollins at gmail.com>:
>>
>>> Yeah, it's new!
>>>
>>> I might as well note, in case you decide only to try a subset of the
>>> methods available:
>>> - Ward's method is most likely to select a very large number of
>>> variables to get the most complete picture
>>> - Single linkage hierarchical clustering will probably select the fewest
>>> - Centroid clustering will probably select a useful middle-ground.
>>>
>>> You can always check to see what proportion of the variance is contained
>>> in the subset of variables retained, or you could even try running a DAPC/
>>> PCA with just those variables to compare the discriminatory power of the
>>> entire set with that of the subset selected.
>>>
>>> Good luck.
>>>
>>> Cheers,
>>> Caitlin.
>>>
>>>
>>> On Tue, Aug 26, 2014 at 4:31 PM, Charlie Waters <cwaters8 at uw.edu> wrote:
>>>
>>>> Thanks Caitlin! I've never come across the snpzip function so I'll give
>>>> those clustering methods a try.
>>>>
>>>> Thanks,
>>>> Charlie
>>>>
>>>>
>>>> On Tue, Aug 26, 2014 at 3:49 AM, Caitlin Collins <
>>>> caitiecollins at gmail.com> wrote:
>>>>
>>>>> Hi Charlie,
>>>>>
>>>>> Good question. Technically, there is no one "correct" statistical
>>>>> solution to your problem. But, there *are *a number of ways of
>>>>> approaching the problem with more statistical rigour than simply using an
>>>>> arbitrary threshold as you have done.
>>>>>
>>>>> Have you taken a look at the snpzip function in the adegenet packge?
>>>>> If not, just type "?snpzip" into R with the adegenet package loaded. With
>>>>> this function, you can apply one of seven different hierarchical clustering
>>>>> formulas to the allelic contributions generated by dapc. Essentially, each
>>>>> hierarchical clustering method uses a unique approach to determine where
>>>>> the threshold should be drawn. I should note, however, that this
>>>>> descriptive approach will not have an associated p-value. You may want to
>>>>> try out a few different methods before deciding which variables you want
>>>>> to consider "most significant".
>>>>>
>>>>> I hope that helps!
>>>>>
>>>>> Best,
>>>>> Caitlin
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Charlie Waters
>>>> Box 355020
>>>> School of Aquatic and Fishery Sciences
>>>> University of Washington
>>>> Seattle, WA 98105
>>>>
>>>>
>>>
>>> _______________________________________________
>>> adegenet-forum mailing list
>>> adegenet-forum at lists.r-forge.r-project.org
>>>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20141016/e1c7fc52/attachment.html>