[adegenet-forum] Fwd: Significance of allelic contribution to discriminant functions

Andrea Garavito neagef at gmail.com
Mon Oct 6 17:30:14 CEST 2014


Hello Caitlin,
I was taking a look to the adegenet forum and I found this previous answer
about a statistical threshold for marker contributions.

Originally I was planing to retain for each one of my discriminant
functions, around the 0.3% of markers with the highest contributions by
establishing a threshold of  3-sigma. I'm not sure if these data are
distributed normally, but as I have almost 5000 markers I was assuming so.
Then I saw your post about the snpzip analysis and decided to give it a try.
I tested the function with all the methods available, and I think I'll use
the "median" method as with the others I'm getting to many markers retained
(and only one with the "single" method).
I see that the snpzip test make the analysis for the first discriminant
function, but is there a way to make it also for the other discriminant
functions found with DAPC?

Thanks for your answer
Andrea


2014-08-26 12:58 GMT-03:00 Caitlin Collins <caitiecollins at gmail.com>:

> Yeah, it's new!
>
> I might as well note, in case you decide only to try a subset of the
> methods available:
> - Ward's method is most likely to select a very large number of variables
> to get the most complete picture
> - Single linkage hierarchical clustering will probably select the fewest
> - Centroid clustering will probably select a useful middle-ground.
>
> You can always check to see what proportion of the variance is contained
> in the subset of variables retained, or you could even try running a DAPC/
> PCA with just those variables to compare the discriminatory power of the
> entire set with that of the subset selected.
>
> Good luck.
>
> Cheers,
> Caitlin.
>
>
> On Tue, Aug 26, 2014 at 4:31 PM, Charlie Waters <cwaters8 at uw.edu> wrote:
>
>> Thanks Caitlin! I've never come across the snpzip function so I'll give
>> those clustering methods a try.
>>
>> Thanks,
>> Charlie
>>
>>
>> On Tue, Aug 26, 2014 at 3:49 AM, Caitlin Collins <caitiecollins at gmail.com
>> > wrote:
>>
>>> Hi Charlie,
>>>
>>> Good question. Technically, there is no one "correct" statistical
>>> solution to your problem. But, there *are *a number of ways of
>>> approaching the problem with more statistical rigour than simply using an
>>> arbitrary threshold as you have done.
>>>
>>> Have you taken a look at the snpzip function in the adegenet packge? If
>>> not, just type "?snpzip" into R with the adegenet package loaded. With this
>>> function, you can apply one of seven different hierarchical clustering
>>> formulas to the allelic contributions generated by dapc. Essentially, each
>>> hierarchical clustering method uses a unique approach to determine where
>>> the threshold should be drawn. I should note, however, that this
>>> descriptive approach will not have an associated p-value. You may want to
>>> try out a few different methods before deciding which variables you want
>>> to consider "most significant".
>>>
>>> I hope that helps!
>>>
>>> Best,
>>> Caitlin
>>>
>>
>>
>>
>> --
>> Charlie Waters
>> Box 355020
>> School of Aquatic and Fishery Sciences
>> University of Washington
>> Seattle, WA 98105
>>
>>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20141006/9ea51ab2/attachment.html>


More information about the adegenet-forum mailing list