<div dir="ltr">Yeah, it's new! <div><br>I might as well note, in case you decide only to try a subset of the methods available:<div>- Ward's method is most likely to select a very large number of variables to get the most complete picture</div>

<div>- Single linkage hierarchical clustering will probably select the fewest</div><div>- Centroid clustering will probably select a useful middle-ground.</div><div><br></div><div>You can always check to see what proportion of the variance is contained in the subset of variables retained, or you could even try running a DAPC/ PCA with just those variables to compare the discriminatory power of the entire set with that of the subset selected. <br>

<br>Good luck. <br><br>Cheers, <br>Caitlin. </div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Aug 26, 2014 at 4:31 PM, Charlie Waters <span dir="ltr"><<a href="mailto:cwaters8@uw.edu" target="_blank">cwaters8@uw.edu</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Thanks Caitlin! I've never come across the snpzip function so I'll give those clustering methods a try. <div>

<br></div><div>Thanks,</div><div>Charlie</div></div><div class="gmail_extra"><div><div class="h5"><br><br><div class="gmail_quote">

On Tue, Aug 26, 2014 at 3:49 AM, Caitlin Collins <span dir="ltr"><<a href="mailto:caitiecollins@gmail.com" target="_blank">caitiecollins@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div dir="ltr">Hi Charlie, <br><br>Good question. Technically, there is no one "correct" statistical solution to your problem. But, there <i>are </i>a number of ways of approaching the problem with more statistical rigour than simply using an arbitrary threshold as you have done. <br>


<br>Have you taken a look at the snpzip function in the adegenet packge? If not, just type "?snpzip" into R with the adegenet package loaded. With this function, you can apply one of seven different hierarchical clustering formulas to the allelic contributions generated by dapc. Essentially, each hierarchical clustering method uses a unique approach to determine where the threshold should be drawn. I should note, however, that this descriptive approach will not have an associated p-value. You may want to try out a few different methods before deciding which variables you want to consider "most significant". <div>


<br></div><div>I hope that helps! <br><br>Best, <span><font color="#888888"><br>Caitlin</font></span></div></div>

</blockquote></div><br><br clear="all"><div><br></div></div></div><span class="HOEnZb"><font color="#888888">-- <br><div dir="ltr">Charlie Waters<div>Box 355020<br><div>School of Aquatic and Fishery Sciences</div><div>University of Washington</div>

<div>Seattle, WA 98105</div>

<div><br></div></div></div>

</font></span></div>

</blockquote></div><br></div>