From karl.fetter at gmail.com  Sat Oct 12 23:16:21 2013
From: karl.fetter at gmail.com (Karl Fetter)
Date: Sat, 12 Oct 2013 17:16:21 -0400
Subject: [adegenet-forum] Selecting K in DAPC
Message-ID: <CACZQHzw9pF9vFmaj7LSiWL11Em5DFqdzh9PTviQs+buK-teiYg@mail.gmail.com>

Hello,

I'm a new user to DAPC and adegenet in general. I just went through the
DAPC vignette using my own data instead of the data provided. Unless I
missed something, it appears to me that DAPC doesn't actually select the
most likely value of K. It looks like the selection of this value is left
up to the user, and despite optimizing the number of pca's to use with
alpha-score optimization, the entire process depends on the value of K you
select when you are using find.cluster.

Am I missing something?

On a related note, I'm using several different methods to select K:
structure, structurama, Fst clusters, & my hypothesis regarding the number
of K. For DAPC, I chose K=9 because that's where the "elbow" in the BIC vs
K plot. All my other clustering methods suggest k=4, or k=5.  When I use
DAPC with K = 9, and I make a scatter plot, it appears there are 3 clusters
that are widely and obviously separated from each other. Inferring K from
this plot makes more sense to me than continuing with the analyses outlined
in the DAPC vignette. Would it be appropriate to infer k from this plot,
and then make a dapc w/ K=3 that is subsequently visualized with compoplot?

I don't think I fully understand the rational of DAPC. Is it a method for
selecting K, when you do not have, or do not prefer to use any a priori
information about your groups? Or only if you are willing to use a priori
information?

Thanks for your ideas and help,

Karl Fetter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20131012/225307c0/attachment.html>