[adegenet-forum] DAPC

Jombart, Thibaut t.jombart at imperial.ac.uk
Wed Sep 23 16:11:10 CEST 2015


Hi there,

sorry I did not go into the details, but checking for potential mistakes:

#1
 'na.action' is not an argument in find.clusters

#2

x<-Just.V4[2:13]

this looks like a vector to me, not a matrix/data.frame; I am not quite sure what you expect 'grp' to be then.

As for producing scatterplots with varying factors, see argument 'grp' in ?scatter.dapc

Cheers
Thibaut
==============================
Dr Thibaut Jombart
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - School of Public Health
Norfolk Place, London W2 1PG, UK
Tel. : 0044 (0)20 7594 3658
http://sites.google.com/site/thibautjombart/
http://sites.google.com/site/therepiproject/
http://adegenet.r-forge.r-project.org/
Twitter: @thibautjombart


________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Kirsty Medcalf [kirsty.m.medcalf at gmail.com]
Sent: 23 September 2015 05:45
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] DAPC


Dear Forum

This is my first post, so I would like to thank you for your patience.  The multivariate data that I am using contains two categorical grouping factors (V4 or G8) under the column family (response variable) and 12 accompanying predictor variables. The data is called LDA.scores and is found at the bottom of my Stack Overflow page by following the link below, which shows my attempted step-by-step logic and figures.

http://stackoverflow.com/questions/32704902/discriminant-analysis-of-principal-components-and-how-to-graphically-show-the-di

I have been attempting to graphically show the distance of data points to its multivariate centroid using DAPC analysis and the function `scatter' in the `adegenet' package in R. After splitting the two categorical factors into two separate data frames (coding below), I attempt to produce these scatterplot. I understand this package is used for the analysis of genetic markers, however, I am also under the impression that all types of multivariate data can be analysed using this package. I tried to manipulate the data but to no avail.

Code used to produce figure
*Split the dataframe into just V4 and G8

Just.V4<-LDA.scores[LDA.scores$Family=="V4",]
Just.G8 <-LDA.scores[LDA.scores$Family=="G8",]


#Attempt to produce a scatterplot for the categorical factor V4
library(adegenet)
x<-Just.V4[2:13]

*Find the clusters

grp<-find.clusters(x, max.n.clust=12, na.action="omit")

The next step is the perform the discriminant analysis of principal components

 dapc1<-dapc(x, grp$grp)
 scatter(dapc1)

I have tried many different combinations of code and here are some of the error messages

Error in dapc.data.frame(x, grp1$grp1) : Inconsistent length for grp
Warning in find.clusters.data.frame(as.data.frame(x), ...) :
NAs introduced by coercion
Error in if (n.pca >= N) warning("number of retained PCs of PCA is  greater than N") :
missing value where TRUE/FALSE needed

If anyone has a solution in terms of how to produce two figures for each categorical factor which illustrates the clusters (12 parameters measured) to its multivariate centroid, then thank so much. I have followed lots of tutorials, searched online and read papers, and still do not understand these error and warning messages.

Thank you if anyone can help.

Best wishes,
Kaikash

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150923/f6ec0c21/attachment-0001.html>


More information about the adegenet-forum mailing list