[adegenet-forum] problems with na.replace and values for n.pca & n.da

D. Magdalena Sorger dm.sorger at gmail.com
Thu Oct 15 03:36:21 CEST 2015


Dear all,

I updated my adegenet package, however, I still have issues with confirming
that it is reading my missing values in correctly.

I now use this code to read in my genepop file:
*msts_m2<-read.genepop("BOR-Od_m2_301w.gen") *

I tried this line of code for replacing NA's with means:
*msts_m2 <- scaleGen(msts_m2, NA.method="mean")*

....but this just seems to be appropriate for the regular PCA and my DAPC
code (see below) won't work and give me this error: Error in (function
(classes, fdef, mtable):   unable to find an inherited method for function
‘pop’ for signature ‘"matrix"’

Therefore: What is the proper way to treat missing values (denoted as NA in
my genepop file) for a DAPC - do they need to be replaced with means (which
I assume) and if so what is the proper line of code to do that?

If I don't run the scaleGen line of code and continue with the DAPC code
(see below), I still get the same output I had questions about earlier (see
below).

Best,
Magdalena




On Mon, Oct 12, 2015 at 6:21 AM, Jombart, Thibaut <t.jombart at imperial.ac.uk>
wrote:

> Hi there,
>
> your post suggests you are using an outdated version of adegenet -
> na.replace and some  arguments you are using have been removed from the
> package since version 2.0.0. This means you have been consulting an
> outdated version of the tutorials.. where did you find it? If there is
> outdated doc around I need to get rid of it.
>
> Please update to the devel version (2.0.1) from github:
> https://github.com/thibautjombart/adegenet
>
> Current tutorials should answer your first question. Missing data are
> detected automatically if the right format is used. Let's wait to make sure
> your data were imported fine for the others.
>
> Cheers
> Thibaut
>
> ==============================
> Dr Thibaut Jombart
> MRC Centre for Outbreak Analysis and Modelling
> Department of Infectious Disease Epidemiology
> Imperial College - School of Public Health
> Norfolk Place, London W2 1PG, UK
> Tel. : 0044 (0)20 7594 3658
> http://sites.google.com/site/thibautjombart/
> http://sites.google.com/site/therepiproject/
> http://adegenet.r-forge.r-project.org/
> Twitter: @thibautjombart
>
>
> ------------------------------
> *From:* adegenet-forum-bounces at lists.r-forge.r-project.org [
> adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of D.
> Magdalena Sorger [dm.sorger at gmail.com]
> *Sent:* 09 October 2015 20:00
> *To:* adegenet-forum at lists.r-forge.r-project.org
> *Subject:* [adegenet-forum] problems with na.replace and values for n.pca
> & n.da
>
> Dear all,
>
> I am trying to constuct a DAPC scatterplot with adegenet and have three
> questions that after consulting online resources, tutorials, etc. still
> haven't been answered definitively.
>
>
> My data set consists of (diploid) microsatellite data (8 markers) for 20
> ant colonies of 9-28 workers each (mean=15), 301 workers total. I have
> missing data in 7 spots (i.e. individuals with missing data at one or more
> loci). My first question is about reading in the genepop file:
>
> *1) What is the proper command for reading in my file in regards to
> missing data? *
>
> I had replaced all missing data ("0000") with "NA" in the genepop file and
> used the below code assuming that it would recognize my missing data as NA
> (first line of code) and replace missing values with means (second line):
>
>
> *msts_m2<-read.genepop("BOR-Od_m2_301w.gen",missing="NA")
> na.replace(msts_m2,"mean", quiet=FALSE)*
>
> However, when I run this code, it informs me that it replaced 119 missing
> values. This obviously seems too much as it should have only replaced 7.
> I'm not sure why this isn't working, see output below
>
>
> *OUTPUT:*
> *> na.replace(msts_m2,"mean", quiet=FALSE) *
>
> * Replaced 119 missing values *
>
> *   #####################*
> *   ### Genind object ### *
> *   #####################*
> *- genotypes of individuals - *
>
> *S4 class:  genind*
> *@call: read.genepop(file = "BOR-Od_m2_301w.gen", missing = "NA")*
>
> *@tab:  301 x 93 matrix of genotypes*
>
> *@ind.names: vector of  301 individual names*
> *@loc.names: vector of  8 locus names*
> *@loc.nall: number of alleles per locus*
> *@loc.fac: locus factor for the  93 columns of @tab*
> *@all.names: list of  8 components yielding allele names for each locus*
> *@ploidy:  2*
> *@type:  codom*
>
> *Optionnal contents: *
> *@pop:  factor giving the population of each individual*
> *@pop.names:  factor giving the population of each individual*
>
> *@other: - empty -*
>
>
>
>
>
> My second and third questions relate to the selection of the # of PCs and
> the # of discriminant functions to retain. It seems that each time I make
> slight changes to these numbers, the output changes vastly and so I want to
> make sure I input the proper numbers.
>
> First I use the find.clusters function, I designate 20 for n.pca here
> since I have 20 colonies:
>
> *clusters<-find.clusters(msts_m2,max.n.pca=20)*
>
> When asked for the number of PCs to retain I select 30 which is the level
> at which the points seem to level off. The BIC graph looks nothing like the
> graph in the vignette (it does not level off but below a certain level
> shows a downward zigzag pattern) so I choose 20 given that I have 20
> colonies.
>
> *dapc_m2<-dapc(msts_m2,clusters$grp)*
> [image: Inline image 1][image: Inline image 2]
> Next, when asked for PCs to retain I again choose the level at which the
> points start to level off (30) but when asked for discriminant functions to
> retain, I'm at a loss. I have about 18 relatively constantly decreasing
> bars. I usually choose the point between the first few ones and the rest
> where there is some kind of bigger (sometimes arbitrary) break and use that
> number (in this case: 4):
>
> *2) How to choose the appropriate number for PCs to retain and discr.
> functions to retain?*
>
>
> *[image: Inline image 3][image: Inline image 4] *
>
>
>
> *best.n.pca<-a.score(dapc_m2)*
> *temp<-optim.a.score(dapc_m2)*
>
> After running the optim.a.score function I receive a graph that tells me
> at the top "optimal number of PCs". This is the number I put into my last
> line of code (below) for n.pca, and for n.da I choose the number I used
> when prompted earlier:
>
> *3) Are these the correct numbers to use for this last line of code? *
>
> *dapc_m2<-dapc(msts_m2,n.pca=7,n.da=4)*
>
>
>
> --
> Magdalena Sorger
> ____________
> Department of Applied Ecology
> North Carolina State University
> 127 David Clark Labs, Box 7617
> 100 Eugene Brooks Ave.
> Raleigh, NC 27695, USA
> # 919-513-7464
> dmsorger at ncsu.edu
> *www.theantlife.com* <http://www.theantlife.com>
>



-- 
Magdalena Sorger
____________
Department of Applied Ecology
North Carolina State University
127 David Clark Labs, Box 7617
100 Eugene Brooks Ave.
Raleigh, NC 27695, USA
# 919-513-7464
dmsorger at ncsu.edu
*www.theantlife.com* <http://www.theantlife.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151014/4ac82719/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 15234 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151014/4ac82719/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 19074 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151014/4ac82719/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 19507 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151014/4ac82719/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 18215 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151014/4ac82719/attachment-0007.png>


More information about the adegenet-forum mailing list