[adegenet-forum] problems with na.replace and values for n.pca & n.da

D. Magdalena Sorger dm.sorger at gmail.com
Thu Oct 15 19:45:18 CEST 2015


Dear Thibaut,

I found the information on scaleGen() in a tutorial from July 29, 2015 on
the adegenet website:
http://adegenet.r-forge.r-project.org/files/tutorial-basics.pdf

Other than this function, I didn't find any instructions on how to treat
missing values in that tutorial. I also didn't find instructions for how to
use tab() to replace missing values.

If I just read my raw data in (missing allele values denoted as "NA" in
genepop file), it informs me that there is 0% missing data which is what
concerns me:

*> msts_m2<-read.genepop("BOR-Od_m2_301w.gen") *
*> summary(msts_m2)*

*// Number of individuals: 301*
*// Group sizes: 12 9 18 12 20 11 18 18 12 12 17 18 20 28 12 12 12 11 12 17*
*// Number of alleles per locus: 4 4 13 19 20 10 20 7*
*// Number of alleles per group: 16 17 18 18 26 18 23 21 18 19 22 17 17 31
19 16 21 17 18 16*
*// Percentage of missing data: 0 %*
*// Observed heterozygosity: 0.41 0.55 0.81 0.66 0.79 0.69 0.78 0.74*
*// Expected heterozygosity: 0.39 0.6 0.88 0.87 0.92 0.75 0.92 0.76*
*> *



When I call tab, it lists the missing data at e.g., locus loc21 as loc21.NA:

> msts_m2[loc="loc21"]@tab
    loc21.02 loc21.03 loc21.NA loc21.04 loc21.05 loc21.06 loc21.01
1          1        1        0        0        0        0        0
2          2        0        0        0        0        0        0
3          2        0        0        0        0        0        0
4          1        1        0        0        0        0        0
5          1        1        0        0        0        0        0



I also tried this, but the output remains the same as above:


*tab(msts_m2, missing="mean")*



Best,
Magdalena


On Thu, Oct 15, 2015 at 9:10 AM, Jombart, Thibaut <t.jombart at imperial.ac.uk>
wrote:

> Hi there,
>
> please check the current tutorials. scaleGen is no longer used, and has
> been replaced by tab. Then you should be able to feed the resulting object
> to dapc without problem.
>
> Cheers
> Thibaut
>
>
>
> ------------------------------
> *From:* D. Magdalena Sorger [dm.sorger at gmail.com]
> *Sent:* 15 October 2015 02:36
> *To:* Jombart, Thibaut
> *Cc:* adegenet-forum at lists.r-forge.r-project.org
> *Subject:* Re: [adegenet-forum] problems with na.replace and values for
> n.pca & n.da
>
> Dear all,
>
> I updated my adegenet package, however, I still have issues with
> confirming that it is reading my missing values in correctly.
>
> I now use this code to read in my genepop file:
> *msts_m2<-read.genepop("BOR-Od_m2_301w.gen") *
>
> I tried this line of code for replacing NA's with means:
> *msts_m2 <- scaleGen(msts_m2, NA.method="mean")*
>
> ....but this just seems to be appropriate for the regular PCA and my DAPC
> code (see below) won't work and give me this error: Error in (function
> (classes, fdef, mtable):   unable to find an inherited method for function
> ‘pop’ for signature ‘"matrix"’
>
> Therefore: What is the proper way to treat missing values (denoted as NA
> in my genepop file) for a DAPC - do they need to be replaced with means
> (which I assume) and if so what is the proper line of code to do that?
>
> If I don't run the scaleGen line of code and continue with the DAPC code
> (see below), I still get the same output I had questions about earlier (see
> below).
>
> Best,
> Magdalena
>
>
>
>
> On Mon, Oct 12, 2015 at 6:21 AM, Jombart, Thibaut <
> t.jombart at imperial.ac.uk> wrote:
>
>> Hi there,
>>
>> your post suggests you are using an outdated version of adegenet -
>> na.replace and some  arguments you are using have been removed from the
>> package since version 2.0.0. This means you have been consulting an
>> outdated version of the tutorials.. where did you find it? If there is
>> outdated doc around I need to get rid of it.
>>
>> Please update to the devel version (2.0.1) from github:
>> https://github.com/thibautjombart/adegenet
>>
>> Current tutorials should answer your first question. Missing data are
>> detected automatically if the right format is used. Let's wait to make sure
>> your data were imported fine for the others.
>>
>> Cheers
>> Thibaut
>>
>> ==============================
>> Dr Thibaut Jombart
>> MRC Centre for Outbreak Analysis and Modelling
>> Department of Infectious Disease Epidemiology
>> Imperial College - School of Public Health
>> Norfolk Place, London W2 1PG, UK
>> Tel. : 0044 (0)20 7594 3658
>> http://sites.google.com/site/thibautjombart/
>> http://sites.google.com/site/therepiproject/
>> http://adegenet.r-forge.r-project.org/
>> Twitter: @thibautjombart
>>
>>
>> ------------------------------
>> *From:* adegenet-forum-bounces at lists.r-forge.r-project.org [
>> adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of D.
>> Magdalena Sorger [dm.sorger at gmail.com]
>> *Sent:* 09 October 2015 20:00
>> *To:* adegenet-forum at lists.r-forge.r-project.org
>> *Subject:* [adegenet-forum] problems with na.replace and values for
>> n.pca & n.da
>>
>> Dear all,
>>
>> I am trying to constuct a DAPC scatterplot with adegenet and have three
>> questions that after consulting online resources, tutorials, etc. still
>> haven't been answered definitively.
>>
>>
>> My data set consists of (diploid) microsatellite data (8 markers) for 20
>> ant colonies of 9-28 workers each (mean=15), 301 workers total. I have
>> missing data in 7 spots (i.e. individuals with missing data at one or more
>> loci). My first question is about reading in the genepop file:
>>
>> *1) What is the proper command for reading in my file in regards to
>> missing data? *
>>
>> I had replaced all missing data ("0000") with "NA" in the genepop file
>> and used the below code assuming that it would recognize my missing data as
>> NA (first line of code) and replace missing values with means (second line):
>>
>>
>> *msts_m2<-read.genepop("BOR-Od_m2_301w.gen",missing="NA")
>> na.replace(msts_m2,"mean", quiet=FALSE)*
>>
>> However, when I run this code, it informs me that it replaced 119 missing
>> values. This obviously seems too much as it should have only replaced 7.
>> I'm not sure why this isn't working, see output below
>>
>>
>> *OUTPUT:*
>> *> na.replace(msts_m2,"mean", quiet=FALSE) *
>>
>> * Replaced 119 missing values *
>>
>> *   #####################*
>> *   ### Genind object ### *
>> *   #####################*
>> *- genotypes of individuals - *
>>
>> *S4 class:  genind*
>> *@call: read.genepop(file = "BOR-Od_m2_301w.gen", missing = "NA")*
>>
>> *@tab:  301 x 93 matrix of genotypes*
>>
>> *@ind.names: vector of  301 individual names*
>> *@loc.names: vector of  8 locus names*
>> *@loc.nall: number of alleles per locus*
>> *@loc.fac: locus factor for the  93 columns of @tab*
>> *@all.names: list of  8 components yielding allele names for each locus*
>> *@ploidy:  2*
>> *@type:  codom*
>>
>> *Optionnal contents: *
>> *@pop:  factor giving the population of each individual*
>> *@pop.names:  factor giving the population of each individual*
>>
>> *@other: - empty -*
>>
>>
>>
>>
>>
>> My second and third questions relate to the selection of the # of PCs and
>> the # of discriminant functions to retain. It seems that each time I make
>> slight changes to these numbers, the output changes vastly and so I want to
>> make sure I input the proper numbers.
>>
>> First I use the find.clusters function, I designate 20 for n.pca here
>> since I have 20 colonies:
>>
>> *clusters<-find.clusters(msts_m2,max.n.pca=20)*
>>
>> When asked for the number of PCs to retain I select 30 which is the level
>> at which the points seem to level off. The BIC graph looks nothing like the
>> graph in the vignette (it does not level off but below a certain level
>> shows a downward zigzag pattern) so I choose 20 given that I have 20
>> colonies.
>>
>> *dapc_m2<-dapc(msts_m2,clusters$grp)*
>> [image: Inline image 1][image: Inline image 2]
>> Next, when asked for PCs to retain I again choose the level at which the
>> points start to level off (30) but when asked for discriminant functions to
>> retain, I'm at a loss. I have about 18 relatively constantly decreasing
>> bars. I usually choose the point between the first few ones and the rest
>> where there is some kind of bigger (sometimes arbitrary) break and use that
>> number (in this case: 4):
>>
>> *2) How to choose the appropriate number for PCs to retain and discr.
>> functions to retain?*
>>
>>
>> *[image: Inline image 3][image: Inline image 4] *
>>
>>
>>
>> *best.n.pca<-a.score(dapc_m2)*
>> *temp<-optim.a.score(dapc_m2)*
>>
>> After running the optim.a.score function I receive a graph that tells me
>> at the top "optimal number of PCs". This is the number I put into my last
>> line of code (below) for n.pca, and for n.da I choose the number I used
>> when prompted earlier:
>>
>> *3) Are these the correct numbers to use for this last line of code? *
>>
>> *dapc_m2<-dapc(msts_m2,n.pca=7,n.da=4)*
>>
>>
>>
>> --
>> Magdalena Sorger
>> ____________
>> Department of Applied Ecology
>> North Carolina State University
>> 127 David Clark Labs, Box 7617
>> 100 Eugene Brooks Ave.
>> Raleigh, NC 27695, USA
>> # 919-513-7464
>> dmsorger at ncsu.edu
>> *www.theantlife.com* <http://www.theantlife.com>
>>
>
>
>
> --
> Magdalena Sorger
> ____________
> Department of Applied Ecology
> North Carolina State University
> 127 David Clark Labs, Box 7617
> 100 Eugene Brooks Ave.
> Raleigh, NC 27695, USA
> # 919-513-7464
> dmsorger at ncsu.edu
> *www.theantlife.com* <http://www.theantlife.com>
>



-- 
Magdalena Sorger
____________
Department of Applied Ecology
North Carolina State University
127 David Clark Labs, Box 7617
100 Eugene Brooks Ave.
Raleigh, NC 27695, USA
# 919-513-7464
dmsorger at ncsu.edu
*www.theantlife.com* <http://www.theantlife.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151015/8dbfaf17/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 19507 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151015/8dbfaf17/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 15234 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151015/8dbfaf17/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 18215 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151015/8dbfaf17/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 19074 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151015/8dbfaf17/attachment-0007.png>


More information about the adegenet-forum mailing list