[adegenet-forum] Difference in assignment between versions of adegenet (DAPC)

Thibaut Jombart thibautjombart at gmail.com
Mon Jan 2 16:59:59 CET 2017


Dear Ole and Team,

sorry, this message was stuck in the moderation limbos for a while. Best
subscribe to skip this.

There's been a bunch of changes since the older version, but the main one I
can think of in this case is that a bug has been fixed in predict.dapc when
supplementary individuals are used. In some versions prior to 2.0, data
were not scaled before being projected onto the discriminant functions,
resulting in some erroneous predictions. In practice, it seems
supplementary individuals tended to be too drastically assigned to a given
group, though not too wrongly (in the microbov example predictions still
made sense). This said, I'm not aware of any systematic study of the issue.

At any rate, trust the current version.

Best
Thibaut


--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College
London
Head of RECON: repidemicsconsortium.org
sites.google.com/site/thibautjombart/
github.com/thibautjombart
Twitter: @TeebzR <http://twitter.com/TeebzR>
+44(0)20 7594 3658

On 14 December 2016 at 14:05, Ole Henriksen <ohen at aqua.dtu.dk> wrote:

> Hi Thibaut and Co
>
>
>
> We're a team who have used *adegenet*'s (version *1-4.1 *and *1-4.2 *) DAPC
> assignment method for some earlier studies. We are now encountering
> problems using the assignment method. The problem is that the new version *adegenet
> 2.0.1 *assigns "old individuals", which we have used in earlier studies, differently
> compared to assignments with earlier versions of the package.
>
>
>
> We use SNP data, and our gen-files look as shown below. Alleles are coded
> by three digits. Se example below
>
> ______________________________________________
>
> GenePop file, with 5 samples & 96 loci
>
> cgpGmo-S1017
>
> cgpGmo-S1018a
>
> cgpGmo-S1026
>
> cgpGmo-S1070
>
> cgpGmo-S1095
>
> cgpGmo-S1103
>
> POP
>
> DAB08_01 , 001001 002002 001002 002001 001001 001001
>
> DAB08_02 , 001001 002002 001001 002002 001001 002002
>
> DAB08_03 , 001001 002002 001001 002002 001001 002001
>
> POP
>
> INC02_01 , 001001 002002 001002 002002 001001 002001
>
> INC02_02 , 001001 002002 002002 002002 001001 002002
>
> INC02_03 , 001001 002002 001002 002002 001001 002001
>
> __________________________________________
>
>
>
> We have two issues
>
>
>
> 1) Last year we assigned individuals using version *adegenet 1-4.1.*We
> suspected that is must be something with how the file are read, and we
> wanted to check and compare with older versions (*1-4.1 *and *1-4.2*).
> We've tried to use older versions with *install_version()* to make the
> comparison between versions (*1-4.1*, *1-4.2 *and *2.0.1*), but we keep
> getting following error message when using older versions.
>
> ___________________________________________
>
> * Converting data from a Genepop .gen file to a genind object... *
>
>
>
>
>
> *File description:  GenePop file, with 5 samples & 96 loci *
>
> *Error in while (keepCheck) { : missing value where TRUE/FALSE needed*
>
> *____________________________________________________________*
>
>
>
> We do not understand why we get this error message, when we use the exact
> same files as we have always used. Any idea?
>
>
>
> 2) When we use the newest version, we get a different assignment result compared
> to assignments with earlier versions of the package.
>
> I have my previous assignment results for assigned individuals (*1-4.1 *
> and *1-4.2*). I reassigned these individuals with the new package (*2.0.1*).
> Thereafter, I've compared the assignment between package versions and they
> are different, even though we retain the same number of PC's, use same
> reference file and use the same script with some minor corrections for
> reading files to accommodate the new version. Any idea why this is the
> case? Any changes to how each locus and allele are read from version to
> version?
>
>
>
> I have noticed that there is a difference between assignment when using
> adegenet (*2.0.1*)  depending on the individuals I include in a gen-file
> for assignment. When I assign all my individuals from all years in one
> file, it will give a different assignment result than when I assign single
> files where they are divided up into years.
>
> Can it be the positioning of alleles at each locus which have changed? We
> are not sure what is going wrong, but we suspect that it is something with
> the reading of our files.
>
>
>
> Below is some R-history, which hopefully. might be helpful
>
> R-script:
>
> ______________________________________________
>
> #Reading files
>
> Ref <- read.genepop("Ref.gen", ncode = 3)
>
> Assign <- read.genepop("TBA_All.gen", ncode = 3)
>
> #DAPC
>
> DAPC_Ref<-dapc(Ref, pop(Ref), n.pca=100, n.da=3)
>
> #Assignment
>
> Predict=predict.dapc(DAPC_Ref, newdata=Assign)
>
> Predict$assign
>
>
>
> Genind objects after *read.genepop()*:
>
> ___________________________________
>
> >Reference
>
> /// GENIND OBJECT /////////
>
>
>
>  // 487 individuals; 96 loci; 192 alleles; size: 451.5 Kb
>
>
>
>  // Basic content
>
>    @tab:  487 x 192 matrix of allele counts
>
>    @loc.n.all: number of alleles per locus (range: 2-2)
>
>    @loc.fac: locus factor for the 192 columns of @tab
>
>    @all.names: list of allele names for each locus
>
>    @ploidy: ploidy of each individual  (range: 2-2)
>
>    @type:  codom
>
>    @call: read.genepop(file = "Ref.gen", ncode = 3)
>
>
>
>  // Optional content
>
>    @pop: population of each individual (group size range: 62-215)
>
>
>
> >AssignAll #All individuals for all years
>
> /// GENIND OBJECT /////////
>
>
>
>  // 1,357 individuals; 96 loci; 192 alleles; size: 1.1 Mb
>
>
>
>  // Basic content
>
>    @tab:  1357 x 192 matrix of allele counts
>
>    @loc.n.all: number of alleles per locus (range: 2-2)
>
>    @loc.fac: locus factor for the 192 columns of @tab
>
>    @all.names: list of allele names for each locus
>
>    @ploidy: ploidy of each individual  (range: 2-2)
>
>    @type:  codom
>
>    @call: read.genepop(file = "TBA_All.gen", ncode = 3)
>
>
>
>  // Optional content
>
>    @pop: population of each individual (group size range: 1357-1357)
>
>
>
> > Assign2015 #individuals for year 2015 only
>
> /// GENIND OBJECT /////////
>
>
>
>  // 469 individuals; 96 loci; 192 alleles; size: 434.2 Kb
>
>
>
>  // Basic content
>
>    @tab:  469 x 192 matrix of allele counts
>
>    @loc.n.all: number of alleles per locus (range: 2-2)
>
>    @loc.fac: locus factor for the 192 columns of @tab
>
>    @all.names: list of allele names for each locus
>
>    @ploidy: ploidy of each individual  (range: 2-2)
>
>    @type:  codom
>
>    @call: read.genepop(file = "TBA_Fisk2015.gen", ncode = 3)
>
>
>
>  // Optional content
>
>    @pop: population of each individual (group size range: 469-469)
>
>
>
> Assignment result showing different assignment depending on which
> individuals one include in a input-file (gen-file) for assignment is after
> *predict.dapc()*:
>
> _______________________________________________________
>
> > Predict$assign #All individuals for all years
>
>    [1] TAS10_30 TAS10_30 TAS10_30 TAS10_30 UMM45_39 UMM45_39
>
>    [7] UMM45_39 UMM45_39 TAS10_30 TAS10_30 UMM45_39 UMM45_39
>
>   [13] ISC02_39 UMM45_39 ISC02_39 ISC02_39 ISC02_39 ISC02_39
>
>   [19] UMM45_39 QOR08_30 ISC02_39 UMM45_39 TAS10_30 UMM45_39
>
>   [25] QOR08_30 QOR08_30 UMM45_39 QOR08_30 QOR08_30 UMM45_39
>
>   [31] UMM45_39 UMM45_39 QOR08_30 UMM45_39 UMM45_39 ISC02_39
>
>   [37] ISC02_39 UMM45_39 UMM45_39 QOR08_30 UMM45_39 QOR08_30
>
>   [43] UMM45_39 UMM45_39 UMM45_39 UMM45_39 QOR08_30 UMM45_39
>
>                              etc.
>
>
>
> > Predict$assign #individuals for year 2015 only
>
>   [1] TAS10_30 TAS10_30 TAS10_30 TAS10_30 TAS10_30 TAS10_30
>
>   [7] TAS10_30 TAS10_30 TAS10_30 TAS10_30 UMM45_39 UMM45_39
>
>  [13] UMM45_39 UMM45_39 UMM45_39 UMM45_39 UMM45_39 UMM45_39
>
>  [19] UMM45_39 UMM45_39 ISC02_39 ISC02_39 ISC02_39 ISC02_39
>
>  [25] ISC02_39 ISC02_39 TAS10_30 ISC02_39 ISC02_39 ISC02_39
>
>  [31] ISC02_39 ISC02_39 ISC02_39 ISC02_39 ISC02_39 TAS10_30
>
>  [37] ISC02_39 ISC02_39 ISC02_39 ISC02_39 ISC02_39 TAS10_30
>
>  [43] ISC02_39 ISC02_39 ISC02_39 ISC02_39 ISC02_39 ISC02_39
>
>                              etc.
>
>
>
> Thank you
>
> Sincerely
>
> Ole and team
>
>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/
> listinfo/adegenet-forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20170102/4e384aee/attachment-0001.html>


More information about the adegenet-forum mailing list