[adegenet-forum] error in gstat.randtest

Thibaut Jombart thibautjombart at gmail.com
Thu Feb 9 11:53:01 CET 2017


Hi Melanie,

it's quite hard to tell without seeing the data, but yes, my suspicion is
the same as yours, NAs are the culprits. Entirely non-typed loci are
normally removed from genind objects during their construction, but it is
still possible that for some groups in your data, some loci are entirely
missing.

Given how many SNPs you have, you can probably afford to remove loci with
many missing data (just make sure you don't end up throwing too much away).
propTyped(..., by = "loc") may be your friend here. Here's an example using
microbov:


> data(microbov)

> propTyped(microbov, by = "loc")
   INRA63     INRA5    ETH225    ILSTS5      HEL5      HEL1    INRA35
 ETH152
0.9914773 0.9786932 0.9914773 0.9673295 0.9786932 0.9914773 0.9829545
0.9573864
   INRA23     ETH10      HEL9    CSSM66    INRA32      ETH3    BM2113
 BM1824
0.9644886 0.9943182 0.9701705 0.9886364 0.9687500 0.9829545 0.9914773
0.9900568
    HEL13    INRA37    BM1818    ILSTS6      MM12    CSRM60    ETH185
 HAUT24
0.9772727 0.9815341 0.9588068 0.9446023 0.9744318 0.9730114 0.9275568
0.9872159
   HAUT27   TGLA227   TGLA126   TGLA122    TGLA53    SPS115
0.9914773 0.9914773 0.9914773 0.9914773 0.9531250 0.9701705

> to_keep <-   propTyped(microbov, by = "loc") > .99  # i.e. less 1%
missing data

> to_keep
 INRA63   INRA5  ETH225  ILSTS5    HEL5    HEL1  INRA35  ETH152  INRA23
ETH10
   TRUE   FALSE    TRUE   FALSE   FALSE    TRUE   FALSE   FALSE   FALSE
 TRUE
   HEL9  CSSM66  INRA32    ETH3  BM2113  BM1824   HEL13  INRA37  BM1818
 ILSTS6
  FALSE   FALSE   FALSE   FALSE    TRUE    TRUE   FALSE   FALSE   FALSE
FALSE
   MM12  CSRM60  ETH185  HAUT24  HAUT27 TGLA227 TGLA126 TGLA122  TGLA53
 SPS115
  FALSE   FALSE   FALSE   FALSE    TRUE    TRUE    TRUE    TRUE   FALSE
FALSE

> x <- microbov[loc = to_keep]

> nLoc(x)
[1] 10

> nLoc(microbov)
[1] 30


This is just an illustration - this dataset actually has little in terms of
missing data. In your case you probably want to play with the threshold
(99% non-NA is likely an overkill).

Cheers
Thibaut



--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College
London
Head of RECON: repidemicsconsortium.org
sites.google.com/site/thibautjombart/
github.com/thibautjombart
Twitter: @TeebzR <http://twitter.com/TeebzR>
+44(0)20 7594 3658

On 8 February 2017 at 11:14, Melanie Montes <melaniesmontes at gmail.com>
wrote:

> Hello all,
> I recently finished running fstat on my dataset of about 50 000 snps / 56
> individuals, and successfully got f-statistics in return. However, when I
> tried to run gstat.randtest to see if the structure was significant:
>
> fstat.sig <-gstat.randtest(nr2014, nsim=1000)
> ...I got 50+ warnings like this:
>
> 50: In max(y, na.rm = TRUE) : no non-missing arguments to max; returning
> -Inf
>
> and my results file looked like this:
>
> > fstat.sig
>
> Monte-Carlo test
>
> Call: gstat.randtest(x = nr2014, nsim = 1000)
>
> Observation: 0
>
> Based on 1000 replicates
>
> Simulated p-value: 1
>
>
> which leads me to suspect that it did not work. Does this have something
> to do with the missing data in my dataset? Sorry if this is a naive
> question, I am an R novice.
>
> Thanks for your time and the awesome package, I've been using it a lot!
>
> Sincerely,
> Melanie
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/
> listinfo/adegenet-forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20170209/808e9634/attachment.html>


More information about the adegenet-forum mailing list