[adegenet-forum] error in gstat.randtest
Thibaut Jombart
thibautjombart at gmail.com
Thu Feb 9 11:53:01 CET 2017
Hi Melanie,
it's quite hard to tell without seeing the data, but yes, my suspicion is
the same as yours, NAs are the culprits. Entirely non-typed loci are
normally removed from genind objects during their construction, but it is
still possible that for some groups in your data, some loci are entirely
missing.
Given how many SNPs you have, you can probably afford to remove loci with
many missing data (just make sure you don't end up throwing too much away).
propTyped(..., by = "loc") may be your friend here. Here's an example using
microbov:
> data(microbov)
> propTyped(microbov, by = "loc")
INRA63 INRA5 ETH225 ILSTS5 HEL5 HEL1 INRA35
ETH152
0.9914773 0.9786932 0.9914773 0.9673295 0.9786932 0.9914773 0.9829545
0.9573864
INRA23 ETH10 HEL9 CSSM66 INRA32 ETH3 BM2113
BM1824
0.9644886 0.9943182 0.9701705 0.9886364 0.9687500 0.9829545 0.9914773
0.9900568
HEL13 INRA37 BM1818 ILSTS6 MM12 CSRM60 ETH185
HAUT24
0.9772727 0.9815341 0.9588068 0.9446023 0.9744318 0.9730114 0.9275568
0.9872159
HAUT27 TGLA227 TGLA126 TGLA122 TGLA53 SPS115
0.9914773 0.9914773 0.9914773 0.9914773 0.9531250 0.9701705
> to_keep <- propTyped(microbov, by = "loc") > .99 # i.e. less 1%
missing data
> to_keep
INRA63 INRA5 ETH225 ILSTS5 HEL5 HEL1 INRA35 ETH152 INRA23
ETH10
TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE
TRUE
HEL9 CSSM66 INRA32 ETH3 BM2113 BM1824 HEL13 INRA37 BM1818
ILSTS6
FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE
FALSE
MM12 CSRM60 ETH185 HAUT24 HAUT27 TGLA227 TGLA126 TGLA122 TGLA53
SPS115
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE
FALSE
> x <- microbov[loc = to_keep]
> nLoc(x)
[1] 10
> nLoc(microbov)
[1] 30
This is just an illustration - this dataset actually has little in terms of
missing data. In your case you probably want to play with the threshold
(99% non-NA is likely an overkill).
Cheers
Thibaut
--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College
London
Head of RECON: repidemicsconsortium.org
sites.google.com/site/thibautjombart/
github.com/thibautjombart
Twitter: @TeebzR <http://twitter.com/TeebzR>
+44(0)20 7594 3658
On 8 February 2017 at 11:14, Melanie Montes <melaniesmontes at gmail.com>
wrote:
> Hello all,
> I recently finished running fstat on my dataset of about 50 000 snps / 56
> individuals, and successfully got f-statistics in return. However, when I
> tried to run gstat.randtest to see if the structure was significant:
>
> fstat.sig <-gstat.randtest(nr2014, nsim=1000)
> ...I got 50+ warnings like this:
>
> 50: In max(y, na.rm = TRUE) : no non-missing arguments to max; returning
> -Inf
>
> and my results file looked like this:
>
> > fstat.sig
>
> Monte-Carlo test
>
> Call: gstat.randtest(x = nr2014, nsim = 1000)
>
> Observation: 0
>
> Based on 1000 replicates
>
> Simulated p-value: 1
>
>
> which leads me to suspect that it did not work. Does this have something
> to do with the missing data in my dataset? Sorry if this is a naive
> question, I am an R novice.
>
> Thanks for your time and the awesome package, I've been using it a lot!
>
> Sincerely,
> Melanie
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/
> listinfo/adegenet-forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20170209/808e9634/attachment.html>
More information about the adegenet-forum
mailing list