Jombart, Thibaut t.jombart at imperial.ac.uk
Mon Feb 7 15:15:23 CET 2011

```Dear Rudy,

there is currently no implementation for getting directly a p-value for each Fst. This can be done fairly easily, albeit it will take a bit of time to run. The idea is simple:
1) compute original Fst matrix
2) permute groups randomly
3) recompute Fst - this is one matrix under Ho: individuals are randomly distributed between the groups (panmixie)
4) redo 2-3) xxx times to get a full reference distribution.
5) compare observed values to the ref distribution to get a p-value

Here is an example using a small dataset:
####
data(nancycats)
x <- nancycats[sample(1:nrow(nancycats at tab), 50)] # keep 50 individuals
mat.obs <- pairwise.fst(x, res.type="matrix") # this takes about 17 seconds on my computer

NBPERM <- 10 # this is the number of permutations used for the p-values; for a publication at least 999 would be required.
mat.perm <- lapply(1:NBPERM, function(i) pairwise.fst(x, pop=sample(pop(x)), res.type="matrix"))
####

mat.obs contains original Fst values, mat.perm is a list with NPERM matrices of permuted Fst values. To get e.g. right-tail p-values, we can just count the proportion of mat.obs >= mat.perm; e.g. for the first pair of populations:
####
> mean(c(mat.obs[1,2] < na.omit(sapply(1:NBPERM, function(i) mat.perm[[i]][1,2])), TRUE))
 0.2
####
In the above command, "na.omit(sapply(1:NBPERM, function(i) mat.perm[[i]][1,2]))" is a vector of permuted values for this pair of populations across all replicates; c(..., TRUE) is added because the observed value is always added to the permuted values (it is one of the possible permutations of groups).

In practice, it is easier to convert the results as objects of the class randtest (class for Monte Carlo test in ade4):
####
test12 <- as.randtest(na.omit(sapply(1:NBPERM, function(i) mat.perm[[i]][1,2])), mat.obs[1,2], alter="greater")
> test12
Monte-Carlo test
Call: as.randtest(sim = na.omit(sapply(1:NBPERM, function(i) mat.perm[[i]][1,
2])), obs = mat.obs[1, 2], alter = "greater")

Observation: 0.1548673

Based on 9 replicates
Simulated p-value: 0.2
Alternative hypothesis: greater

Std.Obs Expectation    Variance
1.91197702  0.08374942  0.00138354
####

To have it done for all pairs of populations:
####
allTests <- list()
for(i in 1:(nrow(mat.obs)-1)){
for(j in 2:nrow(mat.obs)){
allTests[[paste(rownames(mat.obs)[i],rownames(mat.obs)[j],sep="-")]] <- as.randtest(na.omit(sapply(1:NBPERM, function(k) mat.perm[[k]][i,j])), mat.obs[i,j], alter="greater")
}
}
####

allTests is a list with one test of the class "randtest" for each pair of populations. The printing of the object gives the corresponding p-value; note that you can even plot these objects.

As a last note, getting the permuted matrices takes some time: NPERM x 17 seconds in my case would make it 4.7 hours for 1,000 permutations. This could be speeded up drastically by recoding everything in C, but I have no time for this at the moment. To reduce the time taken by the whole stuff, replace lapply by mclapply from the multicore package if you have a multiple core machine. On mine, it decreases the computational time fairly efficiently (from >170 seconds to 52seconds).

Best regards,

Thibaut

________________________________________
From: adegenet-forum-bounces at r-forge.wu-wien.ac.at [adegenet-forum-bounces at r-forge.wu-wien.ac.at] On Behalf Of Jonker, Rudy [Rudy.Jonker at wur.nl]
Sent: 07 February 2011 13:33

Hello Thibaut,

When using the pairwise.fst method I obtain Fst values between all populations, such as:

P1          P2          P3          P4          P5
P1 0.000000000 0.006336151 0.007354116 0.006710538 0.014397947
P2 0.006336151 0.000000000 0.009318789 0.015509769 0.010039293
P3 0.007354116 0.009318789 0.000000000 0.012273043 0.005988692
P4 0.006710538 0.015509769 0.012273043 0.000000000 0.014574640
P5 0.014397947 0.010039293 0.005988692 0.014574640 0.000000000

To test if these Fst's are significant, I'd like to get a similar table with p-avlues for each combination of populations. If I do gstat.randtest I get one p-value for the entire set, also when I select all three methods (global, within & between).

If I make subsets of my data two do each Fst analysis between two populations separately (which allows testing with gstat.randtest) to obtain a p-value for the fst between these two populations, results in different Fst values than via the method above and is thus not comparable?

Is there any way to get such a p-value matrix?