[adegenet-forum] Help...

Jombart, Thibaut t.jombart at imperial.ac.uk
Tue Mar 3 17:40:47 CET 2009


Dear Adrien,

this is an important question for sPCA, although not restrained to it. Input from other spatial experts are welcome (some are hiding inside this list)!

The question is quite general: how can we model the connectivity between a set of locations, some of which are identical? I don't think a single answer can be brought here, and the best solution likely is contingent on the data.

Adrien's data represent one tricky and interesting case.

> xy=read.table("coord.txt",head=T)
> foo1=sunflowerplot(xy) # this figure explains the problem

The roughest approach is to remove duplicate locations, but I would barely call it a solution. In this case it is unthinkable.
Adding some random noise to coordinates can be another option:
> xy=as.matrix(xy)
> xyrand=jitter(xy, fac=2)
> sunflowerplot(xyrand)

Now, all locations have been disentangled, and yet the sampling designed is preserved. Here is Delaunay's triangulation:
> cn1=chooseCN(xyrand,type=1)

However, a problem arises: Delaunay is sensitive to small changes in locations. So, another random noise added, another graph obtained:
> xyrand=jitter(xy, fac=2)
> sunflowerplot(xyrand)
> cn2=chooseCN(xyrand,type=1)
> identical(cn1,cn2) # this is usually FALSE

How many differences are there?
> temp1=nb2mat(cn1, style="B")
> temp2=nb2mat(cn2, style="B")

> temp= temp1 == temp2
> sum(temp) / length(temp) * 100 # I got around 99 % of similar entries
> cor(as.vector(temp1),as.vector(temp2)) # correlation of only (around) 0.5
> cor.test(as.vector(temp1),as.vector(temp2))

Adjacency matrices are only weakly correlated, but this is due to the fact they are very sparse.
One reassuring fact is that structures of autocorrelation on these maps are fairly similar:
> library(ade4)
> s.image(xy,orthobasis.mat(temp1)[,3], kgri=10)
> s.image(xy,orthobasis.mat(temp2)[,3], kgri=10)

So adding a small amount of random noise could do the trick, though it is not very elegant.

Another alternative is using graphs that do not require location to be unique, like for instance connectivity based on distance:
> cn3=chooseCN(xy,type=5, d1=0,d2=.08)

But then other problems arise, for and foremost the definition of the range at which two locations are neighbours.

In the present case, we might think about pooling data by locations, and working with alleles frequencies observed at each unique location. But this has a strong empirical implication: it means that genetic similarities among genotypes inside a given locations are no longer investigated, and no longer part of what we call 'spatial genetic structure'. These become 'genetic similarities between locations', which is different. It also implies that 10 genotypes taken from the same location will have the same 'weight' in the analysis as one genotype taken at a unique location. So this does not only depend on the data, but also on the question being asked.

To pool data by location, use the devel version of adegenet.
Define the pop factor as follows:
> myPop = paste(xy[,1],xy[,2])
> myPop=factor(myPop, levels=unique(myPop))
> levels(myPop)= 1:length(levels(myPop))
> plot(xy, type="n")
> text(xy,lab=myPop)


Put the xy coordinates in the other slot:
> myObj at other$xy = xy
and then use 'genind2genpop', specifying "process.other = TRUE, other.action = mean". This will generate a genpop object with the appropriate xy coordinates in the @other slot.

And use the sPCA on this genpop object, using Delaunay triangulation as you wished.


To summarise:
- adding noise works, but this is the quick & dirty approach; might be an issue for testing procedures relying on the connection network, as not only p-values, but test statistics would change.

- using another graph than Delaunay is always an option, but may not always be statisfying.

- working at a 'population' level may be the best option in some cases, when patterns of interest exclude genetic similarities between genotypes taken exactly at the same place, and when the question is more about patterns among groups of genotypes.

Best regards,

Thibaut.

> Hi,
> I'd like to realise a sPCA analysis with adegenet but a little bug appears during the "connection network choice" step. I'd like to know if someone could help me for that.
> I'd like to use the Delaunay triangulation type (type1) but when i choose this option, i get this error message : "Error in tri.mesh(x = coords[, 1], y = coords[, 2]) : duplicate data points". I think it's because my data game contains some samples with the same spatial coordinates. I'd like to know if it's possible to keep this samples in the analysis (i've seen that R can automatically delete all the data with repeated coords or keep only one of these samples/coord but this solution doesn't suits me fine), and if yes, how should I do ?. Thanks for your help. (The coord file is attached to this message, maybe it could help)
> Thanks a lot
> Adrien RIEUX, PhD student in Montpellier (France)
> //
> --------------------------------------------------------------------------------
>
> Adrien RIEUX
>
> PhD Student
>
> CIRAD - Département BIOS
> UMR Biologie et Génétique des Interactions Plantes-Parasites
> TA A 54 / K - Campus International de Baillarguet - Bureau 118
> 34398 Montpellier Cedex 5
> France
>
> Tel : + 33 4 99 62 41 84
> Fax : + 33 4 99 62 48 48
> Mail : adrien.rieux at cirad.fr
>
>
>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum


-- 
######################################
Dr Thibaut JOMBART
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - Faculty of Medicine
St Mary’s Campus
Norfolk Place
London W2 1PG
United Kingdom
Tel. : 0044 (0)20 7594 3658
t.jombart at imperial.ac.uk
http://biomserv.univ-lyon1.fr/%7Ejombart/
http://adegenet.r-forge.r-project.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20090303/936144ce/attachment-0001.htm 


More information about the adegenet-forum mailing list