[adegenet-forum] spatial pca with large SNP data
Jombart, Thibaut
t.jombart at imperial.ac.uk
Tue Oct 20 16:00:26 CEST 2015
Dear Anne,
this relates to:
https://github.com/thibautjombart/adegenet/issues/95
And there is no simple solution to the problem. I can see two options:
#1 PCA then multispati
Run a PCA on your data first and then use ade4's equivalent of the sPCA, multispati, to get a spatial analysis; your new object won't exactly be of the same structure as a sPCA but most outputs will be
there.
#2 PCA, retain most contributing loci, then sPCA
You can do a PCA on your whole dataset and then compute the average contribution (squared loadings) of your loci over the first xxx axes. Then you can keep the xxx loci which are most informative. I think I'd prefer this option slightly, as then you still work with alleles for your sPCA and not synthetic variables, but up to you:
Here's an example to get a new dataset with the 25% most contributing loci:
> library(adegenet)
> data(H3N2)
## make your PCA
> pca1 <- dudi.pca(tab(H3N2, freq=TRUE,NA.method="mean"),scannf=FALSE,scale=FALSE, nf=3)
## use loadingplot to ID most contributing loci
> toKeep <- loadingplot(pca1$c1^2, byfac=TRUE,fac=locFac(H3N2))$var.idx
> toKeep
45 60 73 148 168 171 225 247 317 351 391 396 435 463 464 468 476 483 490 566 577 578 582 600 604 673 676 679 763 807 963
5 7 9 16 19 20 22 25 31 35 40 41 51 53 54 55 57 58 59 68 70 71 72 77 79 91 93 94 98 100 116
> new.x <- H3N2[loc=toKeep]
> new.x
/// GENIND OBJECT /////////
// 1,903 individuals; 31 loci; 78 alleles; size: 1.4 Mb
// Basic content
@tab: 1903 x 78 matrix of allele counts
@loc.n.all: number of alleles per locus (range: 2-4)
@loc.fac: locus factor for the 78 columns of @tab
@all.names: list of allele names for each locus
@ploidy: ploidy of each individual (range: 1-1)
@type: codom
@call: .local(x = x, i = i, j = j, loc = ..1, drop = drop)
// Optional content
@other: a list containing: x xy epid
> locNames(new.x)
[1] "45" "60" "73" "148" "168" "171" "225" "247" "317" "351" "391" "396" "435" "463" "464" "468" "476" "483" "490" "566" "577"
[22] "578" "582" "600" "604" "673" "676" "679" "763" "807" "963"
Cheers
Thibaut
________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of anne DaSilva [anne_dasilva at hotmail.com]
Sent: 16 October 2015 18:02
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] spatial pca with large SNP data
Dear all,
I would like to conduct a spatial PCA with 24000 SNP (after pruning). If I understand spca is not possible with a genlight object, so I use PLINK to recode my data in a STRUCTURE format, and then I work on a genind object in R....but the analysis ends....because of the size of the genind object I imagine.
Is there a solution (split the data? but how?)?
I am loosing all my hair over that problem (perhaps pathetically simple....) and I would be really grateful if someone could help me to escape from my ignorance....
Kind regards
Anne
Anne Blondeau Da Silva
Unité de Génétique Moléculaire Animale
UMR 1061 INRA-Université de Limoges
Faculté des Sciences et Techniques
123 Avenue Albert Thomas
87060 LIMOGES Cedex
Tél. 05 55 45 76 75
Fax. 05 55 45 76 53
_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
_______________________________________________
adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151020/3ad9ce12/attachment-0001.html>
More information about the adegenet-forum
mailing list