[adegenet-forum] spatial pca with large SNP data

Jombart, Thibaut t.jombart at imperial.ac.uk
Tue Oct 20 16:00:26 CEST 2015


Dear Anne,

this relates to:
https://github.com/thibautjombart/adegenet/issues/95

And there is no simple solution to the problem. I can see two options:
#1 PCA then multispati
Run a PCA on your data first and then use ade4's equivalent of the sPCA, multispati, to get a spatial analysis; your new object won't exactly be of the same structure as a sPCA but most outputs will be
there.

#2 PCA, retain most contributing loci, then sPCA
You can do a PCA on your whole dataset and then compute the average contribution (squared loadings) of your loci over the first xxx axes. Then you can keep the xxx loci which are most informative. I think I'd prefer this option slightly, as then you still work with alleles for your sPCA and not synthetic variables, but up to you:

Here's an example to get a new dataset with the 25% most contributing loci:
> library(adegenet)
> data(H3N2)

## make your PCA
> pca1 <- dudi.pca(tab(H3N2, freq=TRUE,NA.method="mean"),scannf=FALSE,scale=FALSE, nf=3)

## use loadingplot to ID most contributing loci
> toKeep <- loadingplot(pca1$c1^2, byfac=TRUE,fac=locFac(H3N2))$var.idx
> toKeep
 45  60  73 148 168 171 225 247 317 351 391 396 435 463 464 468 476 483 490 566 577 578 582 600 604 673 676 679 763 807 963
  5   7   9  16  19  20  22  25  31  35  40  41  51  53  54  55  57  58  59  68  70  71  72  77  79  91  93  94  98 100 116
> new.x <- H3N2[loc=toKeep]
> new.x
/// GENIND OBJECT /////////

 // 1,903 individuals; 31 loci; 78 alleles; size: 1.4 Mb

 // Basic content
   @tab:  1903 x 78 matrix of allele counts
   @loc.n.all: number of alleles per locus (range: 2-4)
   @loc.fac: locus factor for the 78 columns of @tab
   @all.names: list of allele names for each locus
   @ploidy: ploidy of each individual  (range: 1-1)
   @type:  codom
   @call: .local(x = x, i = i, j = j, loc = ..1, drop = drop)

 // Optional content
   @other: a list containing: x  xy  epid

> locNames(new.x)
 [1] "45"  "60"  "73"  "148" "168" "171" "225" "247" "317" "351" "391" "396" "435" "463" "464" "468" "476" "483" "490" "566" "577"
[22] "578" "582" "600" "604" "673" "676" "679" "763" "807" "963"


Cheers
Thibaut




________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of anne DaSilva [anne_dasilva at hotmail.com]
Sent: 16 October 2015 18:02
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] spatial pca with large SNP data

Dear all,
I would like to conduct a spatial PCA with 24000 SNP (after pruning). If I understand spca is not possible with a genlight object, so I use PLINK to recode my data in a STRUCTURE format, and then I work on a genind object in R....but the analysis ends....because of the size of the genind object I imagine.
Is there a solution (split the data? but how?)?
I am loosing all my hair over that problem (perhaps pathetically simple....)  and I would be really grateful if someone could help me to escape from my ignorance....
Kind regards
Anne



 Anne Blondeau Da Silva
 Unité de Génétique Moléculaire Animale
 UMR 1061 INRA-Université de Limoges
 Faculté des Sciences et Techniques
 123 Avenue Albert Thomas
 87060 LIMOGES Cedex
 Tél. 05 55 45 76 75
 Fax. 05 55 45 76 53
_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum


_______________________________________________
adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151020/3ad9ce12/attachment-0001.html>


More information about the adegenet-forum mailing list