[adegenet-forum] genind object too big for sPCA?

Thibaut Jombart thibautjombart at gmail.com
Fri Nov 17 14:56:46 CET 2017


Hello,

this probably comes from the fact that you have to run the
eigenanalysis on a large matrix - I would guess around 13,000 x 13,000
in this case. Unlike regular PCA, sPCA cannot diagonalise in the
smallest dimension (# indiv / # alleles).

On quick solution for this would be reduce the number of alleles, e.g.
by keeping alleles which are contributors (e.g. squared loadings > .01
or 0.5) in the first xxx axes of a PCA. Otherwise the same trick used
in DAPC can be used in sPCA: run the analysis through a PCA first,
then run the sPCA on the principal components.

Best
Thibaut

--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College London
Head of RECON: repidemicsconsortium.org
WHO Consultant - outbreak analysis
sites.google.com/site/thibautjombart/
Twitter: @TeebzR
+44(0)20 7594 3658


On 17 November 2017 at 09:33, Roman Luštrik <roman.lustrik at biolitika.si> wrote:
> Can you check the memory usage? Is it consuming all the RAM? If not, they
> it's probably not a memory issue and the culprit is somewhere else. Have you
> tried running the analysis on a subset of data?
>
> Cheers,
> Roman
>
>
> ----
> In god we trust, all others bring data.
>> Zahtevaj IJZ na https://kurc.biolitika.si
>
> ________________________________
> From: "Judy (Duffie), Caroline" <JudyC at si.edu>
> To: adegenet-forum at lists.r-forge.r-project.org
> Sent: Wednesday, November 8, 2017 7:06:05 PM
> Subject: Re: [adegenet-forum] genind object too big for sPCA?
>
> Update - I tried running the same script on a computer with 64 GB of memory.
> Same issues.
>
> On Nov 7, 2017, at 12:13 PM, Judy (Duffie), Caroline <JudyC at si.edu> wrote:
>
> Hi all,
>
> I’m having trouble running an sPCA on a genind object (10.6Mb) that contains
> about 160 individuals and 6500 SNPs - When I run the command: 'mySpca <-
> spca(data, ask=FALSE, type=1, scannf=FALSE)” R crashes - i.e. I get the
> “whirling ball of death” and the program becomes unresponsive.
>
> I’ve seen some older messages on the forum that similarly report problems
> with larger genind objects, but responses indicate that there shouldn’t be a
> memory issue
> (http://lists.r-forge.r-project.org/pipermail/adegenet-forum/2012-June/000513.html).
> I’m running on a MBP  with 16 GB of memory.
>
> Any tips or tricks for running an object of this size? Interestingly I’ve
> been able to run a PCA and DAPC without issue.
>
> #Convert structure file to a genind object.
>> data <-
>> read.structure("~/Documents/Trochilus/second_chapter/Analysis/structure/input/GBS_all_pop_pheno.stru",
> + n.ind=158,
> + n.loc=6451,
> + onerowperin=TRUE,
> + col.lab=1,
> + col.pop=2,
> + col.others=3:8,
> + row.marknames=0,
> + ask=FALSE,
> +  )
>
>  Converting data from a STRUCTURE .stru file to a genind object...
>
>> #add xy data as a separate element in the list $other
>> other(data)$xy <- other(data)$X[, 5:6]
>> mode(other(data)$xy) <- "numeric"
>> colnames(other(data)$xy) <- c("x", "y")
>> #define strata
>> strata(data) <- as.data.frame(other(data)$X[, 1:3])
>> nameStrata(data) <-c("sex","phenotype", "HI")
>>
>> # add jitter
>> data$other$xy <-jitter(data$other$xy, factor = 1, amount = NULL)
>
>>  mySpca <- spca(data, ask=FALSE, type=1, scannf=FALSE)
>
>
> Caroline D. Judy
> PhD Candidate (LSU)
> Peter Buck Predoctoral Fellow (NMNH)
> email: judyc at si.edu
>
>
>
>
> Caroline D. Judy
> PhD Candidate (LSU)
> Peter Buck Predoctoral Fellow (NMNH)
> email: judyc at si.edu
>
>
>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum


More information about the adegenet-forum mailing list