[adegenet-forum] li vs. ls in sPCA analysis

Hanan Sela hans at tauex.tau.ac.il
Sun Aug 11 07:21:42 CEST 2013


Hello
I have plotted the first  PC of sPCA analysis using s.value once with
z=my.pca$li[,1]
and once with z=my.pca$ls[,1]. The patterns seems to differ (see attached
file). I do not understand what the lagged PC is representing. What is the
meaning of "denoisified" in the practical day presentation  (Google does
not know). How do i interpent the difference. Please explain.
Thank you

Mr. Hanan Sela Ph.D.
Curator of the Lieberman Cereal Germplasm Bank
The Institute for Cereal Crops Improvement
Tel-Aviv University
P.O. Box 39040
Tel Aviv 69978
Israel

hans at tauex.tau.ac.il
Phone: 972-3-6405773
Cell: 972-50-5727458 , local U.S 17203600603
Fax: 972-3-6407857


On Thu, Aug 1, 2013 at 7:15 PM, <
adegenet-forum-request at lists.r-forge.r-project.org> wrote:

> Send adegenet-forum mailing list submissions to
>         adegenet-forum at lists.r-forge.r-project.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
> or, via email, send a message with subject or body 'help' to
>         adegenet-forum-request at lists.r-forge.r-project.org
>
> You can reach the person managing the list at
>         adegenet-forum-owner at lists.r-forge.r-project.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of adegenet-forum digest..."
>
>
> Today's Topics:
>
>    1. Fwd: Question about pre-processing of SNP data for        machine
>       learning (Daniel Murrell)
>    2. Re: Fwd: Question about pre-processing of SNP data for
>       machine learning (Jombart, Thibaut)
>    3. Re: Fwd: Question about pre-processing of SNP data for
>       machine learning (Daniel Murrell)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 1 Aug 2013 15:26:00 +0100
> From: Daniel Murrell <dsm38 at cam.ac.uk>
> To: adegenet-forum at lists.r-forge.r-project.org
> Subject: [adegenet-forum] Fwd: Question about pre-processing of SNP
>         data for        machine learning
> Message-ID:
>         <CADK=3HwmiEO5v6fCQUYNkHFQ520avQJ9LFOAdu=
> Yu-Z+8h7BCg at mail.gmail.com>
> Content-Type: text/plain; charset="windows-1252"
>
> Hi All
>
> This is my first time using adegenet. I'm trying to perform some
> pre-processing on 1.3M SNPs (~800 individuals) so that I can use them for a
> machine learning task. My data was stored in a format which had to be
> converted to a genlight object. The data was split so that the information
> for the SNPs in each chromosome was in a separate file. I've read each file
> in, converted that to a genlight object and then concatenated the genlight
> objects using cbind. All of that seems to work ok (except the position and
> chromosome data went back to NULL during the concatenation and I had to
> reset it on the combined genlight object).
>
> So, now I want to do my own processing on each SNP and when I try to access
> the information for this SNP over the 800 individuals, it takes ages to
> extract. Is this because the encoding is done row wise, and so the whole
> object needs to be decoded for me to get out the information I require? Is
> there a way to transpose this genlight object so that I can access the data
> for a single SNP across all individual quickly?
>
> Thank you
> Daniel
>
> ---------- Forwarded message ----------
> From: Jombart, Thibaut <t.jombart at imperial.ac.uk>
> Date: Fri, Jul 19, 2013 at 4:27 PM
> Subject: RE: Question about pre-processing of SNP data for machine learning
> To: Daniel Murrell <dsm38 at cam.ac.uk>
>
>
> Dear Daniel,
>
> yes, adegenet is designed for that kind of task. Please look at the
> tutorial on adegenet-basics where you'll find examples of dimension
> reduction for SNP data, to be found on:
> http://adegenet.r-forge.r-project.org/
>
> Don't hesitate to use the adegenet-forum for further questions (see
> contacts on the website).
> Best
> Thibaut
>
> --
> ######################################
> Dr Thibaut JOMBART
> MRC Centre for Outbreak Analysis and Modelling
> Department of Infectious Disease Epidemiology
> Imperial College - School of Public Health
> St Mary?s Campus
> Norfolk Place
> London W2 1PG
> United Kingdom
> Tel. : 0044 (0)20 7594 3658
> t.jombart at imperial.ac.uk
> http://sites.google.com/site/thibautjombart/
> http://adegenet.r-forge.r-project.org/
> ________________________________________
> From: dsmurrell at gmail.com [dsmurrell at gmail.com] on behalf of Daniel
> Murrell
> [dsm38 at cam.ac.uk]
> Sent: 19 July 2013 16:23
> To: Jombart, Thibaut
> Subject: Question about pre-processing of SNP data for machine learning
>
> Dear Thibaut
>
> I'm trying to build a model that uses SNP data as input. The problem I have
> is that there is too much of it and I need a way to reduce the number or
> the dimensionality of the data points so that I can use them as input to
> machine learning algorithms (genome wide, 1.3 million SNPs, 800
> individuals). I've done some searching and found this paper:
> http://www.ncbi.nlm.nih.gov/pubmed/18076475 (pdf attached).
>
> I also found your adegenet package and wondered if it's designed for doing
> something like this? I'm not from this field and I'm having some trouble
> working this out. Can you point me to anything that might help?
>
> I'm not sure whether I should be keeping a subset of SNPs and how to find
> that subset from the 1.3 million, or whether I should be reducing the
> dimensionality.
>
> Thank you
> Daniel
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20130801/a331daec/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Thu, 1 Aug 2013 15:22:27 +0000
> From: "Jombart, Thibaut" <t.jombart at imperial.ac.uk>
> To: Daniel Murrell <dsm38 at cam.ac.uk>,
>         "adegenet-forum at lists.r-forge.r-project.org"
>         <adegenet-forum at lists.r-forge.r-project.org>
> Subject: Re: [adegenet-forum] Fwd: Question about pre-processing of
>         SNP data for    machine learning
> Message-ID:
>         <2CB2DA8E426F3541AB1907F98ABA6570638ABF4F at icexch-m1.ic.ac.uk>
> Content-Type: text/plain; charset="Windows-1252"
>
>
> Dear Daniel,
>
> the loss of attributes after cbind indeed is a glitch. Would you mind
> creating a ticket about it?
> https://sourceforge.net/p/adegenet/tickets/
>
> You're right about the issue. The encoding is indeed done row-wise so the
> conversion is done many times over. There's no option for transposing the
> data, but one solution would be converting your data to integers by blocks
> so that conversion takes place less often, while still keep RAM
> requirements reasonable.
>
> All the best
>
> Thibaut
>
> ________________________________________
> From: adegenet-forum-bounces at lists.r-forge.r-project.org [
> adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Daniel
> Murrell [dsm38 at cam.ac.uk]
> Sent: 01 August 2013 15:26
> To: adegenet-forum at lists.r-forge.r-project.org
> Subject: [adegenet-forum] Fwd: Question about pre-processing of SNP data
> for    machine learning
>
> Hi All
>
> This is my first time using adegenet. I'm trying to perform some
> pre-processing on 1.3M SNPs (~800 individuals) so that I can use them for a
> machine learning task. My data was stored in a format which had to be
> converted to a genlight object. The data was split so that the information
> for the SNPs in each chromosome was in a separate file. I've read each file
> in, converted that to a genlight object and then concatenated the genlight
> objects using cbind. All of that seems to work ok (except the position and
> chromosome data went back to NULL during the concatenation and I had to
> reset it on the combined genlight object).
>
> So, now I want to do my own processing on each SNP and when I try to
> access the information for this SNP over the 800 individuals, it takes ages
> to extract. Is this because the encoding is done row wise, and so the whole
> object needs to be decoded for me to get out the information I require? Is
> there a way to transpose this genlight object so that I can access the data
> for a single SNP across all individual quickly?
>
> Thank you
> Daniel
>
> ---------- Forwarded message ----------
> From: Jombart, Thibaut <t.jombart at imperial.ac.uk<mailto:
> t.jombart at imperial.ac.uk>>
> Date: Fri, Jul 19, 2013 at 4:27 PM
> Subject: RE: Question about pre-processing of SNP data for machine learning
> To: Daniel Murrell <dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk>>
>
>
> Dear Daniel,
>
> yes, adegenet is designed for that kind of task. Please look at the
> tutorial on adegenet-basics where you'll find examples of dimension
> reduction for SNP data, to be found on:
> http://adegenet.r-forge.r-project.org/
>
> Don't hesitate to use the adegenet-forum for further questions (see
> contacts on the website).
> Best
> Thibaut
>
> --
> ######################################
> Dr Thibaut JOMBART
> MRC Centre for Outbreak Analysis and Modelling
> Department of Infectious Disease Epidemiology
> Imperial College - School of Public Health
> St Mary?s Campus
> Norfolk Place
> London W2 1PG
> United Kingdom
> Tel. : 0044 (0)20 7594 3658<tel:0044%20%280%2920%207594%203658>
> t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>
> http://sites.google.com/site/thibautjombart/
> http://adegenet.r-forge.r-project.org/
> ________________________________________
> From: dsmurrell at gmail.com<mailto:dsmurrell at gmail.com> [dsmurrell at gmail.com
> <mailto:dsmurrell at gmail.com>] on behalf of Daniel Murrell [dsm38 at cam.ac.uk
> <mailto:dsm38 at cam.ac.uk>]
> Sent: 19 July 2013 16:23
> To: Jombart, Thibaut
> Subject: Question about pre-processing of SNP data for machine learning
>
> Dear Thibaut
>
> I'm trying to build a model that uses SNP data as input. The problem I
> have is that there is too much of it and I need a way to reduce the number
> or the dimensionality of the data points so that I can use them as input to
> machine learning algorithms (genome wide, 1.3 million SNPs, 800
> individuals). I've done some searching and found this paper:
> http://www.ncbi.nlm.nih.gov/pubmed/18076475 (pdf attached).
>
> I also found your adegenet package and wondered if it's designed for doing
> something like this? I'm not from this field and I'm having some trouble
> working this out. Can you point me to anything that might help?
>
> I'm not sure whether I should be keeping a subset of SNPs and how to find
> that subset from the 1.3 million, or whether I should be reducing the
> dimensionality.
>
> Thank you
> Daniel
>
>
> ------------------------------
>
> Message: 3
> Date: Thu, 1 Aug 2013 17:14:37 +0100
> From: Daniel Murrell <dsm38 at cam.ac.uk>
> To: "Jombart, Thibaut" <t.jombart at imperial.ac.uk>
> Cc: "adegenet-forum at lists.r-forge.r-project.org"
>         <adegenet-forum at lists.r-forge.r-project.org>
> Subject: Re: [adegenet-forum] Fwd: Question about pre-processing of
>         SNP data for machine learning
> Message-ID:
>         <CADK=3Hz=iJSJePuCOSwCkFOQUWHQyAmk+YS=-
> qWD+EO5vOBihA at mail.gmail.com>
> Content-Type: text/plain; charset="windows-1252"
>
> Dear Thibaut
>
> Ok, I could try that. I could also try and use the genlight object in a
> transposed manner just for the purposes of holding the data so that I can
> access individual SNPs easily. I mean nothing else would work expect the
> containment.
>
> Thanks for the help
> Regards
> Daniel
>
> On Thu, Aug 1, 2013 at 4:22 PM, Jombart, Thibaut
> <t.jombart at imperial.ac.uk>wrote:
>
> >
> > Dear Daniel,
> >
> > the loss of attributes after cbind indeed is a glitch. Would you mind
> > creating a ticket about it?
> > https://sourceforge.net/p/adegenet/tickets/
> >
> > You're right about the issue. The encoding is indeed done row-wise so the
> > conversion is done many times over. There's no option for transposing the
> > data, but one solution would be converting your data to integers by
> blocks
> > so that conversion takes place less often, while still keep RAM
> > requirements reasonable.
> >
> > All the best
> >
> > Thibaut
> >
> > ________________________________________
> > From: adegenet-forum-bounces at lists.r-forge.r-project.org [
> > adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Daniel
> > Murrell [dsm38 at cam.ac.uk]
> > Sent: 01 August 2013 15:26
> > To: adegenet-forum at lists.r-forge.r-project.org
> > Subject: [adegenet-forum] Fwd: Question about pre-processing of SNP data
> > for    machine learning
> >
> > Hi All
> >
> > This is my first time using adegenet. I'm trying to perform some
> > pre-processing on 1.3M SNPs (~800 individuals) so that I can use them
> for a
> > machine learning task. My data was stored in a format which had to be
> > converted to a genlight object. The data was split so that the
> information
> > for the SNPs in each chromosome was in a separate file. I've read each
> file
> > in, converted that to a genlight object and then concatenated the
> genlight
> > objects using cbind. All of that seems to work ok (except the position
> and
> > chromosome data went back to NULL during the concatenation and I had to
> > reset it on the combined genlight object).
> >
> > So, now I want to do my own processing on each SNP and when I try to
> > access the information for this SNP over the 800 individuals, it takes
> ages
> > to extract. Is this because the encoding is done row wise, and so the
> whole
> > object needs to be decoded for me to get out the information I require?
> Is
> > there a way to transpose this genlight object so that I can access the
> data
> > for a single SNP across all individual quickly?
> >
> > Thank you
> > Daniel
> >
> > ---------- Forwarded message ----------
> > From: Jombart, Thibaut <t.jombart at imperial.ac.uk<mailto:
> > t.jombart at imperial.ac.uk>>
> > Date: Fri, Jul 19, 2013 at 4:27 PM
> > Subject: RE: Question about pre-processing of SNP data for machine
> learning
> > To: Daniel Murrell <dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk>>
> >
> >
> > Dear Daniel,
> >
> > yes, adegenet is designed for that kind of task. Please look at the
> > tutorial on adegenet-basics where you'll find examples of dimension
> > reduction for SNP data, to be found on:
> > http://adegenet.r-forge.r-project.org/
> >
> > Don't hesitate to use the adegenet-forum for further questions (see
> > contacts on the website).
> > Best
> > Thibaut
> >
> > --
> > ######################################
> > Dr Thibaut JOMBART
> > MRC Centre for Outbreak Analysis and Modelling
> > Department of Infectious Disease Epidemiology
> > Imperial College - School of Public Health
> > St Mary?s Campus
> > Norfolk Place
> > London W2 1PG
> > United Kingdom
> > Tel. : 0044 (0)20 7594 3658<tel:0044%20%280%2920%207594%203658>
> > t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>
> > http://sites.google.com/site/thibautjombart/
> > http://adegenet.r-forge.r-project.org/
> > ________________________________________
> > From: dsmurrell at gmail.com<mailto:dsmurrell at gmail.com> [
> dsmurrell at gmail.com
> > <mailto:dsmurrell at gmail.com>] on behalf of Daniel Murrell [
> dsm38 at cam.ac.uk
> > <mailto:dsm38 at cam.ac.uk>]
> > Sent: 19 July 2013 16:23
> > To: Jombart, Thibaut
> > Subject: Question about pre-processing of SNP data for machine learning
> >
> > Dear Thibaut
> >
> > I'm trying to build a model that uses SNP data as input. The problem I
> > have is that there is too much of it and I need a way to reduce the
> number
> > or the dimensionality of the data points so that I can use them as input
> to
> > machine learning algorithms (genome wide, 1.3 million SNPs, 800
> > individuals). I've done some searching and found this paper:
> > http://www.ncbi.nlm.nih.gov/pubmed/18076475 (pdf attached).
> >
> > I also found your adegenet package and wondered if it's designed for
> doing
> > something like this? I'm not from this field and I'm having some trouble
> > working this out. Can you point me to anything that might help?
> >
> > I'm not sure whether I should be keeping a subset of SNPs and how to find
> > that subset from the 1.3 million, or whether I should be reducing the
> > dimensionality.
> >
> > Thank you
> > Daniel
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20130801/4373022c/attachment.html
> >
>
> ------------------------------
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
> End of adegenet-forum Digest, Vol 60, Issue 2
> *********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20130811/8511ba1e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: li_vs_ls.pdf
Type: application/pdf
Size: 93855 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20130811/8511ba1e/attachment-0001.pdf>


More information about the adegenet-forum mailing list