[adegenet-forum] $li in sPCA analysis

Valeria Montano mirainoshojo at gmail.com
Wed Sep 4 16:45:40 CEST 2013


Hi Nate,

the $li scores are the scores of each locality onto a given component, the
same that you have in classic PCA, that is they are simply the coordinates
of the entities on the component you are interested in. As the component is
centred on zero, the values are both positive and negative and represent
the position of a specific location along that component. That is valid for
both positive and negative eigenvalues, respectively associated to global
and local spatial structure. A significant structure, whether global
(positive) or local (negative), is currently evaluated by the global and
local rtests on the basis of the overall genetic correlation with the
spatial distribution of the localities. Each positive and negative
component (with its own amount of genetic variance and moran Index
explained) is thus a partial representation of the global and local spatial
structure. So in your case, since you have a significant local structure,
you may plot one by one the first, second, third etc negative component and
see what the pattern looks like according to each component. Sometimes
there's interesting info in smaller cp.

Ehm, as usual it's a bit messy explanation (I am not good at explaining),
but I hope this helps. Otherwise I hope you will get better replies.

Ciao

Valeria

On 3 September 2013 14:44, Nathan Truelove <nathan.truelove at manchester.ac.uk
> wrote:

>  Hi Adegenet Forum,
>
>  Thanks in advance to anyone who has some advice to share with the forum
> on SPCA. If you're in a rush just read the parts in bold.
>
>  *I've been using SPCA to look at spatial genetics patterns among lobster
> populations*. I found positive local structure with the function
> local.rest and no global structure using global.rtest. I've followed
> Thibaut's advice in his previous sPCA email to forum and used $li to
> interpret local structure. I selected the local eigenvalue that had the
> highest levels of negative spatial autocorrelation and genetic variance for
> interpretation using the screeplot function. The $li values from this
> eigenvalue were then used to create an interpolated map.
>
>  *My question for the forum is*: *What do the positive and negative $li
> values associated with the local eigenvalue mean? *Do they correspond to
> levels of local (positive) and global (negative) scores at each location?
> Or are the $li values associated with the local eigenvalues simply a score
> for detecting local spatial genetic structure among sites and have nothing
> to do with global structure?
>
>  Best Wishes,
>
>  Nate
>
>   On Aug 11, 2013, at 4:35 PM, Jombart, Thibaut wrote:
>
>
> Hello,
>
> I think you attached the wrong file.
>
> Negative values and local structure are not related. Local structure =
> sharp differences between neighours. These would be overlooked by the
> lagged vector.
>
> If the structure is clear enough, use $li.
>
> As you have many overlapping points, s.value is suboptimal. You should
> consider using the colorplot, or interpolated maps. See the tutorial on
> sPCA for some example:
> http://cran.r-project.org/web/packages/adegenet/vignettes/adegenet-spca.pdf
>
> Best
> Thibaut
> ________________________________________
> From: dooshra at gmail.com [dooshra at gmail.com] on behalf of Hanan Sela [
> hans at tauex.tau.ac.il]
> Sent: 11 August 2013 12:19
> To: Jombart, Thibaut
> Subject: Re: [adegenet-forum] li vs. ls in sPCA analysis
>
> Hello Thibaut,
> Thank you for the response.
> In the file I have attached I see that with the $li variable there are no
> negative values in the southern sites while with the $ls values there are
> negative values in the south. It seems that I see more local spatial
> structure with $ls than with $li . When I tested the data with local test I
> got significant results.  Which variable is better to present in a paper.
> Thank you
> Hanan
> Mr. Hanan Sela Ph.D.
> Curator of the Lieberman Cereal Germplasm Bank
> The Institute for Cereal Crops Improvement
> Tel-Aviv University
> P.O. Box 39040
> Tel Aviv 69978
> Israel
>
> hans at tauex.tau.ac.il<mailto:hans at tauex.tau.ac.il>
> Phone: 972-3-6405773
> Cell: 972-50-5727458 , local U.S 17203600603
> Fax: 972-3-6407857
>
>
> On Sun, Aug 11, 2013 at 12:37 PM, Jombart, Thibaut <
> t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>> wrote:
> Hello,
>
> the lagged vector is the spatially weighted average of the original
> vector. That is, the value of the score at a given location is the weighted
> average of the neighbouring values. This basically smooths the patterns so
> that they can be detected / visualized more easily.
>
> Cheers
> Thibaut.
>
> --
> ######################################
> Dr Thibaut JOMBART
> MRC Centre for Outbreak Analysis and Modelling
> Department of Infectious Disease Epidemiology
> Imperial College - School of Public Health
> St Mary’s Campus
> Norfolk Place
> London W2 1PG
> United Kingdom
> Tel. : 0044 (0)20 7594 3658<tel:0044%20%280%2920%207594%203658>
> t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>
> http://sites.google.com/site/thibautjombart/
> http://adegenet.r-forge.r-project.org/
> ________________________________________
> From: adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org> [
> adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org>] on behalf of Hanan
> Sela [hans at tauex.tau.ac.il<mailto:hans at tauex.tau.ac.il>]
> Sent: 11 August 2013<tel:2013> 06:21
> To: adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org>
> Subject: [adegenet-forum] li vs. ls in sPCA analysis
>
> Hello
> I have plotted the first  PC of sPCA analysis using s.value once with
> z=my.pca$li[,1]
> and once with z=my.pca$ls[,1]. The patterns seems to differ (see attached
> file). I do not understand what the lagged PC is representing. What is the
> meaning of "denoisified" in the practical day presentation  (Google does
> not know). How do i interpent the difference. Please explain.
> Thank you
>
> Mr. Hanan Sela Ph.D.
> Curator of the Lieberman Cereal Germplasm Bank
> The Institute for Cereal Crops Improvement
> Tel-Aviv University
> P.O. Box 39040
> Tel Aviv 69978
> Israel
>
> hans at tauex.tau.ac.il<mailto:hans at tauex.tau.ac.il><mailto:
> hans at tauex.tau.ac.il<mailto:hans at tauex.tau.ac.il>>
> Phone: 972-3-6405773<tel:972-3-6405773>
> Cell: 972-50-5727458<tel:972-50-5727458> , local U.S 17203600603
> Fax: 972-3-6407857<tel:972-3-6407857>
>
>
> On Thu, Aug 1, 2013<tel:2013> at 7:15 PM, <
> adegenet-forum-request at lists.r-forge.r-project.org<mailto:
> adegenet-forum-request at lists.r-forge.r-project.org><mailto:
> adegenet-forum-request at lists.r-forge.r-project.org<mailto:
> adegenet-forum-request at lists.r-forge.r-project.org>>> wrote:
> Send adegenet-forum mailing list submissions to
>        adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org><mailto:
> adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org>>
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
> or, via email, send a message with subject or body 'help' to
>        adegenet-forum-request at lists.r-forge.r-project.org<mailto:
> adegenet-forum-request at lists.r-forge.r-project.org><mailto:
> adegenet-forum-request at lists.r-forge.r-project.org<mailto:
> adegenet-forum-request at lists.r-forge.r-project.org>>
>
> You can reach the person managing the list at
>        adegenet-forum-owner at lists.r-forge.r-project.org<mailto:
> adegenet-forum-owner at lists.r-forge.r-project.org><mailto:
> adegenet-forum-owner at lists.r-forge.r-project.org<mailto:
> adegenet-forum-owner at lists.r-forge.r-project.org>>
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of adegenet-forum digest..."
>
>
> Today's Topics:
>
>   1. Fwd: Question about pre-processing of SNP data for        machine
>      learning (Daniel Murrell)
>   2. Re: Fwd: Question about pre-processing of SNP data for
>      machine learning (Jombart, Thibaut)
>   3. Re: Fwd: Question about pre-processing of SNP data for
>      machine learning (Daniel Murrell)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 1 Aug 2013<tel:2013><tel:2013<tel:2013>> 15:26:00 +0100
> From: Daniel Murrell <dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk><mailto:
> dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk>>>
> To: adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org><mailto:
> adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org>>
> Subject: [adegenet-forum] Fwd: Question about pre-processing of SNP
>        data for        machine learning
> Message-ID:
>        <CADK=3HwmiEO5v6fCQUYNkHFQ520avQJ9LFOAdu=Yu-Z+8h7BCg at mail.gmail.com
> <mailto:Yu-Z%2B8h7BCg at mail.gmail.com><mailto:Yu-Z%2B8h7BCg at mail.gmail.com
> <mailto:Yu-Z%252B8h7BCg at mail.gmail.com>>>
> Content-Type: text/plain; charset="windows-1252"
>
> Hi All
>
> This is my first time using adegenet. I'm trying to perform some
> pre-processing on 1.3M SNPs (~800 individuals) so that I can use them for a
> machine learning task. My data was stored in a format which had to be
> converted to a genlight object. The data was split so that the information
> for the SNPs in each chromosome was in a separate file. I've read each file
> in, converted that to a genlight object and then concatenated the genlight
> objects using cbind. All of that seems to work ok (except the position and
> chromosome data went back to NULL during the concatenation and I had to
> reset it on the combined genlight object).
>
> So, now I want to do my own processing on each SNP and when I try to access
> the information for this SNP over the 800 individuals, it takes ages to
> extract. Is this because the encoding is done row wise, and so the whole
> object needs to be decoded for me to get out the information I require? Is
> there a way to transpose this genlight object so that I can access the data
> for a single SNP across all individual quickly?
>
> Thank you
> Daniel
>
> ---------- Forwarded message ----------
> From: Jombart, Thibaut <t.jombart at imperial.ac.uk<mailto:
> t.jombart at imperial.ac.uk><mailto:t.jombart at imperial.ac.uk<mailto:
> t.jombart at imperial.ac.uk>>>
> Date: Fri, Jul 19, 2013<tel:2013><tel:2013<tel:2013>> at 4:27 PM
> Subject: RE: Question about pre-processing of SNP data for machine learning
> To: Daniel Murrell <dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk><mailto:
> dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk>>>
>
>
> Dear Daniel,
>
> yes, adegenet is designed for that kind of task. Please look at the
> tutorial on adegenet-basics where you'll find examples of dimension
> reduction for SNP data, to be found on:
> http://adegenet.r-forge.r-project.org/
>
> Don't hesitate to use the adegenet-forum for further questions (see
> contacts on the website).
> Best
> Thibaut
>
> --
> ######################################
> Dr Thibaut JOMBART
> MRC Centre for Outbreak Analysis and Modelling
> Department of Infectious Disease Epidemiology
> Imperial College - School of Public Health
> St Mary?s Campus
> Norfolk Place
> London W2 1PG
> United Kingdom
> Tel. : 0044 (0)20 7594 3658
> <tel:0044%20%280%2920%207594%203658><tel:0044%20%280%2920%207594%203658>
> t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk><mailto:
> t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>>
> http://sites.google.com/site/thibautjombart/
> http://adegenet.r-forge.r-project.org/
> ________________________________________
> From: dsmurrell at gmail.com<mailto:dsmurrell at gmail.com><mailto:
> dsmurrell at gmail.com<mailto:dsmurrell at gmail.com>> [dsmurrell at gmail.com
> <mailto:dsmurrell at gmail.com><mailto:dsmurrell at gmail.com<mailto:
> dsmurrell at gmail.com>>] on behalf of Daniel Murrell
> [dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk><mailto:dsm38 at cam.ac.uk<mailto:
> dsm38 at cam.ac.uk>>]
> Sent: 19 July 2013<tel:2013><tel:2013> 16:23
> To: Jombart, Thibaut
> Subject: Question about pre-processing of SNP data for machine learning
>
> Dear Thibaut
>
> I'm trying to build a model that uses SNP data as input. The problem I have
> is that there is too much of it and I need a way to reduce the number or
> the dimensionality of the data points so that I can use them as input to
> machine learning algorithms (genome wide, 1.3 million SNPs, 800
> individuals). I've done some searching and found this paper:
> http://www.ncbi.nlm.nih.gov/pubmed/18076475 (pdf attached).
>
> I also found your adegenet package and wondered if it's designed for doing
> something like this? I'm not from this field and I'm having some trouble
> working this out. Can you point me to anything that might help?
>
> I'm not sure whether I should be keeping a subset of SNPs and how to find
> that subset from the 1.3 million, or whether I should be reducing the
> dimensionality.
>
> Thank you
> Daniel
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20130801/a331daec/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Thu, 1 Aug 2013<tel:2013> 15:22:27 +0000
> From: "Jombart, Thibaut" <t.jombart at imperial.ac.uk<mailto:
> t.jombart at imperial.ac.uk><mailto:t.jombart at imperial.ac.uk<mailto:
> t.jombart at imperial.ac.uk>>>
> To: Daniel Murrell <dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk><mailto:
> dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk>>>,
>        "adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org><mailto:
> adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org>>"
>        <adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org><mailto:
> adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org>>>
> Subject: Re: [adegenet-forum] Fwd: Question about pre-processing of
>        SNP data for    machine learning
> Message-ID:
>        <2CB2DA8E426F3541AB1907F98ABA6570638ABF4F at icexch-m1.ic.ac.uk
> <mailto:2CB2DA8E426F3541AB1907F98ABA6570638ABF4F at icexch-m1.ic.ac.uk
> ><mailto:2CB2DA8E426F3541AB1907F98ABA6570638ABF4F at icexch-m1.ic.ac.uk
> <mailto:2CB2DA8E426F3541AB1907F98ABA6570638ABF4F at icexch-m1.ic.ac.uk>>>
> Content-Type: text/plain; charset="Windows-1252"
>
>
> Dear Daniel,
>
> the loss of attributes after cbind indeed is a glitch. Would you mind
> creating a ticket about it?
> https://sourceforge.net/p/adegenet/tickets/
>
> You're right about the issue. The encoding is indeed done row-wise so the
> conversion is done many times over. There's no option for transposing the
> data, but one solution would be converting your data to integers by blocks
> so that conversion takes place less often, while still keep RAM
> requirements reasonable.
>
> All the best
>
> Thibaut
>
> ________________________________________
> From: adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org><mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org>> [
> adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org><mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org>>] on behalf of Daniel
> Murrell [dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk><mailto:dsm38 at cam.ac.uk
> <mailto:dsm38 at cam.ac.uk>>]
> Sent: 01 August 2013<tel:2013> 15:26
> To: adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org><mailto:
> adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org>>
> Subject: [adegenet-forum] Fwd: Question about pre-processing of SNP data
> for    machine learning
>
> Hi All
>
> This is my first time using adegenet. I'm trying to perform some
> pre-processing on 1.3M SNPs (~800 individuals) so that I can use them for a
> machine learning task. My data was stored in a format which had to be
> converted to a genlight object. The data was split so that the information
> for the SNPs in each chromosome was in a separate file. I've read each file
> in, converted that to a genlight object and then concatenated the genlight
> objects using cbind. All of that seems to work ok (except the position and
> chromosome data went back to NULL during the concatenation and I had to
> reset it on the combined genlight object).
>
> So, now I want to do my own processing on each SNP and when I try to
> access the information for this SNP over the 800 individuals, it takes ages
> to extract. Is this because the encoding is done row wise, and so the whole
> object needs to be decoded for me to get out the information I require? Is
> there a way to transpose this genlight object so that I can access the data
> for a single SNP across all individual quickly?
>
> Thank you
> Daniel
>
> ---------- Forwarded message ----------
> From: Jombart, Thibaut <t.jombart at imperial.ac.uk<mailto:
> t.jombart at imperial.ac.uk><mailto:t.jombart at imperial.ac.uk<mailto:
> t.jombart at imperial.ac.uk>><mailto:t.jombart at imperial.ac.uk<mailto:
> t.jombart at imperial.ac.uk><mailto:t.jombart at imperial.ac.uk<mailto:
> t.jombart at imperial.ac.uk>>>>
> Date: Fri, Jul 19, 2013<tel:2013> at 4:27 PM
> Subject: RE: Question about pre-processing of SNP data for machine learning
> To: Daniel Murrell <dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk><mailto:
> dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk>><mailto:dsm38 at cam.ac.uk<mailto:
> dsm38 at cam.ac.uk><mailto:dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk>>>>
>
>
> Dear Daniel,
>
> yes, adegenet is designed for that kind of task. Please look at the
> tutorial on adegenet-basics where you'll find examples of dimension
> reduction for SNP data, to be found on:
> http://adegenet.r-forge.r-project.org/
>
> Don't hesitate to use the adegenet-forum for further questions (see
> contacts on the website).
> Best
> Thibaut
>
> --
> ######################################
> Dr Thibaut JOMBART
> MRC Centre for Outbreak Analysis and Modelling
> Department of Infectious Disease Epidemiology
> Imperial College - School of Public Health
> St Mary?s Campus
> Norfolk Place
> London W2 1PG
> United Kingdom
> Tel. : 0044 (0)20 7594 3658
> <tel:0044%20%280%2920%207594%203658><tel:0044%20%280%2920%207594%203658>
> t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk><mailto:
> t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>><mailto:
> t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk><mailto:
> t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>>>
> http://sites.google.com/site/thibautjombart/
> http://adegenet.r-forge.r-project.org/
> ________________________________________
> From: dsmurrell at gmail.com<mailto:dsmurrell at gmail.com><mailto:
> dsmurrell at gmail.com<mailto:dsmurrell at gmail.com>><mailto:
> dsmurrell at gmail.com<mailto:dsmurrell at gmail.com><mailto:dsmurrell at gmail.com
> <mailto:dsmurrell at gmail.com>>> [dsmurrell at gmail.com<mailto:
> dsmurrell at gmail.com><mailto:dsmurrell at gmail.com<mailto:dsmurrell at gmail.com
> >><mailto:dsmurrell at gmail.com<mailto:dsmurrell at gmail.com><mailto:
> dsmurrell at gmail.com<mailto:dsmurrell at gmail.com>>>] on behalf of Daniel
> Murrell [dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk><mailto:dsm38 at cam.ac.uk
> <mailto:dsm38 at cam.ac.uk>><mailto:dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk
> ><mailto:dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk>>>]
> Sent: 19 July 2013<tel:2013> 16:23
> To: Jombart, Thibaut
> Subject: Question about pre-processing of SNP data for machine learning
>
> Dear Thibaut
>
> I'm trying to build a model that uses SNP data as input. The problem I
> have is that there is too much of it and I need a way to reduce the number
> or the dimensionality of the data points so that I can use them as input to
> machine learning algorithms (genome wide, 1.3 million SNPs, 800
> individuals). I've done some searching and found this paper:
> http://www.ncbi.nlm.nih.gov/pubmed/18076475 (pdf attached).
>
> I also found your adegenet package and wondered if it's designed for doing
> something like this? I'm not from this field and I'm having some trouble
> working this out. Can you point me to anything that might help?
>
> I'm not sure whether I should be keeping a subset of SNPs and how to find
> that subset from the 1.3 million, or whether I should be reducing the
> dimensionality.
>
> Thank you
> Daniel
>
>
> ------------------------------
>
> Message: 3
> Date: Thu, 1 Aug 2013<tel:2013> 17:14:37 +0100
> From: Daniel Murrell <dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk><mailto:
> dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk>>>
> To: "Jombart, Thibaut" <t.jombart at imperial.ac.uk<mailto:
> t.jombart at imperial.ac.uk><mailto:t.jombart at imperial.ac.uk<mailto:
> t.jombart at imperial.ac.uk>>>
> Cc: "adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org><mailto:
> adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org>>"
>        <adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org><mailto:
> adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org>>>
> Subject: Re: [adegenet-forum] Fwd: Question about pre-processing of
>        SNP data for machine learning
> Message-ID:
>        <CADK=3Hz=iJSJePuCOSwCkFOQUWHQyAmk+YS=-qWD+EO5vOBihA at mail.gmail.com
> <mailto:qWD%2BEO5vOBihA at mail.gmail.com><mailto:
> qWD%2BEO5vOBihA at mail.gmail.com<mailto:qWD%252BEO5vOBihA at mail.gmail.com>>>
> Content-Type: text/plain; charset="windows-1252"
>
> Dear Thibaut
>
> Ok, I could try that. I could also try and use the genlight object in a
> transposed manner just for the purposes of holding the data so that I can
> access individual SNPs easily. I mean nothing else would work expect the
> containment.
>
> Thanks for the help
> Regards
> Daniel
>
> On Thu, Aug 1, 2013<tel:2013> at 4:22 PM, Jombart, Thibaut
> <t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk><mailto:
> t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>>>wrote:
>
>
>  Dear Daniel,
>
>
>  the loss of attributes after cbind indeed is a glitch. Would you mind
>
> creating a ticket about it?
>
> https://sourceforge.net/p/adegenet/tickets/
>
>
>  You're right about the issue. The encoding is indeed done row-wise so the
>
> conversion is done many times over. There's no option for transposing the
>
> data, but one solution would be converting your data to integers by blocks
>
> so that conversion takes place less often, while still keep RAM
>
> requirements reasonable.
>
>
>  All the best
>
>
>  Thibaut
>
>
>  ________________________________________
>
> From: adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org><mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org>> [
>
> adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org><mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:
> adegenet-forum-bounces at lists.r-forge.r-project.org>>] on behalf of Daniel
>
> Murrell [dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk><mailto:dsm38 at cam.ac.uk
> <mailto:dsm38 at cam.ac.uk>>]
>
> Sent: 01 August 2013<tel:2013> 15:26
>
> To: adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org><mailto:
> adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org>>
>
> Subject: [adegenet-forum] Fwd: Question about pre-processing of SNP data
>
> for    machine learning
>
>
>  Hi All
>
>
>  This is my first time using adegenet. I'm trying to perform some
>
> pre-processing on 1.3M SNPs (~800 individuals) so that I can use them for a
>
> machine learning task. My data was stored in a format which had to be
>
> converted to a genlight object. The data was split so that the information
>
> for the SNPs in each chromosome was in a separate file. I've read each file
>
> in, converted that to a genlight object and then concatenated the genlight
>
> objects using cbind. All of that seems to work ok (except the position and
>
> chromosome data went back to NULL during the concatenation and I had to
>
> reset it on the combined genlight object).
>
>
>  So, now I want to do my own processing on each SNP and when I try to
>
> access the information for this SNP over the 800 individuals, it takes ages
>
> to extract. Is this because the encoding is done row wise, and so the whole
>
> object needs to be decoded for me to get out the information I require? Is
>
> there a way to transpose this genlight object so that I can access the data
>
> for a single SNP across all individual quickly?
>
>
>  Thank you
>
> Daniel
>
>
>  ---------- Forwarded message ----------
>
> From: Jombart, Thibaut <t.jombart at imperial.ac.uk<mailto:
> t.jombart at imperial.ac.uk><mailto:t.jombart at imperial.ac.uk<mailto:
> t.jombart at imperial.ac.uk>><mailto:
>
> t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk><mailto:
> t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>>>>
>
> Date: Fri, Jul 19, 2013 at 4:27 PM
>
> Subject: RE: Question about pre-processing of SNP data for machine learning
>
> To: Daniel Murrell <dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk><mailto:
> dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk>><mailto:dsm38 at cam.ac.uk<mailto:
> dsm38 at cam.ac.uk><mailto:dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk>>>>
>
>
>
>  Dear Daniel,
>
>
>  yes, adegenet is designed for that kind of task. Please look at the
>
> tutorial on adegenet-basics where you'll find examples of dimension
>
> reduction for SNP data, to be found on:
>
> http://adegenet.r-forge.r-project.org/
>
>
>  Don't hesitate to use the adegenet-forum for further questions (see
>
> contacts on the website).
>
> Best
>
> Thibaut
>
>
>  --
>
> ######################################
>
> Dr Thibaut JOMBART
>
> MRC Centre for Outbreak Analysis and Modelling
>
> Department of Infectious Disease Epidemiology
>
> Imperial College - School of Public Health
>
> St Mary?s Campus
>
> Norfolk Place
>
> London W2 1PG
>
> United Kingdom
>
> Tel. : 0044 (0)20 7594 3658<tel:0044%20%280%2920%207594%203658>
>
> t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk><mailto:
> t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>><mailto:
> t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk><mailto:
> t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>>>
>
> http://sites.google.com/site/thibautjombart/
>
> http://adegenet.r-forge.r-project.org/
>
> ________________________________________
>
> From: dsmurrell at gmail.com<mailto:dsmurrell at gmail.com><mailto:
> dsmurrell at gmail.com<mailto:dsmurrell at gmail.com>><mailto:
> dsmurrell at gmail.com<mailto:dsmurrell at gmail.com><mailto:dsmurrell at gmail.com
> <mailto:dsmurrell at gmail.com>>> [dsmurrell at gmail.com<mailto:
> dsmurrell at gmail.com><mailto:dsmurrell at gmail.com<mailto:dsmurrell at gmail.com
> >>
>
> <mailto:dsmurrell at gmail.com<mailto:dsmurrell at gmail.com><mailto:
> dsmurrell at gmail.com<mailto:dsmurrell at gmail.com>>>] on behalf of Daniel
> Murrell [dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk><mailto:dsm38 at cam.ac.uk
> <mailto:dsm38 at cam.ac.uk>>
>
> <mailto:dsm38 at cam.ac.uk<mailto:dsm38 at cam.ac.uk><mailto:dsm38 at cam.ac.uk
> <mailto:dsm38 at cam.ac.uk>>>]
>
> Sent: 19 July 2013 16:23
>
> To: Jombart, Thibaut
>
> Subject: Question about pre-processing of SNP data for machine learning
>
>
>  Dear Thibaut
>
>
>  I'm trying to build a model that uses SNP data as input. The problem I
>
> have is that there is too much of it and I need a way to reduce the number
>
> or the dimensionality of the data points so that I can use them as input to
>
> machine learning algorithms (genome wide, 1.3 million SNPs, 800
>
> individuals). I've done some searching and found this paper:
>
> http://www.ncbi.nlm.nih.gov/pubmed/18076475 (pdf attached).
>
>
>  I also found your adegenet package and wondered if it's designed for
> doing
>
> something like this? I'm not from this field and I'm having some trouble
>
> working this out. Can you point me to anything that might help?
>
>
>  I'm not sure whether I should be keeping a subset of SNPs and how to find
>
> that subset from the 1.3 million, or whether I should be reducing the
>
> dimensionality.
>
>
>  Thank you
>
> Daniel
>
>
>  -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20130801/4373022c/attachment.html
> >
>
> ------------------------------
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org><mailto:
> adegenet-forum at lists.r-forge.r-project.org<mailto:
> adegenet-forum at lists.r-forge.r-project.org>>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
> End of adegenet-forum Digest, Vol 60, Issue 2
> *********************************************
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20130904/b00a4ea7/attachment-0001.html>


More information about the adegenet-forum mailing list