From t.jombart at imperial.ac.uk  Mon Jun  1 11:44:33 2015
From: t.jombart at imperial.ac.uk (Jombart, Thibaut)
Date: Mon, 1 Jun 2015 09:44:33 +0000
Subject: [adegenet-forum] DAPC with mtDNA data
In-Reply-To: <CA+LS-JRTFw6KOuRjK6Xbkp9zqMh0hrqevD5Q=-wckUvNx77u2g@mail.gmail.com>
References: <CA+LS-JRTFw6KOuRjK6Xbkp9zqMh0hrqevD5Q=-wckUvNx77u2g@mail.gmail.com>
Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF3044B@icexch-m1.ic.ac.uk>

Hi Francesca,

I think a bunch of emails have been exchanged on this topic on the forum.

See for instance:
http://lists.r-forge.r-project.org/pipermail/adegenet-forum/2014-May/000838.html

To find them, use the search engine on the adegenet website:
http://adegenet.r-forge.r-project.org/search.html

If you don't find your answer, please repost here.

Best
Thibaut


==============================
Dr Thibaut Jombart
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - School of Public Health
Norfolk Place, London W2 1PG, UK
Tel. : 0044 (0)20 7594 3658
http://sites.google.com/site/thibautjombart/
http://sites.google.com/site/therepiproject/
http://adegenet.r-forge.r-project.org/
Twitter: @thibautjombart


________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Francesca Tassi [tssfnc at unife.it]
Sent: 29 May 2015 11:33
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] DAPC with mtDNA data


Hi,

I'm trying to do a DAPC with a sequence matrix of mtDNA.

Which is the best way to run find.cluster procedure and then DAPC analysis?

Many thanks

Francesca

--
Francesca Tassi, PhD
Dipartimento di Scienze della Vita e Biotecnologie
Universit? di Ferrara
via Borsari 46
I-44121 Ferrara
Phone: +39 0532 455951  Fax: +39 0532 249761

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150601/e1851f1b/attachment.html>

From simon.crameri at env.ethz.ch  Tue Jun  2 14:03:09 2015
From: simon.crameri at env.ethz.ch (Crameri  Simon)
Date: Tue, 2 Jun 2015 12:03:09 +0000
Subject: [adegenet-forum] DAPC,
	number of retained PCs and number of saved LDFs
References: <9D61E83C-092E-402D-B4D0-BE066EE3E4AA@ethz.ch>
Message-ID: <A437426C-D731-4904-A79F-5BFF88D855EC@ethz.ch>

Hi Thibaut

I have a genetic dataset of 125 individuals belonging to 11 different closely related plant species (saved in the @pop slot) and I would like to model the species-genotype relationship using dapc().

Of course one major issue is to find the best number of retained principal components during the PCA step of DAPC. This is my approach to find the best n.pca:

- create 100 permuted training sets of my complete dataset, each containing 50% of the samples (sampling stratified for @pop since I have groups that contain only very few individuals)
- do DAPC with all 100 training sets  and each time predict the species of the validation samples

dapc.train <- dapc(training.set, n.pca = n.pca, n.da = n.pca)
val <- predict(dapc.train, newdata = validation.set)

- look at the prediction successes, calculate mean overall prediction success over the 100 runs that used the identical n.pca
- do the steps above for say n.pca = 1:30
- select the optimal n.pca for my validated model according to the first local prediction success maximum (alternatively, take the global maximum)

I think this is a similar procedure to doing

optim.a.score(dapc(complete.set, n.pca = 30, n.da = 30), smart = F, n.sim = 100, n.da = 30)

but the resulting best n.pca is somewhat larger  if I do it "by hand", and the resulting mean overall prediction successes are much larger than the respecitve mean a-scores.

Question 1)
Given these different results: where lies the difference between the two approaches (doing it "by hand" or using optim.a.score)? Does my approach make any sense?


In addition, I would like to compare the accuracy of different DAPC models using different datasets. I have a cpDNA dataset and a microsatellite dataset and would like to compare DAPC models that contain one,
the other or a combination of both datasets. To do this, I need to have the best n.pca for each case, and use the same procedure as described above. However, I observe that at
n.pca ? 10, less than n.pca discriminant functions are saved in the case of the cpDNA dataset. This behaviour is associated with some of the training sets only, and causes problems when I want
to automatize the script for different n.pca. I think this has something to do with the proportion of conserved variance, which reaches >0.98 at n.pca ? 10.

Question 2)
Why can't dapc() always save as many discriminant functions as there are available principal components (as indicated in the dapc argument n.da), and why is this is the case for some training sets only?

I sent you the the data and an R script that hopefully shows the problem.


With regards,
Simon

*********************************************
Simon Crameri

phD student
ETH Zurich
Plant Ecological Genetics

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150602/e9ea312e/attachment.html>

From t.jombart at imperial.ac.uk  Tue Jun  2 15:12:27 2015
From: t.jombart at imperial.ac.uk (Jombart, Thibaut)
Date: Tue, 2 Jun 2015 13:12:27 +0000
Subject: [adegenet-forum] DAPC,
 number of retained PCs and number of saved LDFs
In-Reply-To: <A437426C-D731-4904-A79F-5BFF88D855EC@ethz.ch>
References: <9D61E83C-092E-402D-B4D0-BE066EE3E4AA@ethz.ch>,
 <A437426C-D731-4904-A79F-5BFF88D855EC@ethz.ch>
Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF3065A@icexch-m1.ic.ac.uk>

Hello,

it looks like you have reinvented xvalDapc.. maybe worth trying it? ;)

?xvalDapc

Cheers
Thibaut


==============================
Dr Thibaut Jombart
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - School of Public Health
Norfolk Place, London W2 1PG, UK
Tel. : 0044 (0)20 7594 3658
http://sites.google.com/site/thibautjombart/
http://sites.google.com/site/therepiproject/
http://adegenet.r-forge.r-project.org/
Twitter: @thibautjombart


________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Crameri Simon [simon.crameri at env.ethz.ch]
Sent: 02 June 2015 13:03
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] DAPC, number of retained PCs and number of saved LDFs

Hi Thibaut

I have a genetic dataset of 125 individuals belonging to 11 different closely related plant species (saved in the @pop slot) and I would like to model the species-genotype relationship using dapc().

Of course one major issue is to find the best number of retained principal components during the PCA step of DAPC. This is my approach to find the best n.pca:

- create 100 permuted training sets of my complete dataset, each containing 50% of the samples (sampling stratified for @pop since I have groups that contain only very few individuals)
- do DAPC with all 100 training sets  and each time predict the species of the validation samples

dapc.train <- dapc(training.set, n.pca = n.pca, n.da = n.pca)
val <- predict(dapc.train, newdata = validation.set)

- look at the prediction successes, calculate mean overall prediction success over the 100 runs that used the identical n.pca
- do the steps above for say n.pca = 1:30
- select the optimal n.pca for my validated model according to the first local prediction success maximum (alternatively, take the global maximum)

I think this is a similar procedure to doing

optim.a.score(dapc(complete.set, n.pca = 30, n.da = 30), smart = F, n.sim = 100, n.da = 30)

but the resulting best n.pca is somewhat larger  if I do it "by hand", and the resulting mean overall prediction successes are much larger than the respecitve mean a-scores.

Question 1)
Given these different results: where lies the difference between the two approaches (doing it "by hand" or using optim.a.score)? Does my approach make any sense?


In addition, I would like to compare the accuracy of different DAPC models using different datasets. I have a cpDNA dataset and a microsatellite dataset and would like to compare DAPC models that contain one,
the other or a combination of both datasets. To do this, I need to have the best n.pca for each case, and use the same procedure as described above. However, I observe that at
n.pca ? 10, less than n.pca discriminant functions are saved in the case of the cpDNA dataset. This behaviour is associated with some of the training sets only, and causes problems when I want
to automatize the script for different n.pca. I think this has something to do with the proportion of conserved variance, which reaches >0.98 at n.pca ? 10.

Question 2)
Why can't dapc() always save as many discriminant functions as there are available principal components (as indicated in the dapc argument n.da), and why is this is the case for some training sets only?

I sent you the the data and an R script that hopefully shows the problem.


With regards,
Simon

*********************************************
Simon Crameri

phD student
ETH Zurich
Plant Ecological Genetics

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150602/557e10a3/attachment.html>

From 16187393 at sun.ac.za  Wed Jun  3 14:57:34 2015
From: 16187393 at sun.ac.za (Phair, D, Mnr <16187393@sun.ac.za>)
Date: Wed, 3 Jun 2015 12:57:34 +0000
Subject: [adegenet-forum] Calculating Distances from a connection Network
Message-ID: <DBXPR07MB016B87B8518C119B6E2E9D48BB40@DBXPR07MB016.eurprd07.prod.outlook.com>

Hi there


I am a Masters student running an MSPA to look for Spatial structuring in an invasive species in a South African and Australian context. I am able to run the analyses with no apparent issues but was wondering if anyone knew of a method to calculate the minimum and maximum distances from a Delaunay triangulation connection network.

The reason being that i would like to compare my results between South Africa and Australia but want to be sure the connection extents are comparable. i.e. that the minimum/maximum distance between connected individuals are similar within South Africa and Australia.

I am fairly new to both R and Adegenet and so have only a basic working knowledge.


Regards


David Phair
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150603/4f0cd4bf/attachment.html>

From t.jombart at imperial.ac.uk  Thu Jun  4 12:16:12 2015
From: t.jombart at imperial.ac.uk (Jombart, Thibaut)
Date: Thu, 4 Jun 2015 10:16:12 +0000
Subject: [adegenet-forum] Calculating Distances from a connection Network
In-Reply-To: <DBXPR07MB016B87B8518C119B6E2E9D48BB40@DBXPR07MB016.eurprd07.prod.outlook.com>
References: <DBXPR07MB016B87B8518C119B6E2E9D48BB40@DBXPR07MB016.eurprd07.prod.outlook.com>
Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF30973@icexch-m1.ic.ac.uk>

Hi there,

yes, the simplest way might be to get the adjacency matrix from the graph and then multiply it by the matrix of geographic distances. Example using nancycats:

## get network
library(adegenet)
data(nancycats)
cn1 <- chooseCN(nancycats at other$xy,ask=FALSE,type=1)

## get adj matrix
M <- neig2mat(nb2neig(cn1))

## get geo dist matrix
G <- as.matrix(dist(other(nancycats)$xy))

## get distances on Delaunay graph
d.delau <- G[M>0]

## range
range(d.delau)

Cheers
Thibaut


==============================
Dr Thibaut Jombart
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - School of Public Health
Norfolk Place, London W2 1PG, UK
Tel. : 0044 (0)20 7594 3658
http://sites.google.com/site/thibautjombart/
http://sites.google.com/site/therepiproject/
http://adegenet.r-forge.r-project.org/
Twitter: @thibautjombart


________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Phair, D, Mnr <16187393 at sun.ac.za> [16187393 at sun.ac.za]
Sent: 03 June 2015 13:57
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] Calculating Distances from a connection Network


Hi there


I am a Masters student running an MSPA to look for Spatial structuring in an invasive species in a South African and Australian context. I am able to run the analyses with no apparent issues but was wondering if anyone knew of a method to calculate the minimum and maximum distances from a Delaunay triangulation connection network.

The reason being that i would like to compare my results between South Africa and Australia but want to be sure the connection extents are comparable. i.e. that the minimum/maximum distance between connected individuals are similar within South Africa and Australia.

I am fairly new to both R and Adegenet and so have only a basic working knowledge.


Regards


David Phair
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150604/97b245e0/attachment.html>

From postmaster at r-forge.wu-wien.ac.at  Sat Jun  6 11:00:21 2015
From: postmaster at r-forge.wu-wien.ac.at (Returned mail)
Date: Sat, 6 Jun 2015 17:00:21 +0800
Subject: [adegenet-forum] error
Message-ID: <mailman.0.1433581321.1043.adegenet-forum@lists.r-forge.r-project.org>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: Document.bat
Type: application/octet-stream
Size: 28864 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150606/f2dfac28/attachment.obj>

From laura.benestan at icloud.com  Fri Jun  5 16:41:50 2015
From: laura.benestan at icloud.com (Laura Benestan)
Date: Fri, 05 Jun 2015 10:41:50 -0400
Subject: [adegenet-forum] Find the slope (R square) value for Mantel test
Message-ID: <2D9828EC-D0D1-44DB-863E-7DBE84B02C1A@icloud.com>

Hi, 

I would like to extract the R square value (or determination coefficient) from the Mantel test.
How could I do it after obtaining results from this command:
ibd <- mantel.rtest(dist_geo, dist_fst, 10000)

Thanks,

Laura Benestan
PhD student
Institute of Integrative Biology and Systems (IBIS)
Laboratoire Louis Bernatchez 
Pavillon Charles- Eug?ne-Marchand
1030 Avenue of Medicine 
Universit? Laval
Quebec
G1V 0A6
Canada
418-265-7756
laura.benestan at icloud.com


From roman.lustrik at biolitika.si  Mon Jun  8 12:25:29 2015
From: roman.lustrik at biolitika.si (Roman Lustrik)
Date: Mon, 8 Jun 2015 12:25:29 +0200 (CEST)
Subject: [adegenet-forum] Find the slope (R square) value for Mantel test
In-Reply-To: <2D9828EC-D0D1-44DB-863E-7DBE84B02C1A@icloud.com>
References: <2D9828EC-D0D1-44DB-863E-7DBE84B02C1A@icloud.com>
Message-ID: <1173502431.1508091.1433759129869.JavaMail.zimbra@biolitika.si>

All information from the test is available by calling list elements. Which list elements? See the object structure with function str().

library(ade4)

data(yanomama)
gen <- quasieuclid(as.dist(yanomama$gen))
geo <- quasieuclid(as.dist(yanomama$geo))
r1 <- mantel.rtest(geo,gen)

str(r1)

List of 5
 $ sim   : num [1:99] -0.152 0.272 0.128 -0.198 -0.259 ...
 $ obs   : num 0.51
 $ rep   : int 99
 $ pvalue: num 0.01
 $ call  : language mantel.rtest(m1 = geo, m2 = gen)
 - attr(*, "class")= chr "rtest"


If you want pvalue, you would say r1$pvalue. Can you explain where R square/coefficient of determination come from in this test?

Cheers,
Roman

----
In god we trust, all others bring data.

----- Original Message -----
From: "Laura Benestan" <laura.benestan at icloud.com>
To: adegenet-forum at lists.r-forge.r-project.org
Sent: Friday, June 5, 2015 4:41:50 PM
Subject: [adegenet-forum] Find the slope (R square) value for Mantel test

Hi, 

I would like to extract the R square value (or determination coefficient) from the Mantel test.
How could I do it after obtaining results from this command:
ibd <- mantel.rtest(dist_geo, dist_fst, 10000)

Thanks,

Laura Benestan
PhD student
Institute of Integrative Biology and Systems (IBIS)
Laboratoire Louis Bernatchez 
Pavillon Charles- Eug?ne-Marchand
1030 Avenue of Medicine 
Universit? Laval
Quebec
G1V 0A6
Canada
418-265-7756
laura.benestan at icloud.com


_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum

From 16187393 at sun.ac.za  Thu Jun 18 10:33:52 2015
From: 16187393 at sun.ac.za (Phair, D, Mnr <16187393@sun.ac.za>)
Date: Thu, 18 Jun 2015 08:33:52 +0000
Subject: [adegenet-forum] Specifying Maximum Distance in a Connection network
Message-ID: <DBXPR07MB016CE47797D0FC8377CCE0E8BA50@DBXPR07MB016.eurprd07.prod.outlook.com>

Hi there


I am a masters Student working on spatial sorting in an invasive Bird.

I am looking at comparing patterns of spatial sorting in a species over two continents.

I am not having any issues with the running of the analysis but i wondered if it was possible to specify a maximum distance for any of the connection network methods other than the minimum spanning tree. as the connectivity in that is to high.

I.E. I would like to use something like Delunay Triangulation but limit the maximum distance so that it is the same in South Africa and Australian Dataset.


Regards


David Phair

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150618/4b5b09fe/attachment.html>

From t.jombart at imperial.ac.uk  Thu Jun 18 10:53:55 2015
From: t.jombart at imperial.ac.uk (Jombart, Thibaut)
Date: Thu, 18 Jun 2015 08:53:55 +0000
Subject: [adegenet-forum] Specifying Maximum Distance in a Connection
	network
In-Reply-To: <DBXPR07MB016CE47797D0FC8377CCE0E8BA50@DBXPR07MB016.eurprd07.prod.outlook.com>
References: <DBXPR07MB016CE47797D0FC8377CCE0E8BA50@DBXPR07MB016.eurprd07.prod.outlook.com>
Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF3BE64@icexch-m1.ic.ac.uk>

Hi David, 

yes you can do that, though it is not directly implemented in chooseCN. The idea is to get the adjacency matrix, put '0's where needed, and convert it back to a nb object.

Here's an example:
## load data
 > library(adegenet)
> data(nancycats)

## Delaunay triangulation
> cn1 <- chooseCN(nancycats at other$xy,ask=FALSE,type=1)

## that's the adj. matrix
> neig2mat(nb2neig(cn1))
   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1  0 0 0 0 1 0 0 0 1  0  0  1  0  0  0  1  1
2  0 0 0 0 1 0 0 1 1  1  0  0  0  0  0  0  0
3  0 0 0 0 0 1 0 0 0  0  1  0  0  1  0  1  1
4  0 0 0 0 0 0 0 0 0  1  0  1  1  0  1  1  0
5  1 1 0 0 0 1 1 0 1  0  0  0  0  0  0  0  1
6  0 0 1 0 1 0 1 0 0  0  0  0  0  1  0  0  1
7  0 0 0 0 1 1 0 0 0  0  1  0  0  1  0  0  0
8  0 1 0 0 0 0 0 0 1  1  0  1  0  0  0  0  0
9  1 1 0 0 1 0 0 1 0  0  0  1  0  0  0  0  0
10 0 1 0 1 0 0 0 1 0  0  0  1  0  0  0  0  0
11 0 0 1 0 0 0 1 0 0  0  0  0  1  1  1  1  0
12 1 0 0 1 0 0 0 1 1  1  0  0  0  0  0  1  0
13 0 0 0 1 0 0 0 0 0  0  1  0  0  0  1  1  0
14 0 0 1 0 0 1 1 0 0  0  1  0  0  0  0  0  0
15 0 0 0 1 0 0 0 0 0  0  1  0  1  0  0  0  0
16 1 0 1 1 0 0 0 0 0  0  1  1  1  0  0  0  1
17 1 0 1 0 1 1 0 0 0  0  0  0  0  0  0  1  0

## store it
> matConnect <- neig2mat(nb2neig(cn1))

## get geographic distances
> D <- as.matrix(dist(nancycats$other$xy))
> range(D)
[1]   0.0000 369.1358

## new adj. matrix to be pruned
> matConnect2 <- matConnect

## these are links for distances > 150m
> matConnect2[D>150]
  [1] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
 [38] 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0
 [75] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
[112] 0 0 0 0 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
[149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 1 0
[186] 0 0 0 0 1

## set these to be all 0
> matConnect2[D>150] <- 0
> library(spdep)

## convert back to nb object
> cn2 <- mat2listw(matConnect2)$neighbours
> cn2
Neighbour list object:
Number of regions: 17 
Number of nonzero links: 60 
Percentage nonzero weights: 20.76125 
Average number of links: 3.529412 

## plot to check the differences
> plot(cn1, coords=nancycats$other$xy)

> plot(cn2, coords=nancycats$other$xy)

If others think it is useful, post a feature request on github:
https://github.com/thibautjombart/adegenet/issues

Cheers
Thibaut

==============================
Dr Thibaut Jombart
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - School of Public Health
Norfolk Place, London W2 1PG, UK
Tel. : 0044 (0)20 7594 3658
http://sites.google.com/site/thibautjombart/
http://sites.google.com/site/therepiproject/
http://adegenet.r-forge.r-project.org/
Twitter: @thibautjombart


________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Phair, D, Mnr <16187393 at sun.ac.za> [16187393 at sun.ac.za]
Sent: 18 June 2015 09:33
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] Specifying Maximum Distance in a Connection network

Hi there


I am a masters Student working on spatial sorting in an invasive Bird.

I am looking at comparing patterns of spatial sorting in a species over two continents.

I am not having any issues with the running of the analysis but i wondered if it was possible to specify a maximum distance for any of the connection network methods other than the minimum spanning tree. as the connectivity in that is to high.

I.E. I would like to use something like Delunay Triangulation but limit the maximum distance so that it is the same in South Africa and Australian Dataset.


Regards


David Phair


From Mark.Coulson.ic at uhi.ac.uk  Fri Jun 19 12:23:13 2015
From: Mark.Coulson.ic at uhi.ac.uk (Mark Coulson)
Date: Fri, 19 Jun 2015 10:23:13 +0000
Subject: [adegenet-forum] supplementary individuals
Message-ID: <DB4PR06MB014E6427433C94CC544B220EAA40@DB4PR06MB014.eurprd06.prod.outlook.com>

Hi Thibault,

I am trying to use the pred.sup function to assign 'test' individuals against my baseline data. Both baseline and supplementary individuals files load fine in adegenet but when I run the pred.sup function I get the following:

Error in predict.dapc(dapc1, newdata=sup):
                Number of variables in newdata does not match original data.


Looking at the dataframes, the baseline says it's a matrix of 1800 x 70 (which I expect), however the supplementary says 69 for the latter(?). I have found that when  interrogating @loc.fac for the supplementary file, locus27 is only listed once, while all others are listed 2x. Perhaps a coincidence but this locus is the only one that is monomorphic  in the supplementary individuals but it is polymorphic in the baseline - would this have any effect?

Suggestions?

Thanks,
Mark
Inverness College UHI, a partner in the University of the Highlands and Islands www.inverness.uhi.ac.uk Board of Management of Inverness College (known as Inverness College UHI), Scottish Charity No SC021197.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150619/ba645b91/attachment.html>

From t.jombart at imperial.ac.uk  Fri Jun 19 12:41:27 2015
From: t.jombart at imperial.ac.uk (Jombart, Thibaut)
Date: Fri, 19 Jun 2015 10:41:27 +0000
Subject: [adegenet-forum] supplementary individuals
In-Reply-To: <DB4PR06MB014E6427433C94CC544B220EAA40@DB4PR06MB014.eurprd06.prod.outlook.com>
References: <DB4PR06MB014E6427433C94CC544B220EAA40@DB4PR06MB014.eurprd06.prod.outlook.com>
Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF3D067@icexch-m1.ic.ac.uk>

Hi Mark

I think you identified the problem. genind object keep only polymorphic sites.

You would need to 'repool' your supplementary individuals to make sure loci/alleles match, and then just extract the relevant individuals for the prediction.

Makes sense?

Best
Thibaut


==============================
Dr Thibaut Jombart
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - School of Public Health
Norfolk Place, London W2 1PG, UK
Tel. : 0044 (0)20 7594 3658
http://sites.google.com/site/thibautjombart/
http://sites.google.com/site/therepiproject/
http://adegenet.r-forge.r-project.org/
Twitter: @thibautjombart


________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Mark Coulson [Mark.Coulson.ic at uhi.ac.uk]
Sent: 19 June 2015 11:23
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] supplementary individuals

Hi Thibault,

I am trying to use the pred.sup function to assign ?test? individuals against my baseline data. Both baseline and supplementary individuals files load fine in adegenet but when I run the pred.sup function I get the following:

Error in predict.dapc(dapc1, newdata=sup):
                Number of variables in newdata does not match original data.


Looking at the dataframes, the baseline says it?s a matrix of 1800 x 70 (which I expect), however the supplementary says 69 for the latter(?). I have found that when  interrogating @loc.fac for the supplementary file, locus27 is only listed once, while all others are listed 2x. Perhaps a coincidence but this locus is the only one that is monomorphic  in the supplementary individuals but it is polymorphic in the baseline ? would this have any effect?

Suggestions?

Thanks,
Mark
Inverness College UHI, a partner in the University of the Highlands and Islands www.inverness.uhi.ac.uk Board of Management of Inverness College (known as Inverness College UHI), Scottish Charity No SC021197.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150619/f2991fc3/attachment.html>

From t.jombart at imperial.ac.uk  Fri Jun 19 12:47:18 2015
From: t.jombart at imperial.ac.uk (Jombart, Thibaut)
Date: Fri, 19 Jun 2015 10:47:18 +0000
Subject: [adegenet-forum] supplementary individuals
In-Reply-To: <DB4PR06MB01499DA92F6664BAB29ED8EEAA40@DB4PR06MB014.eurprd06.prod.outlook.com>
References: <DB4PR06MB014E6427433C94CC544B220EAA40@DB4PR06MB014.eurprd06.prod.outlook.com>
 <2CB2DA8E426F3541AB1907F98ABA6570ABF3D067@icexch-m1.ic.ac.uk>,
 <DB4PR06MB01499DA92F6664BAB29ED8EEAA40@DB4PR06MB014.eurprd06.prod.outlook.com>
Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF3D081@icexch-m1.ic.ac.uk>


There is a function 'repool' to do what you need (see ?repool). If A and B are two geninds with different alleles, then it merges the datasets together to have matching alleles and dimensions.

Dropping the locus is possible here indeed, but that's a potentially big loss of information if it is informative in the training set - this locus alone could define the most likely group assignment.

Cheers
Thibaut


________________________________
From: Mark Coulson [Mark.Coulson.ic at uhi.ac.uk]
Sent: 19 June 2015 11:43
To: Jombart, Thibaut
Subject: RE: supplementary individuals

Thanks Thibault!

Not sure what you mean about the repool. All individuals in the supplementary are fixed ?0202?. My initial reaction was to simply drop this locus from both datasets and re-run the DAPC ? what?s the easiest way to tell adegenet to omit a locus?

Best,
Mark


From: Jombart, Thibaut [mailto:t.jombart at imperial.ac.uk]
Sent: 19 June 2015 11:41
To: Mark Coulson; adegenet-forum at lists.r-forge.r-project.org
Subject: RE: supplementary individuals

Hi Mark

I think you identified the problem. genind object keep only polymorphic sites.

You would need to 'repool' your supplementary individuals to make sure loci/alleles match, and then just extract the relevant individuals for the prediction.

Makes sense?

Best
Thibaut


==============================
Dr Thibaut Jombart
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - School of Public Health
Norfolk Place, London W2 1PG, UK
Tel. : 0044 (0)20 7594 3658
http://sites.google.com/site/thibautjombart/
http://sites.google.com/site/therepiproject/
http://adegenet.r-forge.r-project.org/
Twitter: @thibautjombart

________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:adegenet-forum-bounces at lists.r-forge.r-project.org> [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Mark Coulson [Mark.Coulson.ic at uhi.ac.uk]
Sent: 19 June 2015 11:23
To: adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
Subject: [adegenet-forum] supplementary individuals
Hi Thibault,

I am trying to use the pred.sup function to assign ?test? individuals against my baseline data. Both baseline and supplementary individuals files load fine in adegenet but when I run the pred.sup function I get the following:

Error in predict.dapc(dapc1, newdata=sup):
                Number of variables in newdata does not match original data.


Looking at the dataframes, the baseline says it?s a matrix of 1800 x 70 (which I expect), however the supplementary says 69 for the latter(?). I have found that when  interrogating @loc.fac for the supplementary file, locus27 is only listed once, while all others are listed 2x. Perhaps a coincidence but this locus is the only one that is monomorphic  in the supplementary individuals but it is polymorphic in the baseline ? would this have any effect?

Suggestions?

Thanks,
Mark
Inverness College UHI, a partner in the University of the Highlands and Islands www.inverness.uhi.ac.uk<http://www.inverness.uhi.ac.uk> Board of Management of Inverness College (known as Inverness College UHI), Scottish Charity No SC021197.
Inverness College UHI, a partner in the University of the Highlands and Islands www.inverness.uhi.ac.uk Board of Management of Inverness College (known as Inverness College UHI), Scottish Charity No SC021197.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150619/58e0cda7/attachment-0001.html>

From t.jombart at imperial.ac.uk  Fri Jun 19 14:47:00 2015
From: t.jombart at imperial.ac.uk (Jombart, Thibaut)
Date: Fri, 19 Jun 2015 12:47:00 +0000
Subject: [adegenet-forum] supplementary individuals
In-Reply-To: <DB4PR06MB014588C87272F8729E23FCDEAA40@DB4PR06MB014.eurprd06.prod.outlook.com>
References: <DB4PR06MB014E6427433C94CC544B220EAA40@DB4PR06MB014.eurprd06.prod.outlook.com>
 <2CB2DA8E426F3541AB1907F98ABA6570ABF3D067@icexch-m1.ic.ac.uk>,
 <DB4PR06MB01499DA92F6664BAB29ED8EEAA40@DB4PR06MB014.eurprd06.prod.outlook.com>
 <2CB2DA8E426F3541AB1907F98ABA6570ABF3D081@icexch-m1.ic.ac.uk>,
 <DB4PR06MB014588C87272F8729E23FCDEAA40@DB4PR06MB014.eurprd06.prod.outlook.com>
Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF3D0D5@icexch-m1.ic.ac.uk>


Hi there,

(please keep the forum posted)

Easiest way is to subset the individuals you want to keep. genind objects can be subsetted like matrices, i.e. x[i,] where 'x' is your repooled genind and 'i' indicates individuals to keep.

Cheers
Thibaut


________________________________
From: Mark Coulson [Mark.Coulson.ic at uhi.ac.uk]
Sent: 19 June 2015 12:51
To: Jombart, Thibaut
Subject: RE: supplementary individuals

Ok, so I did repool(A,B) and got a matrix with the correct dimensions. How do I extract the, say last 7 populations? I?ve used seppop on the combined dataframe now but obviously repool from here for the supplementary individuals will simply reverse the last action and still give me the wrong locus count.

I have also tried the popsub from the poppr package but same result

Mark

From: Jombart, Thibaut [mailto:t.jombart at imperial.ac.uk]
Sent: 19 June 2015 11:47
To: Mark Coulson; adegenet-forum at lists.r-forge.r-project.org
Subject: RE: supplementary individuals


There is a function 'repool' to do what you need (see ?repool). If A and B are two geninds with different alleles, then it merges the datasets together to have matching alleles and dimensions.

Dropping the locus is possible here indeed, but that's a potentially big loss of information if it is informative in the training set - this locus alone could define the most likely group assignment.

Cheers
Thibaut

________________________________
From: Mark Coulson [Mark.Coulson.ic at uhi.ac.uk]
Sent: 19 June 2015 11:43
To: Jombart, Thibaut
Subject: RE: supplementary individuals
Thanks Thibault!

Not sure what you mean about the repool. All individuals in the supplementary are fixed ?0202?. My initial reaction was to simply drop this locus from both datasets and re-run the DAPC ? what?s the easiest way to tell adegenet to omit a locus?

Best,
Mark


From: Jombart, Thibaut [mailto:t.jombart at imperial.ac.uk]
Sent: 19 June 2015 11:41
To: Mark Coulson; adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
Subject: RE: supplementary individuals

Hi Mark

I think you identified the problem. genind object keep only polymorphic sites.

You would need to 'repool' your supplementary individuals to make sure loci/alleles match, and then just extract the relevant individuals for the prediction.

Makes sense?

Best
Thibaut


==============================
Dr Thibaut Jombart
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - School of Public Health
Norfolk Place, London W2 1PG, UK
Tel. : 0044 (0)20 7594 3658
http://sites.google.com/site/thibautjombart/
http://sites.google.com/site/therepiproject/
http://adegenet.r-forge.r-project.org/
Twitter: @thibautjombart
________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:adegenet-forum-bounces at lists.r-forge.r-project.org> [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Mark Coulson [Mark.Coulson.ic at uhi.ac.uk]
Sent: 19 June 2015 11:23
To: adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
Subject: [adegenet-forum] supplementary individuals
Hi Thibault,

I am trying to use the pred.sup function to assign ?test? individuals against my baseline data. Both baseline and supplementary individuals files load fine in adegenet but when I run the pred.sup function I get the following:

Error in predict.dapc(dapc1, newdata=sup):
                Number of variables in newdata does not match original data.


Looking at the dataframes, the baseline says it?s a matrix of 1800 x 70 (which I expect), however the supplementary says 69 for the latter(?). I have found that when  interrogating @loc.fac for the supplementary file, locus27 is only listed once, while all others are listed 2x. Perhaps a coincidence but this locus is the only one that is monomorphic  in the supplementary individuals but it is polymorphic in the baseline ? would this have any effect?

Suggestions?

Thanks,
Mark
Inverness College UHI, a partner in the University of the Highlands and Islands www.inverness.uhi.ac.uk<http://www.inverness.uhi.ac.uk> Board of Management of Inverness College (known as Inverness College UHI), Scottish Charity No SC021197.
Inverness College UHI, a partner in the University of the Highlands and Islands www.inverness.uhi.ac.uk<http://www.inverness.uhi.ac.uk> Board of Management of Inverness College (known as Inverness College UHI), Scottish Charity No SC021197.
Inverness College UHI, a partner in the University of the Highlands and Islands www.inverness.uhi.ac.uk Board of Management of Inverness College (known as Inverness College UHI), Scottish Charity No SC021197.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150619/ac210c53/attachment.html>

From goatsrunfaster at gmail.com  Tue Jun 23 15:08:48 2015
From: goatsrunfaster at gmail.com (Spencer Bruce)
Date: Tue, 23 Jun 2015 09:08:48 -0400
Subject: [adegenet-forum] Compoplot as table
Message-ID: <CAGjKGebrMEuARx_dnSU4C6Y0EXoGX-CHAcqbrhpSSiZ1tG2J4w@mail.gmail.com>

Hello All,

I'm simply looking to get an output using the compoplot function but in the
form of a table with Q values similar to what is produced by STRUCTURE (as
opposed to the visual output). Does anybody have some simple code that will
produce this?

Thanks in advance!

Best!
-Spencer

-- 
Spencer A Bruce
113 Hill St.
Troy, NY 12180
518 225 0787
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150623/2679e42f/attachment.html>

From roman.lustrik at biolitika.si  Tue Jun 23 15:23:04 2015
From: roman.lustrik at biolitika.si (Roman Lustrik)
Date: Tue, 23 Jun 2015 15:23:04 +0200 (CEST)
Subject: [adegenet-forum] Compoplot as table
In-Reply-To: <CAGjKGebrMEuARx_dnSU4C6Y0EXoGX-CHAcqbrhpSSiZ1tG2J4w@mail.gmail.com>
References: <CAGjKGebrMEuARx_dnSU4C6Y0EXoGX-CHAcqbrhpSSiZ1tG2J4w@mail.gmail.com>
Message-ID: <360856397.1665492.1435065784970.JavaMail.zimbra@biolitika.si>

You have two options. One is to locally hack the function definition to return the data (it currently returns match.call()) or file a feature request on github (https://github.com/thibautjombart/adegenet/issues). 

Cheers, 
Roman 

---- 
In god we trust, all others bring data. 


----- Original Message -----

From: "Spencer Bruce" <goatsrunfaster at gmail.com> 
To: adegenet-forum at lists.r-forge.r-project.org 
Sent: Tuesday, June 23, 2015 3:08:48 PM 
Subject: [adegenet-forum] Compoplot as table 

Hello All, 

I'm simply looking to get an output using the compoplot function but in the form of a table with Q values similar to what is produced by STRUCTURE (as opposed to the visual output). Does anybody have some simple code that will produce this? 

Thanks in advance! 

Best! 
-Spencer 

-- 
Spencer A Bruce 
113 Hill St. 
Troy, NY 12180 
518 225 0787 

_______________________________________________ 
adegenet-forum mailing list 
adegenet-forum at lists.r-forge.r-project.org 
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150623/3c81bf9c/attachment.html>

From t.jombart at imperial.ac.uk  Tue Jun 23 15:16:36 2015
From: t.jombart at imperial.ac.uk (Jombart, Thibaut)
Date: Tue, 23 Jun 2015 13:16:36 +0000
Subject: [adegenet-forum] Compoplot as table
In-Reply-To: <CAGjKGebrMEuARx_dnSU4C6Y0EXoGX-CHAcqbrhpSSiZ1tG2J4w@mail.gmail.com>
References: <CAGjKGebrMEuARx_dnSU4C6Y0EXoGX-CHAcqbrhpSSiZ1tG2J4w@mail.gmail.com>
Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF447CE@icexch-m1.ic.ac.uk>

Hi,

yes, the function 'predict' does what you want:

data(H3N2)
pop(H3N2) <- factor(H3N2$other$epid)
dapc1 <- dapc(H3N2, var.contrib=FALSE, scale=FALSE, n.pca=150, n.da=5)
predict(dapc1)


Cheers
Thibaut

==============================
Dr Thibaut Jombart
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - School of Public Health
Norfolk Place, London W2 1PG, UK
Tel. : 0044 (0)20 7594 3658
http://sites.google.com/site/thibautjombart/
http://sites.google.com/site/therepiproject/
http://adegenet.r-forge.r-project.org/
Twitter: @thibautjombart


________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Spencer Bruce [goatsrunfaster at gmail.com]
Sent: 23 June 2015 14:08
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] Compoplot as table

Hello All,

I'm simply looking to get an output using the compoplot function but in the form of a table with Q values similar to what is produced by STRUCTURE (as opposed to the visual output). Does anybody have some simple code that will produce this?

Thanks in advance!

Best!
-Spencer

--
Spencer A Bruce
113 Hill St.
Troy, NY 12180
518 225 0787
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150623/ce80e7dc/attachment.html>

From legrasjl at supagro.inra.fr  Wed Jun 24 16:04:41 2015
From: legrasjl at supagro.inra.fr (Jean-Luc LEGRAS)
Date: Wed, 24 Jun 2015 16:04:41 +0200
Subject: [adegenet-forum] extracting subset of SNPs with the highest weight
Message-ID: <4F7735B4-465C-472B-83F1-2060E0471DBD@supagro.inra.fr>

Hello
I am using adegenet 1.4-2 on a set of genomic data. I have convert my data to  the plink raw format, in 326000 snp for 82 diploid individuals. All variant position have an ID chromosomenumber+coordinates.
I performed a PCA on genotypes which separates nicely the main groups and I wanted to extract snps which have the highest  contribution  (5%) of the PCA to make a subset of the initial genotypes matrix. I can obtain the list of snps with the highest loadings but I cannot The problem is that when using subset I obtain an empty list:. Is this wrong? Do you have any suggestions?

Thank you in advance.
Best regards.
Jean-Luc
here is the code I used:

GWEVariant <- read.PLINK(file="GWE.raw",map.file = "GWE.map",multicore= FALSE)

GWEVariant.PCA <-glPca(GWEVariant, center = TRUE, scale = FALSE, nf = 7, loadings = TRUE, alleleAsUnit = FALSE, useC = TRUE,n.cores = 4, returnDotProd=FALSE, matDotProd=NULL)
DTloadings<- data.frame(GWEVariant at loc.names,GWEVariant.PCA$loadings)

top <-matrix(nrow=7,ncol=2)
Mqdiscriminants<-matrix(,ncol=8)
colnames(Mqdiscriminants)<-colnames(DTloadings)
liste <-list()
i=1
for (i in 1:7) {
top[i,1]<-quantile(DTloadings[, i+1], probs = .025)
top[i,2]<-quantile(DTloadings[, i+1], probs = .975)
liste <-  which(DTloadings[,i+1]<top[i,1] | DTloadings[,i+1]>top[i,2])
Mqdiscriminants<-rbind(Mqdiscriminants,DTloadings[liste,])
}

Mqdiscriminants <-unique(Mqdiscriminants)
Mqdiscriminants<-na.omit(Mqdiscriminants)

subset<-as.matrix(GWEvVaraint[,Mqdiscriminants[,1]])


From t.jombart at imperial.ac.uk  Wed Jun 24 17:00:57 2015
From: t.jombart at imperial.ac.uk (Jombart, Thibaut)
Date: Wed, 24 Jun 2015 15:00:57 +0000
Subject: [adegenet-forum] extracting subset of SNPs with the highest
 weight
In-Reply-To: <4F7735B4-465C-472B-83F1-2060E0471DBD@supagro.inra.fr>
References: <4F7735B4-465C-472B-83F1-2060E0471DBD@supagro.inra.fr>
Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF4699D@icexch-m1.ic.ac.uk>

Hi there, 

can you try with 'loadingplot'? It invisibly returns the list of most contributing alleles.

Best
Thibaut 

________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Jean-Luc LEGRAS [legrasjl at supagro.inra.fr]
Sent: 24 June 2015 15:04
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] extracting subset of SNPs with the highest weight

Hello
I am using adegenet 1.4-2 on a set of genomic data. I have convert my data to  the plink raw format, in 326000 snp for 82 diploid individuals. All variant position have an ID chromosomenumber+coordinates.
I performed a PCA on genotypes which separates nicely the main groups and I wanted to extract snps which have the highest  contribution  (5%) of the PCA to make a subset of the initial genotypes matrix. I can obtain the list of snps with the highest loadings but I cannot The problem is that when using subset I obtain an empty list:. Is this wrong? Do you have any suggestions?

Thank you in advance.
Best regards.
Jean-Luc
here is the code I used:

GWEVariant <- read.PLINK(file="GWE.raw",map.file = "GWE.map",multicore= FALSE)

GWEVariant.PCA <-glPca(GWEVariant, center = TRUE, scale = FALSE, nf = 7, loadings = TRUE, alleleAsUnit = FALSE, useC = TRUE,n.cores = 4, returnDotProd=FALSE, matDotProd=NULL)
DTloadings<- data.frame(GWEVariant at loc.names,GWEVariant.PCA$loadings)

top <-matrix(nrow=7,ncol=2)
Mqdiscriminants<-matrix(,ncol=8)
colnames(Mqdiscriminants)<-colnames(DTloadings)
liste <-list()
i=1
for (i in 1:7) {
top[i,1]<-quantile(DTloadings[, i+1], probs = .025)
top[i,2]<-quantile(DTloadings[, i+1], probs = .975)
liste <-  which(DTloadings[,i+1]<top[i,1] | DTloadings[,i+1]>top[i,2])
Mqdiscriminants<-rbind(Mqdiscriminants,DTloadings[liste,])
}

Mqdiscriminants <-unique(Mqdiscriminants)
Mqdiscriminants<-na.omit(Mqdiscriminants)

subset<-as.matrix(GWEvVaraint[,Mqdiscriminants[,1]])


_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum

From postmaster at r-forge.wu-wien.ac.at  Thu Jun 25 08:28:51 2015
From: postmaster at r-forge.wu-wien.ac.at (The Post Office)
Date: Thu, 25 Jun 2015 11:58:51 +0530
Subject: [adegenet-forum] Returned mail: Data format error
Message-ID: <mailman.0.1435213854.1100.adegenet-forum@lists.r-forge.r-project.org>

Message could not be delivered

-------------- next part --------------
A non-text attachment was scrubbed...
Name: message.zip
Type: application/octet-stream
Size: 29104 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150625/eed9b833/attachment.obj>

From legrasjl at supagro.inra.fr  Thu Jun 25 16:32:03 2015
From: legrasjl at supagro.inra.fr (Jean-Luc LEGRAS)
Date: Thu, 25 Jun 2015 16:32:03 +0200
Subject: [adegenet-forum] extracting subset of SNPs with the highest
	weight
In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570ABF4699D@icexch-m1.ic.ac.uk>
References: <4F7735B4-465C-472B-83F1-2060E0471DBD@supagro.inra.fr>
 <2CB2DA8E426F3541AB1907F98ABA6570ABF4699D@icexch-m1.ic.ac.uk>
Message-ID: <8544F55D-F649-49DC-B0F8-D4D2741C7C1C@supagro.inra.fr>

Hello
Thank you for your 	answer and solution:

Indeed  i could obtain a plot and the list of SNPs with the highest contribution using  
Axis1<- loadingplot(abs(GWEVariant.PCA$loadings[,1]), threshold=quantile(abs(DTloadings[, i+1]),probs = .95),  lab=rownames(GWEVariant.PCA$loadings), cex.lab=0.7, cex.fac=1, lab.jitter=0, main="Loading plot", xlab="SNP positions", ylab="Contributions", srt = 90, adj = c(0, 0.5))

and then  subset<-as.matrix(GWEVariant[,Axe1$var.idx])


Best regards.
Jean-Luc

Le 24 juin 2015 ? 17:00, Jombart, Thibaut <t.jombart at imperial.ac.uk> a ?crit :

> Hi there, 
> 
> can you try with 'loadingplot'? It invisibly returns the list of most contributing alleles.
> 
> Best
> Thibaut 
> 
> ________________________________________
> From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Jean-Luc LEGRAS [legrasjl at supagro.inra.fr]
> Sent: 24 June 2015 15:04
> To: adegenet-forum at lists.r-forge.r-project.org
> Subject: [adegenet-forum] extracting subset of SNPs with the highest weight
> 
> Hello
> I am using adegenet 1.4-2 on a set of genomic data. I have convert my data to  the plink raw format, in 326000 snp for 82 diploid individuals. All variant position have an ID chromosomenumber+coordinates.
> I performed a PCA on genotypes which separates nicely the main groups and I wanted to extract snps which have the highest  contribution  (5%) of the PCA to make a subset of the initial genotypes matrix. I can obtain the list of snps with the highest loadings but I cannot The problem is that when using subset I obtain an empty list:. Is this wrong? Do you have any suggestions?
> 
> Thank you in advance.
> Best regards.
> Jean-Luc
> here is the code I used:
> 
> GWEVariant <- read.PLINK(file="GWE.raw",map.file = "GWE.map",multicore= FALSE)
> 
> GWEVariant.PCA <-glPca(GWEVariant, center = TRUE, scale = FALSE, nf = 7, loadings = TRUE, alleleAsUnit = FALSE, useC = TRUE,n.cores = 4, returnDotProd=FALSE, matDotProd=NULL)
> DTloadings<- data.frame(GWEVariant at loc.names,GWEVariant.PCA$loadings)
> 
> top <-matrix(nrow=7,ncol=2)
> Mqdiscriminants<-matrix(,ncol=8)
> colnames(Mqdiscriminants)<-colnames(DTloadings)
> liste <-list()
> i=1
> for (i in 1:7) {
> top[i,1]<-quantile(DTloadings[, i+1], probs = .025)
> top[i,2]<-quantile(DTloadings[, i+1], probs = .975)
> liste <-  which(DTloadings[,i+1]<top[i,1] | DTloadings[,i+1]>top[i,2])
> Mqdiscriminants<-rbind(Mqdiscriminants,DTloadings[liste,])
> }
> 
> Mqdiscriminants <-unique(Mqdiscriminants)
> Mqdiscriminants<-na.omit(Mqdiscriminants)
> 
> subset<-as.matrix(GWEvVaraint[,Mqdiscriminants[,1]])
> 
> 
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum


From t.jombart at imperial.ac.uk  Thu Jun 25 16:36:43 2015
From: t.jombart at imperial.ac.uk (Jombart, Thibaut)
Date: Thu, 25 Jun 2015 14:36:43 +0000
Subject: [adegenet-forum] extracting subset of SNPs with the highest
 weight
In-Reply-To: <8544F55D-F649-49DC-B0F8-D4D2741C7C1C@supagro.inra.fr>
References: <4F7735B4-465C-472B-83F1-2060E0471DBD@supagro.inra.fr>
 <2CB2DA8E426F3541AB1907F98ABA6570ABF4699D@icexch-m1.ic.ac.uk>,
 <8544F55D-F649-49DC-B0F8-D4D2741C7C1C@supagro.inra.fr>
Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF46B04@icexch-m1.ic.ac.uk>

Great, glad to see it worked. 

Best
Thibaut


________________________________________
From: Jean-Luc LEGRAS [legrasjl at supagro.inra.fr]
Sent: 25 June 2015 15:32
To: Jombart, Thibaut
Cc: adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] extracting subset of SNPs with the highest weight

Hello
Thank you for your      answer and solution:

Indeed  i could obtain a plot and the list of SNPs with the highest contribution using
Axis1<- loadingplot(abs(GWEVariant.PCA$loadings[,1]), threshold=quantile(abs(DTloadings[, i+1]),probs = .95),  lab=rownames(GWEVariant.PCA$loadings), cex.lab=0.7, cex.fac=1, lab.jitter=0, main="Loading plot", xlab="SNP positions", ylab="Contributions", srt = 90, adj = c(0, 0.5))

and then  subset<-as.matrix(GWEVariant[,Axe1$var.idx])


Best regards.
Jean-Luc

Le 24 juin 2015 ? 17:00, Jombart, Thibaut <t.jombart at imperial.ac.uk> a ?crit :

> Hi there,
>
> can you try with 'loadingplot'? It invisibly returns the list of most contributing alleles.
>
> Best
> Thibaut
>
> ________________________________________
> From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Jean-Luc LEGRAS [legrasjl at supagro.inra.fr]
> Sent: 24 June 2015 15:04
> To: adegenet-forum at lists.r-forge.r-project.org
> Subject: [adegenet-forum] extracting subset of SNPs with the highest weight
>
> Hello
> I am using adegenet 1.4-2 on a set of genomic data. I have convert my data to  the plink raw format, in 326000 snp for 82 diploid individuals. All variant position have an ID chromosomenumber+coordinates.
> I performed a PCA on genotypes which separates nicely the main groups and I wanted to extract snps which have the highest  contribution  (5%) of the PCA to make a subset of the initial genotypes matrix. I can obtain the list of snps with the highest loadings but I cannot The problem is that when using subset I obtain an empty list:. Is this wrong? Do you have any suggestions?
>
> Thank you in advance.
> Best regards.
> Jean-Luc
> here is the code I used:
>
> GWEVariant <- read.PLINK(file="GWE.raw",map.file = "GWE.map",multicore= FALSE)
>
> GWEVariant.PCA <-glPca(GWEVariant, center = TRUE, scale = FALSE, nf = 7, loadings = TRUE, alleleAsUnit = FALSE, useC = TRUE,n.cores = 4, returnDotProd=FALSE, matDotProd=NULL)
> DTloadings<- data.frame(GWEVariant at loc.names,GWEVariant.PCA$loadings)
>
> top <-matrix(nrow=7,ncol=2)
> Mqdiscriminants<-matrix(,ncol=8)
> colnames(Mqdiscriminants)<-colnames(DTloadings)
> liste <-list()
> i=1
> for (i in 1:7) {
> top[i,1]<-quantile(DTloadings[, i+1], probs = .025)
> top[i,2]<-quantile(DTloadings[, i+1], probs = .975)
> liste <-  which(DTloadings[,i+1]<top[i,1] | DTloadings[,i+1]>top[i,2])
> Mqdiscriminants<-rbind(Mqdiscriminants,DTloadings[liste,])
> }
>
> Mqdiscriminants <-unique(Mqdiscriminants)
> Mqdiscriminants<-na.omit(Mqdiscriminants)
>
> subset<-as.matrix(GWEvVaraint[,Mqdiscriminants[,1]])
>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum