[adegenet-forum] addendum to question about how to remove outliers (http://lists.r-forge.r-project.org/pipermail/adegenet-forum/2015-July/001194.html)
Thibaut Jombart
thibautjombart at gmail.com
Tue Apr 4 15:18:28 CEST 2017
Hi Ella,
what you want is possible; workflow would be:
1) get a vector of loci to remove (different options here)
2) subset the genind
Example using microbov:
> locNames(microbov)
[1] "INRA63" "INRA5" "ETH225" "ILSTS5" "HEL5" "HEL1" "INRA35"
[8] "ETH152" "INRA23" "ETH10" "HEL9" "CSSM66" "INRA32" "ETH3"
[15] "BM2113" "BM1824" "HEL13" "INRA37" "BM1818" "ILSTS6" "MM12"
[22] "CSRM60" "ETH185" "HAUT24" "HAUT27" "TGLA227" "TGLA126" "TGLA122"
[29] "TGLA53" "SPS115"
Let say I want to remove all 'INRA ...' markers, and my input is the
allele names:
> x <- grep("INRA", locNames(microbov, TRUE), value = TRUE)
> x
[1] "INRA63.167" "INRA63.171" "INRA63.173" "INRA63.175" "INRA63.177"
[6] "INRA63.179" "INRA63.181" "INRA63.183" "INRA63.185" "INRA5.137"
[11] "INRA5.139" "INRA5.141" "INRA5.143" "INRA5.145" "INRA5.147"
[16] "INRA5.149" "INRA35.102" "INRA35.104" "INRA35.106" "INRA35.108"
[21] "INRA35.110" "INRA35.114" "INRA35.120" "INRA23.193" "INRA23.197"
[26] "INRA23.199" "INRA23.201" "INRA23.203" "INRA23.205" "INRA23.207"
[31] "INRA23.209" "INRA23.211" "INRA23.213" "INRA23.215" "INRA23.217"
[36] "INRA23.219" "INRA32.160" "INRA32.162" "INRA32.164" "INRA32.166"
[41] "INRA32.168" "INRA32.174" "INRA32.176" "INRA32.178" "INRA32.180"
[46] "INRA32.182" "INRA32.184" "INRA32.186" "INRA32.202" "INRA32.204"
[51] "INRA37.112" "INRA37.114" "INRA37.116" "INRA37.118" "INRA37.120"
[56] "INRA37.122" "INRA37.124" "INRA37.126" "INRA37.128" "INRA37.130"
[61] "INRA37.132" "INRA37.134" "INRA37.136" "INRA37.138" "INRA37.140"
[66] "INRA37.142" "INRA37.144" "INRA37.146" "INRA37.148"
> loc_to_remove <- unique(sub("[.].*", "", x))
> loc_to_remove
[1] "INRA63" "INRA5" "INRA35" "INRA23" "INRA32" "INRA37"
> loc_to_keep <- setdiff(locNames(microbov), loc_to_remove)
> loc_to_keep
[1] "ETH225" "ILSTS5" "HEL5" "HEL1" "ETH152" "ETH10" "HEL9"
[8] "CSSM66" "ETH3" "BM2113" "BM1824" "HEL13" "BM1818" "ILSTS6"
[15] "MM12" "CSRM60" "ETH185" "HAUT24" "HAUT27" "TGLA227" "TGLA126"
[22] "TGLA122" "TGLA53" "SPS115"
> new_data
/// GENIND OBJECT /////////
// 704 individuals; 24 loci; 304 alleles; size: 944.1 Kb
// Basic content
@tab: 704 x 304 matrix of allele counts
@loc.n.all: number of alleles per locus (range: 5-22)
@loc.fac: locus factor for the 304 columns of @tab
@all.names: list of allele names for each locus
@ploidy: ploidy of each individual (range: 2-2)
@type: codom
@call: .local(x = x, i = i, j = j, loc = ..1, drop = drop)
// Optional content
@pop: population of each individual (group size range: 30-61)
@other: a list containing: coun breed spe
> locNames(new_data)
[1] "ETH225" "ILSTS5" "HEL5" "HEL1" "ETH152" "ETH10" "HEL9"
[8] "CSSM66" "ETH3" "BM2113" "BM1824" "HEL13" "BM1818" "ILSTS6"
[15] "MM12" "CSRM60" "ETH185" "HAUT24" "HAUT27" "TGLA227" "TGLA126"
[22] "TGLA122" "TGLA53" "SPS115"
Done!
The only difference in your case will be the regular expression to use
for your data, likely something like:
sub("_.*", "", x)
Otherwise, it should all work fine.
Cheers
Thibaut
--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College London
Head of RECON: repidemicsconsortium.org
sites.google.com/site/thibautjombart/
github.com/thibautjombart
Twitter: @TeebzR
+44(0)20 7594 3658
On 30 March 2017 at 21:27, Ella Bowles <bowlese at gmail.com> wrote:
> Hello,
>
> I am trying to remove a set of loci, as in the question posted here:
> http://lists.r-forge.r-project.org/pipermail/adegenet-forum/2015-July/001194.html.
> However, I'm wondering if there is a way to list the exact locus names
> instead of simply the position of the locus. And, what is more, if there is
> a way to provide a vector with loci to remove, with the identifier be only
> part of the full locus name?
>
> Say I have the following
>> dat.s_subset <- dat.s[1:5,1:5]
>> dat.s_subset
> /// GENIND OBJECT /////////
>
> // 5 individuals; 3 loci; 5 alleles; size: 6.7 Kb
>
> // Basic content
> @tab: 5 x 5 matrix of allele counts
> @loc.n.all: number of alleles per locus (range: 1-2)
> @loc.fac: locus factor for the 5 columns of @tab
> @all.names: list of allele names for each locus
> @ploidy: ploidy of each individual (range: 2-2)
> @type: codom
> @call: .local(x = x, i = i, j = j, drop = drop)
>
> // Optional content
> @pop: population of each individual (group size range: 5-5)
>> locNames(dat.s_subset)
> [1] "12706_10" "14223_16" "14481_7"
>
> As I understand it, if I want to remove locus 14223_16, I can use
>> toRemove=c(2)
>> x=dat.s_subset[loc=-toRemove]
>> x
> /// GENIND OBJECT /////////
>
> // 5 individuals; 2 loci; 3 alleles; size: 6.3 Kb
>
> // Basic content
> @tab: 5 x 3 matrix of allele counts
> @loc.n.all: number of alleles per locus (range: 1-2)
> @loc.fac: locus factor for the 3 columns of @tab
> @all.names: list of allele names for each locus
> @ploidy: ploidy of each individual (range: 2-2)
> @type: codom
> @call: .local(x = x, i = i, j = j, loc = ..1, drop = drop)
>
> // Optional content
> @pop: population of each individual (group size range: 5-5)
>> locNames(x)
> [1] "12706_10" "14481_7"
>
> However, I have thousands of loci, and from the analysis that I have done,
> my vector of loci that I want to remove has the number of the locus before
> the underscore. Is there a way of specifying loci using only this
> information? So, I'd need something like the unix wildcard "*", and to be
> able to say something like toRemove=c(14223*).
>
> I've done a bunch of searching on the web to see if it would be easier to do
> this outside of adegenet, but it seems like it is going to be hard.
>
> Any help would be much appreciated.
>
> Sincerely,
> Ella
>
> --
> Ella Bowles, PhD
> Postdoctoral Researcher
> Department of Biology
> Concordia University
>
> Website: https://ellabowlesphd.wordpress.com/
> Email: bowlese at gmail.com
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
More information about the adegenet-forum
mailing list