[adegenet-forum] addendum to question about how to remove outliers (http://lists.r-forge.r-project.org/pipermail/adegenet-forum/2015-July/001194.html)

Thibaut Jombart thibautjombart at gmail.com
Tue Apr 4 15:18:28 CEST 2017


Hi Ella,

what you want is possible; workflow would be:
1) get a vector of loci to remove (different options here)
2) subset the genind

Example using microbov:
> locNames(microbov)
 [1] "INRA63"  "INRA5"   "ETH225"  "ILSTS5"  "HEL5"    "HEL1"    "INRA35"
 [8] "ETH152"  "INRA23"  "ETH10"   "HEL9"    "CSSM66"  "INRA32"  "ETH3"
[15] "BM2113"  "BM1824"  "HEL13"   "INRA37"  "BM1818"  "ILSTS6"  "MM12"
[22] "CSRM60"  "ETH185"  "HAUT24"  "HAUT27"  "TGLA227" "TGLA126" "TGLA122"
[29] "TGLA53"  "SPS115"

Let say I want to remove all 'INRA ...' markers, and my input is the
allele names:
> x <- grep("INRA", locNames(microbov, TRUE), value = TRUE)
> x
 [1] "INRA63.167" "INRA63.171" "INRA63.173" "INRA63.175" "INRA63.177"
 [6] "INRA63.179" "INRA63.181" "INRA63.183" "INRA63.185" "INRA5.137"
[11] "INRA5.139"  "INRA5.141"  "INRA5.143"  "INRA5.145"  "INRA5.147"
[16] "INRA5.149"  "INRA35.102" "INRA35.104" "INRA35.106" "INRA35.108"
[21] "INRA35.110" "INRA35.114" "INRA35.120" "INRA23.193" "INRA23.197"
[26] "INRA23.199" "INRA23.201" "INRA23.203" "INRA23.205" "INRA23.207"
[31] "INRA23.209" "INRA23.211" "INRA23.213" "INRA23.215" "INRA23.217"
[36] "INRA23.219" "INRA32.160" "INRA32.162" "INRA32.164" "INRA32.166"
[41] "INRA32.168" "INRA32.174" "INRA32.176" "INRA32.178" "INRA32.180"
[46] "INRA32.182" "INRA32.184" "INRA32.186" "INRA32.202" "INRA32.204"
[51] "INRA37.112" "INRA37.114" "INRA37.116" "INRA37.118" "INRA37.120"
[56] "INRA37.122" "INRA37.124" "INRA37.126" "INRA37.128" "INRA37.130"
[61] "INRA37.132" "INRA37.134" "INRA37.136" "INRA37.138" "INRA37.140"
[66] "INRA37.142" "INRA37.144" "INRA37.146" "INRA37.148"

> loc_to_remove <- unique(sub("[.].*", "", x))
> loc_to_remove
[1] "INRA63" "INRA5"  "INRA35" "INRA23" "INRA32" "INRA37"

> loc_to_keep <- setdiff(locNames(microbov), loc_to_remove)
> loc_to_keep
 [1] "ETH225"  "ILSTS5"  "HEL5"    "HEL1"    "ETH152"  "ETH10"   "HEL9"
 [8] "CSSM66"  "ETH3"    "BM2113"  "BM1824"  "HEL13"   "BM1818"  "ILSTS6"
[15] "MM12"    "CSRM60"  "ETH185"  "HAUT24"  "HAUT27"  "TGLA227" "TGLA126"
[22] "TGLA122" "TGLA53"  "SPS115"

> new_data
/// GENIND OBJECT /////////

 // 704 individuals; 24 loci; 304 alleles; size: 944.1 Kb

 // Basic content
   @tab:  704 x 304 matrix of allele counts
   @loc.n.all: number of alleles per locus (range: 5-22)
   @loc.fac: locus factor for the 304 columns of @tab
   @all.names: list of allele names for each locus
   @ploidy: ploidy of each individual  (range: 2-2)
   @type:  codom
   @call: .local(x = x, i = i, j = j, loc = ..1, drop = drop)

 // Optional content
   @pop: population of each individual (group size range: 30-61)
   @other: a list containing: coun  breed  spe

> locNames(new_data)
 [1] "ETH225"  "ILSTS5"  "HEL5"    "HEL1"    "ETH152"  "ETH10"   "HEL9"
 [8] "CSSM66"  "ETH3"    "BM2113"  "BM1824"  "HEL13"   "BM1818"  "ILSTS6"
[15] "MM12"    "CSRM60"  "ETH185"  "HAUT24"  "HAUT27"  "TGLA227" "TGLA126"
[22] "TGLA122" "TGLA53"  "SPS115"


Done!

The only difference in your case will be the regular expression to use
for your data, likely something like:
sub("_.*", "", x)

Otherwise, it should all work fine.
Cheers
Thibaut

--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College London
Head of RECON: repidemicsconsortium.org
sites.google.com/site/thibautjombart/
github.com/thibautjombart
Twitter: @TeebzR
+44(0)20 7594 3658


On 30 March 2017 at 21:27, Ella Bowles <bowlese at gmail.com> wrote:
> Hello,
>
> I am trying to remove a set of loci, as in the question posted here:
> http://lists.r-forge.r-project.org/pipermail/adegenet-forum/2015-July/001194.html.
> However, I'm wondering if there is a way to list the exact locus names
> instead of simply the position of the locus. And, what is more, if there is
> a way to provide a vector with loci to remove, with the identifier be only
> part of the full locus name?
>
> Say I have the following
>> dat.s_subset <- dat.s[1:5,1:5]
>> dat.s_subset
> /// GENIND OBJECT /////////
>
>  // 5 individuals; 3 loci; 5 alleles; size: 6.7 Kb
>
>  // Basic content
>    @tab:  5 x 5 matrix of allele counts
>    @loc.n.all: number of alleles per locus (range: 1-2)
>    @loc.fac: locus factor for the 5 columns of @tab
>    @all.names: list of allele names for each locus
>    @ploidy: ploidy of each individual  (range: 2-2)
>    @type:  codom
>    @call: .local(x = x, i = i, j = j, drop = drop)
>
>  // Optional content
>    @pop: population of each individual (group size range: 5-5)
>> locNames(dat.s_subset)
> [1] "12706_10" "14223_16" "14481_7"
>
> As I understand it, if I want to remove locus 14223_16, I can use
>> toRemove=c(2)
>> x=dat.s_subset[loc=-toRemove]
>> x
> /// GENIND OBJECT /////////
>
>  // 5 individuals; 2 loci; 3 alleles; size: 6.3 Kb
>
>  // Basic content
>    @tab:  5 x 3 matrix of allele counts
>    @loc.n.all: number of alleles per locus (range: 1-2)
>    @loc.fac: locus factor for the 3 columns of @tab
>    @all.names: list of allele names for each locus
>    @ploidy: ploidy of each individual  (range: 2-2)
>    @type:  codom
>    @call: .local(x = x, i = i, j = j, loc = ..1, drop = drop)
>
>  // Optional content
>    @pop: population of each individual (group size range: 5-5)
>> locNames(x)
> [1] "12706_10" "14481_7"
>
> However, I have thousands of loci, and from the analysis that I have done,
> my vector of loci that I want to remove has the number of the locus before
> the underscore. Is there a way of specifying loci using only this
> information? So, I'd need something like the unix wildcard "*", and to be
> able to say something like toRemove=c(14223*).
>
> I've done a bunch of searching on the web to see if it would be easier to do
> this outside of adegenet, but it seems like it is going to be hard.
>
> Any help would be much appreciated.
>
> Sincerely,
> Ella
>
> --
> Ella Bowles, PhD
> Postdoctoral Researcher
> Department of Biology
> Concordia University
>
> Website: https://ellabowlesphd.wordpress.com/
> Email: bowlese at gmail.com
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum


More information about the adegenet-forum mailing list