[GenABEL-dev] sequencing, VCF and tabix
Yurii Aulchenko
yurii.aulchenko at gmail.com
Wed Dec 15 12:16:06 CET 2010
Something to think about -- if we wnatt o move to sequencing data, we
need some means to read them in.
VCF seems to be one of the formats, which may be used for statistical
analyzes, but these are big files... Using 'tabix' indexes (below) may
be another out-of-RAM model we should be considered next
Yurii
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/2010_11/README.20100804_merged_snp_set
File format
-----------
The variant sites themselves are in vcf format which is documented here
http://vcftools.sourceforge.net/
Each file has a tabix index associated with it allowing subsections to
be downloaded using the tabix program which can be downloaded here:
http://sourceforge.net/projects/samtools/files/tabix/
To install tabix it is best to checkout the code using sourceforge and
follow these instructions
svn co https://samtools.svn.sourceforge.net/svnroot/samtools/trunk/tabix
cd tabix
make
./tabix
Then this style of command will give you subsets of data from the remote files
tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/2010_11/ALL.2of4intersection.20100804.sites.vcf.gz
19:9185575-9186513
More information about the genabel-devel
mailing list