[GenABEL-dev] sequencing, VCF and tabix

Yurii Aulchenko yurii.aulchenko at gmail.com
Wed Dec 15 12:16:06 CET 2010


Something to think about -- if we wnatt o move to sequencing data, we
need some means to read them in.

VCF seems to be one of the formats, which may be used for statistical
analyzes, but these are big files... Using 'tabix' indexes (below) may
be another out-of-RAM model we should be considered next

Yurii

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/2010_11/README.20100804_merged_snp_set

File format
-----------

The variant sites themselves are in vcf format which is documented here
http://vcftools.sourceforge.net/

Each file has a tabix index associated with it allowing subsections to
be downloaded using the tabix program which can be downloaded here:
http://sourceforge.net/projects/samtools/files/tabix/

To install tabix it is best to checkout the code using sourceforge and
follow these instructions

svn co https://samtools.svn.sourceforge.net/svnroot/samtools/trunk/tabix
cd tabix
make
./tabix

Then this style of command will give you subsets of data from the remote files

tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/2010_11/ALL.2of4intersection.20100804.sites.vcf.gz
19:9185575-9186513


More information about the genabel-devel mailing list