Welcome to VCFPy’s documentation!

VCFPy is a Python 3 library with good support for both reading and writing VCF files. The documentation is split into three parts (accessible through the navigation on the left):

Installation & Getting Started
Instructions for the installation of the module and some examples to get you started.
API Documentation
This section contains the API documentation for the module
Project Info
More information on the project, including the changelog, list of contributing authors, and contribution instructions.

Quick Example

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import vcfpy

# Open input, add FILTER header, and open output file
reader = vcfpy.Reader.from_path('input.vcf')
reader.header.add_filter_line(vcfpy.OrderedDict([
    ('ID', 'DP10'), ('Description', 'total DP < 10')]))
writer = vcfpy.Writer.from_path('/dev/stdout', reader.header)

# Add "DP10" filter to records having less than 10 reads
for record in reader:
    ad = sum(c.data.get('DP', 0) for c in record.calls)
    if ad < 10:
        record.add_filter('DP10')
    writer.write_record(record)

Features

  • Support for reading and writing VCF v4.3
  • Interface to INFO and FORMAT fields is based on OrderedDict allows for easier modification than PyVCF (also I find this more pythonic)
  • Read (and jump in) and write BGZF files just using vcfpy

Frequently Asked Questions

Why another Python library for VCF?

I’ve been using PyVCF with quite some success in the past. However, the main bottleneck of PyVCF is when you want to modify the per-sample genotype information. There are some issues in the tracker of PyVCF but none of them can really be considered solved. I tried several hours to solve these problems within PyVCF but this never got far or towards a complete rewrite...

For this reason, VCFPy was born and here it is!

Why Python 3 only?
As I’m only using Python 3 code, I see no advantage in carrying around support for legacy Python 2 and maintaining it. At a later point when VCFPy is known to be stable, Python 2 support might be added if someone contributes a pull request.
What’s the state?
VCFPy is the result of two full days of development plus some maintenance work later now (right now). I’m using it in several projects but it is not as battle-tested as PyVCF.
What’s the difference to PyVCF?
The main difference is technical. Instead of using collections.namedtuple for storing the call annotation, VCFPy uses collections.OrderedDict. This has the advantage that (1) access to optional settings is much more pythonic using .get(KEY, DEFAULT) instead of getattr(). Further, (2) adding call annotations (FORMAT) fields is able without any performance penalty where for PyVCF, copy.deepcopy has to be used at some point which is very slow. There has not been any movement in supporting modifying FORMAT fields in PyVCF and here is a library that does this well.
What’s the aim?
The aim of the project is to provide simple yet efficient read and write access to VCF files. Eventually, PySAM will probably be a better choice once it has a Python wrapper for the VCF part of htslib. However, as this is still misssing, VCFPy is a good solution for the time being.

Installation

Stable release

To install vcfpy, run this command in your terminal:

$ pip install vcfpy

This is the preferred method to install VCFPy, as it will always install the most recent stable release.

If you don’t have pip installed, this Python installation guide can guide you through the process.

From sources

The sources for vcfpy can be downloaded from the Github repo.

You can either clone the public repository:

$ git clone git://github.com/bihealth/vcfpy

Or download the tarball:

$ curl  -OL https://github.com/bihealth/vcfpy/tarball/master

Once you have a copy of the source, you can install it with:

$ python setup.py install

Getting Started

After installation, you can use VCFPy in your project simply by importing the module.

import vcfpy

That’s all, continue and look at the list of examples.

Examples

This chapter contains several examples for the most important use cases of VCFPy.

Reading VCF Files

The following is an example for reading VCF files and writing out a TSV file with the genotype calls of all SNVs. You can find the example Python and VCF file in the sources below the directory examples/vcf_to_tsv.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import vcfpy

# Open file, this will read in the header
reader = vcfpy.Reader.from_path('input.vcf')

# Build and print header
header = ['#CHROM', 'POS', 'REF', 'ALT'] + reader.header.samples.names
print('\t'.join(header))

for record in reader:
    if not record.is_snv():
        continue
    line = [record.CHROM, record.POS, record.REF]
    line += [alt.value for alt in record.ALT]
    line += [call.data.get('GT') or './.' for call in record.calls]
    print('\t'.join(map(str, line)))

The program call looks as follows.

$ ./vcf_to_tsv.py
#CHROM	POS	REF	ALT	BLANK	NA12878	NA12891	NA12892	NA19238	NA19239	NA19240
chr22	42522392	G	A	0/0	0/1	0/1	0/0	0/0	0/0	0/0
chr22	42522597	C	T	0/1	0/0	0/0	0/0	0/0	0/0	0/0
chr22	42522613	G	C	0/1	0/1	0/0	0/1	0/1	0/1	0/1
chr22	42523003	A	G	0/1	1/1	0/1	0/1	0/1	0/1	0/1
chr22	42523209	T	C	0/1	1/1	0/1	0/1	0/1	0/1	0/1
chr22	42523211	T	C	0/0	0/1	0/1	0/0	0/0	0/0	0/0
chr22	42523409	G	T	0/1	0/1	0/0	0/1	0/1	0/1	0/1
chr22	42523491	C	T	0/1	0/0	0/0	0/0	0/0	0/0	0/0
chr22	42523507	A	G	0/1	0/0	0/0	0/0	0/0	0/0	0/0
chr22	42523805	C	T	0/0	0/0	0/1	0/0	0/0	0/0	0/0
chr22	42523943	A	G	0/1	1/1	0/1	0/1	0/1	0/1	0/1
chr22	42524435	T	A	0/1	0/0	0/0	0/0	0/0	0/1	0/1
[...]

Writing VCF Files

The following shows how to add values to the FILTER column to records of an existing VCF file. Adding to existing records is simpler than constructing them from scratch, of course.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import vcfpy

# Open input, add FILTER header, and open output file
reader = vcfpy.Reader.from_path('input.vcf')
reader.header.add_filter_line(vcfpy.OrderedDict([
    ('ID', 'DP10'), ('Description', 'total DP < 10')]))
writer = vcfpy.Writer.from_path('/dev/stdout', reader.header)

# Add "DP10" filter to records having less than 10 reads
for record in reader:
    ad = sum(c.data.get('DP', 0) for c in record.calls)
    if ad < 10:
        record.add_filter('DP10')
    writer.write_record(record)

The program call looks as follows.

##fileformat=VCFv4.3
##contig=<ID=20,length=62435964>
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
##FILTER=<ID=DP10,Description="total DP < 10">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA00001	NA00002	NA00003
20	14370	rs6054257	G	A	29	PASS	NS=3;DP=14;AF=0.5;DB;H2	GT:GQ:DP:HQ	0|0:48:1:51,51	1|0:48:8:51,51	1/1:43:5:.,.
20	17330	.	T	A	3	q10	NS=3;DP=11;AF=0.017	GT:GQ:DP:HQ	0|0:49:3:58,50	0|1:3:5:65,3	0/0:41:3:40,30
20	1110696	rs6040355	A	G,T	67	PASS	NS=2;DP=10;AF=0.333,0.667;AA=T;DB	GT:GQ:DP:HQ1|2:21:6:23,27	2|1:2:0:18,2	2/2:35:4:40,30
20	1230237	.	T	.	47	PASS	NS=3;DP=13;AA=T	GT:GQ:DP:HQ	0|0:54:7:56,60	0|0:48:4:51,51	0/0:61:2:40,30
20	1234567	microsat1	GTC	G,GTCT	50	PASS;DP10	NS=3;DP=9;AA=G	GT:GQ:DP	0/1:35:4	0/2:17:2	1/1:40:3

Jumping in Tabix-indexed Files

The following shows a small program that extracts a genomic region from the input VCF file and writes it to stdout.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import vcfpy

# Open input, add FILTER header, and open output file
reader = vcfpy.Reader.from_path('input.vcf.gz')
writer = vcfpy.Writer.from_path('/dev/stdout', reader.header)

# Fetch region 20:1,110,694-1,230,237.  Note that the coordinates
# in the API call are zero-based and describe half-open intervals.
for record in reader.fetch('20', 1110695, 1230237):
    writer.write_record(record)

The program call looks as follows.

##fileformat=VCFv4.3
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta
##contig=<ID=1,length=249250621>
##contig=<ID=2,length=243199373>
##contig=<ID=20,length=62435964>
##phasing=partial
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA00001	NA00002	NA00003
20	1110696	rs6040355	A	G,T	67	PASS	NS=2;DP=10;AF=0.333,0.667;AA=T;DB	GT:GQ:DP:HQ1|2:21:6:23,27	2|1:2:0:18,2	2/2:35:4:40,30
20	1230237	.	T	.	47	PASS	NS=3;DP=13;AA=T	GT:GQ:DP:HQ	0|0:54:7:56,60	0|0:48:4:51,51	0/0:61:2:40,30

Best Practice

While not strictly part of the documentation of VCFPy, we include some notes on hints that we consider best practice when building VCF processing applications.

Keep Input Verbatim Where Possible

Try to keep the input verbatim if there is no strong reason for adjusting it. Strong reasons include fixing Type or Number in header lines describing arrays of strings, for example.

Whenever possible, keep the header order intact. VCFPy does this automatically for you (in contrast to PyVCF).

Prefer Soft-Filters over Hard-Filters

Soft-filters mean annotating your VCF records in the FILTER column whereas Hard-filters mean removing records from VCF file. In many situations, it is useful to keep around all VCF records and just annotate why they are to be dropped. Then, in the last step, only the interesting ones are kept.

This makes tracing back easier when and why a record was removed.

Input/Output

vcfpy.Reader

class vcfpy.Reader(stream, path=None, tabix_path=None, record_checks=None, parse_samples=None)[source]

Class for parsing of files from file-like objects

Instead of using the constructor, use the class methods from_stream() and from_path().

On construction, the header will be read from the file which can cause problems. After construction, Reader can be used as an iterable of Record.

Raises:InvalidHeaderException in the case of problems reading the header

Note

It is important to note that the header member is used during the parsing of the file. If you need a modified version then create a copy, e.g., using :py:method:`~vcfpy.header.Header.copy`.

Note

If you use the parse_samples feature and you write out records then you must not change the FORMAT of the record.

close()[source]

Close underlying stream

fetch(chrom_or_region, begin=None, end=None)[source]

Jump to the start position of the given chromosomal position and limit iteration to the end position

Parameters:
  • chrom_or_region (str) – name of the chromosome to jump to if begin and end are given and a samtools region string otherwise (e.g. “chr1:123,456-123,900”).
  • begin (int) – 0-based begin position (inclusive)
  • end (int) – 0-based end position (exclusive)
classmethod from_path(klass, path, tabix_path=None, record_checks=None, parse_samples=None)[source]

Create new Reader from path

Note

If you use the parse_samples feature and you write out records then you must not change the FORMAT of the record.

Parameters:
  • path – the path to load from (converted to str for compatibility with path.py)
  • tabix_path – optional string with path to TBI index, automatic inferral from path will be tried on the fly if not given
  • record_checks (list) – record checks to perform, can contain ‘INFO’ and ‘FORMAT’
classmethod from_stream(klass, stream, path=None, tabix_path=None, record_checks=None, parse_samples=None)[source]

Create new Reader from file

Note

If you use the parse_samples feature and you write out records then you must not change the FORMAT of the record.

Parameters:
  • streamfile-like object to read from
  • path – optional string with path to store (for display only)
  • record_checks (list) – record checks to perform, can contain ‘INFO’ and ‘FORMAT’
  • parse_samples (list) – list of str values with names of samples to parse call information for (for speedup); leave to None for ignoring
header = None

the Header

parse_samples = None

if set, list of samples to parse for

parser = None

the parser to use

path = None

optional str with the path to the stream

record_checks = None

checks to perform on records, can contain ‘FORMAT’ and ‘INFO’

stream = None

stream (file-like object) to read from

tabix_file = None

the pysam.TabixFile used for reading from index bgzip-ed VCF; constructed on the fly

tabix_path = None

optional str with path to tabix file

vcfpy.Writer

class vcfpy.Writer(stream, header, path=None)[source]

Class for writing VCF files to file-like objects

Instead of using the constructor, use the class methods from_stream() and from_path().

The writer has to be constructed with a Header object and the full VCF header will be written immediately on construction. This, of course, implies that modifying the header after construction is illegal.

close()[source]

Close underlying stream

classmethod from_path(klass, path, header)[source]

Create new Writer from path

Parameters:
  • path – the path to load from (converted to str for compatibility with path.py)
  • header – VCF header to use, lines and samples are deep-copied
classmethod from_stream(klass, stream, header, path=None, use_bgzf=None)[source]

Create new Writer from file

Note that for getting bgzf support, you have to pass in a stream opened in binary mode. Further, you either have to provide a path ending in ".gz" or set use_bgzf=True. Otherwise, you will get the notorious “TypeError: ‘str’ does not support the buffer interface”.

Parameters:
  • streamfile-like object to write to
  • header – VCF header to use, lines and samples are deep-copied
  • path – optional string with path to store (for display only)
  • use_bgzf – indicator whether to write bgzf to stream if True, prevent if False, interpret path if None
header = None

the :py:class:~vcfpy.header.Header` to write out, will be deep-copied into the Writer on initialization

path = None

optional str with the path to the stream

stream = None

stream (file-like object) to read from

write_record(record)[source]

Write out the given vcfpy.record.Record to this Writer

Exceptions

vcfpy.VCFPyException

exception vcfpy.VCFPyException[source]

Base class for module’s exception

vcfpy.InvalidHeaderException

exception vcfpy.InvalidHeaderException[source]

Raised in the case of invalid header formatting

vcfpy.InvalidRecordException

exception vcfpy.InvalidRecordException[source]

Raised in the case of invalid record formatting

vcfpy.IncorrectVCFFormat

exception vcfpy.IncorrectVCFFormat[source]

Raised on problems parsing VCF

vcfpy.HeaderNotFound

exception vcfpy.HeaderNotFound[source]

Raised when a VCF header could not be found

Records

vcfpy.Record

class vcfpy.Record(CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT, calls)[source]

Represent one record from the VCF file

Record objects are iterators of their calls

ALT = None

A list of alternative allele records of type AltRecord

CHROM = None

A str with the chromosome name

FILTER = None

A list of strings for the FILTER column

FORMAT = None

A list of strings for the FORMAT column

ID = None

A list of the semicolon-separated values of the ID column

INFO = None

An OrderedDict giving the values of the INFO column, flags are mapped to True

POS = None

An int with a 1-based begin position

QUAL = None

The quality value, can be None

REF = None

A str with the REF value

add_filter(label)[source]

Add label to FILTER if not set yet, removing PASS entry if present

add_format(key, value=None)[source]

Add an entry to format

The record’s calls data[key] will be set to value if not yet set and value is not None. If key is already in FORMAT then nothing is done.

affected_end

Return affected start position in 0-based coordinates

For SNVs, MNVs, and deletions, the behaviour is based on the start position and the length of the REF. In the case of insertions, the position behind the insert position is returned, yielding a 0-length interval together with affected_start()

affected_start

Return affected start position in 0-based coordinates

For SNVs, MNVs, and deletions, the behaviour is the start position. In the case of insertions, the position behind the insert position is returned, yielding a 0-length interval together with affected_end()

begin = None

An int with a 0-based begin position

call_for_sample = None

A mapping from sample name to entry in self.calls

calls = None

A list of genotype Call objects

end = None

An int with a 0-based end position

is_snv()[source]

Return True if it is a SNV

vcfpy.Call

class vcfpy.Call(sample, data, site=None)[source]

The information for a genotype callable

By VCF, this should always include the genotype information and can contain an arbitrary number of further annotation, e.g., the coverage at the variant position.

called = None

whether or not the variant is fully called

data = None

an OrderedDict with the key/value pair information from the call’s data

gt_alleles = None

the allele numbers (0, 1, ...) in this calls or None for no-call

gt_bases

Return the actual genotype bases, e.g. if VCF genotype is 0/1, could return (‘A’, ‘T’)

gt_phase_char

Return character to use for phasing

gt_type

The type of genotype, returns one of HOM_REF, HOM_ALT, and HET.

is_filtered(require=None, ignore=None)[source]

Return True for filtered calls

Parameters:
  • ignore (iterable) – if set, the filters to ignore, make sure to include ‘PASS’, when setting, default is ['PASS']
  • require (iterable) – if set, the filters to require for returning True
is_het

Return True for heterozygous calls

is_phased

Return boolean indicating whether this call is phased

is_variant

Return True for non-hom-ref calls

plodity = None

the number of alleles in this sample’s call

sample = None

the name of the sample for which the call was made

site = None

the Record of this Call

vcfpy.AltRecord

class vcfpy.AltRecord(type_=None)[source]

An alternative allele Record

Currently, can be a substitution, an SV placeholder, or breakend

serialize()[source]

Return str with representation for VCF file

type = None

String describing the type of the variant, could be one of SNV, MNV, could be any of teh types described in the ALT header lines, such as DUP, DEL, INS, ...

vcfpy.Substitution

class vcfpy.Substitution(type_, value)[source]

A basic alternative allele record describing a REF->AltRecord substitution

Note that this subsumes MNVs, insertions, and deletions.

value = None

The alternative base sequence to use in the substitution

vcfpy.SV

vcfpy.SV

vcfpy.BreakEnd

class vcfpy.BreakEnd(mate_chrom, mate_pos, orientation, mate_orientation, sequence, within_main_assembly)[source]

A placeholder for a breakend

mate_chrom = None

chromosome of the mate breakend

mate_orientation = None

orientation breakend’s mate

mate_pos = None

position of the mate breakend

orientation = None

orientation of this breakend

sequence = None

breakpoint’s connecting sequence

serialize()[source]

Return string representation for VCF

within_main_assembly = None

bool specifying if the breakend mate is within the assembly (True) or in an ancillary assembly (False)

vcfpy.SingleBreakEnd

class vcfpy.SingleBreakEnd(orientation, sequence)[source]

A placeholder for a single breakend

vcfpy.SymbolicAllele

class vcfpy.SymbolicAllele(value)[source]

A placeholder for a symbolic allele

The allele symbol must be defined in the header using an ALT header before being parsed. Usually, this is used for succinct descriptions of structural variants or IUPAC parameters.

value = None

The symbolic value, e.g. ‘DUP’

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

You can contribute in many ways:

Types of Contributions

Report Bugs

Report bugs at https://github.com/bihealth/vcfpy/issues.

If you are reporting a bug, please include:

  • Your operating system name and version.
  • Any details about your local setup that might be helpful in troubleshooting.
  • Detailed steps to reproduce the bug.
Fix Bugs

Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.

Implement Features

Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.

Write Documentation

vcfpy could always use more documentation, whether as part of the official vcfpy docs, in docstrings, or even on the web in blog posts, articles, and such.

Submit Feedback

The best way to send feedback is to file an issue at https://github.com/bihealth/vcfpy/issues.

If you are proposing a feature:

  • Explain in detail how it would work.
  • Keep the scope as narrow as possible, to make it easier to implement.
  • Remember that this is a volunteer-driven project, and that contributions are welcome :)

Get Started!

Ready to contribute? Here’s how to set up vcfpy for local development.

  1. Fork the vcfpy repo on GitHub.

  2. Clone your fork locally:

    $ git clone git@github.com:your_name_here/vcfpy.git
    
  3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:

    $ mkvirtualenv vcfpy
    $ cd vcfpy/
    $ python setup.py develop
    
  4. Create a branch for local development:

    $ git checkout -b name-of-your-bugfix-or-feature
    

    Now you can make your changes locally.

  5. When you’re done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:

    $ flake8 vcfpy tests
    $ python setup.py test or py.test
    $ tox
    

    To get flake8 and tox, just pip install them into your virtualenv.

  6. Commit your changes and push your branch to GitHub:

    $ git add .
    $ git commit -m "Your detailed description of your changes."
    $ git push origin name-of-your-bugfix-or-feature
    
  7. Submit a pull request through the GitHub website.

Pull Request Guidelines

Before you submit a pull request, check that it meets these guidelines:

  1. The pull request should include tests.
  2. If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring, and add the feature to the list in README.rst.
  3. The pull request should work for Python 3.3, 3.4 and 3.5. Check https://travis-ci.org/bihealth/vcfpy/pull_requests and make sure that the tests pass for all supported Python versions.

Tips

To run a subset of tests:

$ py.test tests.test_vcfpy

Credits

Development Lead

Contributors

None yet. Why not be the first?

History

0.10.0 (2017-02-17)

  • Extending API to allow for reading subsets of records. (Writing for sample subsets or reordered samples is possible through using the appropriate names list in the SamplesInfos for the Writer).
  • Deep-copying header lines and samples infos on Writer construction
  • Using samples attribute from Header in Reader and Writer instead of passing explicitely

0.9.0 (2017-02-26)

  • Restructuring of requirements.txt files
  • Fixing parsing of no-call GT fields

0.8.1 (2017-02-08)

  • PEP8 style adjustments
  • Using versioneer for versioning
  • Using requirements*.txt files now from setup.py
  • Fixing dependency on cyordereddict to be for Python <3.6 instead of <3.5
  • Jumping by samtools coordinate string now also allowed

0.8.0 (2016-10-31)

  • Adding Header.has_header_line for querying existence of header line
  • Header.add_*_line return a bool no indicating any conflicts
  • Construction of Writer uses samples within header and no extra parameter (breaks API)

0.7.0 (2016-09-25)

  • Smaller improvements and fixes to documentation
  • Adding Codacy coverage and static code analysis results to README
  • Various smaller code cleanup triggered by Codacy results
  • Adding __eq__, __neq__ and __hash__ to data types (where applicable)

0.6.0 (2016-09-25

  • Refining implementation for breakend and symbolic allele class
  • Removing record.SV_CODES
  • Refactoring parser module a bit to make the code cleaner
  • Fixing small typos and problems in documentation

0.5.0 (2016-09-24)

  • Deactivating warnings on record parsing by default because of performance
  • Adding validation for INFO and FORMAT fields on reading (#8)
  • Adding predefined INFO and FORMAT fields to pyvcf.header (#32)

0.4.1 (2016-09-22)

  • Initially enabling codeclimate

0.4.0 (2016-09-22)

  • Exporting constants for encoding variant types
  • Exporting genotype constants HOM_REF, HOM_ALT, HET
  • Implementing Call.is_phased, Call.is_het, Call.is_variant, Call.is_phased, Call.is_hom_ref, Call.is_hom_alt
  • Removing Call.phased (breaks API, next release is 0.4.0)
  • Adding tests, fixing bugs for methods of Call

0.3.1 (2016-09-21)

  • Work around FORMAT/FT being a string; this is done so in the Delly output

0.3.0 (2016-09-21)

  • Reader and Writer can now be used as context manager (with with)
  • Including license in documentation, including Biopython license
  • Adding support for writing bgzf files (taken from Biopython)
  • Adding support for parsing arrays in header lines
  • Removing example-4.1-bnd.vcf example file because v4.1 tumor derival lacks ID field
  • Adding AltAlleleHeaderLine, MetaHeaderLine, PedigreeHeaderLine, and SampleHeaderLine
  • Renaming SimpleHeaderFile to SimpleHeaderLine
  • Warn on missing FILTER entries on parsing
  • Reordered parameters in from_stream and from_file (#18)
  • Renamed from_file to from_stream (#18)
  • Renamed Reader.jump_to to Reader.fetch
  • Adding header_without_lines function
  • Generally extending API to make it esier to use
  • Upgrading dependencies, enabling pyup-bot
  • Greatly extending documentation

0.2.1 (2016-09-19)

  • First release on PyPI

License

VCFPy License

You can find the License of VCFPy below.

MIT License

Copyright (c) 2016, Berlin Institute of Health

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Biopython License Agreement

The bgzf writing code is taken from the Biopython project. You can find a copy of the license below.

                Biopython License Agreement

Permission to use, copy, modify, and distribute this software and its
documentation with or without modifications and for any purpose and
without fee is hereby granted, provided that any copyright notices
appear in all copies and that both those copyright notices and this
permission notice appear in supporting documentation, and that the
names of the contributors or copyright holders not be used in
advertising or publicity pertaining to distribution of the software
without specific prior permission.

THE CONTRIBUTORS AND COPYRIGHT HOLDERS OF THIS SOFTWARE DISCLAIM ALL
WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE
CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT
OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS
OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE
OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE
OR PERFORMANCE OF THIS SOFTWARE.