vcfpy package

Submodules

vcfpy.exceptions module

Exceptions for the vcfpy module

exception vcfpy.exceptions.HeaderNotFound[source]

Bases: vcfpy.exceptions.VCFPyException

Raised when a VCF header could not be found

exception vcfpy.exceptions.IncorrectVCFFormat[source]

Bases: vcfpy.exceptions.VCFPyException

Raised on problems parsing VCF

exception vcfpy.exceptions.InvalidHeaderException[source]

Bases: vcfpy.exceptions.VCFPyException

Raised in the case of invalid header formatting

exception vcfpy.exceptions.InvalidRecordException[source]

Bases: vcfpy.exceptions.VCFPyException

Raised in the case of invalid record formatting

exception vcfpy.exceptions.VCFPyException[source]

Bases: RuntimeError

Base class for module’s exception

vcfpy.header module

Code for representing the VCF header part

The VCF header class structure is modeled after HTSJDK

vcfpy.header.FORMAT_TYPES = ('Integer', 'Float', 'Character', 'String')

valid FORMAT value types

class vcfpy.header.FieldInfo(type_, number)[source]

Bases: object

Core information for describing field type and number

number = None

Number description, either an int or constant

type = None

The type, one of INFO_TYPES or FORMAT_TYPES

vcfpy.header.HEADER_NUMBER_ALLELES = 'A'

number of alleles excluding reference

vcfpy.header.HEADER_NUMBER_GENOTYPES = 'G'

number of genotypes

vcfpy.header.HEADER_NUMBER_REF = 'R'

number of alleles including reference

vcfpy.header.HEADER_NUMBER_UNBOUNDED = '.'

unbounded number of values

vcfpy.header.INFO_TYPES = ('Integer', 'Float', 'Flag', 'Character', 'String')

valid INFO value types

vcfpy.header.LINES_WITH_ID = ('FORMAT', 'INFO', 'FILTER', 'contig')

header lines that contain an “ID” entry

class vcfpy.header.SamplesInfos(sample_names)[source]

Bases: object

Helper class for handling and mapping of sample names to numeric indices

name_to_idx = None

mapping from sample name to index

names = None

list of sample names

vcfpy.header.VALID_NUMBERS = ('A', 'R', 'G', '.')

valid values for “Number” entries, except for integers

class vcfpy.header.VCFCompoundHeaderLine(key, value, mapping)[source]

Bases: vcfpy.header.VCFHeaderLine

Base class for compound header lines, currently format and header lines

Compound header lines describe fields that can have more than one entry.

mapping = None

OrderedDict with key/value mapping

serialize()[source]
class vcfpy.header.VCFContigHeaderLine(key, value, mapping)[source]

Bases: vcfpy.header.VCFSimpleHeaderLine

Contig header line

Most importantly, parses the 'length' key into an integer

id = None

name of the contig

length = None

length of the contig, None if missing

class vcfpy.header.VCFFilterHeaderLine(key, value, mapping)[source]

Bases: vcfpy.header.VCFSimpleHeaderLine

FILTER header line

description = None

description for the filter, None if missing

id = None

token for the filter

class vcfpy.header.VCFFormatHeaderLine(key, value, mapping)[source]

Bases: vcfpy.header.VCFCompoundHeaderLine

Header line for FORMAT fields

description = None

description, should be given, None if not given

id = None

key in the INFO field

source = None

source of INFO field, None if not given

type = None

value type

version = None

version of INFO field, None if not given

class vcfpy.header.VCFHeader(lines=[], samples=None)[source]

Bases: object

Represent header of VCF file

While this class allows mutating records, it should not be changed once it has been assigned to

get_format_field_info(key)[source]

Return FieldInfo for the given INFO field

get_info_field_info(key)[source]

Return FieldInfo for the given INFO field

lines = None

list of :py:VCFHeaderLine objects

samples = None

SamplesInfo object

class vcfpy.header.VCFHeaderLine(key, value)[source]

Bases: object

Base class for VCF header lines

key = None

str with key of header line

serialize()[source]

Return VCF-serialized version of this header line

value = None

str with raw value of header line

class vcfpy.header.VCFInfoHeaderLine(key, value, mapping)[source]

Bases: vcfpy.header.VCFCompoundHeaderLine

Header line for INFO fields

Note that the Number field will be parsed into an int if possible. Otherwise, the constants HEADER_NUMBER_* will be used.

description = None

description, should be given, None if not given

id = None

key in the INFO field

source = None

source of INFO field, None if not given

type = None

value type

version = None

version of INFO field, None if not given

class vcfpy.header.VCFSimpleHeaderLine(key, value, mapping)[source]

Bases: vcfpy.header.VCFHeaderLine

Base class for simple header lines, currently contig and filter header lines

Raises:vcfpy.exceptions.InvalidHeaderException in the case of missing key "ID"
mapping = None

collections.OrderedDict with key/value mapping of the attributes

serialize()[source]
vcfpy.header.serialize_for_header(key, value)[source]

Serialize value for the given mapping key for a VCF header line

vcfpy.parser module

Parsing of VCF files from str

class vcfpy.parser.MappingVCFHeaderLineParser(line_class)[source]

Bases: vcfpy.parser.VCFHeaderLineParserBase

Parse into VCFHeaderLine (no particular structure)

line_class = None

the class to use for the VCF header line

parse_key_value(key, value)[source]
vcfpy.parser.SUPPORTED_VCF_VERSIONS = ('VCFv4.0', 'VCFv4.1', 'VCFv4.2', 'VCFv4.3')

Supported VCF versions, a warning will be issued otherwise

class vcfpy.parser.StupidVCFHeaderLineParser[source]

Bases: vcfpy.parser.VCFHeaderLineParserBase

Parse into VCFHeaderLine (no particular structure)

parse_key_value(key, value)[source]
class vcfpy.parser.VCFHeaderLineParserBase[source]

Bases: object

Parse into appropriate VCFHeaderLine

parse_key_value(key, value)[source]

Parse the key/value pair

Parameters:
  • key (str) – the key to use in parsing
  • value (str) – the value to parse
Returns:

vcfpy.header.VCFHeaderLine object

class vcfpy.parser.VCFHeaderParser(sub_parsers)[source]

Bases: object

Helper class for parsing a VCF header

parse_line(line)[source]

Parse VCF header line (trailing ‘ ‘ or ‘ ‘ is ignored)

param str line:str with line to parse
param dict sub_parsers:
 dict mapping header line types to appropriate parser objects
returns:appropriate VCFHeaderLine parsed from line
raises:vcfpy.exceptions.InvalidHeaderException if there was a problem parsing the file
class vcfpy.parser.VCFParser(stream, path=None)[source]

Bases: object

Class for line-wise parsing of VCF files

In most cases, you want to use vcfpy.reader.VCFReader instead.

Parameters:
  • streamfile-like object to read from
  • path (str) – path the VCF is parsed from, for display purposes only, optional
header = None

header, once it has been read

parse_header()[source]

Read and parse vcfpy.header.VCFHeader from file, set into self.header and return it

Returns:vcfpy.header.VCFHeader
Raises:vcfpy.exceptions.InvalidHeaderException in the case of problems reading the header
parse_next_record()[source]

Read, parse and return next vcfpy.record.VCFRecord

Returns:next VCF record or None if at end
Raises:vcfpy.exceptions.InvalidRecordException in the case of problems reading the record
samples = None

vcfpy.header.SamplesInfos with sample information; set on parsing the header

class vcfpy.parser.VCFRecordParser(header, samples)[source]

Bases: object

Helper class for parsing VCF records

header = None

VCFHeader with the meta information

parse_line(line_str)[source]

Parse line from file (including trailing line break) and return resulting VCFRecord

samples = None

SamplesInfos with sample information

vcfpy.parser.convert_field_value(key, type_, value)[source]

Convert atomic field value according to the type

vcfpy.parser.parse_field_value(key, field_info, value)[source]

Parse value according to field_info

vcfpy.parser.parse_mapping(value)[source]

Parse the given VCF header line mapping

Such a mapping consists of “key=value” pairs, separated by commas and wrapped into angular brackets (“<...>”). Strings are usually quoted, for certain known keys, exceptions are made, depending on the tag key. this, however, only gets important when serializing.

Raises:vcfpy.exceptions.InvalidHeaderException if there was a problem parsing the file
vcfpy.parser.process_alt(header, ref, alt_str)[source]

Process alternative value using VCFHeader in header

vcfpy.parser.split_quoted_string(s, delim=', ', quote='"')[source]

Split string s at delimiter, correctly interpreting quotes

vcfpy.reader module

Parsing of VCF files from file-like objects

class vcfpy.reader.VCFReader(stream, path=None)[source]

Bases: object

Class for parsing of files from file-like objects

Instead of using the constructor, use the class methods from_file() and from_path().

On construction, the header will be read from the file which can cause problems. After construction, VCFReader can be used as an iterable of VCFRecord.

Raises:InvalidHeaderException in the case of problems reading the header
close()[source]

Close underlying stream

classmethod from_file(klass, stream, path=None)[source]

Create new VCFReader from file

Parameters:
  • streamfile-like object to read from
  • path – optional string with path to store (for display only)
classmethod from_path(klass, path)[source]

Create new VCFReader from path

Parameters:path – the path to load from (converted to str for compatibility with path.py)
header = None

the VCFHeader

jump_to(chrom, begin, end)[source]

Jump to the start position of the given chromosomal position and limit iteration to the end position

Parameters:
  • chrom (str) – name of the chromosome to jump to
  • begin (int) – 0-based begin position (inclusive)
  • end (int) – 0-based end position (exclusive)
parser = None

the parser to use

path = None

optional str with the path to the stream

samples = None

the vcfpy.header.SamplesInfos object with the sample name information

stream = None

stream (file-like object) to read from

vcfpy.record module

Code for representing a VCF record

The VCF record structure is modeled after the one of PyVCF

class vcfpy.record.AltRecord(type_=None)[source]

Bases: object

An alternative allele Record

Currently, can be a substitution, an SV placeholder, or breakend

type = None

String describing the type of the variant, could be one of SNV, MNV, could be any of teh types described in the ALT header lines, such as DUP, DEL, INS, ...

vcfpy.record.BND = 'BND'

Code for break-end

class vcfpy.record.BreakEnd(type_, value)[source]

Bases: vcfpy.record.AltRecord

A placeholder for a breakend

value = None

The alternative base sequence to use in the substitution

class vcfpy.record.Call(sample, data, site=None)[source]

Bases: object

The information for a genotype callable

By VCF, this should always include the genotype information and can contain an arbitrary number of further annotation, e.g., the coverage at the variant position.

data = None

an OrderedDict with the key/value pair information from the call’s data

gt_bases

Return the actual genotype alleles, e.g. if VCF genotype is 0/1, could return A/T

gt_type

The type of genotype, mapping is

  • hom_ref = 0
  • het = 1
  • hom_alt = 2 (which alt is untracked)
  • uncalled = None
is_filtered

Return True for filtered calls

is_het

Return True for filtered calls

is_phased

Return True for phased calls

is_variant

Return True for filtered calls

sample = None

the name of the sample for which the call was made

site = None

the Record of this Call

vcfpy.record.DEL = 'DEL'

Code for “clean” deletion

vcfpy.record.INDEL = 'INDEL'

Code for indel, includes substitutions of unequal length

vcfpy.record.INS = 'INS'

Code for “clean” insertion

vcfpy.record.MNV = 'MNV'

Code for a multi nucleotide variant

class vcfpy.record.Record(CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT, calls)[source]

Bases: object

Represent one record from the VCF file

Record objects are iterators of their calls

ALT = None

A list of alternative allele records of type AltRecord

CHROM = None

A str with the chromosome name

FILTER = None

A list of strings for the FILTER column

FORMAT = None

A list of strings for the FORMAT column

ID = None

A list of the semicolon-separated values of the ID column

INFO = None

An OrderedDict giving the values of the INFO column, flags are mapped to True

POS = None

An int with a 1-based begin position

QUAL = None

The quality value, can be None

REF = None

A str with the REF value

add_filter(label)[source]

Add label to FILTER if not set yet

add_format(key, value=None)[source]

Add an entry to format

The record’s calls data[key] will be set to value if not yet set and value is not None. If key is already in FORMAT then nothing is done.

begin = None

An int with a 0-based begin position

call_for_sample = None

A mapping from sample name to entry in self.calls

calls = None

A list of genotype Call objects

end = None

An int with a 0-based end position

vcfpy.record.SNV = 'SNV'

Code for single nucleotide variant

class vcfpy.record.SV(type_, value)[source]

Bases: vcfpy.record.AltRecord

Code for structural variant

value = None

The alternative base sequence to use in the substitution

vcfpy.record.SV_CODES = ('DEL', 'INS', 'DUP', 'INV', 'CNV')

Codes for structural variants

vcfpy.record.SYMBOLIC = 'SYMBOLIC'

Code for symbolic allele that is neither SV nor BND

class vcfpy.record.SingleBreakEnd(type_, value)[source]

Bases: vcfpy.record.AltRecord

A placeholder for a single breakend

value = None

The alternative base sequence to use in the substitution

class vcfpy.record.Substitution(type_, value)[source]

Bases: vcfpy.record.AltRecord

A basic alternative allele record describing a REF->AltRecord substitution

Note that this subsumes MNVs, insertions, and deletions.

value = None

The alternative base sequence to use in the substitution

class vcfpy.record.SymbolicAllele(type_, value)[source]

Bases: vcfpy.record.AltRecord

A placeholder for a symbolic allele

value = None

The alternative base sequence to use in the substitution

Module contents