vcfpy package¶

Submodules¶

vcfpy.bgzf module¶

Support code for writing BGZF files

Shamelessly taken from Biopython

class vcfpy.bgzf.BgzfWriter(filename=None, mode='w', fileobj=None, compresslevel=6)[source]¶

Bases: object

close()[source]¶: Flush data, write 28 bytes BGZF EOF marker, and close BGZF file. samtools will look for a magic EOF marker, just a 28 byte empty BGZF block, and if it is missing warns the BAM file may be truncated. In addition to samtools writing this block, so too does bgzip - so this implementation does too.

fileno()[source]¶

flush()[source]¶

isatty()[source]¶

seekable()[source]¶

tell()[source]¶: Returns a BGZF 64-bit virtual offset.

write(data)[source]¶

vcfpy.bgzf.make_virtual_offset(block_start_offset, within_block_offset)[source]¶: Compute a BGZF virtual offset from block start and within block offsets. The BAM indexing scheme records read positions using a 64 bit ‘virtual offset’, comprising in C terms: block_start_offset << 16 | within_block_offset Here block_start_offset is the file offset of the BGZF block start (unsigned integer using up to 64-16 = 48 bits), and within_block_offset within the (decompressed) block (unsigned 16 bit integer). >>> make_virtual_offset(0, 0) 0 >>> make_virtual_offset(0, 1) 1 >>> make_virtual_offset(0, 2**16 - 1) 65535 >>> make_virtual_offset(0, 2**16) Traceback (most recent call last): ... ValueError: Require 0 <= within_block_offset < 2**16, got 65536 >>> 65536 == make_virtual_offset(1, 0) True >>> 65537 == make_virtual_offset(1, 1) True >>> 131071 == make_virtual_offset(1, 2**16 - 1) True >>> 6553600000 == make_virtual_offset(100000, 0) True >>> 6553600001 == make_virtual_offset(100000, 1) True >>> 6553600010 == make_virtual_offset(100000, 10) True >>> make_virtual_offset(2**48, 0) Traceback (most recent call last): ... ValueError: Require 0 <= block_start_offset < 2**48, got 281474976710656

vcfpy.exceptions module¶

Exceptions for the vcfpy module

exception vcfpy.exceptions.HeaderNotFound[source]¶

Bases: vcfpy.exceptions.VCFPyException

Raised when a VCF header could not be found

exception vcfpy.exceptions.IncorrectVCFFormat[source]¶

Bases: vcfpy.exceptions.VCFPyException

Raised on problems parsing VCF

exception vcfpy.exceptions.InvalidHeaderException[source]¶

Bases: vcfpy.exceptions.VCFPyException

Raised in the case of invalid header formatting

exception vcfpy.exceptions.InvalidRecordException[source]¶

Bases: vcfpy.exceptions.VCFPyException

Raised in the case of invalid record formatting

exception vcfpy.exceptions.VCFPyException[source]¶

Bases: RuntimeError

Base class for module’s exception

vcfpy.header module¶

Code for representing the VCF header part

The VCF header class structure is modeled after HTSJDK

class vcfpy.header.AltAlleleHeaderLine(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶

Bases: vcfpy.header.SimpleHeaderLine

Alternative allele header line

Mostly used for defining symbolic alleles for structural variants and IUPAC ambiguity codes

classmethod from_mapping(klass, mapping)[source]¶: Construct from mapping, not requiring the string value

id = None¶: name of the alternative allele

class vcfpy.header.CompoundHeaderLine(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶

Bases: vcfpy.header.HeaderLine

Base class for compound header lines, currently format and header lines

Compound header lines describe fields that can have more than one entry.

mapping = None¶: OrderedDict with key/value mapping

serialize()[source]¶

value¶

class vcfpy.header.ContigHeaderLine(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶

Bases: vcfpy.header.SimpleHeaderLine

Contig header line

Most importantly, parses the 'length' key into an integer

classmethod from_mapping(klass, mapping)[source]¶: Construct from mapping, not requiring the string value

id = None¶: name of the contig

length = None¶: length of the contig, None if missing

vcfpy.header.FORMAT_TYPES = ('Integer', 'Float', 'Character', 'String')¶: valid FORMAT value types

class vcfpy.header.FieldInfo(type_, number, description=None)[source]¶

Bases: object

Core information for describing field type and number

description = None¶: Description for the header field, optional

number = None¶: Number description, either an int or constant

type = None¶: The type, one of INFO_TYPES or FORMAT_TYPES

class vcfpy.header.FilterHeaderLine(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶

Bases: vcfpy.header.SimpleHeaderLine

FILTER header line

description = None¶: description for the filter, None if missing

classmethod from_mapping(klass, mapping)[source]¶: Construct from mapping, not requiring the string value

id = None¶: token for the filter

class vcfpy.header.FormatHeaderLine(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶

Bases: vcfpy.header.CompoundHeaderLine

Header line for FORMAT fields

description = None¶: description, should be given, None if not given

classmethod from_mapping(klass, mapping)[source]¶: Construct from mapping, not requiring the string value

id = None¶: key in the INFO field

source = None¶: source of INFO field, None if not given

type = None¶: value type

version = None¶: version of INFO field, None if not given

vcfpy.header.HEADER_NUMBER_ALLELES = 'A'¶: number of alleles excluding reference

vcfpy.header.HEADER_NUMBER_GENOTYPES = 'G'¶: number of genotypes

vcfpy.header.HEADER_NUMBER_REF = 'R'¶: number of alleles including reference

vcfpy.header.HEADER_NUMBER_UNBOUNDED = '.'¶: unbounded number of values

class vcfpy.header.Header(lines=[], samples=None, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶

Bases: object

Represent header of VCF file

While this class allows mutating records, it should not be changed once it has been assigned to

This class provides function for adding lines to a header and updating the supporting index data structures. There is no explicit API for removing header lines, the best way is to reconstruct a new Header instance with a filtered list of header lines.

add_contig_line(mapping)[source]¶: Add “contig” header line constructed from the given mapping

add_filter_line(mapping)[source]¶: Add FILTER header line constructed from the given mapping

add_format_line(mapping)[source]¶: Add FORMAT header line constructed from the given mapping

add_info_line(mapping)[source]¶: Add INFO header line constructed from the given mapping

add_line(header_line)[source]¶: Add header line, updating any necessary support indices

filter_ids()[source]¶: Return list of all filter IDs

format_ids()[source]¶: Return list of all format IDs

get_format_field_info(key)[source]¶: Return FieldInfo for the given INFO field

get_info_field_info(key)[source]¶: Return FieldInfo for the given INFO field

get_lines(key)[source]¶: Return header lines having the given key as their type

info_ids()[source]¶: Return list of all info IDs

lines = None¶: list of :py:HeaderLine objects

samples = None¶: SamplesInfo object

class vcfpy.header.HeaderLine(key, value, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶

Bases: object

Base class for VCF header lines

key = None¶: str with key of header line

serialize()[source]¶: Return VCF-serialized version of this header line

value¶

warning_helper = None¶: Helper for printing warnings

vcfpy.header.INFO_TYPES = ('Integer', 'Float', 'Flag', 'Character', 'String')¶: valid INFO value types

class vcfpy.header.InfoHeaderLine(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶

Bases: vcfpy.header.CompoundHeaderLine

Header line for INFO fields

Note that the Number field will be parsed into an int if possible. Otherwise, the constants HEADER_NUMBER_* will be used.

description = None¶: description, should be given, None if not given

classmethod from_mapping(klass, mapping)[source]¶: Construct from mapping, not requiring the string value

id = None¶: key in the INFO field

source = None¶: source of INFO field, None if not given

type = None¶: value type

version = None¶: version of INFO field, None if not given

vcfpy.header.LINES_WITH_ID = ('ALT', 'contig', 'FILTER', 'FORMAT', 'INFO', 'META', 'PEDIGREE', 'SAMPLE')¶: header lines that contain an “ID” entry

class vcfpy.header.MetaHeaderLine(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶

Bases: vcfpy.header.SimpleHeaderLine

Alternative allele header line

Used for defining set of valid values for samples keys

classmethod from_mapping(klass, mapping)[source]¶: Construct from mapping, not requiring the string value

id = None¶: name of the alternative allele

class vcfpy.header.PedigreeHeaderLine(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶

Bases: vcfpy.header.SimpleHeaderLine

Header line for defining a pedigree entry

classmethod from_mapping(klass, mapping)[source]¶: Construct from mapping, not requiring the string value

id = None¶: name of the alternative allele

vcfpy.header.RESERVED_INFO = {'CIEND': FieldInfo('Integer', 2, 'Confidence interval around END for imprecise variants'), 'DPADJ': FieldInfo('Integer', '.', 'Read Depth of adjacency'), 'ADR': FieldInfo('Integer', 'R', 'Reverse read depth for each allele'), 'CIPOS': FieldInfo('Integer', 2, 'Confidence interval around POS for imprecise variants'), 'CILEN': FieldInfo('Integer', 2, 'Confidence interval around the inserted material between breakends'), 'DB': FieldInfo('Flag', 0, 'dbSNP membership'), 'SVTYPE': FieldInfo('String', 1, 'Type of structural variant'), 'DBVARID': FieldInfo('String', 1, 'ID of this element in DBVAR'), 'IMPRECISE': FieldInfo('Flag', 0, 'Imprecise structural variation'), 'CIGAR': FieldInfo('String', 'A', 'CIGAR string describing how to align each ALT allele to the reference allele'), 'EVENT': FieldInfo('String', 1, 'ID of event associated to breakend'), 'H2': FieldInfo('Flag', 0, 'Membership in HapMap 2'), 'AF': FieldInfo('Float', 'A', 'Allele frequency for each ALT allele in the same order as listed: used for estimating from primary data not called genotypes'), 'AC': FieldInfo('Integer', 'A', 'Allele count in genotypes, for each ALT allele, in the same order as listed'), 'CNADJ': FieldInfo('Integer', '.', 'Copy number of adjacency'), 'PARID': FieldInfo('String', 1, 'ID of partner breakend'), 'NOVEL': FieldInfo('Flag', 0, 'Indicates a novel structural variation'), 'DBRIPID': FieldInfo('String', 1, 'ID of this element in DBRIP'), 'ADF': FieldInfo('Integer', 'R', 'Forward read depth for each allele'), 'AA': FieldInfo('String', 1, 'Ancestral Allele'), 'DP': FieldInfo('Integer', 1, 'Read Depth of segment containing breakend'), 'CICNADJ': FieldInfo('Integer', '.', 'Confidence interval around copy number for the adjacency'), 'HOMSEQ': FieldInfo('String', '.', 'Sequence of base pair identical micro-homology at event breakpoints'), 'NS': FieldInfo('Integer', 1, 'Number of samples with data'), 'MATEID': FieldInfo('String', '.', 'ID of mate breakends'), 'AN': FieldInfo('Integer', 1, 'Total number of alleles in called genotypes'), 'SB': FieldInfo('Integer', 4, 'Strand bias at this position'), 'HOMLEN': FieldInfo('Integer', '.', 'Length of base pair identical micro-homology at event breakpoints'), 'H3': FieldInfo('Flag', 0, 'Membership in HapMap 3'), 'CN': FieldInfo('Integer', 1, 'Copy number of segment containing breakend'), 'MQ': FieldInfo('Integer', 1, 'RMS mapping quality'), 'CICN': FieldInfo('Integer', 2, 'Confidence interval around copy number for the segment'), '1000G': FieldInfo('Flag', 0, 'Membership in 1000 Genomes'), 'AD': FieldInfo('Integer', 'R', 'Total read depth for each allele'), 'END': FieldInfo('Integer', 1, 'End position of the variant described in this record (for symbolic alleles)'), 'MEINFO': FieldInfo('String', 4, 'Mobile element info of the form NAME,START,END,POLARITY'), 'BKPTID': FieldInfo('String', '.', 'ID of the assembled alternate allele in the assembly file'), 'METRANS': FieldInfo('String', 4, 'Mobile element transduction info of the form CHR,START,END,POLARITY'), 'MQ0': FieldInfo('Integer', 1, 'Number of MAPQ == 0 reads covering this record'), 'SVLEN': FieldInfo('Integer', 1, 'Difference in length between REF and ALT alleles'), 'BQ': FieldInfo('Float', 1, 'RMS base quality at this position'), 'VALIDATED': FieldInfo('Flag', 0, 'Validated by follow-up experiment'), 'SOMATIC': FieldInfo('Flag', 0, 'Indicates that the record is a somatic mutation, for cancer genomics'), 'DGVID': FieldInfo('String', 1, 'ID of this element in Database of Genomic Variation')}¶: Reserved fields for INFO from VCF v4.3

class vcfpy.header.SampleHeaderLine(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶

Bases: vcfpy.header.SimpleHeaderLine

Header line for defining a SAMPLE entry

classmethod from_mapping(klass, mapping)[source]¶: Construct from mapping, not requiring the string value

id = None¶: name of the alternative allele

class vcfpy.header.SamplesInfos(sample_names)[source]¶

Bases: object

Helper class for handling and mapping of sample names to numeric indices

name_to_idx = None¶: mapping from sample name to index

names = None¶: list of sample names

class vcfpy.header.SimpleHeaderLine(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶

Bases: vcfpy.header.HeaderLine

Base class for simple header lines, currently contig and filter header lines

Raises:	`vcfpy.exceptions.InvalidHeaderException` in the case of missing key `"ID"`

mapping = None¶: collections.OrderedDict with key/value mapping of the attributes

serialize()[source]¶

value¶

vcfpy.header.VALID_NUMBERS = ('A', 'R', 'G', '.')¶: valid values for “Number” entries, except for integers

vcfpy.header.header_without_lines(header, remove)[source]¶

Return Header without lines given in remove

remove is an iterable of pairs key/ID with the VCF header key and ID of entry to remove. In the case that a line does not have a mapping entry, you can give the full value to remove.

vcfpy.header.mapping_to_str(mapping)[source]¶: Convert mapping to string

vcfpy.header.serialize_for_header(key, value)[source]¶: Serialize value for the given mapping key for a VCF header line

vcfpy.parser module¶

Parsing of VCF files from str

class vcfpy.parser.FormatChecker[source]¶

Bases: object

Helper class for checking a FORMAT field

header = None¶: VCFHeader to use for checking

run(call, num_alts)[source]¶

Check FORMAT of a record.Call

Currently, only checks for consistent counts are implemented

warning_helper = None¶: helper class for printing warnings

class vcfpy.parser.HeaderChecker(warning_helper)[source]¶

Bases: object

Helper class for checking a VCF header

run(header)[source]¶

Check the header

Warnings will be printed using self.warning_helper while errors will raise an exception.

Raises:	`vcfpy.exceptions.InvalidHeaderException` in the case of severe errors reading the header

warning_helper = None¶: helper class for printing warnings

class vcfpy.parser.HeaderLineParserBase(warning_helper)[source]¶

Bases: object

Parse into appropriate HeaderLine

parse_key_value(key, value)[source]¶

Parse the key/value pair

Parameters:	key (str) – the key to use in parsing value (str) – the value to parse
Returns:	`vcfpy.header.HeaderLine` object

warning_helper = None¶: WarningHelper to use for print warnings

class vcfpy.parser.HeaderParser(warning_helper)[source]¶

Bases: object

Helper class for parsing a VCF header

parse_line(line)[source]¶

Parse VCF header line (trailing ‘ ‘ or ‘ ‘ is ignored)

param str line: str with line to parse

param dict sub_parsers:

dict mapping header line types to appropriate parser objects

returns: appropriate HeaderLine parsed from line

raises: vcfpy.exceptions.InvalidHeaderException if there was a problem parsing the file

sub_parsers = None¶: Sub parsers to use for parsing the header lines

warning_helper = None¶: WarningHelper to use for printing warnings

class vcfpy.parser.InfoChecker(header, warning_helper)[source]¶

Bases: object

Helper class for checking an INFO field

header = None¶: VCFHeader to use for checking

run(key, value, num_alts)[source]¶

Check value in INFO[key] of record

Currently, only checks for consistent counts are implemented

Parameters:	key (str) – key of INFO entry to check value – value to check alts (int) – list of alternative alleles, for length

warning_helper = None¶: helper class for printing warnings

class vcfpy.parser.MappingHeaderLineParser(warning_helper, line_class)[source]¶

Bases: vcfpy.parser.HeaderLineParserBase

Parse into HeaderLine (no particular structure)

line_class = None¶: the class to use for the VCF header line

parse_key_value(key, value)[source]¶

class vcfpy.parser.NoopFormatChecker[source]¶

Bases: object

Helper class that performs no checks

run(call, num_alts)[source]¶

class vcfpy.parser.NoopInfoChecker[source]¶

Bases: object

Helper class that performs no checks

run(key, value, num_alts)[source]¶

class vcfpy.parser.Parser(stream, path=None, record_checks=[])[source]¶

Bases: object

Class for line-wise parsing of VCF files

In most cases, you want to use vcfpy.reader.Reader instead.

Parameters:	stream – `file`-like object to read from path (str) – path the VCF is parsed from, for display purposes only, optional

header = None¶: header, once it has been read

parse_header()[source]¶

Read and parse vcfpy.header.Header from file, set into self.header and return it

Returns:	`vcfpy.header.Header`
Raises:	`vcfpy.exceptions.InvalidHeaderException` in the case of problems reading the header

parse_line(line)[source]¶: Pare the given line without reading another one from the stream

parse_next_record()[source]¶

Read, parse and return next vcfpy.record.Record

Returns:	next VCF record or `None` if at end
Raises:	`vcfpy.exceptions.InvalidRecordException` in the case of problems reading the record

print_warn_summary()[source]¶: If there were any warnings, print summary with warnings

record_checks = None¶: checks to perform, can contain ‘INFO’ and ‘FORMAT’

samples = None¶: vcfpy.header.SamplesInfos with sample information; set on parsing the header

warning_helper = None¶: helper for printing warnings

class vcfpy.parser.QuotedStringSplitter(delim=', ', quote='"', brackets='[]')[source]¶

Bases: object

Helper class for splitting quoted strings

Has support for interpreting quoting strings but also brackets. Meant for splitting the VCF header line dicts

ARRAY = 3¶: state constant for array

DELIM = 4¶: state constant for delimiter

ESCAPED = 2¶: state constant for delimiter

NORMAL = 0¶: state constant for normal

QUOTED = 1¶: state constant for quoted

delim = None¶: string delimiter

quote = None¶: quote character

run(s)[source]¶

Split string s at delimiter, correctly interpreting quotes

Further, interprets arrays wrapped in one level of []. No recursive brackets are interpreted (as this would make the grammar non-regular and currently this complexity is not needed). Currently, quoting inside of braces is not supported either. This is just to support the example from VCF v4.3.

class vcfpy.parser.RecordParser(header, samples, warning_helper, record_checks=[])[source]¶

Bases: object

Helper class for parsing VCF records

header = None¶: Header with the meta information

parse_line(line_str)[source]¶: Parse line from file (including trailing line break) and return resulting Record

record_checks = None¶: The checks to perform, can contain ‘INFO’ and ‘FORMAT’

samples = None¶: SamplesInfos with sample information

warning_helper = None¶: Helper class for printing warnings

vcfpy.parser.SUPPORTED_VCF_VERSIONS = ('VCFv4.0', 'VCFv4.1', 'VCFv4.2', 'VCFv4.3')¶: Supported VCF versions, a warning will be issued otherwise

class vcfpy.parser.StupidHeaderLineParser(warning_helper)[source]¶

Bases: vcfpy.parser.HeaderLineParserBase

Parse into HeaderLine (no particular structure)

parse_key_value(key, value)[source]¶

vcfpy.parser.binomial(n, k)[source]¶

vcfpy.parser.build_header_parsers(warning_helper)[source]¶

Return mapping for parsers to use for each VCF header type

Inject the WarningHelper into the parsers.

vcfpy.parser.convert_field_value(key, type_, value)[source]¶: Convert atomic field value according to the type

vcfpy.parser.parse_breakend(alt_str)[source]¶: Parse breakend and return tuple with results, parameters for BreakEnd constructor

vcfpy.parser.parse_field_value(key, field_info, value)[source]¶: Parse value according to field_info

vcfpy.parser.parse_mapping(value, warning_helper)[source]¶

Parse the given VCF header line mapping

Such a mapping consists of “key=value” pairs, separated by commas and wrapped into angular brackets (“<...>”). Strings are usually quoted, for certain known keys, exceptions are made, depending on the tag key. this, however, only gets important when serializing.

Parameters:	warning_helper (WarningHelper) – object to use for printing warning messages
Raises:	`vcfpy.exceptions.InvalidHeaderException` if there was a problem parsing the file

vcfpy.parser.process_alt(header, ref, alt_str)[source]¶: Process alternative value using Header in header

vcfpy.parser.process_sub(ref, alt_str)[source]¶: Process substitution

vcfpy.parser.process_sub_grow(ref, alt_str)[source]¶: Process substution where the string grows

vcfpy.parser.process_sub_shrink(ref, alt_str)[source]¶: Process substution where the string shrink

vcfpy.parser.split_mapping(pair_str, warning_helper)[source]¶

Split the str in pair_str at '='

Warn if key needs to be stripped

vcfpy.parser.split_quoted_string(s, delim=', ', quote='"', brackets='[]')[source]¶

vcfpy.reader module¶

Parsing of VCF files from file-like objects

class vcfpy.reader.Reader(stream, path=None, tabix_path=None, record_checks=[])[source]¶

Bases: object

Class for parsing of files from file-like objects

Instead of using the constructor, use the class methods from_stream() and from_path().

On construction, the header will be read from the file which can cause problems. After construction, Reader can be used as an iterable of Record.

Raises:	`InvalidHeaderException` in the case of problems reading the header

close()[source]¶: Close underlying stream

fetch(chrom, begin, end)[source]¶

Jump to the start position of the given chromosomal position and limit iteration to the end position

Parameters:	chrom (str) – name of the chromosome to jump to begin (int) – 0-based begin position (inclusive) end (int) – 0-based end position (exclusive)

classmethod from_path(klass, path, tabix_path=None, record_checks=[])[source]¶

Create new Reader from path

Parameters:	path – the path to load from (converted to `str` for compatibility with `path.py`) tabix_path – optional string with path to TBI index, automatic inferral from `path` will be tried on the fly if not given record_checks (list) – record checks to perform, can contain ‘INFO’ and ‘FORMAT’

classmethod from_stream(klass, stream, path=None, tabix_path=None, record_checks=[])[source]¶

Create new Reader from file

Parameters:	stream – `file`-like object to read from path – optional string with path to store (for display only) record_checks (list) – record checks to perform, can contain ‘INFO’ and ‘FORMAT’

header = None¶: the Header

parser = None¶: the parser to use

path = None¶: optional str with the path to the stream

record_checks = None¶: checks to perform on records, can contain ‘FORMAT’ and ‘INFO’

samples = None¶: the vcfpy.header.SamplesInfos object with the sample name information

stream = None¶: stream (file-like object) to read from

tabix_file = None¶: the pysam.TabixFile used for reading from index bgzip-ed VCF; constructed on the fly

tabix_path = None¶: optional str with path to tabix file

vcfpy.record module¶

Code for representing a VCF record

The VCF record structure is modeled after the one of PyVCF

class vcfpy.record.AltRecord(type_=None)[source]¶

Bases: object

An alternative allele Record

Currently, can be a substitution, an SV placeholder, or breakend

serialize()[source]¶: Return str with representation for VCF file

type = None¶: String describing the type of the variant, could be one of SNV, MNV, could be any of teh types described in the ALT header lines, such as DUP, DEL, INS, ...

vcfpy.record.BND = 'BND'¶: Code for break-end allele

class vcfpy.record.BreakEnd(mate_chrom, mate_pos, orientation, mate_orientation, sequence, within_main_assembly)[source]¶

Bases: vcfpy.record.AltRecord

A placeholder for a breakend

mate_chrom = None¶: chromosome of the mate breakend

mate_orientation = None¶: orientation breakend’s mate

mate_pos = None¶: position of the mate breakend

orientation = None¶: orientation of this breakend

sequence = None¶: breakpoint’s connecting sequence

serialize()[source]¶: Return string representation for VCF

within_main_assembly = None¶: bool specifying if the breakend mate is within the assembly (True) or in an ancillary assembly (False)

class vcfpy.record.Call(sample, data, site=None)[source]¶

Bases: object

The information for a genotype callable

By VCF, this should always include the genotype information and can contain an arbitrary number of further annotation, e.g., the coverage at the variant position.

called = None¶: whether or not the variant is fully called

data = None¶: an OrderedDict with the key/value pair information from the call’s data

gt_alleles = None¶: the allele numbers (0, 1, ...) in this calls or None for no-call

gt_bases¶: Return the actual genotype bases, e.g. if VCF genotype is 0/1, could return (‘A’, ‘T’)

gt_phase_char¶: Return character to use for phasing

gt_type¶: The type of genotype, returns one of HOM_REF, HOM_ALT, and HET.

is_filtered(require=None, ignore=['PASS'])[source]¶

Return True for filtered calls

Parameters:	ignore (iterable) – if set, the filters to ignore, make sure to include ‘PASS’, when setting require (iterable) – if set, the filters to require for returning `True`

is_het¶: Return True for heterozygous calls

is_phased¶: Return boolean indicating whether this call is phased

is_variant¶: Return True for non-hom-ref calls

plodity = None¶: the number of alleles in this sample’s call

sample = None¶: the name of the sample for which the call was made

site = None¶: the Record of this Call

vcfpy.record.DEL = 'DEL'¶: Code for “clean” deletion allele

vcfpy.record.ESCAPE_MAPPING = [('%', '%25'), (':', '%3A'), (';', '%3B'), ('=', '%3D'), (',', '%2C'), ('\r', '%0D'), ('\n', '%0A'), ('\t', '%09')]¶: Mapping for escaping reserved characters

vcfpy.record.FIVE_PRIME = '5'¶: code for five prime orientation :py:class:`BreakEnd`s

vcfpy.record.FORWARD = '+'¶: code for forward orientation

vcfpy.record.HET = 1¶: Code for heterozygous

vcfpy.record.HOM_ALT = 2¶: Code for homozygous alternative

vcfpy.record.HOM_REF = 0¶: Code for homozygous reference

vcfpy.record.INDEL = 'INDEL'¶: Code for indel allele, includes substitutions of unequal length

vcfpy.record.INS = 'INS'¶: Code for “clean” insertion allele

vcfpy.record.MIXED = 'MIXED'¶: Code for mixed variant type

vcfpy.record.MNV = 'MNV'¶: Code for a multi nucleotide variant allele

vcfpy.record.RESERVED_CHARS = ':;=%,\r\n\t'¶: Characters reserved in VCF, have to be escaped

vcfpy.record.REVERSE = '-'¶: code for reverse orientation

class vcfpy.record.Record(CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT, calls)[source]¶

Bases: object

Represent one record from the VCF file

Record objects are iterators of their calls

ALT = None¶: A list of alternative allele records of type AltRecord

CHROM = None¶: A str with the chromosome name

FILTER = None¶: A list of strings for the FILTER column

FORMAT = None¶: A list of strings for the FORMAT column

ID = None¶: A list of the semicolon-separated values of the ID column

INFO = None¶: An OrderedDict giving the values of the INFO column, flags are mapped to True

POS = None¶: An int with a 1-based begin position

QUAL = None¶: The quality value, can be None

REF = None¶: A str with the REF value

add_filter(label)[source]¶: Add label to FILTER if not set yet, removing PASS entry if present

add_format(key, value=None)[source]¶

Add an entry to format

The record’s calls data[key] will be set to value if not yet set and value is not None. If key is already in FORMAT then nothing is done.

affected_end¶

Return affected start position in 0-based coordinates

For SNVs, MNVs, and deletions, the behaviour is based on the start position and the length of the REF. In the case of insertions, the position behind the insert position is returned, yielding a 0-length interval together with affected_start()

affected_start¶

Return affected start position in 0-based coordinates

For SNVs, MNVs, and deletions, the behaviour is the start position. In the case of insertions, the position behind the insert position is returned, yielding a 0-length interval together with affected_end()

begin = None¶: An int with a 0-based begin position

call_for_sample = None¶: A mapping from sample name to entry in self.calls

calls = None¶: A list of genotype Call objects

end = None¶: An int with a 0-based end position

is_snv()[source]¶: Return True if it is a SNV

vcfpy.record.SNV = 'SNV'¶: Code for single nucleotide variant allele

vcfpy.record.SV = 'SV'¶: Code for structural variant allele

vcfpy.record.SYMBOLIC = 'SYMBOLIC'¶: Code for symbolic allele that is neither SV nor BND

class vcfpy.record.SingleBreakEnd(orientation, sequence)[source]¶

Bases: vcfpy.record.BreakEnd

A placeholder for a single breakend

class vcfpy.record.Substitution(type_, value)[source]¶

Bases: vcfpy.record.AltRecord

A basic alternative allele record describing a REF->AltRecord substitution

Note that this subsumes MNVs, insertions, and deletions.

serialize()[source]¶

value = None¶: The alternative base sequence to use in the substitution

class vcfpy.record.SymbolicAllele(value)[source]¶

Bases: vcfpy.record.AltRecord

A placeholder for a symbolic allele

The allele symbol must be defined in the header using an ALT header before being parsed. Usually, this is used for succinct descriptions of structural variants or IUPAC parameters.

serialize()[source]¶

value = None¶: The symbolic value, e.g. ‘DUP’

vcfpy.record.THREE_PRIME = '3'¶: code for three prime orientation :py:class:`BreakEnd`s

vcfpy.record.UNESCAPE_MAPPING = [('%25', '%'), ('%3A', ':'), ('%3B', ';'), ('%3D', '='), ('%2C', ','), ('%0D', '\r'), ('%0A', '\n'), ('%09', '\t')]¶: Mapping from escaped characters to reserved one

vcfpy.warn_utils module¶

Code for printing warnings

class vcfpy.warn_utils.WarningHelper(prefix='[vcfpy] ', stream=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>)[source]¶

Bases: object

Helper class for checkers

This class implements a “warn_once” function that allows to print warnings only once and a “print_summary” function that, in the end, allows to print a summary table with number of warnings.

prefix = None¶: string to prepend before all warnings

print_summary(title='WARNINGS', format='{: 6}\t{}')[source]¶: Print warning messages and count to self.stream

stream = None¶: the stream to write warnings to

warn_once(message)[source]¶: Warn once with message

warning_counter = None¶: mapping from warning string to counter

vcfpy.writer module¶

Writing of VCF files to file-like objects

Currently, only writing to plain-text files is supported

class vcfpy.writer.Writer(stream, header, samples, path=None)[source]¶

Bases: object

Class for writing VCF files to file-like objects

Instead of using the constructor, use the class methods from_stream() and from_path().

The writer has to be constructed with a Header and a SamplesInfos object and the full VCF header will be written immediately on construction. This, of course, implies that modifying the header after construction is illegal.

close()[source]¶: Close underlying stream

classmethod from_path(klass, path, header, samples)[source]¶

Create new Writer from path

Parameters:	path – the path to load from (converted to `str` for compatibility with `path.py`) header – VCF header to use samples – SamplesInfos to use

classmethod from_stream(klass, stream, header, samples, path=None, use_bgzf=None)[source]¶

Create new Writer from file

Note that for getting bgzf support, you have to pass in a stream opened in binary mode. Further, you either have to provide a path ending in ".gz" or set use_bgzf=True. Otherwise, you will get the notorious “TypeError: ‘str’ does not support the buffer interface”.

Parameters:	stream – `file`-like object to write to header – VCF header to use samples – SamplesInfos to use path – optional string with path to store (for display only) use_bgzf – indicator whether to write bgzf to `stream` if `True`, prevent if `False`, interpret `path` if `None`

header = None¶: the :py:class:~vcfpy.header.Header` written out

path = None¶: optional str with the path to the stream

samples = None¶: the :py:class:~vcfpy.header.SamplesInfos` written out

stream = None¶: stream (file-like object) to read from

write_record(record)[source]¶: Write out the given vcfpy.record.Record to this Writer

vcfpy.writer.format_atomic(value)[source]¶

Format atomic value

This function also takes care of escaping the value in case one of the reserved characters occurs in the value.

vcfpy.writer.format_value(field_info, value)[source]¶: Format possibly compound value given the FieldInfo

vcfpy package¶

Submodules¶

vcfpy.bgzf module¶

vcfpy.exceptions module¶

vcfpy.header module¶

vcfpy.parser module¶

vcfpy.reader module¶

vcfpy.record module¶

vcfpy.warn_utils module¶

vcfpy.writer module¶

Module contents¶