vcfpy package¶
Submodules¶
vcfpy.exceptions module¶
Exceptions for the vcfpy module
-
exception
vcfpy.exceptions.
HeaderNotFound
[source]¶ Bases:
vcfpy.exceptions.VCFPyException
Raised when a VCF header could not be found
-
exception
vcfpy.exceptions.
IncorrectVCFFormat
[source]¶ Bases:
vcfpy.exceptions.VCFPyException
Raised on problems parsing VCF
-
exception
vcfpy.exceptions.
InvalidHeaderException
[source]¶ Bases:
vcfpy.exceptions.VCFPyException
Raised in the case of invalid header formatting
-
exception
vcfpy.exceptions.
InvalidRecordException
[source]¶ Bases:
vcfpy.exceptions.VCFPyException
Raised in the case of invalid record formatting
vcfpy.header module¶
Code for representing the VCF header part
The VCF header class structure is modeled after HTSJDK
-
vcfpy.header.
FORMAT_TYPES
= ('Integer', 'Float', 'Character', 'String')¶ valid FORMAT value types
-
class
vcfpy.header.
FieldInfo
(type_, number)[source]¶ Bases:
object
Core information for describing field type and number
-
number
= None¶ Number description, either an int or constant
-
type
= None¶ The type, one of INFO_TYPES or FORMAT_TYPES
-
-
vcfpy.header.
HEADER_NUMBER_ALLELES
= 'A'¶ number of alleles excluding reference
-
vcfpy.header.
HEADER_NUMBER_GENOTYPES
= 'G'¶ number of genotypes
-
vcfpy.header.
HEADER_NUMBER_REF
= 'R'¶ number of alleles including reference
-
vcfpy.header.
HEADER_NUMBER_UNBOUNDED
= '.'¶ unbounded number of values
-
vcfpy.header.
INFO_TYPES
= ('Integer', 'Float', 'Flag', 'Character', 'String')¶ valid INFO value types
-
vcfpy.header.
LINES_WITH_ID
= ('FORMAT', 'INFO', 'FILTER', 'contig')¶ header lines that contain an “ID” entry
-
class
vcfpy.header.
SamplesInfos
(sample_names)[source]¶ Bases:
object
Helper class for handling and mapping of sample names to numeric indices
-
name_to_idx
= None¶ mapping from sample name to index
-
names
= None¶ list of sample names
-
-
vcfpy.header.
VALID_NUMBERS
= ('A', 'R', 'G', '.')¶ valid values for “Number” entries, except for integers
-
class
vcfpy.header.
VCFCompoundHeaderLine
(key, value, mapping)[source]¶ Bases:
vcfpy.header.VCFHeaderLine
Base class for compound header lines, currently format and header lines
Compound header lines describe fields that can have more than one entry.
-
mapping
= None¶ OrderedDict with key/value mapping
-
-
class
vcfpy.header.
VCFContigHeaderLine
(key, value, mapping)[source]¶ Bases:
vcfpy.header.VCFSimpleHeaderLine
Contig header line
Most importantly, parses the
'length'
key into an integer-
id
= None¶ name of the contig
-
length
= None¶ length of the contig,
None
if missing
-
-
class
vcfpy.header.
VCFFilterHeaderLine
(key, value, mapping)[source]¶ Bases:
vcfpy.header.VCFSimpleHeaderLine
FILTER header line
-
description
= None¶ description for the filter,
None
if missing
-
id
= None¶ token for the filter
-
-
class
vcfpy.header.
VCFFormatHeaderLine
(key, value, mapping)[source]¶ Bases:
vcfpy.header.VCFCompoundHeaderLine
Header line for FORMAT fields
-
description
= None¶ description, should be given,
None
if not given
-
id
= None¶ key in the INFO field
-
source
= None¶ source of INFO field,
None
if not given
-
type
= None¶ value type
-
version
= None¶ version of INFO field,
None
if not given
-
-
class
vcfpy.header.
VCFHeader
(lines=[], samples=None)[source]¶ Bases:
object
Represent header of VCF file
While this class allows mutating records, it should not be changed once it has been assigned to
-
lines
= None¶ list
of :py:VCFHeaderLine objects
-
samples
= None¶ SamplesInfo
object
-
-
class
vcfpy.header.
VCFHeaderLine
(key, value)[source]¶ Bases:
object
Base class for VCF header lines
-
key
= None¶ str
with key of header line
-
value
= None¶ str
with raw value of header line
-
-
class
vcfpy.header.
VCFInfoHeaderLine
(key, value, mapping)[source]¶ Bases:
vcfpy.header.VCFCompoundHeaderLine
Header line for INFO fields
Note that the
Number
field will be parsed into anint
if possible. Otherwise, the constantsHEADER_NUMBER_*
will be used.-
description
= None¶ description, should be given,
None
if not given
-
id
= None¶ key in the INFO field
-
source
= None¶ source of INFO field,
None
if not given
-
type
= None¶ value type
-
version
= None¶ version of INFO field,
None
if not given
-
-
class
vcfpy.header.
VCFSimpleHeaderLine
(key, value, mapping)[source]¶ Bases:
vcfpy.header.VCFHeaderLine
Base class for simple header lines, currently contig and filter header lines
Raises: vcfpy.exceptions.InvalidHeaderException
in the case of missing key"ID"
-
mapping
= None¶ collections.OrderedDict
with key/value mapping of the attributes
-
vcfpy.parser module¶
Parsing of VCF files from str
-
class
vcfpy.parser.
MappingVCFHeaderLineParser
(line_class)[source]¶ Bases:
vcfpy.parser.VCFHeaderLineParserBase
Parse into VCFHeaderLine (no particular structure)
-
line_class
= None¶ the class to use for the VCF header line
-
-
vcfpy.parser.
SUPPORTED_VCF_VERSIONS
= ('VCFv4.0', 'VCFv4.1', 'VCFv4.2', 'VCFv4.3')¶ Supported VCF versions, a warning will be issued otherwise
-
class
vcfpy.parser.
StupidVCFHeaderLineParser
[source]¶ Bases:
vcfpy.parser.VCFHeaderLineParserBase
Parse into VCFHeaderLine (no particular structure)
-
class
vcfpy.parser.
VCFHeaderLineParserBase
[source]¶ Bases:
object
Parse into appropriate VCFHeaderLine
-
parse_key_value
(key, value)[source]¶ Parse the key/value pair
Parameters: - key (str) – the key to use in parsing
- value (str) – the value to parse
Returns: vcfpy.header.VCFHeaderLine
object
-
-
class
vcfpy.parser.
VCFHeaderParser
(sub_parsers)[source]¶ Bases:
object
Helper class for parsing a VCF header
-
parse_line
(line)[source]¶ Parse VCF header
line
(trailing ‘ ‘ or ‘ ‘ is ignored)param str line: str
with line to parseparam dict sub_parsers: dict
mapping header line types to appropriate parser objectsreturns: appropriate VCFHeaderLine
parsed fromline
raises: vcfpy.exceptions.InvalidHeaderException
if there was a problem parsing the file
-
-
class
vcfpy.parser.
VCFParser
(stream, path=None)[source]¶ Bases:
object
Class for line-wise parsing of VCF files
In most cases, you want to use
vcfpy.reader.VCFReader
instead.Parameters: - stream –
file
-like object to read from - path (str) – path the VCF is parsed from, for display purposes only, optional
-
header
= None¶ header, once it has been read
-
parse_header
()[source]¶ Read and parse
vcfpy.header.VCFHeader
from file, set intoself.header
and return itReturns: vcfpy.header.VCFHeader
Raises: vcfpy.exceptions.InvalidHeaderException
in the case of problems reading the header
-
parse_next_record
()[source]¶ Read, parse and return next
vcfpy.record.VCFRecord
Returns: next VCF record or None
if at endRaises: vcfpy.exceptions.InvalidRecordException
in the case of problems reading the record
-
samples
= None¶ vcfpy.header.SamplesInfos
with sample information; set on parsing the header
- stream –
-
class
vcfpy.parser.
VCFRecordParser
(header, samples)[source]¶ Bases:
object
Helper class for parsing VCF records
-
header
= None¶ VCFHeader with the meta information
-
parse_line
(line_str)[source]¶ Parse line from file (including trailing line break) and return resulting VCFRecord
-
samples
= None¶ SamplesInfos with sample information
-
-
vcfpy.parser.
convert_field_value
(key, type_, value)[source]¶ Convert atomic field value according to the type
-
vcfpy.parser.
parse_mapping
(value)[source]¶ Parse the given VCF header line mapping
Such a mapping consists of “key=value” pairs, separated by commas and wrapped into angular brackets (“<...>”). Strings are usually quoted, for certain known keys, exceptions are made, depending on the tag key. this, however, only gets important when serializing.
Raises: vcfpy.exceptions.InvalidHeaderException
if there was a problem parsing the file
vcfpy.reader module¶
Parsing of VCF files from file
-like objects
-
class
vcfpy.reader.
VCFReader
(stream, path=None)[source]¶ Bases:
object
Class for parsing of files from
file
-like objectsInstead of using the constructor, use the class methods
from_file()
andfrom_path()
.On construction, the header will be read from the file which can cause problems. After construction,
VCFReader
can be used as an iterable ofVCFRecord
.Raises: InvalidHeaderException
in the case of problems reading the header-
classmethod
from_file
(klass, stream, path=None)[source]¶ Create new
VCFReader
from fileParameters: - stream –
file
-like object to read from - path – optional string with path to store (for display only)
- stream –
-
classmethod
from_path
(klass, path)[source]¶ Create new
VCFReader
from pathParameters: path – the path to load from (converted to str
for compatibility withpath.py
)
-
header
= None¶ the VCFHeader
-
jump_to
(chrom, begin, end)[source]¶ Jump to the start position of the given chromosomal position and limit iteration to the end position
Parameters: - chrom (str) – name of the chromosome to jump to
- begin (int) – 0-based begin position (inclusive)
- end (int) – 0-based end position (exclusive)
-
parser
= None¶ the parser to use
-
path
= None¶ optional
str
with the path to the stream
-
samples
= None¶ the
vcfpy.header.SamplesInfos
object with the sample name information
-
stream
= None¶ stream (
file
-like object) to read from
-
classmethod
vcfpy.record module¶
Code for representing a VCF record
The VCF record structure is modeled after the one of PyVCF
-
class
vcfpy.record.
AltRecord
(type_=None)[source]¶ Bases:
object
An alternative allele Record
Currently, can be a substitution, an SV placeholder, or breakend
-
type
= None¶ String describing the type of the variant, could be one of SNV, MNV, could be any of teh types described in the ALT header lines, such as DUP, DEL, INS, ...
-
-
vcfpy.record.
BND
= 'BND'¶ Code for break-end
-
class
vcfpy.record.
BreakEnd
(type_, value)[source]¶ Bases:
vcfpy.record.AltRecord
A placeholder for a breakend
-
value
= None¶ The alternative base sequence to use in the substitution
-
-
class
vcfpy.record.
Call
(sample, data, site=None)[source]¶ Bases:
object
The information for a genotype callable
By VCF, this should always include the genotype information and can contain an arbitrary number of further annotation, e.g., the coverage at the variant position.
-
data
= None¶ an OrderedDict with the key/value pair information from the call’s data
-
gt_bases
¶ Return the actual genotype alleles, e.g. if VCF genotype is 0/1, could return A/T
-
gt_type
¶ The type of genotype, mapping is
- hom_ref = 0
- het = 1
- hom_alt = 2 (which alt is untracked)
- uncalled =
None
-
is_filtered
¶ Return
True
for filtered calls
-
is_het
¶ Return
True
for filtered calls
-
is_phased
¶ Return
True
for phased calls
-
is_variant
¶ Return
True
for filtered calls
-
sample
= None¶ the name of the sample for which the call was made
-
-
vcfpy.record.
DEL
= 'DEL'¶ Code for “clean” deletion
-
vcfpy.record.
INDEL
= 'INDEL'¶ Code for indel, includes substitutions of unequal length
-
vcfpy.record.
INS
= 'INS'¶ Code for “clean” insertion
-
vcfpy.record.
MNV
= 'MNV'¶ Code for a multi nucleotide variant
-
class
vcfpy.record.
Record
(CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT, calls)[source]¶ Bases:
object
Represent one record from the VCF file
Record objects are iterators of their calls
-
CHROM
= None¶ A
str
with the chromosome name
-
FILTER
= None¶ A list of strings for the FILTER column
-
FORMAT
= None¶ A list of strings for the FORMAT column
-
ID
= None¶ A list of the semicolon-separated values of the ID column
-
INFO
= None¶ An OrderedDict giving the values of the INFO column, flags are mapped to
True
-
POS
= None¶ An
int
with a 1-based begin position
-
QUAL
= None¶ The quality value, can be
None
-
REF
= None¶ A
str
with the REF value
-
add_format
(key, value=None)[source]¶ Add an entry to format
The record’s calls
data[key]
will be set tovalue
if not yet set and value is notNone
. If key is already in FORMAT then nothing is done.
-
begin
= None¶ An
int
with a 0-based begin position
-
call_for_sample
= None¶ A mapping from sample name to entry in self.calls
-
end
= None¶ An
int
with a 0-based end position
-
-
vcfpy.record.
SNV
= 'SNV'¶ Code for single nucleotide variant
-
class
vcfpy.record.
SV
(type_, value)[source]¶ Bases:
vcfpy.record.AltRecord
Code for structural variant
-
value
= None¶ The alternative base sequence to use in the substitution
-
-
vcfpy.record.
SV_CODES
= ('DEL', 'INS', 'DUP', 'INV', 'CNV')¶ Codes for structural variants
-
vcfpy.record.
SYMBOLIC
= 'SYMBOLIC'¶ Code for symbolic allele that is neither SV nor BND
-
class
vcfpy.record.
SingleBreakEnd
(type_, value)[source]¶ Bases:
vcfpy.record.AltRecord
A placeholder for a single breakend
-
value
= None¶ The alternative base sequence to use in the substitution
-
-
class
vcfpy.record.
Substitution
(type_, value)[source]¶ Bases:
vcfpy.record.AltRecord
A basic alternative allele record describing a REF->AltRecord substitution
Note that this subsumes MNVs, insertions, and deletions.
-
value
= None¶ The alternative base sequence to use in the substitution
-
-
class
vcfpy.record.
SymbolicAllele
(type_, value)[source]¶ Bases:
vcfpy.record.AltRecord
A placeholder for a symbolic allele
-
value
= None¶ The alternative base sequence to use in the substitution
-