vcfpy package¶
Submodules¶
vcfpy.exceptions module¶
Exceptions for the vcfpy module
-
exception
vcfpy.exceptions.
HeaderNotFound
[source]¶ Bases:
vcfpy.exceptions.VCFPyException
Raised when a VCF header could not be found
-
exception
vcfpy.exceptions.
IncorrectVCFFormat
[source]¶ Bases:
vcfpy.exceptions.VCFPyException
Raised on problems parsing VCF
-
exception
vcfpy.exceptions.
InvalidHeaderException
[source]¶ Bases:
vcfpy.exceptions.VCFPyException
Raised in the case of invalid header formatting
-
exception
vcfpy.exceptions.
InvalidRecordException
[source]¶ Bases:
vcfpy.exceptions.VCFPyException
Raised in the case of invalid record formatting
vcfpy.header module¶
Code for representing the VCF header part
The VCF header class structure is modeled after HTSJDK
-
class
vcfpy.header.
AltAlleleHeaderLine
(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶ Bases:
vcfpy.header.SimpleHeaderLine
Alternative allele header line
Mostly used for defining symbolic alleles for structural variants and IUPAC ambiguity codes
-
classmethod
from_mapping
(klass, mapping)[source]¶ Construct from mapping, not requiring the string value
-
id
= None¶ name of the alternative allele
-
classmethod
-
class
vcfpy.header.
CompoundHeaderLine
(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶ Bases:
vcfpy.header.HeaderLine
Base class for compound header lines, currently format and header lines
Compound header lines describe fields that can have more than one entry.
-
mapping
= None¶ OrderedDict with key/value mapping
-
value
¶
-
-
class
vcfpy.header.
ContigHeaderLine
(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶ Bases:
vcfpy.header.SimpleHeaderLine
Contig header line
Most importantly, parses the
'length'
key into an integer-
classmethod
from_mapping
(klass, mapping)[source]¶ Construct from mapping, not requiring the string value
-
id
= None¶ name of the contig
-
length
= None¶ length of the contig,
None
if missing
-
classmethod
-
vcfpy.header.
FORMAT_TYPES
= ('Integer', 'Float', 'Character', 'String')¶ valid FORMAT value types
-
class
vcfpy.header.
FieldInfo
(type_, number, description=None)[source]¶ Bases:
object
Core information for describing field type and number
-
description
= None¶ Description for the header field, optional
-
number
= None¶ Number description, either an int or constant
-
type
= None¶ The type, one of INFO_TYPES or FORMAT_TYPES
-
-
class
vcfpy.header.
FilterHeaderLine
(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶ Bases:
vcfpy.header.SimpleHeaderLine
FILTER header line
-
description
= None¶ description for the filter,
None
if missing
-
classmethod
from_mapping
(klass, mapping)[source]¶ Construct from mapping, not requiring the string value
-
id
= None¶ token for the filter
-
-
class
vcfpy.header.
FormatHeaderLine
(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶ Bases:
vcfpy.header.CompoundHeaderLine
Header line for FORMAT fields
-
description
= None¶ description, should be given,
None
if not given
-
classmethod
from_mapping
(klass, mapping)[source]¶ Construct from mapping, not requiring the string value
-
id
= None¶ key in the INFO field
-
source
= None¶ source of INFO field,
None
if not given
-
type
= None¶ value type
-
version
= None¶ version of INFO field,
None
if not given
-
-
vcfpy.header.
HEADER_NUMBER_ALLELES
= 'A'¶ number of alleles excluding reference
-
vcfpy.header.
HEADER_NUMBER_GENOTYPES
= 'G'¶ number of genotypes
-
vcfpy.header.
HEADER_NUMBER_REF
= 'R'¶ number of alleles including reference
-
vcfpy.header.
HEADER_NUMBER_UNBOUNDED
= '.'¶ unbounded number of values
-
class
vcfpy.header.
Header
(lines=[], samples=None, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶ Bases:
object
Represent header of VCF file
While this class allows mutating records, it should not be changed once it has been assigned to
This class provides function for adding lines to a header and updating the supporting index data structures. There is no explicit API for removing header lines, the best way is to reconstruct a new
Header
instance with a filtered list of header lines.-
lines
= None¶ list
of :py:HeaderLine objects
-
samples
= None¶ SamplesInfo
object
-
-
class
vcfpy.header.
HeaderLine
(key, value, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶ Bases:
object
Base class for VCF header lines
-
key
= None¶ str
with key of header line
-
value
¶
-
warning_helper
= None¶ Helper for printing warnings
-
-
vcfpy.header.
INFO_TYPES
= ('Integer', 'Float', 'Flag', 'Character', 'String')¶ valid INFO value types
-
class
vcfpy.header.
InfoHeaderLine
(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶ Bases:
vcfpy.header.CompoundHeaderLine
Header line for INFO fields
Note that the
Number
field will be parsed into anint
if possible. Otherwise, the constantsHEADER_NUMBER_*
will be used.-
description
= None¶ description, should be given,
None
if not given
-
classmethod
from_mapping
(klass, mapping)[source]¶ Construct from mapping, not requiring the string value
-
id
= None¶ key in the INFO field
-
source
= None¶ source of INFO field,
None
if not given
-
type
= None¶ value type
-
version
= None¶ version of INFO field,
None
if not given
-
-
vcfpy.header.
LINES_WITH_ID
= ('ALT', 'contig', 'FILTER', 'FORMAT', 'INFO', 'META', 'PEDIGREE', 'SAMPLE')¶ header lines that contain an “ID” entry
-
class
vcfpy.header.
MetaHeaderLine
(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶ Bases:
vcfpy.header.SimpleHeaderLine
Alternative allele header line
Used for defining set of valid values for samples keys
-
classmethod
from_mapping
(klass, mapping)[source]¶ Construct from mapping, not requiring the string value
-
id
= None¶ name of the alternative allele
-
classmethod
-
class
vcfpy.header.
PedigreeHeaderLine
(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶ Bases:
vcfpy.header.SimpleHeaderLine
Header line for defining a pedigree entry
-
classmethod
from_mapping
(klass, mapping)[source]¶ Construct from mapping, not requiring the string value
-
id
= None¶ name of the alternative allele
-
classmethod
-
vcfpy.header.
RESERVED_INFO
= {'HOMSEQ': FieldInfo('String', '.', 'Sequence of base pair identical micro-homology at event breakpoints'), 'CN': FieldInfo('Integer', 1, 'Copy number of segment containing breakend'), 'NS': FieldInfo('Integer', 1, 'Number of samples with data'), 'METRANS': FieldInfo('String', 4, 'Mobile element transduction info of the form CHR,START,END,POLARITY'), 'SB': FieldInfo('Integer', 4, 'Strand bias at this position'), 'MEINFO': FieldInfo('String', 4, 'Mobile element info of the form NAME,START,END,POLARITY'), 'BKPTID': FieldInfo('String', '.', 'ID of the assembled alternate allele in the assembly file'), 'CICNADJ': FieldInfo('Integer', '.', 'Confidence interval around copy number for the adjacency'), 'MQ0': FieldInfo('Integer', 1, 'Number of MAPQ == 0 reads covering this record'), 'MATEID': FieldInfo('String', '.', 'ID of mate breakends'), 'AD': FieldInfo('Integer', 'R', 'Total read depth for each allele'), 'ADR': FieldInfo('Integer', 'R', 'Reverse read depth for each allele'), 'END': FieldInfo('Integer', 1, 'End position of the variant described in this record (for symbolic alleles)'), 'H3': FieldInfo('Flag', 0, 'Membership in HapMap 3'), 'AC': FieldInfo('Integer', 'A', 'Allele count in genotypes, for each ALT allele, in the same order as listed'), 'CIGAR': FieldInfo('String', 'A', 'CIGAR string describing how to align each ALT allele to the reference allele'), 'DP': FieldInfo('Integer', 1, 'Read Depth of segment containing breakend'), 'AA': FieldInfo('String', 1, 'Ancestral Allele'), 'DPADJ': FieldInfo('Integer', '.', 'Read Depth of adjacency'), 'CIEND': FieldInfo('Integer', 2, 'Confidence interval around END for imprecise variants'), 'ADF': FieldInfo('Integer', 'R', 'Forward read depth for each allele'), 'CIPOS': FieldInfo('Integer', 2, 'Confidence interval around POS for imprecise variants'), 'PARID': FieldInfo('String', 1, 'ID of partner breakend'), 'HOMLEN': FieldInfo('Integer', '.', 'Length of base pair identical micro-homology at event breakpoints'), 'CILEN': FieldInfo('Integer', 2, 'Confidence interval around the inserted material between breakends'), 'AF': FieldInfo('Float', 'A', 'Allele frequency for each ALT allele in the same order as listed: used for estimating from primary data not called genotypes'), 'EVENT': FieldInfo('String', 1, 'ID of event associated to breakend'), '1000G': FieldInfo('Flag', 0, 'Membership in 1000 Genomes'), 'H2': FieldInfo('Flag', 0, 'Membership in HapMap 2'), 'NOVEL': FieldInfo('Flag', 0, 'Indicates a novel structural variation'), 'AN': FieldInfo('Integer', 1, 'Total number of alleles in called genotypes'), 'DB': FieldInfo('Flag', 0, 'dbSNP membership'), 'BQ': FieldInfo('Float', 1, 'RMS base quality at this position'), 'SVTYPE': FieldInfo('String', 1, 'Type of structural variant'), 'DGVID': FieldInfo('String', 1, 'ID of this element in Database of Genomic Variation'), 'CICN': FieldInfo('Integer', 2, 'Confidence interval around copy number for the segment'), 'DBVARID': FieldInfo('String', 1, 'ID of this element in DBVAR'), 'VALIDATED': FieldInfo('Flag', 0, 'Validated by follow-up experiment'), 'DBRIPID': FieldInfo('String', 1, 'ID of this element in DBRIP'), 'CNADJ': FieldInfo('Integer', '.', 'Copy number of adjacency'), 'IMPRECISE': FieldInfo('Flag', 0, 'Imprecise structural variation'), 'MQ': FieldInfo('Integer', 1, 'RMS mapping quality'), 'SOMATIC': FieldInfo('Flag', 0, 'Indicates that the record is a somatic mutation, for cancer genomics'), 'SVLEN': FieldInfo('Integer', 1, 'Difference in length between REF and ALT alleles')}¶ Reserved fields for INFO from VCF v4.3
-
class
vcfpy.header.
SampleHeaderLine
(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶ Bases:
vcfpy.header.SimpleHeaderLine
Header line for defining a SAMPLE entry
-
classmethod
from_mapping
(klass, mapping)[source]¶ Construct from mapping, not requiring the string value
-
id
= None¶ name of the alternative allele
-
classmethod
-
class
vcfpy.header.
SamplesInfos
(sample_names)[source]¶ Bases:
object
Helper class for handling and mapping of sample names to numeric indices
-
name_to_idx
= None¶ mapping from sample name to index
-
names
= None¶ list of sample names
-
-
class
vcfpy.header.
SimpleHeaderLine
(key, value, mapping, warning_helper=<vcfpy.warn_utils.WarningHelper object>)[source]¶ Bases:
vcfpy.header.HeaderLine
Base class for simple header lines, currently contig and filter header lines
Raises: vcfpy.exceptions.InvalidHeaderException
in the case of missing key"ID"
-
mapping
= None¶ collections.OrderedDict
with key/value mapping of the attributes
-
value
¶
-
-
vcfpy.header.
VALID_NUMBERS
= ('A', 'R', 'G', '.')¶ valid values for “Number” entries, except for integers
vcfpy.parser module¶
Parsing of VCF files from str
-
class
vcfpy.parser.
FormatChecker
[source]¶ Bases:
object
Helper class for checking a FORMAT field
-
header
= None¶ VCFHeader to use for checking
-
run
(call, num_alts)[source]¶ Check
FORMAT
of a record.CallCurrently, only checks for consistent counts are implemented
-
warning_helper
= None¶ helper class for printing warnings
-
-
class
vcfpy.parser.
HeaderChecker
(warning_helper)[source]¶ Bases:
object
Helper class for checking a VCF header
-
run
(header)[source]¶ Check the header
Warnings will be printed using
self.warning_helper
while errors will raise an exception.Raises: vcfpy.exceptions.InvalidHeaderException
in the case of severe errors reading the header
-
warning_helper
= None¶ helper class for printing warnings
-
-
class
vcfpy.parser.
HeaderLineParserBase
(warning_helper)[source]¶ Bases:
object
Parse into appropriate HeaderLine
-
parse_key_value
(key, value)[source]¶ Parse the key/value pair
Parameters: - key (str) – the key to use in parsing
- value (str) – the value to parse
Returns: vcfpy.header.HeaderLine
object
-
warning_helper
= None¶ WarningHelper
to use for print warnings
-
-
class
vcfpy.parser.
HeaderParser
(warning_helper)[source]¶ Bases:
object
Helper class for parsing a VCF header
-
parse_line
(line)[source]¶ Parse VCF header
line
(trailing ‘ ‘ or ‘ ‘ is ignored)param str line: str
with line to parseparam dict sub_parsers: dict
mapping header line types to appropriate parser objectsreturns: appropriate HeaderLine
parsed fromline
raises: vcfpy.exceptions.InvalidHeaderException
if there was a problem parsing the file
-
sub_parsers
= None¶ Sub parsers to use for parsing the header lines
-
warning_helper
= None¶ WarningHelper to use for printing warnings
-
-
class
vcfpy.parser.
InfoChecker
(header, warning_helper)[source]¶ Bases:
object
Helper class for checking an INFO field
-
header
= None¶ VCFHeader to use for checking
-
run
(key, value, num_alts)[source]¶ Check value in INFO[key] of record
Currently, only checks for consistent counts are implemented
Parameters: - key (str) – key of INFO entry to check
- value – value to check
- alts (int) – list of alternative alleles, for length
-
warning_helper
= None¶ helper class for printing warnings
-
-
class
vcfpy.parser.
MappingHeaderLineParser
(warning_helper, line_class)[source]¶ Bases:
vcfpy.parser.HeaderLineParserBase
Parse into HeaderLine (no particular structure)
-
line_class
= None¶ the class to use for the VCF header line
-
-
class
vcfpy.parser.
Parser
(stream, path=None, record_checks=[])[source]¶ Bases:
object
Class for line-wise parsing of VCF files
In most cases, you want to use
vcfpy.reader.Reader
instead.Parameters: - stream –
file
-like object to read from - path (str) – path the VCF is parsed from, for display purposes only, optional
-
header
= None¶ header, once it has been read
-
parse_header
()[source]¶ Read and parse
vcfpy.header.Header
from file, set intoself.header
and return itReturns: vcfpy.header.Header
Raises: vcfpy.exceptions.InvalidHeaderException
in the case of problems reading the header
-
parse_next_record
()[source]¶ Read, parse and return next
vcfpy.record.Record
Returns: next VCF record or None
if at endRaises: vcfpy.exceptions.InvalidRecordException
in the case of problems reading the record
-
record_checks
= None¶ checks to perform, can contain ‘INFO’ and ‘FORMAT’
-
samples
= None¶ vcfpy.header.SamplesInfos
with sample information; set on parsing the header
-
warning_helper
= None¶ helper for printing warnings
- stream –
-
class
vcfpy.parser.
RecordParser
(header, samples, warning_helper, record_checks=[])[source]¶ Bases:
object
Helper class for parsing VCF records
-
header
= None¶ Header with the meta information
-
parse_line
(line_str)[source]¶ Parse line from file (including trailing line break) and return resulting Record
-
record_checks
= None¶ The checks to perform, can contain ‘INFO’ and ‘FORMAT’
-
samples
= None¶ SamplesInfos with sample information
-
warning_helper
= None¶ Helper class for printing warnings
-
-
vcfpy.parser.
SUPPORTED_VCF_VERSIONS
= ('VCFv4.0', 'VCFv4.1', 'VCFv4.2', 'VCFv4.3')¶ Supported VCF versions, a warning will be issued otherwise
-
class
vcfpy.parser.
StupidHeaderLineParser
(warning_helper)[source]¶ Bases:
vcfpy.parser.HeaderLineParserBase
Parse into HeaderLine (no particular structure)
-
vcfpy.parser.
build_header_parsers
(warning_helper)[source]¶ Return mapping for parsers to use for each VCF header type
Inject the WarningHelper into the parsers.
-
vcfpy.parser.
convert_field_value
(key, type_, value)[source]¶ Convert atomic field value according to the type
-
vcfpy.parser.
parse_mapping
(value, warning_helper)[source]¶ Parse the given VCF header line mapping
Such a mapping consists of “key=value” pairs, separated by commas and wrapped into angular brackets (“<...>”). Strings are usually quoted, for certain known keys, exceptions are made, depending on the tag key. this, however, only gets important when serializing.
Parameters: warning_helper (WarningHelper) – object to use for printing warning messages Raises: vcfpy.exceptions.InvalidHeaderException
if there was a problem parsing the file
-
vcfpy.parser.
process_alt
(header, ref, alt_str)[source]¶ Process alternative value using Header in
header
-
vcfpy.parser.
split_mapping
(pair_str, warning_helper)[source]¶ Split the
str
inpair_str
at'='
Warn if key needs to be stripped
-
vcfpy.parser.
split_quoted_string
(s, delim=', ', quote='"', brackets='[]')[source]¶ Split string
s
at delimiter, correctly interpreting quotesFurther, interprets arrays wrapped in one level of
[]
. No recursive brackets are interpreted (as this would make the grammar non-regular and currently this complexity is not needed). Currently, quoting inside of braces is not supported either. This is just to support the example from VCF v4.3.
vcfpy.reader module¶
Parsing of VCF files from file
-like objects
-
class
vcfpy.reader.
Reader
(stream, path=None, tabix_path=None, record_checks=[])[source]¶ Bases:
object
Class for parsing of files from
file
-like objectsInstead of using the constructor, use the class methods
from_stream()
andfrom_path()
.On construction, the header will be read from the file which can cause problems. After construction,
Reader
can be used as an iterable ofRecord
.Raises: InvalidHeaderException
in the case of problems reading the header-
fetch
(chrom, begin, end)[source]¶ Jump to the start position of the given chromosomal position and limit iteration to the end position
Parameters: - chrom (str) – name of the chromosome to jump to
- begin (int) – 0-based begin position (inclusive)
- end (int) – 0-based end position (exclusive)
-
classmethod
from_path
(klass, path, tabix_path=None, record_checks=[])[source]¶ Create new
Reader
from pathParameters: - path – the path to load from (converted to
str
for compatibility withpath.py
) - tabix_path – optional string with path to TBI index,
automatic inferral from
path
will be tried on the fly if not given - record_checks (list) – record checks to perform, can contain ‘INFO’ and ‘FORMAT’
- path – the path to load from (converted to
-
classmethod
from_stream
(klass, stream, path=None, tabix_path=None, record_checks=[])[source]¶ Create new
Reader
from fileParameters: - stream –
file
-like object to read from - path – optional string with path to store (for display only)
- record_checks (list) – record checks to perform, can contain ‘INFO’ and ‘FORMAT’
- stream –
-
header
= None¶ the Header
-
parser
= None¶ the parser to use
-
path
= None¶ optional
str
with the path to the stream
-
record_checks
= None¶ checks to perform on records, can contain ‘FORMAT’ and ‘INFO’
-
samples
= None¶ the
vcfpy.header.SamplesInfos
object with the sample name information
-
stream
= None¶ stream (
file
-like object) to read from
-
tabix_file
= None¶ the
pysam.TabixFile
used for reading from index bgzip-ed VCF; constructed on the fly
-
tabix_path
= None¶ optional
str
with path to tabix file
-
vcfpy.record module¶
Code for representing a VCF record
The VCF record structure is modeled after the one of PyVCF
-
class
vcfpy.record.
AltRecord
(type_=None)[source]¶ Bases:
object
An alternative allele Record
Currently, can be a substitution, an SV placeholder, or breakend
-
type
= None¶ String describing the type of the variant, could be one of SNV, MNV, could be any of teh types described in the ALT header lines, such as DUP, DEL, INS, ...
-
-
vcfpy.record.
BND
= 'BND'¶ Code for break-end allele
-
class
vcfpy.record.
BreakEnd
(type_, value)[source]¶ Bases:
vcfpy.record.AltRecord
A placeholder for a breakend
-
value
= None¶ The alternative base sequence to use in the substitution
-
-
class
vcfpy.record.
Call
(sample, data, site=None)[source]¶ Bases:
object
The information for a genotype callable
By VCF, this should always include the genotype information and can contain an arbitrary number of further annotation, e.g., the coverage at the variant position.
-
called
= None¶ whether or not the variant is fully called
-
data
= None¶ an OrderedDict with the key/value pair information from the call’s data
-
gt_alleles
= None¶ the allele numbers (0, 1, ...) in this calls or None for no-call
-
gt_bases
¶ Return the actual genotype bases, e.g. if VCF genotype is 0/1, could return (‘A’, ‘T’)
-
gt_phase_char
¶ Return character to use for phasing
-
gt_type
¶ The type of genotype, returns one of
HOM_REF
,HOM_ALT
, andHET
.
-
is_filtered
(require=None, ignore=['PASS'])[source]¶ Return
True
for filtered callsParameters: - ignore (iterable) – if set, the filters to ignore, make sure to include ‘PASS’, when setting
- require (iterable) – if set, the filters to require for returning
True
-
is_het
¶ Return
True
for heterozygous calls
-
is_phased
¶ Return boolean indicating whether this call is phased
-
is_variant
¶ Return
True
for non-hom-ref calls
-
plodity
= None¶ the number of alleles in this sample’s call
-
sample
= None¶ the name of the sample for which the call was made
-
-
vcfpy.record.
DEL
= 'DEL'¶ Code for “clean” deletion allele
-
vcfpy.record.
ESCAPE_MAPPING
= [('%', '%25'), (':', '%3A'), (';', '%3B'), ('=', '%3D'), (',', '%2C'), ('\r', '%0D'), ('\n', '%0A'), ('\t', '%09')]¶ Mapping for escaping reserved characters
-
vcfpy.record.
HET
= 1¶ Code for heterozygous
-
vcfpy.record.
HOM_ALT
= 2¶ Code for homozygous alternative
-
vcfpy.record.
HOM_REF
= 0¶ Code for homozygous reference
-
vcfpy.record.
INDEL
= 'INDEL'¶ Code for indel allele, includes substitutions of unequal length
-
vcfpy.record.
INS
= 'INS'¶ Code for “clean” insertion allele
-
vcfpy.record.
MIXED
= 'MIXED'¶ Code for mixed variant type
-
vcfpy.record.
MNV
= 'MNV'¶ Code for a multi nucleotide variant allele
-
vcfpy.record.
RESERVED_CHARS
= ':;=%,\r\n\t'¶ Characters reserved in VCF, have to be escaped
-
class
vcfpy.record.
Record
(CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT, calls)[source]¶ Bases:
object
Represent one record from the VCF file
Record objects are iterators of their calls
-
CHROM
= None¶ A
str
with the chromosome name
-
FILTER
= None¶ A list of strings for the FILTER column
-
FORMAT
= None¶ A list of strings for the FORMAT column
-
ID
= None¶ A list of the semicolon-separated values of the ID column
-
INFO
= None¶ An OrderedDict giving the values of the INFO column, flags are mapped to
True
-
POS
= None¶ An
int
with a 1-based begin position
-
QUAL
= None¶ The quality value, can be
None
-
REF
= None¶ A
str
with the REF value
-
add_format
(key, value=None)[source]¶ Add an entry to format
The record’s calls
data[key]
will be set tovalue
if not yet set and value is notNone
. If key is already in FORMAT then nothing is done.
-
affected_end
¶ Return affected start position in 0-based coordinates
For SNVs, MNVs, and deletions, the behaviour is based on the start position and the length of the REF. In the case of insertions, the position behind the insert position is returned, yielding a 0-length interval together with :py:method:`affected_start`
-
affected_start
¶ Return affected start position in 0-based coordinates
For SNVs, MNVs, and deletions, the behaviour is the start position. In the case of insertions, the position behind the insert position is returned, yielding a 0-length interval together with :py:method:`affected_end`
-
begin
= None¶ An
int
with a 0-based begin position
-
call_for_sample
= None¶ A mapping from sample name to entry in self.calls
-
end
= None¶ An
int
with a 0-based end position
-
-
vcfpy.record.
SNV
= 'SNV'¶ Code for single nucleotide variant allele
-
class
vcfpy.record.
SV
(type_, value)[source]¶ Bases:
vcfpy.record.AltRecord
Code for structural variant allele
-
value
= None¶ The alternative base sequence to use in the substitution
-
-
vcfpy.record.
SV_CODES
= ('DEL', 'INS', 'DUP', 'INV', 'CNV')¶ Codes for structural variants
-
vcfpy.record.
SYMBOLIC
= 'SYMBOLIC'¶ Code for symbolic allele that is neither SV nor BND
-
class
vcfpy.record.
SingleBreakEnd
(type_, value)[source]¶ Bases:
vcfpy.record.AltRecord
A placeholder for a single breakend
-
value
= None¶ The alternative base sequence to use in the substitution
-
-
class
vcfpy.record.
Substitution
(type_, value)[source]¶ Bases:
vcfpy.record.AltRecord
A basic alternative allele record describing a REF->AltRecord substitution
Note that this subsumes MNVs, insertions, and deletions.
-
value
= None¶ The alternative base sequence to use in the substitution
-
-
class
vcfpy.record.
SymbolicAllele
(type_, value)[source]¶ Bases:
vcfpy.record.AltRecord
A placeholder for a symbolic allele
-
value
= None¶ The alternative base sequence to use in the substitution
-
-
vcfpy.record.
UNESCAPE_MAPPING
= [('%25', '%'), ('%3A', ':'), ('%3B', ';'), ('%3D', '='), ('%2C', ','), ('%0D', '\r'), ('%0A', '\n'), ('%09', '\t')]¶ Mapping from escaped characters to reserved one
vcfpy.writer module¶
Writing of VCF files to file
-like objects
Currently, only writing to plain-text files is supported
-
class
vcfpy.writer.
Writer
(stream, header, samples, path=None)[source]¶ Bases:
object
Class for writing VCF files to
file
-like objectsInstead of using the constructor, use the class methods
from_stream()
andfrom_path()
.The writer has to be constructed with a
Header
and aSamplesInfos
object and the full VCF header will be written immediately on construction. This, of course, implies that modifying the header after construction is illegal.-
classmethod
from_path
(klass, path, header, samples)[source]¶ Create new
Writer
from pathParameters: - path – the path to load from (converted to
str
for compatibility withpath.py
) - header – VCF header to use
- samples – SamplesInfos to use
- path – the path to load from (converted to
-
classmethod
from_stream
(klass, stream, header, samples, path=None, use_bgzf=None)[source]¶ Create new
Writer
from fileNote that for getting bgzf support, you have to pass in a stream opened in binary mode. Further, you either have to provide a
path
ending in".gz"
or setuse_bgzf=True
. Otherwise, you will get the notorious “TypeError: ‘str’ does not support the buffer interface”.Parameters: - stream –
file
-like object to write to - header – VCF header to use
- samples – SamplesInfos to use
- path – optional string with path to store (for display only)
- use_bgzf – indicator whether to write bgzf to
stream
ifTrue
, prevent ifFalse
, interpretpath
ifNone
- stream –
-
header
= None¶ the :py:class:~vcfpy.header.Header` written out
-
path
= None¶ optional
str
with the path to the stream
-
samples
= None¶ the :py:class:~vcfpy.header.SamplesInfos` written out
-
stream
= None¶ stream (
file
-like object) to read from
-
write_record
(record)[source]¶ Write out the given
vcfpy.record.Record
to this Writer
-
classmethod