Header¶
Contents
- Header
- vcfpy.OrderedDict
- vcfpy.Header
- vcfpy.HeaderLine
- vcfpy.header_without_lines
- vcfpy.SimpleHeaderLine
- vcfpy.AltAlleleHeaderLine
- vcfpy.MetaHeaderLine
- vcfpy.PedigreeHeaderLine
- vcfpy.SampleHeaderLine
- vcfpy.ContigHeaderLine
- vcfpy.FilterHeaderLine
- vcfpy.CompoundHeaderLine
- vcfpy.InfoHeaderLine
- vcfpy.FormatHeaderLine
- vcfpy.FieldInfo
- vcfpy.SamplesInfos
vcfpy.OrderedDict¶
Convenience export of OrderedDict
.
When available, the cyordereddict
, a Cython-reimplementation of OrderedDict
is used for Python before 3.5 (from 3.5, Python ships with a fast, C implementation of OrderedDict
).
-
class
vcfpy.
OrderedDict
¶ Dictionary that remembers insertion order
-
clear
() → None. Remove all items from od.¶
-
copy
() → a shallow copy of od¶
-
fromkeys
(S[, v]) → New ordered dictionary with keys from S.¶ If not specified, the value defaults to None.
-
items
() → a set-like object providing a view on D's items¶
-
keys
() → a set-like object providing a view on D's keys¶
-
move_to_end
()¶ Move an existing element to the end (or beginning if last==False). Raises KeyError if the element does not exist. When last=True, acts like a fast version of self[key]=self.pop(key).
-
pop
(k[, d]) → v, remove specified key and return the corresponding¶ value. If key is not found, d is returned if given, otherwise KeyError is raised.
-
popitem
() → (k, v), return and remove a (key, value) pair.¶ Pairs are returned in LIFO order if last is true or FIFO order if false.
-
setdefault
(k[, d]) → od.get(k,d), also set od[k]=d if k not in od¶
-
update
([E, ]**F) → None. Update D from mapping/iterable E and F.¶ If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v
-
values
() → an object providing a view on D's values¶
-
vcfpy.Header¶
-
class
vcfpy.
Header
(lines=None, samples=None)[source]¶ Represent header of VCF file
While this class allows mutating records, it should not be changed once it has been assigned to a writer. Use :py:method:`~Header.copy` to create a copy that can be modified without problems.
This class provides function for adding lines to a header and updating the supporting index data structures. There is no explicit API for removing header lines, the best way is to reconstruct a new
Header
instance with a filtered list of header lines.-
add_contig_line
(mapping)[source]¶ Add “contig” header line constructed from the given mapping
Parameters: mapping – OrderedDict
with mapping to add. It is recommended to useOrderedDict
overdict
as this makes the result reproducibleReturns: False
on conflicting line andTrue
otherwise
-
add_filter_line
(mapping)[source]¶ Add FILTER header line constructed from the given mapping
Parameters: mapping – OrderedDict
with mapping to add. It is recommended to useOrderedDict
overdict
as this makes the result reproducibleReturns: False
on conflicting line andTrue
otherwise
-
add_format_line
(mapping)[source]¶ Add FORMAT header line constructed from the given mapping
Parameters: mapping – OrderedDict
with mapping to add. It is recommended to useOrderedDict
overdict
as this makes the result reproducibleReturns: False
on conflicting line andTrue
otherwise
-
add_info_line
(mapping)[source]¶ Add INFO header line constructed from the given mapping
Parameters: mapping – OrderedDict
with mapping to add. It is recommended to useOrderedDict
overdict
as this makes the result reproducibleReturns: False
on conflicting line andTrue
otherwise
-
add_line
(header_line)[source]¶ Add header line, updating any necessary support indices
Returns: False
on conflicting line andTrue
otherwise
-
has_header_line
(key, id_)[source]¶ Return whether there is a header line with the given ID of the type given by
key
Parameters: - key – The VCF header key/line type.
- id – The ID value to compare fore
Returns: True
if there is a header line starting with##${key}=
in the VCF file having the mapping entryID
set toid_
.
-
lines
= None¶ list
of :py:HeaderLine objects
-
samples
= None¶ SamplesInfo
object
-
vcfpy.header_without_lines¶
-
vcfpy.
header_without_lines
(header, remove)[source]¶ Return
Header
without lines given inremove
remove
is an iterable of pairskey
/ID
with the VCF header key andID
of entry to remove. In the case that a line does not have amapping
entry, you can give the full value to remove.# header is a vcfpy.Header, e.g., as read earlier from file new_header = vcfpy.without_header_lines( header, [('assembly', None), ('FILTER', 'PASS')]) # now, the header lines starting with "##assembly=" and the "PASS" # filter line will be missing from new_header
vcfpy.SimpleHeaderLine¶
-
class
vcfpy.
SimpleHeaderLine
(key, value, mapping)[source]¶ Base class for simple header lines, currently contig and filter header lines
Don’t use this class directly but rather the sub classes.
Raises: vcfpy.exceptions.InvalidHeaderException
in the case of missing key"ID"
-
mapping
= None¶ collections.OrderedDict
with key/value mapping of the attributes
-
vcfpy.AltAlleleHeaderLine¶
-
class
vcfpy.
AltAlleleHeaderLine
(key, value, mapping)[source]¶ Alternative allele header line
Mostly used for defining symbolic alleles for structural variants and IUPAC ambiguity codes
-
classmethod
from_mapping
(klass, mapping)[source]¶ Construct from mapping, not requiring the string value
-
id
= None¶ name of the alternative allele
-
classmethod
vcfpy.ContigHeaderLine¶
-
class
vcfpy.
ContigHeaderLine
(key, value, mapping)[source]¶ Contig header line
Most importantly, parses the
'length'
key into an integer-
classmethod
from_mapping
(klass, mapping)[source]¶ Construct from mapping, not requiring the string value
-
id
= None¶ name of the contig
-
length
= None¶ length of the contig,
None
if missing
-
classmethod
vcfpy.CompoundHeaderLine¶
-
class
vcfpy.
CompoundHeaderLine
(key, value, mapping)[source]¶ Base class for compound header lines, currently format and header lines
Compound header lines describe fields that can have more than one entry.
Don’t use this class directly but rather the sub classes.
-
mapping
= None¶ OrderedDict with key/value mapping
-
vcfpy.InfoHeaderLine¶
-
class
vcfpy.
InfoHeaderLine
(key, value, mapping)[source]¶ Header line for INFO fields
Note that the
Number
field will be parsed into anint
if possible. Otherwise, the constantsHEADER_NUMBER_*
will be used.-
description
= None¶ description, should be given,
None
if not given
-
classmethod
from_mapping
(klass, mapping)[source]¶ Construct from mapping, not requiring the string value
-
id
= None¶ key in the INFO field
-
source
= None¶ source of INFO field,
None
if not given
-
type
= None¶ value type
-
version
= None¶ version of INFO field,
None
if not given
-
vcfpy.FormatHeaderLine¶
-
class
vcfpy.
FormatHeaderLine
(key, value, mapping)[source]¶ Header line for FORMAT fields
-
description
= None¶ description, should be given,
None
if not given
-
classmethod
from_mapping
(klass, mapping)[source]¶ Construct from mapping, not requiring the string value
-
id
= None¶ key in the INFO field
-
source
= None¶ source of INFO field,
None
if not given
-
type
= None¶ value type
-
version
= None¶ version of INFO field,
None
if not given
-
vcfpy.FieldInfo¶
-
class
vcfpy.
FieldInfo
(type_, number, description=None, id_=None)[source]¶ Core information for describing field type and number
-
description
= None¶ Description for the header field, optional
-
id
= None¶ The id of the field, optional.
-
number
= None¶ Number description, either an int or constant
-
type
= None¶ The type, one of INFO_TYPES or FORMAT_TYPES
-
vcfpy.SamplesInfos¶
-
class
vcfpy.
SamplesInfos
(sample_names, parsed_samples=None)[source]¶ Helper class for handling the samples in VCF files
The purpose of this class is to decouple the sample name list somewhat from
Header
. This encapsulates subsetting samples for which the genotype should be parsed and reordering samples into output files.Note that when subsetting is used and the records are to be written out again then the
FORMAT
field must not be touched.-
name_to_idx
= None¶ mapping from sample name to index
-
names
= None¶ list of sample that are read from/written to the VCF file at hand in the given order
-
parsed_samples
= None¶ set
with the samples for which the genotype call fields should be read; can be used for partial parsing (speedup) and defaults to the full list of samples, None if all are parsed
-