Header
vcfpy.Header
- class vcfpy.Header(lines: list[HeaderLine] | None = None, samples: SamplesInfos | None = None)[source]
Represent header of VCF file
While this class allows mutating records, it should not be changed once it has been assigned to a writer. Use
copy()to create a copy that can be modified without problems.This class provides function for adding lines to a header and updating the supporting index data structures. There is no explicit API for removing header lines, the best way is to reconstruct a new
Headerinstance with a filtered list of header lines.- add_contig_line(mapping: dict[str, Any])[source]
Add “contig” header line constructed from the given mapping
- Parameters:
mapping –
OrderedDictwith mapping to add. It is recommended to useOrderedDictoverdictas this makes the result reproducible- Returns:
Falseon conflicting line andTrueotherwise
- add_filter_line(mapping: dict[str, Any])[source]
Add FILTER header line constructed from the given mapping
- Parameters:
mapping –
OrderedDictwith mapping to add. It is recommended to useOrderedDictoverdictas this makes the result reproducible- Returns:
Falseon conflicting line andTrueotherwise
- add_format_line(mapping: dict[str, Any])[source]
Add FORMAT header line constructed from the given mapping
- Parameters:
mapping –
OrderedDictwith mapping to add. It is recommended to useOrderedDictoverdictas this makes the result reproducible- Returns:
Falseon conflicting line andTrueotherwise
- add_info_line(mapping: dict[str, Any])[source]
Add INFO header line constructed from the given mapping
- Parameters:
mapping –
OrderedDictwith mapping to add. It is recommended to useOrderedDictoverdictas this makes the result reproducible- Returns:
Falseon conflicting line andTrueotherwise
- add_line(header_line: HeaderLine)[source]
Add header line, updating any necessary support indices
- Returns:
Falseon conflicting line andTrueotherwise
- get_lines(key: str) Iterable[HeaderLine][source]
Return header lines having the given
keyas their type
- has_header_line(key: str, id_: str)[source]
Return whether there is a header line with the given ID of the type given by
key- Parameters:
key – The VCF header key/line type.
id – The ID value to compare fore
- Returns:
Trueif there is a header line starting with##${key}=in the VCF file having the mapping entryIDset toid_.
- lines
listof :py:HeaderLine objects
- samples
SamplesInfoobject
vcfpy.HeaderLine
vcfpy.header_without_lines
- vcfpy.header_without_lines(header: Header, remove: Iterable[tuple[str, str]]) Header[source]
Return
Headerwithout lines given inremoveremoveis an iterable of pairskey/IDwith the VCF header key andIDof entry to remove. In the case that a line does not have amappingentry, you can give the full value to remove.# header is a vcfpy.Header, e.g., as read earlier from file new_header = vcfpy.without_header_lines( header, [('assembly', None), ('FILTER', 'PASS')]) # now, the header lines starting with "##assembly=" and the "PASS" # filter line will be missing from new_header
vcfpy.SimpleHeaderLine
- class vcfpy.SimpleHeaderLine(key: str, value: str, mapping: dict[str, Any])[source]
Base class for simple header lines, currently contig and filter header lines
Don’t use this class directly but rather the sub classes.
- Raises:
vcfpy.exceptions.InvalidHeaderExceptionin the case of missing key"ID"
- mapping
collections.OrderedDictwith key/value mapping of the attributes
vcfpy.AltAlleleHeaderLine
- class vcfpy.AltAlleleHeaderLine(key: str, value: str, mapping: dict[str, Any])[source]
Alternative allele header line
Mostly used for defining symbolic alleles for structural variants and IUPAC ambiguity codes
- classmethod from_mapping(mapping: dict[str, Any]) AltAlleleHeaderLine[source]
Construct from mapping, not requiring the string value
- id
name of the alternative allele
vcfpy.MetaHeaderLine
- class vcfpy.MetaHeaderLine(key: str, value: str, mapping: dict[str, Any])[source]
Alternative allele header line
Used for defining set of valid values for samples keys
- classmethod from_mapping(mapping: dict[str, Any]) MetaHeaderLine[source]
Construct from mapping, not requiring the string value
- id
name of the alternative allele
vcfpy.PedigreeHeaderLine
vcfpy.SampleHeaderLine
vcfpy.ContigHeaderLine
- class vcfpy.ContigHeaderLine(key: str, value: str, mapping: dict[str, Any])[source]
Contig header line
Most importantly, parses the
'length'key into an integer- classmethod from_mapping(mapping: dict[str, Any]) ContigHeaderLine[source]
Construct from mapping, not requiring the string value
- id
name of the contig
- length
length of the contig,
Noneif missing
vcfpy.FilterHeaderLine
- class vcfpy.FilterHeaderLine(key: str, value: str, mapping: dict[str, Any])[source]
FILTER header line
- description
description for the filter,
Noneif missing
- classmethod from_mapping(mapping: dict[str, Any]) FilterHeaderLine[source]
Construct from mapping, not requiring the string value
- id
token for the filter
vcfpy.CompoundHeaderLine
- class vcfpy.CompoundHeaderLine(key: str, value: str, mapping: dict[str, Any])[source]
Base class for compound header lines, currently format and header lines
Compound header lines describe fields that can have more than one entry.
Don’t use this class directly but rather the sub classes.
- mapping
OrderedDict with key/value mapping
vcfpy.InfoHeaderLine
- class vcfpy.InfoHeaderLine(key: str, value: str, mapping: dict[str, Any])[source]
Header line for INFO fields
Note that the
Numberfield will be parsed into anintif possible. Otherwise, the constantsHEADER_NUMBER_*will be used.- description
description, should be given,
Noneif not given
- classmethod from_mapping(mapping: dict[str, Any]) InfoHeaderLine[source]
Construct from mapping, not requiring the string value
- id
key in the INFO field
- source
source of INFO field,
Noneif not given
- type
value type
- version
version of INFO field,
Noneif not given
vcfpy.FormatHeaderLine
- class vcfpy.FormatHeaderLine(key: str, value: str, mapping: dict[str, Any])[source]
Header line for FORMAT fields
- description
description, should be given,
Noneif not given
- classmethod from_mapping(mapping: dict[str, Any]) FormatHeaderLine[source]
Construct from mapping, not requiring the string value
- id
key in the INFO field
- source
source of INFO field,
Noneif not given
- type
value type
- version
version of INFO field,
Noneif not given
vcfpy.FieldInfo
- class vcfpy.FieldInfo(type_: Literal['Integer', 'Float', 'Flag', 'Character', 'String'], number: int | str, description: str | None = None, id_: str | None = None)[source]
Core information for describing field type and number
- description: str | None
Description for the header field, optional
- id: str | None
The id of the field, optional.
- number: int | str
Number description, either an int or constant
- type: Literal['Integer', 'Float', 'Flag', 'Character', 'String']
The type, one of INFO_TYPES or FORMAT_TYPES
vcfpy.SamplesInfos
- class vcfpy.SamplesInfos(sample_names: list[str], parsed_samples: list[str] | None = None)[source]
Helper class for handling the samples in VCF files
The purpose of this class is to decouple the sample name list somewhat from
Header. This encapsulates subsetting samples for which the genotype should be parsed and reordering samples into output files.Note that when subsetting is used and the records are to be written out again then the
FORMATfield must not be touched.- name_to_idx
mapping from sample name to index
- names
list of sample that are read from/written to the VCF file at hand in the given order
- parsed_samples
setwith the samples for which the genotype call fields should be read; can be used for partial parsing (speedup) and defaults to the full list of samples, None if all are parsed