vcfpy

API Documentation

  • Header
  • Input/Output
  • Exceptions
  • Records
  • Parser
  • BGZF
  • Tabix

Installation & Getting Started

  • Installation
  • Getting Started
  • Examples
  • Best Practice

API Reference

  • Header
  • Input/Output
  • Exceptions
  • Records

Project Info

  • Contributing
  • Credits
  • History
  • License

Full API Reference

  • vcfpy
vcfpy
  • Header
  • View page source

Header

Contents

  • Header

    • vcfpy.Header

    • vcfpy.HeaderLine

    • vcfpy.header_without_lines

    • vcfpy.SimpleHeaderLine

    • vcfpy.AltAlleleHeaderLine

    • vcfpy.MetaHeaderLine

    • vcfpy.PedigreeHeaderLine

    • vcfpy.SampleHeaderLine

    • vcfpy.ContigHeaderLine

    • vcfpy.FilterHeaderLine

    • vcfpy.CompoundHeaderLine

    • vcfpy.InfoHeaderLine

    • vcfpy.FormatHeaderLine

    • vcfpy.FieldInfo

    • vcfpy.SamplesInfos

vcfpy.Header

class vcfpy.Header(lines: list[HeaderLine] | None = None, samples: SamplesInfos | None = None)[source]

Represent header of VCF file

While this class allows mutating records, it should not be changed once it has been assigned to a writer. Use copy() to create a copy that can be modified without problems.

This class provides function for adding lines to a header and updating the supporting index data structures. There is no explicit API for removing header lines, the best way is to reconstruct a new Header instance with a filtered list of header lines.

add_contig_line(mapping: dict[str, Any])[source]

Add “contig” header line constructed from the given mapping

Parameters:

mapping – OrderedDict with mapping to add. It is recommended to use OrderedDict over dict as this makes the result reproducible

Returns:

False on conflicting line and True otherwise

add_filter_line(mapping: dict[str, Any])[source]

Add FILTER header line constructed from the given mapping

Parameters:

mapping – OrderedDict with mapping to add. It is recommended to use OrderedDict over dict as this makes the result reproducible

Returns:

False on conflicting line and True otherwise

add_format_line(mapping: dict[str, Any])[source]

Add FORMAT header line constructed from the given mapping

Parameters:

mapping – OrderedDict with mapping to add. It is recommended to use OrderedDict over dict as this makes the result reproducible

Returns:

False on conflicting line and True otherwise

add_info_line(mapping: dict[str, Any])[source]

Add INFO header line constructed from the given mapping

Parameters:

mapping – OrderedDict with mapping to add. It is recommended to use OrderedDict over dict as this makes the result reproducible

Returns:

False on conflicting line and True otherwise

add_line(header_line: HeaderLine)[source]

Add header line, updating any necessary support indices

Returns:

False on conflicting line and True otherwise

copy()[source]

Return a copy of this header

filter_ids()[source]

Return list of all filter IDs

format_ids() → list[str][source]

Return list of all format IDs

get_format_field_info(key: str) → FieldInfo[source]

Return FieldInfo for the given FORMAT field

get_info_field_info(key: str) → FieldInfo[source]

Return FieldInfo for the given INFO field

get_lines(key: str) → Iterable[HeaderLine][source]

Return header lines having the given key as their type

has_header_line(key: str, id_: str)[source]

Return whether there is a header line with the given ID of the type given by key

Parameters:
  • key – The VCF header key/line type.

  • id – The ID value to compare fore

Returns:

True if there is a header line starting with ##${key}= in the VCF file having the mapping entry ID set to id_.

info_ids()[source]

Return list of all info IDs

lines

list of :py:HeaderLine objects

samples

SamplesInfo object

vcfpy.HeaderLine

class vcfpy.HeaderLine(key: str, value: str)[source]

Base class for VCF header lines

copy()[source]

Return a copy

key

str with key of header line

serialize()[source]

Return VCF-serialized version of this header line

vcfpy.header_without_lines

vcfpy.header_without_lines(header: Header, remove: Iterable[tuple[str, str]]) → Header[source]

Return Header without lines given in remove

remove is an iterable of pairs key/ID with the VCF header key and ID of entry to remove. In the case that a line does not have a mapping entry, you can give the full value to remove.

# header is a vcfpy.Header, e.g., as read earlier from file
new_header = vcfpy.without_header_lines(
    header, [('assembly', None), ('FILTER', 'PASS')])
# now, the header lines starting with "##assembly=" and the "PASS"
# filter line will be missing from new_header

vcfpy.SimpleHeaderLine

class vcfpy.SimpleHeaderLine(key: str, value: str, mapping: dict[str, Any])[source]

Base class for simple header lines, currently contig and filter header lines

Don’t use this class directly but rather the sub classes.

Raises:

vcfpy.exceptions.InvalidHeaderException in the case of missing key "ID"

copy()[source]

Return a copy

mapping

collections.OrderedDict with key/value mapping of the attributes

serialize()[source]

Return VCF-serialized version of this header line

vcfpy.AltAlleleHeaderLine

class vcfpy.AltAlleleHeaderLine(key: str, value: str, mapping: dict[str, Any])[source]

Alternative allele header line

Mostly used for defining symbolic alleles for structural variants and IUPAC ambiguity codes

classmethod from_mapping(mapping: dict[str, Any]) → AltAlleleHeaderLine[source]

Construct from mapping, not requiring the string value

id

name of the alternative allele

vcfpy.MetaHeaderLine

class vcfpy.MetaHeaderLine(key: str, value: str, mapping: dict[str, Any])[source]

Alternative allele header line

Used for defining set of valid values for samples keys

classmethod from_mapping(mapping: dict[str, Any]) → MetaHeaderLine[source]

Construct from mapping, not requiring the string value

id

name of the alternative allele

vcfpy.PedigreeHeaderLine

class vcfpy.PedigreeHeaderLine(key: str, value: str, mapping: dict[str, Any])[source]

Header line for defining a pedigree entry

classmethod from_mapping(mapping: dict[str, Any]) → PedigreeHeaderLine[source]

Construct from mapping, not requiring the string value

id

name of the alternative allele

vcfpy.SampleHeaderLine

class vcfpy.SampleHeaderLine(key: str, value: str, mapping: dict[str, Any])[source]

Header line for defining a SAMPLE entry

classmethod from_mapping(mapping: dict[str, Any]) → SampleHeaderLine[source]

Construct from mapping, not requiring the string value

id

name of the alternative allele

vcfpy.ContigHeaderLine

class vcfpy.ContigHeaderLine(key: str, value: str, mapping: dict[str, Any])[source]

Contig header line

Most importantly, parses the 'length' key into an integer

classmethod from_mapping(mapping: dict[str, Any]) → ContigHeaderLine[source]

Construct from mapping, not requiring the string value

id

name of the contig

length

length of the contig, None if missing

vcfpy.FilterHeaderLine

class vcfpy.FilterHeaderLine(key: str, value: str, mapping: dict[str, Any])[source]

FILTER header line

description

description for the filter, None if missing

classmethod from_mapping(mapping: dict[str, Any]) → FilterHeaderLine[source]

Construct from mapping, not requiring the string value

id

token for the filter

vcfpy.CompoundHeaderLine

class vcfpy.CompoundHeaderLine(key: str, value: str, mapping: dict[str, Any])[source]

Base class for compound header lines, currently format and header lines

Compound header lines describe fields that can have more than one entry.

Don’t use this class directly but rather the sub classes.

copy()[source]

Return a copy

mapping

OrderedDict with key/value mapping

serialize()[source]

Return VCF-serialized version of this header line

vcfpy.InfoHeaderLine

class vcfpy.InfoHeaderLine(key: str, value: str, mapping: dict[str, Any])[source]

Header line for INFO fields

Note that the Number field will be parsed into an int if possible. Otherwise, the constants HEADER_NUMBER_* will be used.

description

description, should be given, None if not given

classmethod from_mapping(mapping: dict[str, Any]) → InfoHeaderLine[source]

Construct from mapping, not requiring the string value

id

key in the INFO field

source

source of INFO field, None if not given

type

value type

version

version of INFO field, None if not given

vcfpy.FormatHeaderLine

class vcfpy.FormatHeaderLine(key: str, value: str, mapping: dict[str, Any])[source]

Header line for FORMAT fields

description

description, should be given, None if not given

classmethod from_mapping(mapping: dict[str, Any]) → FormatHeaderLine[source]

Construct from mapping, not requiring the string value

id

key in the INFO field

source

source of INFO field, None if not given

type

value type

version

version of INFO field, None if not given

vcfpy.FieldInfo

class vcfpy.FieldInfo(type_: Literal['Integer', 'Float', 'Flag', 'Character', 'String'], number: int | str, description: str | None = None, id_: str | None = None)[source]

Core information for describing field type and number

description: str | None

Description for the header field, optional

id: str | None

The id of the field, optional.

number: int | str

Number description, either an int or constant

type: Literal['Integer', 'Float', 'Flag', 'Character', 'String']

The type, one of INFO_TYPES or FORMAT_TYPES

vcfpy.SamplesInfos

class vcfpy.SamplesInfos(sample_names: list[str], parsed_samples: list[str] | None = None)[source]

Helper class for handling the samples in VCF files

The purpose of this class is to decouple the sample name list somewhat from Header. This encapsulates subsetting samples for which the genotype should be parsed and reordering samples into output files.

Note that when subsetting is used and the records are to be written out again then the FORMAT field must not be touched.

copy()[source]

Return a copy of the object

is_parsed(name: str) → bool[source]

Return whether the sample name is parsed

name_to_idx

mapping from sample name to index

names

list of sample that are read from/written to the VCF file at hand in the given order

parsed_samples

set with the samples for which the genotype call fields should be read; can be used for partial parsing (speedup) and defaults to the full list of samples, None if all are parsed

Previous Next

© Copyright 2016, Manuel Holtgrewe.

Built with Sphinx using a theme provided by Read the Docs.