Records
vcfpy.Record
- class vcfpy.Record(CHROM: str, POS: int, ID: list[str], REF: str, ALT: list[AltRecord], QUAL: float | None, FILTER: list[str], INFO: dict[str, Any], FORMAT: list[str] | None = None, calls: Sequence[Call | UnparsedCall] | None = None)[source]
Represent one record from the VCF file
Record objects are iterators of their calls
- CHROM
A
strwith the chromosome name
- FILTER
A list of strings for the FILTER column
- FORMAT
A list of strings for the FORMAT column. Optional, must be given if and only if
callsis also given.
- ID
A list of the semicolon-separated values of the ID column
- INFO
An OrderedDict giving the values of the INFO column, flags are mapped to
True
- POS
An
intwith a 1-based begin position
- QUAL
The quality value, can be
None
- REF
A
strwith the REF value
- add_format(key: str, value: Any | None = None)[source]
Add an entry to format
The record’s calls
data[key]will be set tovalueif not yet set and value is notNone. If key is already in FORMAT then nothing is done.
- property affected_end
Return affected start position in 0-based coordinates
For SNVs, MNVs, and deletions, the behaviour is based on the start position and the length of the REF. In the case of insertions, the position behind the insert position is returned, yielding a 0-length interval together with
affected_start()
- property affected_start
Return affected start position in 0-based coordinates
For SNVs, MNVs, and deletions, the behaviour is the start position. In the case of insertions, the position behind the insert position is returned, yielding a 0-length interval together with
affected_end()
- begin
An
intwith a 0-based begin position
- call_for_sample
A mapping from sample name to entry in self.calls.
- calls
A list of genotype
Callobjects. Optional, must be given if and only ifFORMATis also given.
- end
An
intwith a 0-based end position
- update_calls(calls: Sequence[Call | UnparsedCall])[source]
Update
self.callsand other fields as necessary.
vcfpy.Call
- class vcfpy.Call(sample: str, data: dict[str, Any], site: Record | None = None)[source]
The information for a genotype callable
By VCF, this should always include the genotype information and can contain an arbitrary number of further annotation, e.g., the coverage at the variant position.
- called
whether or not the variant is fully called
- data: dict[str, Any]
an OrderedDict with the key/value pair information from the call’s data
- gt_alleles: list[int | None] | None
the allele numbers (0, 1, …) in this calls or None for no-call
- property gt_bases: tuple[str | None, ...]
Return the actual genotype bases, e.g. if VCF genotype is 0/1, could return (‘A’, ‘T’)
- property gt_phase_char
Return character to use for phasing
- property gt_type: Literal[0, 1, 2] | None
The type of genotype, returns one of
HOM_REF,HOM_ALT, andHET.
- is_filtered(require: list[str] | None = None, ignore: list[str] | None = None)[source]
Return
Truefor filtered calls- Parameters:
ignore (iterable) – if set, the filters to ignore, make sure to include ‘PASS’, when setting, default is
['PASS']require (iterable) – if set, the filters to require for returning
True
- property is_het: bool
Return
Truefor heterozygous calls
- property is_phased
Return boolean indicating whether this call is phased
- property is_variant: bool
Return
Truefor non-hom-ref calls
- ploidy
the number of alleles in this sample’s call
- sample
the name of the sample for which the call was made
vcfpy.AltRecord
- class vcfpy.AltRecord(type_: Literal['SNV', 'MNV', 'DEL', 'INS', 'INDEL', 'SV', 'BND', 'SYMBOLIC', 'MIXED'] | None = None)[source]
An alternative allele Record
Currently, can be a substitution, an SV placeholder, or breakend
- type
String describing the type of the variant, could be one of SNV, MNV, could be any of teh types described in the ALT header lines, such as DUP, DEL, INS, …
vcfpy.Substitution
- class vcfpy.Substitution(type_: Literal['SNV', 'MNV', 'DEL', 'INS', 'INDEL', 'SV', 'BND', 'SYMBOLIC', 'MIXED'], value: str)[source]
A basic alternative allele record describing a REF->AltRecord substitution
Note that this subsumes MNVs, insertions, and deletions.
- value
The alternative base sequence to use in the substitution
vcfpy.SV
- vcfpy.SV
alias of SV
vcfpy.BreakEnd
- class vcfpy.BreakEnd(mate_chrom: str | None, mate_pos: int | None, orientation: str | None, mate_orientation: Literal['+', '-'] | None, sequence: str, within_main_assembly: bool | None)[source]
A placeholder for a breakend
- mate_chrom
chromosome of the mate breakend
- mate_orientation
orientation breakend’s mate
- mate_pos
position of the mate breakend
- orientation
orientation of this breakend
- sequence
breakpoint’s connecting sequence
- within_main_assembly
boolspecifying if the breakend mate is within the assembly (True) or in an ancillary assembly (False)
vcfpy.SingleBreakEnd
vcfpy.SymbolicAllele
- class vcfpy.SymbolicAllele(value: str)[source]
A placeholder for a symbolic allele
The allele symbol must be defined in the header using an
ALTheader before being parsed. Usually, this is used for succinct descriptions of structural variants or IUPAC parameters.- value
The symbolic value, e.g. ‘DUP’