Tabix

The Tabix module provides support for working with Tabix-indexed files.

vcfpy.tabix.TabixFile

class vcfpy.tabix.TabixFile(*, filename: Path | str, index: Path | str | None = None)[source]

Provides easy access for reading tabix files.

close() None[source]

Close the TabixFile.

fetch(*, reference: str | None = None, start: int | None = None, end: int | None = None, region: str | None = None) TabixFileIter[source]

Fetch iterator for given region.

vcfpy.tabix.TabixFileIter

class vcfpy.tabix.TabixFileIter(*, index: TabixIndex, reference: str, start: int | None = None, end: int | None = None, bgzf_file: BgzfReader | None = None)[source]

Allows for easy iteration over a tabix file.

vcfpy.tabix.TabixIndex

class vcfpy.tabix.TabixIndex(format: FileFormat, col_seq: int, col_beg: int, col_end: int, meta: bytes, skip: int, indices: dict[str, SequenceIndex], num_no_coord: int | None = None)[source]

Index as read from Tabix files and relevant after reading the index.

col_beg: int

Column for begin position.

col_end: int

Column for end position.

col_seq: int

Column for sequence name.

format: FileFormat

Format of underlying file.

indices: dict[str, SequenceIndex]

Per-sequence indices.

meta: bytes

Meta character.

num_no_coord: int | None = None

Optional number of unmapped reads.

skip: int

Lines to skip at the beginning.

Tabix Data Structures

class vcfpy.tabix.Chunk(beg: int, end: int)[source]

Chunk.

beg: int

Begin virtual offset.

end: int

End virtual offset.

class vcfpy.tabix.Bin(number: int, chunks: list[Chunk])[source]

Bin with chunks.

chunks: list[Chunk]

Chunks in this bin.

number: int

Bin number.

class vcfpy.tabix.SequenceIndex(bins: list[Bin], offsets: list[int])[source]

Per-sequence index.

bins: list[Bin]

Bins containing chunks.

offsets: list[int]

Linear index intervals.

class vcfpy.tabix.FileFormat(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Enum for file formats supported by tabix.

GENERIC = 0

Generic tabix file.

SAM = 1

SAM file.

VCF = 2

VCF file.

Tabix Utilities

vcfpy.tabix.read_index(path_tbi: Path | str) TabixIndex[source]

Read tabix index from given path.

Parameters:

path_tbi – path to the tabix index file

Returns:

the read index

vcfpy.tabix.reg2bins(beg: int, end: int) list[int][source]

Get list of bins that may overlap a region [beg, end).

Based on the UCSC binning scheme used by tabix.

Parameters:
  • beg – 0-based start position (inclusive)

  • end – 0-based end position (exclusive)

Returns:

list of bin numbers that may overlap the region (in reverse order)