BGZF
The BGZF module provides support for reading and writing BGZF (Blocked GNU Zip Format) files.
vcfpy.bgzf.BgzfReader
- class vcfpy.bgzf.BgzfReader(filename: str | None = None, mode: str = 'r', fileobj: IO[bytes] | None = None, max_cache: int = 100)[source]
BGZF reader, acts like a read only handle but seek/tell differ.
- property closed: bool
Return True if the file is closed.
- property mode: str
Return the file mode.
- property name: str
Return the file name.
vcfpy.bgzf.BgzfWriter
- class vcfpy.bgzf.BgzfWriter(filename: str | None = None, mode: str = 'w', fileobj: IO[bytes] | None = None, compresslevel: int = 6)[source]
- close() None[source]
Flush data, write 28 bytes BGZF EOF marker, and close BGZF file. samtools will look for a magic EOF marker, just a 28 byte empty BGZF block, and if it is missing warns the BAM file may be truncated. In addition to samtools writing this block, so too does bgzip - so this implementation does too.
- property closed: bool
Return True if the file is closed.
- property mode: str
Return the file mode.
- property name: str
Return the file name.
- readlines(hint: int = -1) list[str][source]
Readlines operation not supported for write-only BGZF file.
BGZF Utilities
- vcfpy.bgzf.make_virtual_offset(block_start_offset: int, within_block_offset: int) int[source]
Compute a BGZF virtual offset from block start and within block offsets. The BAM indexing scheme records read positions using a 64 bit ‘virtual offset’, comprising in C terms: block_start_offset << 16 | within_block_offset Here block_start_offset is the file offset of the BGZF block start (unsigned integer using up to 64-16 = 48 bits), and within_block_offset within the (decompressed) block (unsigned 16 bit integer).
>>> make_virtual_offset(0, 0) 0 >>> make_virtual_offset(0, 1) 1 >>> make_virtual_offset(0, 2**16 - 1) 65535 >>> make_virtual_offset(0, 2**16) Traceback (most recent call last): ... ValueError: Require 0 <= within_block_offset < 2**16, got 65536 >>> 65536 == make_virtual_offset(1, 0) True >>> 65537 == make_virtual_offset(1, 1) True >>> 131071 == make_virtual_offset(1, 2**16 - 1) True >>> 6553600000 == make_virtual_offset(100000, 0) True >>> 6553600001 == make_virtual_offset(100000, 1) True >>> 6553600010 == make_virtual_offset(100000, 10) True >>> make_virtual_offset(2**48, 0) Traceback (most recent call last): ... ValueError: Require 0 <= block_start_offset < 2**48, got 281474976710656
- vcfpy.bgzf.split_virtual_offset(virtual_offset: int) tuple[int, int][source]
Split a 64-bit BGZF virtual offset into block start and within block offsets.
Returns a tuple of (block_start_offset, within_block_offset).
>>> split_virtual_offset(0) (0, 0) >>> split_virtual_offset(1) (0, 1) >>> split_virtual_offset(65535) (0, 65535) >>> split_virtual_offset(65536) (1, 0) >>> split_virtual_offset(65537) (1, 1) >>> split_virtual_offset(1195311108) (18239, 4)