BGZF

The BGZF module provides support for reading and writing BGZF (Blocked GNU Zip Format) files.

vcfpy.bgzf.BgzfReader

class vcfpy.bgzf.BgzfReader(filename: str | None = None, mode: str = 'r', fileobj: IO[bytes] | None = None, max_cache: int = 100)[source]

BGZF reader, acts like a read only handle but seek/tell differ.

close() None[source]

Close BGZF file.

property closed: bool

Return True if the file is closed.

fileno() int[source]

Return integer file descriptor.

flush() None[source]

Flush - no-op for read-only file.

isatty() bool[source]

Return True if connected to a TTY device.

property mode: str

Return the file mode.

property name: str

Return the file name.

read(size: int = -1) str[source]

Read method for the BGZF module.

readable() bool[source]

Return True indicating the BGZF file is readable.

readline(size: int = -1) str[source]

Read a single line for the BGZF file.

readlines(hint: int = -1) list[str][source]

Read all lines from the file.

seek(virtual_offset: int, whence: int = 0) int[source]

Seek to a 64-bit unsigned BGZF virtual offset.

seekable() bool[source]

Return True indicating the BGZF supports random access.

tell()[source]

Return a 64-bit unsigned BGZF virtual offset.

truncate(size: int | None = None) int[source]

Truncate - not supported for read-only file.

writable() bool[source]

Return False indicating the BGZF file is not writable.

write(s: str) int[source]

Write - not supported for read-only file.

writelines(lines: Iterable[str]) None[source]

Write lines - not supported for read-only file.

vcfpy.bgzf.BgzfWriter

class vcfpy.bgzf.BgzfWriter(filename: str | None = None, mode: str = 'w', fileobj: IO[bytes] | None = None, compresslevel: int = 6)[source]
close() None[source]

Flush data, write 28 bytes BGZF EOF marker, and close BGZF file. samtools will look for a magic EOF marker, just a 28 byte empty BGZF block, and if it is missing warns the BAM file may be truncated. In addition to samtools writing this block, so too does bgzip - so this implementation does too.

property closed: bool

Return True if the file is closed.

isatty() bool[source]

Return False as BGZF files are not TTY.

property mode: str

Return the file mode.

property name: str

Return the file name.

read(size: int = -1) str[source]

Read operation not supported for write-only BGZF file.

readable() bool[source]

Return False as this is a write-only file.

readline(size: int = -1) str[source]

Readline operation not supported for write-only BGZF file.

readlines(hint: int = -1) list[str][source]

Readlines operation not supported for write-only BGZF file.

seek(offset: int, whence: int = 0) int[source]

Seek operation not supported for BGZF files.

tell() int[source]

Returns a BGZF 64-bit virtual offset.

truncate(size: int | None = None) int[source]

Truncate operation not supported for BGZF files.

writable() bool[source]

Return True as this is a writable file.

write(data: str) int[source]

Write string data to the BGZF file.

Args:

data: String data to write

Returns:

Number of characters written

writelines(lines: Iterable[str]) None[source]

Write a list of strings to the file.

BGZF Utilities

vcfpy.bgzf.make_virtual_offset(block_start_offset: int, within_block_offset: int) int[source]

Compute a BGZF virtual offset from block start and within block offsets. The BAM indexing scheme records read positions using a 64 bit ‘virtual offset’, comprising in C terms: block_start_offset << 16 | within_block_offset Here block_start_offset is the file offset of the BGZF block start (unsigned integer using up to 64-16 = 48 bits), and within_block_offset within the (decompressed) block (unsigned 16 bit integer).

>>> make_virtual_offset(0, 0)
0
>>> make_virtual_offset(0, 1)
1
>>> make_virtual_offset(0, 2**16 - 1)
65535
>>> make_virtual_offset(0, 2**16)
Traceback (most recent call last):
...
ValueError: Require 0 <= within_block_offset < 2**16, got 65536
>>> 65536 == make_virtual_offset(1, 0)
True
>>> 65537 == make_virtual_offset(1, 1)
True
>>> 131071 == make_virtual_offset(1, 2**16 - 1)
True
>>> 6553600000 == make_virtual_offset(100000, 0)
True
>>> 6553600001 == make_virtual_offset(100000, 1)
True
>>> 6553600010 == make_virtual_offset(100000, 10)
True
>>> make_virtual_offset(2**48, 0)
Traceback (most recent call last):
...
ValueError: Require 0 <= block_start_offset < 2**48, got 281474976710656
vcfpy.bgzf.split_virtual_offset(virtual_offset: int) tuple[int, int][source]

Split a 64-bit BGZF virtual offset into block start and within block offsets.

Returns a tuple of (block_start_offset, within_block_offset).

>>> split_virtual_offset(0)
(0, 0)
>>> split_virtual_offset(1)
(0, 1)
>>> split_virtual_offset(65535)
(0, 65535)
>>> split_virtual_offset(65536)
(1, 0)
>>> split_virtual_offset(65537)
(1, 1)
>>> split_virtual_offset(1195311108)
(18239, 4)