band_hic_matrix Class#
The band_hic_matrix class is the core data structure in the BandHiC package, designed for efficient storage and manipulation of Hi-C contact matrices using a banded representation. It supports various operations similar to those in NumPy, while optimizing memory usage and performance.
Class Overview and Initialization#
- class bandhic.band_hic_matrix(contacts, diag_num=1, mask_row_col=None, mask=None, dtype=None, default_value=0, band_data_input=False)[source]#
Symmetric banded matrix stored in upper-triangular format. This storage format is motivated by high-resolution Hi-C data characteristics:
Symmetry of contact maps.
Interaction frequency concentrated near the diagonal; long-range contacts are sparse (mostly zero).
Contact frequency decays sharply with genomic distance.
By storing only the main and a fixed number of super-diagonals as columns of a band matrix (diagonal-major storage: diagonal k stored in column k), we drastically reduce memory usage while enabling random access to Hi-C contacts. Additionally, mask and mask_row_col arrays track invalid or masked contacts to support downstream analysis.
Operations on this band_hic_matrix are as simple as on a numpy.ndarray; users can ignore these storage details.
This class stores only the main diagonal and up to (diag_num - 1) super-diagonals, exploiting symmetry by mirroring values for lower-triangular access.
- Parameters:
contacts (coo_array | coo_matrix | tuple | ndarray)
diag_num (int)
mask_row_col (ndarray | None)
mask (Tuple[ndarray, ndarray] | None)
dtype (type | None)
default_value (int | float)
band_data_input (bool)
- shape#
Shape of the original full Hi-C contact matrix (bin_num, bin_num), regardless of internal band storage format.
- Type:
tuple of int
- dtype#
Data type of the matrix elements, compatible with numpy dtypes.
- Type:
data-type
- diag_num#
Number of diagonals stored.
- Type:
int
- bin_num#
Number of bins (rows/columns) of the Hi-C matrix.
- Type:
int
- data#
Array of shape (bin_num, diag_num) storing banded Hi-C data.
- Type:
ndarray
- mask#
Mask for individual invalid entries. Stored as a boolean ndarray of shape (bin_num, diag_num) with the same shape as data.
- Type:
ndarray of bool or None
- mask_row_col#
Mask for entire rows and corresponding columns, indicating invalid bins. Stored as a boolean ndarray of shape (bin_num,). For computational convenience, row/column masks are also applied to the mask array to track masked entries.
- Type:
ndarray of bool or None
- default_value#
Default value for out-of-band entries. Entries out of the banded region and not stored in the data array will be set to this value.
- Type:
scalar
Examples
>>> import bandhic as bh >>> import numpy as np >>> mat = bh.band_hic_matrix(np.eye(4), diag_num=2) >>> mat.shape (4, 4)
- __init__(contacts, diag_num=1, mask_row_col=None, mask=None, dtype=None, default_value=0, band_data_input=False)[source]#
Initialize a band_hic_matrix instance.
- Parameters:
contacts ({coo_array, coo_matrix, tuple, ndarray}) – Input Hi-C data in COO format, tuple (data, (row, col)), full square array, or banded stored ndarray. For non-symmetric full arrays, only the upper-triangular part is used and the matrix is symmetrized. Full square arrays are not recommended for large matrices due to memory constraints.
diag_num (int, optional) – Number of diagonals to store. Must be >=1 and <= matrix dimension. Default is 1.
mask_row_col (ndarray of bool or indices, optional) –
Mask for invalid rows/columns. Can be specified as:
A boolean array of shape (bin_num,) indicating which rows/columns to mask.
A list of indices to mask.
Defaults to None (no masking).
mask (ndarray pair of (row_indices, col_indices), optional) –
Mask for invalid matrix entries. Can be specified as:
A tuple of two ndarray (row_indices, col_indices) listing positions to mask.
Defaults to None (no masking).
dtype (data-type, optional) – Desired numpy dtype; defaults to ‘contacts’ data dtype; compatible with numpy dtypes.
default_value (scalar, optional) – Default value for unstored out-of-band entries. Default is 0.
band_data_input (bool, optional) – If True, contacts is treated as precomputed band storage. Default is False.
- Raises:
ValueError – If contacts type is invalid, diag_num out of range, or array shape invalid.
- Return type:
None
Examples
Initialize from a SciPy COO matrix:
>>> import bandhic as bh >>> import numpy as np >>> from scipy.sparse import coo_matrix >>> coo = coo_matrix(([1, 2, 3], ([0, 1, 2],[0, 1, 2])), shape=(3,3)) >>> mat1 = bh.band_hic_matrix(coo, diag_num=2) >>> mat1.data.shape (3, 2)
Initialize from a tuple (data, (row, col)):
>>> mat2 = bh.band_hic_matrix(([4, 5, 6], ([0, 1, 2],[2, 1, 0])), diag_num=1) >>> mat2.data.shape (3, 1)
Initialize from a full dense array, only upper-triangular part is stored, lower part is symmetrized:
>>> arr = np.arange(16).reshape(4,4) >>> mat3 = bh.band_hic_matrix(arr, diag_num=3) >>> mat3.data.shape (4, 3)
Initialize with row/column mask, this masks entire rows and corresponding columns:
>>> mask = np.array([True, False, False, True]) >>> mat4 = bh.band_hic_matrix(arr, diag_num=2, mask_row_col=mask) >>> mat4.mask_row_col array([ True, False, False, True])
mask_row_col is also supported as a list of indices:
>>> mat4 = bh.band_hic_matrix(arr, diag_num=2, mask_row_col=[0, 3]) >>> mat4.mask_row_col array([ True, False, False, True])
Initialize from precomputed banded storage:
>>> band = mat3.data.copy() >>> mat5 = bh.band_hic_matrix(band, band_data_input=True) >>> mat5.data.shape (4, 3)
Masking Methods#
These methods allow for the application and manipulation of masks within the matrix. Masks can be used to exclude or highlight specific parts of the matrix during computations.
Initialize mask for invalid entries based on matrix shape. |
|
|
Add mask entries for specified indices. |
Get current mask array. |
|
|
Remove mask entries for specified indices or clear all. |
Clear the current mask by entry-level, but retain the row/column mask. |
|
|
Mask entire rows and corresponding columns. |
Get current row/column mask. |
|
Clear the current row/column mask. |
|
Clear all masks (both entry-level and row/column-level). |
|
Count the number of masked entries in the banded matrix. |
|
Count the number of unmasked entries in the banded matrix. |
|
Count the number of masked entries in the in-band region. |
|
Count the number of masked entries in the out-of-band region. |
|
Count the number of valid entries in the in-band region. |
|
Count the number of valid entries in the out-of-band region. |
Data Indexing and Modification#
The following methods provide functionality to access, modify, and index the matrix data. These are essential for manipulating individual elements or subsets of the matrix.
|
Retrieve matrix entries or submatrix using NumPy-like indexing. |
|
Assign values to matrix entries using NumPy-like indexing. |
|
Retrieve values considering mask. |
|
Set values at specified row and column indices. |
Retrieve the k-th diagonal from the matrix. |
|
|
Set values in the k-th diagonal of the matrix. |
|
Extract stored, unmasked band values for a given row or column. |
Iterate over diagonals of the matrix. |
|
|
Iterate over the diagonals of the matrix with a specified window size. |
Iterate over the rows of the band_hic_matrix object. |
|
Iterate over the columns of the band_hic_matrix object. |
|
|
Fill masked entries in data with default value. |
|
Clip data values to given range. |
Data Reduction and Normalization#
These methods enable various data reduction operations, such as summing, averaging, or aggregating matrix values along specific axes or dimensions.
|
Compute the minimum value in the matrix or along a given axis. |
|
Compute the maximum value in the matrix or along a given axis. |
|
Compute the sum of the values in the matrix or along a given axis. |
|
Compute the mean value of the matrix or along a given axis. |
|
Compute the standard deviation of the values in the matrix or along a given axis. |
|
Compute the variance of the values in the matrix or along a given axis. |
|
Compute the product of the values in the matrix or along a given axis. |
|
Compute the peak-to-peak (maximum - minimum) value of the matrix or along a given axis. |
|
Test whether all (or any) array elements along a given axis evaluate to True. |
|
Test whether all (or any) array elements along a given axis evaluate to True. |
|
Normalize each diagonal of the matrix to have zero mean and unit variance. |
Vectorized Computation Methods#
These methods allow for the application of universal functions (ufuncs) to the matrix, enabling element-wise operations similar to those in NumPy.
|
Perform element-wise 'absolute' operation. |
|
Perform element-wise 'add' operation with two inputs. |
|
Perform element-wise 'arccos' operation. |
|
Perform element-wise 'arccosh' operation. |
|
Perform element-wise 'arcsin' operation. |
|
Perform element-wise 'arcsinh' operation. |
|
Perform element-wise 'arctan' operation. |
|
Perform element-wise 'arctan2' operation with two inputs. |
|
Perform element-wise 'arctanh' operation. |
|
Perform element-wise 'bitwise_and' operation with two inputs. |
|
Perform element-wise 'bitwise_or' operation with two inputs. |
|
Perform element-wise 'bitwise_xor' operation with two inputs. |
|
Perform element-wise 'cbrt' operation. |
|
Perform element-wise 'conj' operation. |
|
Perform element-wise 'conjugate' operation. |
|
Perform element-wise 'cos' operation. |
|
Perform element-wise 'cosh' operation. |
|
Perform element-wise 'deg2rad' operation. |
|
Perform element-wise 'degrees' operation. |
|
Perform element-wise 'divide' operation with two inputs. |
|
Perform element-wise 'divmod' operation with two inputs. |
|
Perform element-wise 'equal' operation with two inputs. |
|
Perform element-wise 'exp' operation. |
|
Perform element-wise 'exp2' operation. |
|
Perform element-wise 'expm1' operation. |
|
Perform element-wise 'fabs' operation. |
|
Perform element-wise 'float_power' operation with two inputs. |
|
Perform element-wise 'floor_divide' operation with two inputs. |
|
Perform element-wise 'fmod' operation with two inputs. |
|
Perform element-wise 'gcd' operation with two inputs. |
|
Perform element-wise 'greater' operation with two inputs. |
|
Perform element-wise 'greater_equal' operation with two inputs. |
|
Perform element-wise 'heaviside' operation with two inputs. |
|
Perform element-wise 'hypot' operation with two inputs. |
|
Perform element-wise 'invert' operation. |
|
Perform element-wise 'lcm' operation with two inputs. |
|
Perform element-wise 'left_shift' operation with two inputs. |
|
Perform element-wise 'less' operation with two inputs. |
|
Perform element-wise 'less_equal' operation with two inputs. |
|
Perform element-wise 'log' operation. |
|
Perform element-wise 'log1p' operation. |
|
Perform element-wise 'log2' operation. |
|
Perform element-wise 'log10' operation. |
|
Perform element-wise 'logaddexp' operation with two inputs. |
|
Perform element-wise 'logaddexp2' operation with two inputs. |
|
Perform element-wise 'logical_and' operation with two inputs. |
|
Perform element-wise 'logical_or' operation with two inputs. |
|
Perform element-wise 'logical_xor' operation with two inputs. |
|
Perform element-wise 'maximum' operation with two inputs. |
|
Perform element-wise 'minimum' operation with two inputs. |
|
Perform element-wise 'mod' operation with two inputs. |
|
Perform element-wise 'multiply' operation with two inputs. |
|
Perform element-wise 'negative' operation. |
|
Perform element-wise 'not_equal' operation with two inputs. |
|
Perform element-wise 'positive' operation. |
|
Perform element-wise 'power' operation with two inputs. |
|
Perform element-wise 'rad2deg' operation. |
|
Perform element-wise 'radians' operation. |
|
Perform element-wise 'reciprocal' operation. |
|
Perform element-wise 'remainder' operation with two inputs. |
|
Perform element-wise 'right_shift' operation with two inputs. |
|
Perform element-wise 'rint' operation. |
|
Perform element-wise 'sign' operation. |
|
Perform element-wise 'sin' operation. |
|
Perform element-wise 'sinh' operation. |
|
Perform element-wise 'sqrt' operation. |
|
Perform element-wise 'square' operation. |
|
Perform element-wise 'subtract' operation with two inputs. |
|
Perform element-wise 'tan' operation. |
|
Perform element-wise 'tanh' operation. |
|
Perform element-wise 'true_divide' operation with two inputs. |
Data Type Conversion and Representation#
This section includes methods for converting the matrix to different data types or representations, such as dense, sparse, or masked arrays.
Convert the band matrix to a dense format. |
|
|
Convert the matrix to COO format. |
Convert the matrix to CSR format. |
|
Deep copy the object. |
|
|
Cast data to new dtype. |
Return a string representation of the band_hic_matrix object. |
|
Return a string representation of the band_hic_matrix object. |
|
|
Return the data as a NumPy array. |
Return the priority for array operations. |
Other Methods#
This section includes additional methods that do not fall into the categories above but provide other useful operations for band_hic_matrix.
Compute memory usage of band_hic_matirx object. |
|
Return the number of rows in the band_hic_matrix object. |
|
Return a hash value for the band_hic_matrix object. |
|
Truth value of the band_hic_matrix, following NumPy semantics. |
|
|
Save the band_hic_matrix object to a file. |