bandhic.compute_bin_bias#

bandhic.compute_bin_bias(hic_coo, verbose=False, bias_lowerbound=0.5, bias_upperbound=2)[source]#

Compute bias values for Hi-C contact matrices using the Knight-Ruiz normalization algorithm.

Parameters:
  • hic_coo (scipy.sparse.coo_array) – A sparse COO matrix representing Hi-C contact data.

  • verbose (bool, optional) – If True, print detailed information during processing. Default is False.

Returns:

  • bias (numpy.ndarray) – A 1D array containing the bias values for each bin in the Hi-C matrix.

  • is_valid (bool) – A boolean indicating whether the bias vector is valid (mean and median within typical range).

Examples

>>> import scipy.sparse as sps
>>> from bandhic import compute_bin_bias
>>> # Create a sample sparse COO matrix
>>> row = np.array([0, 1, 2, 0, 1, 2])
>>> col = np.array([0, 1, 2, 1, 2, 0])
>>> data = np.array([10, 10, 10, 0, 0, 0])
>>> hic_coo = sps.coo_array((data, (row, col)), shape=(3, 3))
>>> bias, is_valid = compute_bin_bias(hic_coo, verbose=False)
>>> print(bias)
[1. 1. 1.]
>>> print(is_valid)
True

Notes

This function removes a specified percentage of the most sparse bins from the Hi-C matrix before computing the bias values. The Knight-Ruiz normalization algorithm is applied to the modified matrix to compute the bias. The function iteratively removes bins with low interaction counts until a valid bias vector is obtained. The bias vector is expected to have a mean and median close to 1, indicating balanced interaction frequencies across bins.