bandhic.compute_bin_bias#
- bandhic.compute_bin_bias(hic_coo, verbose=False, bias_lowerbound=0.5, bias_upperbound=2)[source]#
Compute bias values for Hi-C contact matrices using the Knight-Ruiz normalization algorithm.
- Parameters:
hic_coo (scipy.sparse.coo_array) – A sparse COO matrix representing Hi-C contact data.
verbose (bool, optional) – If True, print detailed information during processing. Default is False.
- Returns:
bias (numpy.ndarray) – A 1D array containing the bias values for each bin in the Hi-C matrix.
is_valid (bool) – A boolean indicating whether the bias vector is valid (mean and median within typical range).
Examples
>>> import scipy.sparse as sps >>> from bandhic import compute_bin_bias >>> # Create a sample sparse COO matrix >>> row = np.array([0, 1, 2, 0, 1, 2]) >>> col = np.array([0, 1, 2, 1, 2, 0]) >>> data = np.array([10, 10, 10, 0, 0, 0]) >>> hic_coo = sps.coo_array((data, (row, col)), shape=(3, 3)) >>> bias, is_valid = compute_bin_bias(hic_coo, verbose=False) >>> print(bias) [1. 1. 1.] >>> print(is_valid) True
Notes
This function removes a specified percentage of the most sparse bins from the Hi-C matrix before computing the bias values. The Knight-Ruiz normalization algorithm is applied to the modified matrix to compute the bias. The function iteratively removes bins with low interaction counts until a valid bias vector is obtained. The bias vector is expected to have a mean and median close to 1, indicating balanced interaction frequencies across bins.