bandhic.apa#

bandhic.apa(hic_path, resolution, loops_df, window=10, region_width=6, min_peak_dist=0.0, max_peak_dist=8000000, njobs=-1)[source]#

Compute Aggregate Peak Analysis (APA) around a set of loop anchors.

This is a lightweight Python port of the Juicer/juicebox APA workflow. For each loop (x, y), we extract a (2*window+1)×(2*window+1) cutout centered at (x, y), accumulate cutouts across loops, and report common APA normalizations and region-based summary statistics.

Notes

  • Loops are processed per chromosome. For each chromosome, a contact map is loaded from hic_path at the requested resolution using bandhic.straw_chr (with normalization='KR' in the current implementation).

  • Loop distance filtering is applied in bin units after converting loop coordinates to bins. By default, max_peak_dist is interpreted as a genomic distance in bp and converted to bins via resolution.

Parameters:
  • hic_path (str) – Path to an input .hic file.

  • resolution (int) – Hi-C bin size in base pairs.

  • loops_df (DataFrame) – Loop list as a DataFrame (e.g., BEDPE). The current implementation expects at least the columns '#chr1', 'chr2', 'x1', and 'y1' (coordinates in bp). Only intra-chromosomal loops (#chr1 == chr2) are used.

  • window (int) – Number of bins to include on each side of the loop center; the final cutout size is 2*window+1.

  • region_width (int) – Corner box size (in bins) used for APA region statistics.

  • min_peak_dist (float) – Minimum loop distance from the diagonal (in bins, after binning).

  • max_peak_dist (float) – Maximum loop distance from the diagonal (in bp; converted to bins as max_peak_dist // resolution for filtering and matrix loading).

  • njobs (Optional[int]) – Number of parallel worker processes for loop cutout extraction. -1 uses up to os.cpu_count() workers (capped by the number of loops).

Returns:

Aggregated APA matrices (raw and normalized), enhancement scores, peak counts, and region-based summary statistics.

Return type:

APAResult