bandhic.apa#
- bandhic.apa(hic_path, resolution, loops_df, window=10, region_width=6, min_peak_dist=0.0, max_peak_dist=8000000, njobs=-1)[source]#
Compute Aggregate Peak Analysis (APA) around a set of loop anchors.
This is a lightweight Python port of the Juicer/juicebox APA workflow. For each loop (x, y), we extract a (2*window+1)×(2*window+1) cutout centered at (x, y), accumulate cutouts across loops, and report common APA normalizations and region-based summary statistics.
Notes
Loops are processed per chromosome. For each chromosome, a contact map is loaded from
hic_pathat the requestedresolutionusingbandhic.straw_chr(withnormalization='KR'in the current implementation).Loop distance filtering is applied in bin units after converting loop coordinates to bins. By default,
max_peak_distis interpreted as a genomic distance in bp and converted to bins viaresolution.
- Parameters:
hic_path (
str) – Path to an input.hicfile.resolution (
int) – Hi-C bin size in base pairs.loops_df (
DataFrame) – Loop list as a DataFrame (e.g., BEDPE). The current implementation expects at least the columns'#chr1','chr2','x1', and'y1'(coordinates in bp). Only intra-chromosomal loops (#chr1 == chr2) are used.window (
int) – Number of bins to include on each side of the loop center; the final cutout size is2*window+1.region_width (
int) – Corner box size (in bins) used for APA region statistics.min_peak_dist (
float) – Minimum loop distance from the diagonal (in bins, after binning).max_peak_dist (
float) – Maximum loop distance from the diagonal (in bp; converted to bins asmax_peak_dist // resolutionfor filtering and matrix loading).njobs (
Optional[int]) – Number of parallel worker processes for loop cutout extraction.-1uses up toos.cpu_count()workers (capped by the number of loops).
- Returns:
Aggregated APA matrices (raw and normalized), enhancement scores, peak counts, and region-based summary statistics.
- Return type:
APAResult