Skip to content

Assessing differences in interactions between peaks

This is a workflow on how to use AQuA-tools in your local environment, and is designed to help you:

  • Prepare input files
  • Define and organize the necessary variables
  • Run a full workflow step by step
  • Interpret the biological outcome

We will use the following AQuA-tools:

Motivation

With the assumption that relevant 2D interactions observed in a contact matrix (HiChIP) stem from 1D peaks (ChIP-seq), the value at the matrix bin (pixel) emanating from any two given peaks represents its contact frequency in nuclear space. Therefore, by ‘casting a net’ on the contact matrix using the combinatorics of all possible peaks within a given TAD, we get the entire interaction space between peaks of interest for a given sample. This interaction space can then be compared across conditions to ask: “is there a global observable difference in interaction counts between the interaction spaces of peaks between treatment and control?”

In this recipe, we’ll use two experimental conditions:

Case: H3K27ac HiChIP performed on the cell-line RH4 (DMSO)
Control: H3K27ac HiChIP performed on the cell-line RH4 treated with entinostat, an HDAC inhibitor.

For both experiments, we have paired linear 1D peaks from H3K27ac ChIP-seq experiments.

Chapter 0: Ingredients

I. Prerequisites

You must complete the AQuA-tools local installation before starting, which can be found here.

Installing locally will create the following structure in your home directory:

$HOME/
├── aqua_tools/ # AQuA tools
├── lab-data/ # Sample data
├── outputs/ # To store output
├── genome-wide_loops/ # Sample loops files
├── bed_files/ # Sample peaks files
├── juicer_tools_1.19.02.jar # Needed for plot_APA

NOTE: to know your $HOME path, simply type: echo $HOME

II. Format sample .hic data and .mergeStats.txt files

Running AQuA-tools locally requires a specific folder and filename structure, described here. We’ll format the sample data to match those requirements for illustration purposes here. This formatting step is only relevant for using the local installation of AQuA-tools; to use these samples on Docker or Tinker, no renaming is necessary.

Terminal window
# sample1 = RH4 DMSO H3K27ac HiChIP
sample1=$HOME/lab-data/hg38/RH4_DMSO_H3K27ac/RH4_DMSO_H3K27ac_version_1
# Slightly reformatting sample 1 folder structure adapted for Tinker/Docker use to match local/standalone use
mv "$sample1/RH4_DMSO_H3K27ac_version_1.allValidPairs.hic" \
"$sample1/RH4_DMSO_H3K27ac_version_1.hic"
mv "$sample1/mergeStats.txt" \
"$sample1/RH4_DMSO_H3K27ac_version_1.mergeStats.txt"
# sample2 = RH4 HDACi H3K27ac HiChIP
sample2=$HOME/lab-data/hg38/RH4_HDACi_H3K27ac/RH4_HDACi_H3K27ac_version_1
# Slightly reformatting sample 2 folder structure adapted for Tinker/Docker use to match local/standalone use
mv "$sample2/RH4_HDACi_H3K27ac_version_1.allValidPairs.hic" \
"$sample2/RH4_HDACi_H3K27ac_version_1.hic"
mv "$sample2/mergeStats.txt" \
"$sample2/RH4_HDACi_H3K27ac_version_1.mergeStats.txt"

III. Gathering necessary files

For this workflow, we will need H3K27ac peaks files (.bed), a reference TAD file (.bed), a reference CTCF file (.bed from ENCODE-3 cCRE) and a TSS annotation file (.bed from GENCODE). We will assign these files to variable names for simpler use.

NOTE: we are using reference TAD and CTCF annotations for this recipe, as HiC-informed TAD calls and ChIP-informed CTCF peaks are lacking.

Initialise files in variables

Terminal window
DMSO_peaks=$HOME/bed_files/RH4_DMSO_H3K27ac-peaks_GSM3229491_hg38.bed
HDACi_peaks=$HOME/bed_files/RH4_HDACi_H3K27ac-peaks_GSM3229493_hg38.bed
TSS=$HOME/lab-data/hg38/reference/GENCODE_TSSs_hg38.bed
CTCF=$HOME/lab-data/hg38/reference/ENCODE3_cCRE-CTCF_hg38.bed
TAD=$HOME/lab-data/hg38/reference/TAD_goldsorted_span_centromeres-removed_hg38.bed

Chapter 1: Building interaction spaces

I. Building the .bedpe

The AQuA tool build_bedpe can create the scaffold for assessing the combinatorial space of all possible 2D interactions using .bed files. The resulting output is a .bedpe file, which can be then used to query values from a contact matrix. For more information about how build_bedpe works, see our Docs.

In this example, we’ll create two self-interaction .bedpe files, one for each condition (DMSO and HDACi), using their respective peak files. We set a minimum distance of 25kb between interaction anchors to avoid short-range contacts near the diagonal, and we restrict interactions to occur within TADs. This setup gives us the best chance of capturing interactions anchored in open chromatin with minimal noise from short-range contacts.

Terminal window
cd $HOME
build_bedpe \
--bed_A $DMSO_peaks \
--bed_B $DMSO_peaks \
--min_dist 25000 \
--TAD $TAD > $HOME/outputs/RH4_DMSO_peaks.bedpe
build_bedpe \
--bed_A $HDACi_peaks \
--bed_B $HDACi_peaks \
--min_dist 25000 \
--TAD $TAD > $HOME/outputs/RH4_HDACi_peaks.bedpe

Chapter 2: Constraining interaction spaces

I. Intersecting .bedpe with a TSS file

We can then reduce this combinatorial space of interacting peaks to interactions that have at least one TSS on either end of a .bedpe row using intersect_bedpe. This narrows our combinatorial scaffold of interactions to contain genes found in open chromatin and possibly poised for transcription. For more information about how intersect_bedpe works, see our Docs.

Terminal window
intersect_bedpe \
--bed_A $TSS \
--bedpe $HOME/outputs/RH4_DMSO_peaks.bedpe \
> $HOME/outputs/RH4_DMSO_peaks-tss.bedpe
intersect_bedpe \
--bed_A $TSS \
--bedpe $HOME/outputs/RH4_HDACi_peaks.bedpe \
> $HOME/outputs/RH4_HDACi_peaks-tss.bedpe

II. Intersecting .bedpe with a CTCF file

We’ll use intersect_bedpe again, this time with a .bed file containing reference CTCF sites obtained from ENCODE-3. By intersecting with CTCF sites, we narrow our focus to interactions mediated or stabilized by CTCF among regions already anchored at transcription start sites. This gives us a focused set of promoter-associated loops to continue our analysis with.

Terminal window
intersect_bedpe \
--bed_A $CTCF \
--bedpe $HOME/outputs/RH4_HDACi_peaks-tss.bedpe \
> $HOME/outputs/RH4_DMSO_peaks-tss-ctcf.bedpe
intersect_bedpe \
--bed_A $CTCF \
--bedpe $HOME/outputs/RH4_HDACi_peaks-tss.bedpe \
> $HOME/outputs/RH4_HDACi_peaks-tss-ctcf.bedpe

Chapter 3: Refining interaction spaces

I. Querying the .bedpe

The constrained set of interaction space obtained using steps above can then be paired with a contact matrix to query the respective interaction values for each row.

We will use query_bedpe with --formula max and --fix false to refine our input .bedpe coordinates. As the interaction space for any two given peaks can span multiple pixels in the contact matrix, choosing the max pixel hones into a single number with the most signal and keeps all .bedpe coordinates fixed to the resolution of choice (in this case the default of 5kb). For more information about how query_bedpe works, see our Docs.

For this step, we’ll be using our sample data folders which contain our .hic and .mergeStats.txt files.

NOTE: It’s important to use the --sample1_dir parameter when working with your own samples, rather than the --sample1 parameter which is for pre-loaded samples on Docker or Tinker.

Terminal window
query_bedpe \
--bedpe $HOME/outputs/RH4_DMSO_peaks-tss-ctcf.bedpe \
--sample1_dir $sample1 \
--formula max \
--fix FALSE > $HOME/outputs/RH4_DMSO_query.bedpe
query_bedpe \
--bedpe $HOME/outputs/RH4_HDACi_peaks-tss-ctcf.bedpe \
--sample1_dir $sample2 \
--formula max \
--fix FALSE > $HOME/outputs/RH4_HDACi_query.bedpe

Chapter 4: Merging interaction spaces

I. Merging the .bedpe

So far, we have been working with each condition (DMSO and HDACi) separately, creating and refining the interaction spaces of each dataset.

Now, we will use union_bedpe to merge the spaces into a single .bedpe file. This step takes the union of overlapping interaction regions from the two conditions, creating a unified set of candidate loops for comparison. For more information about how union_bedpe works, see our Docs.

Terminal window
union_bedpe \
--bedpe $HOME/outputs/RH4_DMSO_query.bedpe \
--bedpe $HOME/outputs/RH4_HDACi_query.bedpe
> $HOME/outputs/RH4_union.bedpe

Chapter 5: Visualizing

I. Two-sample plot_APA with AQuA normalization

Finally, we can visualize the aggregate contact signal of the merged interaction space across the two conditions using plot_APA.

We’ll make a two-sample delta APA plot using both the RH4_DMSO and RH4_HDACi samples and our unified .bedpe. We’ll run plot_APA twice: once using CPM normalization and once using AQuA normalization. Because this dataset includes spike-in mouse reads, AQuA normalization is applied by default. For more information about how plot_APA works, see our Docs.

Terminal window
plot_APA \
--bedpe $HOME/outputs/RH4_union.bedpe \
--sample1_dir $sample1 \
--sample2_dir $sample2 \
--out-dir $HOME/outputs/RH4_aqua

II. Two-sample plot_APA with CPM normalization

Terminal window
plot_APA \
--bedpe $HOME/outputs/RH4_union.bedpe \
--sample1_dir $sample1 \
--sample2_dir $sample2 \
--norm cpm \
--out-dir $HOME/outputs/RH4_cpm

To view the resulting APA plots, navigate to the output directories created for each plot and view the .pdf.

CPM output: APA_cpm

AQuA output: APA_AQUA

Conclusion

In this recipe, we:

  • Built the entire possible interaction space between linear H3K27ac peaks for each condition
  • Constrained the interaction space to contain TSSs
  • Further constrained the interaction space to contain CTCF sites
  • Refined the coordinates of the interaction space to correspond to the maximum contact values
  • Merged the interaction spaces across conditions
  • Visualized the aggregate of the merged space in CPM and AQuA normalizations

Using CPM normalization, we observe lost interactions between peaks (centred ‘+’ blue) amongst a crackle of surrounding gains. However, when using AQuA, we observe a global overall gain in interactions. This pattern is consistent with findings from Gryder et al. 2020, who showed that AQuA normalization using an orthogonal spike-in can uncover global changes in chromatin interactions that CPM may miss. Specifically, AQuA reveals broad increases in H3K27ac-mediated looping after HDAC inhibition, which points to a shift in enhancer–promoter architecture beyond individual loops.

Why does that matter? Many biological processes depend not just on a handful of punctate loops, but on an overall loosening or tightening of the chromatin landscape. In other words, if you only focus on individual hotspots, you risk missing the forest for the trees: AQuA helps reveal when the entire forest is greening.

We designed the AQuA-tools suite to make it easy to take full advantage of the AQuA normalization method. AQuA-tools that use sample data automatically check the .mergeStats.txt file to determine whether spike-in reads are present. If spike-in data is found, AQuA normalization is applied by default. If not, the tools default to CPM normalization. Our hope is to simplify the process of analyzing your own samples, whether they include spike-in or not.

Once you’ve completed this recipe, we recommend exploring the next recipe designed for Tinker and Docker environments. There, you’ll be introduced to extract_bedpe, a powerful AQuA tool that uses inherent normalization to call loops directly from HiChIP contact matrices, with no peak file required. These next steps offer a novel and flexible approach to discovering chromatin interactions and are perfect for diving deeper into what AQuA-tools can do.



Gryder, B. E., Khan, J., & Stanton, B. Z. (2020). Measurement of differential chromatin interactions with absolute quantification of architecture (AQuA-HiChIP). Nature protocols, 15(3), 1209–1236. https://doi.org/10.1038/s41596-019-0285-9