Skip to content

extract_bedpe

Concepts

The aqua_tools suite provides two primary methods for gathering data for analysis:

  1. build

    If you are starting with known regions in a BED file, such as promoters or enhancers, build_bedpe can convert these regions into paired-end regions for 3D contact analysis.

  2. extract

    If you are uncertain about the specific regions you need, extract_bedpe allows you to analyze contact regions without prior knowledge of their genomic elements or biological significance, making it ideal for exploratory research. By starting with a .hic file, extract_bedpe identifies regions in contact, regardless of whether they are enhancers, promoters, or other genomic elements.

In this tutorial we will cover the basics of extract_bedpe and explain its customizable parameters. extract_bedpe is a flexible tool which will allow you to tailor your analysis to focus on the most relevant contacts, filtering out noise and improving signal clarity.

Using inherent normalization

intro

Click here to learn more about inherent normalization.

Extracting contacts from a .hic

inh_norm

Changing the contact score

You select the level of contact that extract_bedpe considers, allowing for quick identification of the strongest contacts within a region.

3_score

4_score

Minimum distance constraint

Once you have obtained regions with the desired level of contact, you may want to further narrow your search. Using the --min_dist parameter, you can filter your results to ignore regions that are within a certain distance from the diagonal (indicating they are close together in linear space).

5_min_dist

Understanding your output

To illustrate the meaning of the output from extract_bedpe, we have drawn shadows for 4 example regions to show which regions the output refers to.

6_bedpe

Conglomerating regions

Refining your output is easy with additional parameters like --radius. If your output seems too granular, you can use this parameter to conglomerate nearby loops.

7_radius

8_radius_2

Changing the extract_bedpe —mode

9_glob 10_flare 11_minimal

Searching within a range

Depending on your analysis, you may only be interested in contacts (whether they’re loops or globs) within a specific range.

12_range

Searching within TAD boundaries

If you have a bed file of TADs, or any other type of boundaries, you can tell extract_bedpe to only consider contacts within those boundaries.

13_tad

Usage

extract_bedpe obtains clusters of interacting loops based on a score threshold. Scores are calculated using inherent normalization and printed in bedpe format to standard out.

Usage and Option Summary

Terminal window
extract_bedpe -A sample -G genome -R range

Required

Short OptionLong OptionDescription
-A--sample1Name of sample on Tinkerbox
-G--genomeGenome build the sample was processed using
-R--rangeRange to obtain clusters from, in chr:start:end format

Optional

Short OptionLong OptionDescription
-r--resolutionResolution in base pairs. Only 5000 and 1000 supported. Default 5000
-T--TADFull path to TAD file, the boundaries of which will be used to obtain clusters
-S--scoreInherent score to seed cluster formation. Default = 1
-m--modeShape of bedpe to be called. Strictly loop, flare, minimal, or glob. Default glob
--radiusBin distance units to search for neighbours . Default 1
--min_distDistance in basepairs to filter out extracted elements . Default = 0
-h--helpHelp message