annotate_cluster
Concepts
annotate_cluster
takes the 7-column .bedpe
output from cluster_bedpe
and produces a table where each row provides a summary of each cluster’s properties and intersections with reference genome annotations.
For every cluster, the output includes:
Column 1 = cluster id
Column 2 = cluster chr
Column 3 = cluster start
Column 4 = cluster end
Column 5 = number of loop-structures in cluster
Column 6 = total CPM count of cluster
Column 7 = total size of cluster (end - start)
Column 8 = total size of loops in the cluster
Column 9 = total size of bed_B peaks in the cluster
Column 10 = total number of bed_B peaks in the cluster
Column 11 = total number of alternate TSSs
Column 12 = total number of lncRNAs
Column 13 = total number of housekeeping genes
Column 14 = total number of protein-coding genes
Column 15 = total number of ENCODE-3 cCRE enhancers
Column 16 = total number of ENCODE-3 cCRE CTCF sites
Column 17 = total number of UCSC CpG sites
Column 18 = names of lncRNAs
Column 19 = names of housekeeping genes
Column 20 = names of protein-coding genes
Example table output:
Cluster | chr | start | end | Degree | Total_sum | Range_span | Bin_span | Peak_span | Num_bed-B_peaks | Num_Alternate_TSSs | Num_lncRNA | Num_Housekeeping_Genes | Num_All_Genes(protein_coding) | Num_ENCODE-3_Enh | Num_ENCODE-3_CTCF | Num_UCSC_CpG | LncRNAs | Housekeeping_Genes | All_Genes(protein_coding) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
cluster‑601 | chr18 | 8700000 | 8915000 | 1 | 0.231 | 215000 | 15000 | 3279 | 3 | 0 | 0 | 0 | 0 | 8 | 0 | 1 | |||
cluster‑602 | chr18 | 9330000 | 9480000 | 1 | 0.323 | 150000 | 20000 | 3714 | 5 | 6 | 0 | 2 | 2 | 9 | 0 | 3 | RALBP1, TWSG1 | RALBP1, TWSG1 | RALBP1, TWSG1 |
cluster‑603 | chr18 | 9615000 | 9955000 | 5 | 2.129 | 340000 | 50000 | 23034 | 20 | 1 | 0 | 0 | 1 | 33 | 0 | 1 | TXNDC2 | ||
cluster‑604 | chr18 | 12375000 | 12695000 | 2 | 0.379 | 320000 | 30000 | 3609 | 4 | 1 | 0 | 1 | 1 | 10 | SPIRE1 | SPIRE1 | |||
cluster‑605 | chr18 | 12700000 | 12950000 | 3 | 0.976 | 250000 | 50000 | 4503 | 4 | 9 | 0 | 3 | 3 | 8 | CEP76, PSMG2, SEH1L | CEP76, PSMG2, SEH1L | |||
cluster‑606 | chr18 | 21595000 | 21745000 | 4 | 1.835 | 150000 | 40000 | 11839 | 10 | 6 | 0 | 3 | 3 | 27 | ABHD3, ESCO1, SNRPD1 | ABHD3, ESCO1, SNRPD1 | |||
cluster‑607 | chr18 | 22090000 | 22195000 | 1 | 0.458 | 105000 | 20000 | 1752 | 2 | 0 | 0 | 0 | 0 | 3 | 0 | ||||
cluster‑608 | chr18 | 25220000 | 25355000 | 2 | 0.933 | 135000 | 25000 | 223 | 1 | 2 | 0 | 1 | 1 | 1 | ZNF521 | ZNF521 |
Usage
annotate_cluster
annotates a cluster_bedpe
object with biological signals. Input bedpe should be 7-columns only, with column 7 indicating cluster membership.
Usage and Option Summary
annotate_cluster -P path/to/cluster_file.bedpe -A sample_name -G hg38 -B path/to/peaks.bed
Required
Short Option | Long Option | Description |
---|---|---|
-P | --bedpe | Path to 7-col .bedpe |
-A | --sample1 | Name of sample |
-G | --genome | Genome build used for sample processing |
-B | --bed | Path to the sample’s H3K27ac/ATAC/DNAse peaks .bed file |
Optional
Short Option | Long Option | Description |
---|---|---|
-U | --user_bed | Paths to any other .bed files |
-h | --help | Help message |