Skip to content

annotate_cluster

Concepts

annotate_cluster takes the 7-column .bedpe output from cluster_bedpe and produces a table where each row provides a summary of each cluster’s properties and intersections with reference genome annotations.

For every cluster, the output includes:

Column 1 = cluster id
Column 2 = cluster chr
Column 3 = cluster start
Column 4 = cluster end
Column 5 = number of loop-structures in cluster
Column 6 = total CPM count of cluster
Column 7 = total size of cluster (end - start)
Column 8 = total size of loops in the cluster
Column 9 = total size of bed_B peaks in the cluster
Column 10 = total number of bed_B peaks in the cluster
Column 11 = total number of alternate TSSs
Column 12 = total number of lncRNAs
Column 13 = total number of housekeeping genes
Column 14 = total number of protein-coding genes
Column 15 = total number of ENCODE-3 cCRE enhancers
Column 16 = total number of ENCODE-3 cCRE CTCF sites
Column 17 = total number of UCSC CpG sites
Column 18 = names of lncRNAs
Column 19 = names of housekeeping genes
Column 20 = names of protein-coding genes

Example table output:

ClusterchrstartendDegreeTotal_sumRange_spanBin_spanPeak_spanNum_bed-B_peaksNum_Alternate_TSSsNum_lncRNANum_Housekeeping_GenesNum_All_Genes(protein_coding)Num_ENCODE-3_EnhNum_ENCODE-3_CTCFNum_UCSC_CpGLncRNAsHousekeeping_GenesAll_Genes(protein_coding)
cluster‑601chr188700000891500010.23121500015000327930000801
cluster‑602chr189330000948000010.32315000020000371456022903RALBP1, TWSG1RALBP1, TWSG1RALBP1, TWSG1
cluster‑603chr189615000995500052.12934000050000230342010013301TXNDC2
cluster‑604chr18123750001269500020.3793200003000036094101110SPIRE1SPIRE1
cluster‑605chr18127000001295000030.976250000500004503490338CEP76, PSMG2, SEH1LCEP76, PSMG2, SEH1L
cluster‑606chr18215950002174500041.835150000400001183910603327ABHD3, ESCO1, SNRPD1ABHD3, ESCO1, SNRPD1
cluster‑607chr18220900002219500010.4581050002000017522000030
cluster‑608chr18252200002535500020.93313500025000223120111ZNF521ZNF521

Usage

annotate_cluster annotates a cluster_bedpe object with biological signals. Input bedpe should be 7-columns only, with column 7 indicating cluster membership.

Usage and Option Summary

Terminal window
annotate_cluster -P path/to/cluster_file.bedpe -A sample_name -G hg38 -B path/to/peaks.bed

Required

Short OptionLong OptionDescription
-P--bedpePath to 7-col .bedpe
-A--sample1Name of sample
-G--genomeGenome build used for sample processing
-B--bedPath to the sample’s H3K27ac/ATAC/DNAse peaks .bed file

Optional

Short OptionLong OptionDescription
-U--user_bedPaths to any other .bed files
-h--helpHelp message