annotate_cluster

Concepts

annotate_cluster takes the 7-column .bedpe output from cluster_bedpe and produces a table where each row provides a summary of each cluster’s properties and intersections with reference genome annotations.

For every cluster, the output includes:

Column 1 = cluster id
Column 2 = cluster chr
Column 3 = cluster start
Column 4 = cluster end
Column 5 = number of loop-structures in cluster
Column 6 = total CPM count of cluster
Column 7 = total size of cluster (end - start)
Column 8 = total size of loops in the cluster
Column 9 = total size of bed_B peaks in the cluster
Column 10 = total number of bed_B peaks in the cluster
Column 11 = total number of alternate TSSs
Column 12 = total number of lncRNAs
Column 13 = total number of housekeeping genes
Column 14 = total number of protein-coding genes
Column 15 = total number of ENCODE-3 cCRE enhancers
Column 16 = total number of ENCODE-3 cCRE CTCF sites
Column 17 = total number of UCSC CpG sites
Column 18 = names of lncRNAs
Column 19 = names of housekeeping genes
Column 20 = names of protein-coding genes

Example table output:

Cluster	chr	start	end	Degree	Total_sum	Range_span	Bin_span	Peak_span	Num_bed-B_peaks	Num_Alternate_TSSs	Num_Housekeeping_Genes	Num_All_Genes(protein_coding)	Num_ENCODE-3_Enh	Num_ENCODE-3_CTCF	Num_UCSC_CpG	LncRNAs	Housekeeping_Genes	All_Genes(protein_coding)
cluster‑601	chr18	8700000	8915000	1	0.231	215000	15000	3279	3	0	0	0	8	0	1
cluster‑602	chr18	9330000	9480000	1	0.323	150000	20000	3714	5	6	2	2	9	0	3	RALBP1, TWSG1	RALBP1, TWSG1	RALBP1, TWSG1
cluster‑603	chr18	9615000	9955000	5	2.129	340000	50000	23034	20	1	0	1	33	0	1			TXNDC2
cluster‑604	chr18	12375000	12695000	2	0.379	320000	30000	3609	4	1	1	1	10				SPIRE1	SPIRE1
cluster‑605	chr18	12700000	12950000	3	0.976	250000	50000	4503	4	9	3	3	8				CEP76, PSMG2, SEH1L	CEP76, PSMG2, SEH1L
cluster‑606	chr18	21595000	21745000	4	1.835	150000	40000	11839	10	6	3	3	27				ABHD3, ESCO1, SNRPD1	ABHD3, ESCO1, SNRPD1
cluster‑607	chr18	22090000	22195000	1	0.458	105000	20000	1752	2	0	0	0	3	0
cluster‑608	chr18	25220000	25355000	2	0.933	135000	25000	223	1	2	1	1	1				ZNF521	ZNF521

Usage

annotate_cluster annotates a cluster_bedpe object with biological signals. Input bedpe should be 7-columns only, with column 7 indicating cluster membership.

Usage and Option Summary

annotate_cluster -P path/to/cluster_file.bedpe -A sample_name -G hg38 -B path/to/peaks.bed

Required

Short Option	Long Option	Description
`-P`	`--bedpe`	Path to 7-col .bedpe
`-A`	`--sample1`	Name of sample
`-G`	`--genome`	Genome build used for sample processing
`-B`	`--bed`	Path to the sample’s H3K27ac/ATAC/DNAse peaks .bed file

Optional

Short Option	Long Option	Description
`-U`	`--user_bed`	Paths to any other .bed files
`-h`	`--help`	Help message