Predict Copy Number Variation Using HiChIP Data

Background

Depth of coverage (DOC), off-target CNV approaches have been applied to exome and ChIP-seq data [PMC9039557][PMC4396974].
DOC-off-target methods require filtering out peak regions when using non-input immunoprecipitation (IP) samples to avoid off-target binding signals.
The peak calling outcomes are highly dependent on the specific algorithm used.
Most ChIP-seq and ATAC-seq peak callers are not designed for detecting complex, non-symmetric peak patterns.
- For instance, H3K4me3 peaks are typically sharply localized, while H3K4me1/3 peaks span broader domains.
- H3K27ac marks both large regions, such as super-enhancers, and smaller, discrete regions like promoters, exhibiting both broad and narrow peak characteristics [35788238][ref..].
- Pol2 ..
There are some CNV tools for HiC
- LOIC [PMC6127909]
- HiNT [PMC7087379]
- HiCnv and OneD [ref]
No CNV tools for HiChIP data

Example of CNV calls on MG63.3 H3K27ac at Chr6 Regions

Problem

Different CNVs with Different Data

HiChIP(black), ChIP-seq(blue), and ChIP-seq Input(Gold)

Different Peaks with Different Algorithms

CopywriteR : non-parameteric, FDR-base, expand peaks within segment boundaries
HOMER : Simple poisson model, 4-fold greater than in the surrounding 10 kb region, The maximum distance used to stitch peaks together
MACS2 : Dynamic poisson parameters (λlocal = max(λBG, λ1k, λ5k, λ10k), bad at expantion

CNV score is measured after filtering peaks.

Peaks by CopywriteR (top), HOMER(p-53)(mid), MACS2(p-4)(bottom)

Methods

To make the peak caller smarter

Understand CopywriteR Model
Modify the CopywriteR algorithm (reduce peak expantion)

Peak Expantion Algorithm: fullcode

               retest.peak.ranges <- apply(test, 1, function(x) {
                   left.lower.boundary <- max(0, (as.integer(x["start"]) - (resolution + 1)))
                   left.higher.boundary <- max(0, (as.integer(x["start"]) - 1))
                   right.lower.boundary <- min(chromosomes[selection],
                                               (as.integer(x["end"]) + 1))
                   right.higher.boundary <- min(chromosomes[selection],
                                                (as.integer(x["end"]) + (resolution + 1)))
                   left.peakCutoff <- ceiling(.peakCutoff(cov.chr[left.lower.boundary:left.higher.boundary],fdr.cutoff=FDRT))
                   right.peakCutoff <- ceiling(.peakCutoff(cov.chr[right.lower.boundary:right.higher.boundary],fdr.cutoff=FDRT))

The original algorithm uses FDR=0.1 for adding peaks Applied stringent FDRT=0.0001

Results

Recovered CNV calling at 5’ body of SCIRT

Discussion

Loops are good indicators CNV?
New model \(log(C) = \beta_0( GCcontent ) + \beta_1( Mappability ) + \beta_2(3Dcontact) \epsilon\)
Other examples