CNV-HiCHIP Performance Comparisons

Off-Target CNV Prediction Approaches:

ChIP-seq Input: Considered the gold standard.
ChIP-seq Treatment: Used as the standard comparison.
HiChIP-1D Treatment: Treated as standard ChIP-seq data.
HiChIP-3D Treatment (Loop): Utilizes looping information exclusively (Neoloop).

Proper Fragment Realignment Scoring

   [b          e]           # input score x
[a       c]   [d        f]  # target score y,z
=> # score of target on [b    e]

   [b          e]           # w1 = |c-b|/|e-b|, w2= |e-d|/|e-b| 
                            # score = (w1*x + w2*y) / (w1+w2)  ## w1 + w2 is not always 1

Comparison of MG63 Datasets with MG63 ChIP-seq Input

Informative Value: The HiChIP Loop method provides less informative results compared to other approaches.
Noise Levels: HiChIP 1D data is less noisy than ChIP-seq, primarily due to its higher sequencing depth.
False Negatives: Both ChIP-seq and HiChIP off-target selection results in more false negatives, leading to missed CNVs that are detected by the input.
False Positives: ChIP-seq exhibits a higher rate of false positives when calling CNV gains.

Average Plots

TP segments : Input CNV > 1 and Target CNV - Input CNV < 0.5
FN segments : Input CNV > 1 and Target CNV - Input CNV < - 0.5
FP segments : Target CNV > 1 and Input CNV < 0.5
Signals (Color) : normalized log2 FC ( after correcting GC bias, Off-target, mappability)

Summary

Sample	number	length distribution
region_fn.bed	15	min=40000, med=310000, avg=405957, max=2680000
region_fn_MG63_ChIP.bed	7
region_fn_MG63_HiChIP.bed	15
region_tp.bed	188	min=40000, med=180000, avg=444000, max=2280000
region_tp_MG63_ChIP.bed	188
region_tp_MG63_HiChIP.bed	188
region_fp.bed	20
region_fp_MG63_ChIP.bed	13
region_fp_MG63_HiChIP.bed	7

Input	ChIP	HiChIP

False Negative (region_fn.bed, * in the figures)

chr1	122500001	124780000 *
chr1	143180001	143320000 *
chr1	219580001	219620000 *
chr13	16020001	18040000
chr5	49660001	49860000 *
chr8	108700001	108960000
chr8	117840001	117960000
chr8	117960001	118240000
chr9	13860001	14020000
chr9	14020001	14200000
chr9	14200001	14580000
chr9	14580001	14700000
chr9	21500001	21860000
chrX	141000001	141040000
chrY	56680001	56760000

Examples

Figures	Genomic Loc	Comments
	chr1 122500001 124780000	Off-target fragmented
	chr1 143180001 143320000	Off-target missing
	chr1 219580001 219620000	Off-target missing+fragmented
	chr5 49660001 49860000	Low signal

False Positives

chr1	120280001	120320000 *
chr1	120320001	120600000
chr1	144380001	144580000
chr1	144880001	145300000
chr1	148820001	149320000
chr10	16940001	17240000 *
chr11	123060001	123260000
chr12	80240001	80680000
chr13	16000001	16020000
chr13	33100001	33300000
chr13	75620001	75800000
chr17	47460001	47560000 *
chr2	150460001	150680000
chr2	187540001	187600000
chr2	97380001	97440000
chr4	65480001	65680000
chr5	42980001	43080000
chr9	340001	820000
chr9	40700001	41820000
chrX	3840001	3920000

Examples

Figures	Genomic Loc	Comments
	chr1 120280001 120320000
	chr10 16940001 17240000
	chr17 47460001 47560000

Summary

The causes of false positives can be categorized by comparison with the INPUT sample.
Peak calling with various window sizes is a significant factor (impact factor=4).
3D off-target effects (loops) need to be investigated (impact factor > 4).