Background

  • Focal amplification involving enhancers and target oncogenes has been observed in many cancers, such as EGFR in glioblastoma, MYC in group 3 medulloblastoma, and MYCN in both neuroblastoma and Wilms tumors cell2019scacheri.
    • H3K27ac ChlP-seq, ATAC-seq, POLR2A ChlP-seq, and RNA-seq signals at two EGFR enhancers for four glioblastoma lines (GBM3565, GBM3094, GSC23, G459)
    • used HiChIP: GSE73865 (O’Brien et al., 2016), GSE90683 (Boeva et al., 2017)
  • These oncogenes were co-amplified with super-enhancers, not only in contiguous regions but also in more complex, non-contiguous amplicons. They are linearly broken into cis and trans genomic loci associated with oncogenes role of ecDNAs.
  • These regulatory elements have been preserved and evolved within cells in a circular form, referred to as extra-circular DNA cell2013korbel.
  • Bioinformatics tools for analyzing whole genome sequencing (WGS) data can exhibit varying performance based on their underlying assumptions and the quality of the input data 38746056,39209966.

image

Methods

  • Convert contacts to network
  • Assortativity (https://networkx.org/nx-guides/content/algorithms/assortativity/correlation.html)

image

  • Hint : Gini-ranking 32293513
    • github: https://github.com/parklab/HiNT
  • Developed App go
Image 1 Image 2 Image 3

Public Datasets

  • Database 35388171
  • ecDNA HiChIP datasets 31748743.
  • MYC-amplified colorectal cancer cell line, ecDNA hubs are tethered by the BET protein BRD4 34819668.
  • HiChIP datasets from SNU16 cells (amplified for MYC and FGFR2) 31748743.

Previous Results

Results

image

image

image

Methods

  • Hint : Gini-ranking 32293513
    • github: https://github.com/parklab/HiNT

Code Anlysis

The Hint source code (https://github.com/parklab/HiNT):

def gini(x):
    # (Warning: This is a concise implementation, but it is O(n**2)
    # in time and memory, where n = len(x).  *Don't* pass in huge
    # samples!)

    # Mean absolute difference
    mad = np.nanmean(np.abs(np.subtract.outer(x, x)))
    # Relative mean absolute difference
    rmad = mad/np.nanmean(x)
    # Gini coefficient
    g = 0.5 * rmad
    return g

def getGini(mat1,mat2):
    matrix1 = np.genfromtxt(mat1,delimiter="\t")
    matrix2 = np.genfromtxt(mat2,delimiter="\t")
    matrix1[np.isfinite(matrix1)==0] = 0
    matrix2[np.isfinite(matrix2)==0] = 0
    rowsum1 = np.sum(matrix1,axis=1)
    rowsum2 = np.sum(matrix2,axis=1)
    colsum1 = np.sum(matrix1,axis=0)
    colsum2 = np.sum(matrix2,axis=0)
    ridx1 = np.where(rowsum1==0)
    cidx1 = np.where(colsum1==0)
    ridx2 = np.where(rowsum2==0)
    cidx2 = np.where(colsum2==0)
    ridx = np.union1d(ridx1[0], ridx2[0])
    cidx = np.union1d(cidx1[0], cidx2[0])

    temp1 = np.delete(matrix1,ridx,0)
    temp2 = np.delete(matrix2,ridx,0)
    selectedData1 = np.delete(temp1,cidx,1)
    selectedData2 = np.delete(temp2,cidx,1)

    average1 = np.mean(selectedData1)
    average2 = np.mean(selectedData2)
    tm1 = np.divide(selectedData1,average1)
    tm2 = np.divide(selectedData2,average2)
    division = np.divide(tm1,tm2)
    giniIndex = gini(np.asarray(division).reshape(-1))
    maximum = np.nanmax(np.asarray(division).reshape(-1))

    return giniIndex,maximum

def getRankProduct(matrix1MbInfo,background1MbInfo,outdir,name):
    rpout = os.path.join(outdir,name + '_chrompairs_rankProduct.txt')
    outf = open(rpout,'w')
    ginis = []
    maximums = []
    chrompairs = []
    for chrompair in matrix1MbInfo:
        #print chrompair
        matrix1 = matrix1MbInfo[chrompair]
        matrix2 = background1MbInfo[chrompair]
        giniIndex,maximum = getGini(matrix1,matrix2)
        chrompairs.append(chrompair)
        ginis.append(giniIndex)
        maximums.append(maximum)
    rankgini = len(ginis) - rankdata(ginis)
    rankmaximum = len(maximums) - rankdata(maximums)
    #print rankgini,rankmaximum
    rps = (np.divide(rankgini,len(ginis)*1.0))*(np.divide(rankmaximum,len(maximums)*1.0))
    result = np.stack((chrompairs,ginis,maximums,rps),axis=-1)
    sortedResult = sorted(result, key=itemgetter(-1))
    outf.write('\t'.join(['ChromPair',"GiniIndex","Maximum","RankProduct"]) + '\n')
    for res in sortedResult:
        chrompair, gini, maximum, rp = res
        newres = [chrompair, str(gini), str(maximum), str(rp)]
        outf.write('\t'.join(newres) + '\n')
    outf.close()
   
    return rpout