Please see Xiao-Ou Zhang's Google Scholar for the full list of publications.

  • The transcription factor ZEB2 drives the formation of age-associated B cells
    Dai D*, Gu S*, Han X*, Ding H, Jiang Y, Zhang XO, Yao C, Hong S, Zhang J, Shen Y, Hou G, Qu B, Zhou H, Qin Y, He Y, Ma J, Yin Z, Ye Z, Qian J, Jiang Q, Wu L, Guo Q, Chen S, Huang C, Kottyan LC, Weirauch MT, Vinuesa CG†, Shen N†
    Science. 2024, 383:413-421. DOI: 10.1126/science.adf8531
    Abstract: Age-associated B cells (ABCs) accumulate during infection, aging, and autoimmunity, contributing to lupus pathogenesis. In this study, we screened for transcription factors driving ABC formation and found that zinc finger E-box binding homeobox 2 (ZEB2) is required for human and mouse ABC differentiation in vitro. ABCs are reduced in ZEB2 haploinsufficient individuals and in mice lacking Zeb2 in B cells. In mice with toll-like receptor 7 (TLR7)-driven lupus, ZEB2 is essential for ABC formation and autoimmune pathology. ZEB2 binds to +20-kb myocyte enhancer factor 2b (Mef2b)'s intronic enhancer, repressing MEF2B-mediated germinal center B cell differentiation and promoting ABC formation. ZEB2 also targets genes important for ABC specification and function, including Itgax. ZEB2-driven ABC differentiation requires JAK-STAT (Janus kinase-signal transducer and activator of transcription), and treatment with JAK1/3 inhibitor reduces ABC accumulation in autoimmune mice and patients. Thus, ZEB2 emerges as a driver of B cell autoimmunity.
  • TE-TSS: an integrated data resource of human and mouse transposable element (TE)-derived transcription start site (TSS)
    Gu X, Wang M, Zhang XO
    Nucleic Acids Res. 2024, 52:D322-D333. DOI: 10.1093/nar/gkad1048
    Abstract: Transposable elements (TEs) are abundant in the genome and serve as crucial regulatory elements. Some TEs function as epigenetically regulated promoters, and these TE-derived transcription start sites (TSSs) play a crucial role in regulating genes associated with specific functions, such as cancer and embryogenesis. However, the lack of an accessible database that systematically gathers TE-derived TSS data is a current research gap. To address this, we established TE-TSS, an integrated data resource of human and mouse TE-derived TSSs ( TE-TSS has compiled 2681 RNA sequencing datasets, spanning various tissues, cell lines and developmental stages. From these, we identified 5768 human TE-derived TSSs and 2797 mouse TE-derived TSSs, with 47% and 38% being experimentally validated, respectively. TE-TSS enables comprehensive exploration of TSS usage in diverse samples, providing insights into tissue-specific gene expression patterns and transcriptional regulatory elements. Furthermore, TE-TSS compares TE-derived TSS regions across 15 mammalian species, enhancing our understanding of their evolutionary and functional aspects. The establishment of TE-TSS facilitates further investigations into the roles of TEs in shaping the transcriptomic landscape and offers valuable resources for comprehending their involvement in diverse biological processes.
  • Allele-specific binding (ASB) analyzer for annotation of allele-specific binding SNPs
    Li Y, Zhang XO, Liu Y, Lu A†
    BMC Bioinf. 2023, 24:464. DOI: 10.1186/s12859-023-05604-6
    Abstract: Background: Allele-specific binding (ASB) events occur when transcription factors (TFs) bind more favorably to one of the two parental alleles at heterozygous single nucleotide polymorphisms (SNPs). Evidence suggests that ASB events could reveal the impact of sequence variations on TF binding and may have implications for the risk of diseases. Results: Here we present ASB-analyzer, a software platform that enables the users to quickly and efficiently input raw sequencing data to generate individual reports containing the cytogenetic map of ASB SNPs and their associated phenotypes. This interactive tool thereby combines ASB SNP identification, biological annotation, motif analysis, phenotype associations and report summary in one pipeline. With this pipeline, we identified 3772 ASB SNPs from thirty GM12878 ChIP-seq datasets and demonstrated that the ASB SNPs were more likely to be enriched at important sites in TF-binding domains. Conclusions: ASB-analyzer is a user-friendly tool that enables the detection, characterization and visualization of ASB SNPs. It is implemented in Python, R and bash shell and packaged in the Conda environment. It is available as an open-source tool on GitHub at
  • BCL2 is a major regulator of haploidy maintenance in murine embryonic stem cells
    Sun S*, Zhao Q*, Zhao Y, Geng M, Wang Q, Gao Q, Zhang XO†, Zhang W†, Suai L†
    Cell Prolif. 2023, e13498. DOI: 10.1111/cpr.13498
    Abstract: Mammalian haploid cells are important resources for forward genetic screening and are important in genetic medicine and drug development. However, the self-diploidization of murine haploid embryonic stem cells (haESCs) during daily culture or differentiation jeopardizes their use in genetic approaches. Here, we show that overexpression (OE) of an antiapoptosis gene, BCL2, in haESCs robustly ensures their haploidy maintenance in various situations, even under strict differentiation in vivo (embryonic 10.5 chimeric fetus or 21-day teratoma). Haploid cell lines of many lineages, including epiblasts, trophectodermal lineages, and neuroectodermal lineages, can be easily derived by the differentiation of BCL2-OE haESCs in vitro. Transcriptome analysis revealed that BCL2-OE activates another regulatory gene, Has2, which is also sufficient for haploidy maintenance. Together, our findings provide an effective and secure strategy to reduce diploidization during differentiation, which will contribute to the generation of haploid cell lines of the desired lineage and related genetic screening.
  • Epigenetic and chromosomal features drive transposon insertion in Drosophila melanogaster
    Cao J*, Yu T*, Xu B, Hu Z, Zhang XO, Theurkauf WE, Weng Z†
    Nucleic Acids Res. 2023, 51:2066-2086. DOI: 10.1093/nar/gkad054
    Abstract: Transposons are mobile genetic elements prevalent in the genomes of most species. The distribution of transposons within a genome reflects the actions of two opposing processes: initial insertion site selection, and selective pressure from the host. By analyzing whole-genome sequencing data from transposon-activated Drosophila melanogaster, we identified 43 316 de novo and 237 germline insertions from four long-terminal-repeat (LTR) transposons, one LINE transposon (I-element), and one DNA transposon (P-element). We found that all transposon types favored insertion into promoters de novo, but otherwise displayed distinct insertion patterns. De novo and germline P-element insertions preferred replication origins, often landing in a narrow region around transcription start sites and in regions of high chromatin accessibility. De novo LTR transposon insertions preferred regions with high H3K36me3, promoters and exons of active genes; within genes, LTR insertion frequency correlated with gene expression. De novo I-element insertion density increased with distance from the centromere. Germline I-element and LTR transposon insertions were depleted in promoters and exons, suggesting strong selective pressure to remove transposons from functional elements. Transposon movement is associated with genome evolution and disease; therefore, our results can improve our understanding of genome and disease biology.
  • Arih2 regulates Hedgehog signaling through smoothened ubiquitylation and ER-associated degradation.
    Lv B, Zhang XO, Pazour GJ†
    J Cell Sci. 2022, 135:jcs260299. DOI: 10.1242/jcs.260299
    Abstract: During Hedgehog signaling, the ciliary levels of Ptch1 and Smo are regulated by the pathway. At the basal state, Ptch1 localizes to cilia and prevents the ciliary accumulation and activation of Smo. Upon binding a Hedgehog ligand, Ptch1 exits cilia, relieving inhibition of Smo. Smo then concentrates in cilia, becomes activated and activates downstream signaling. Loss of the ubiquitin E3 ligase Arih2 elevates basal Hedgehog signaling, elevates the cellular level of Smo and increases basal levels of ciliary Smo. Mice express two isoforms of Arih2 with Arih2α found primarily in the nucleus and Arih2β found on the cytoplasmic face of the endoplasmic reticulum (ER). Re-expression of ER-localized Arih2β but not nuclear-localized Arih2α rescues the Arih2 mutant phenotypes. When Arih2 is defective, protein aggregates accumulate in the ER and the unfolded protein response is activated. Arih2β appears to regulate the ER-associated degradation (ERAD) of Smo preventing excess and potentially misfolded Smo from reaching the cilium and interfering with pathway regulation.
  • PRMT5 activates AKT via methylation to promote tumor metastasis.
    Huang L, Zhang XO, Rozen EJ, Sun X, Sallis B, Verdejo-Torres O, Wigglesworth K, Moon D, Huang T, Cavaretta JP, Wang G, Zhang L, Shohet JM, Lee MM†, Wu Q†
    Nat Commun. 2022, 13:3955. DOI: 10.1038/s41467-022-31645-1
    Abstract: Protein arginine methyltransferase 5 (PRMT5) is the primary methyltransferase generating symmetric-dimethyl-arginine marks on histone and non-histone proteins. PRMT5 dysregulation is implicated in multiple oncogenic processes. Here, we report that PRMT5-mediated methylation of protein kinase B (AKT) is required for its subsequent phosphorylation at Thr308 and Ser473. Moreover, pharmacologic or genetic inhibition of PRMT5 abolishes AKT1 arginine 15 methylation, thereby preventing AKT1 translocation to the plasma membrane and subsequent recruitment of its upstream activating kinases PDK1 and mTOR2. We show that PRMT5/AKT signaling controls the expression of the epithelial-mesenchymal-transition transcription factors ZEB1, SNAIL, and TWIST1. PRMT5 inhibition significantly attenuates primary tumor growth and broadly blocks metastasis in multiple organs in xenograft tumor models of high-risk neuroblastoma. Collectively, our results suggest that PRMT5 inhibition augments anti-AKT or other downstream targeted therapeutics in high-risk metastatic cancers.
  • Integration of high-resolution promoter profiling assays reveals novel, cell type–specific transcription start sites across 115 human cell and tissue types.
    Moore JE, Zhang XO, Elhajjajy SI, Fan K, Pratt HE, Reese F, Mortazavi A, Weng Z†
    Genome Res. 2022, 32:389-402. DOI: 10.1101/gr.275723.121
    Abstract: Accurate transcription start site (TSS) annotations are essential for understanding transcriptional regulation and its role in human disease. Gene collections such as GENCODE contain annotations for tens of thousands of TSSs, but not all of these annotations are experimentally validated nor do they contain information on cell type-specific usage. Therefore, we sought to generate a collection of experimentally validated TSSs by integrating RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression (RAMPAGE) data from 115 cell and tissue types, which resulted in a collection of approximately 50 thousand representative RAMPAGE peaks. These peaks are primarily proximal to GENCODE-annotated TSSs and are concordant with other transcription assays. Because RAMPAGE uses paired-end reads, we were then able to connect peaks to transcripts by analyzing the genomic positions of the 3' ends of read mates. Using this paired-end information, we classified the vast majority (37 thousand) of our RAMPAGE peaks as verified TSSs, updating TSS annotations for 20% of GENCODE genes. We also found that these updated TSS annotations are supported by epigenomic and other transcriptomic data sets. To show the utility of this RAMPAGE rPeak collection, we intersected it with the NHGRI/EBI genome-wide association study (GWAS) catalog and identified new candidate GWAS genes. Overall, our work shows the importance of integrating experimental data to further refine TSS annotations and provides a valuable resource for the biological community.
  • Deletion and replacement of long genomic sequences using prime editing.
    Jiang T, Zhang XO, Weng Z, Xue W†
    Nat Biotechnol. 2022, 40:227-234. DOI: 10.1038/s41587-021-01026-y
    Abstract: Genomic insertions, duplications and insertion/deletions (indels), which account for ~14% of human pathogenic mutations, cannot be accurately or efficiently corrected by current gene-editing methods, especially those that involve larger alterations (>100 base pairs (bp)). Here, we optimize prime editing (PE) tools for creating precise genomic deletions and direct the replacement of a genomic fragment ranging from ~1 kilobases (kb) to ~10 kb with a desired sequence (up to 60 bp) in the absence of an exogenous DNA template. By conjugating Cas9 nuclease to reverse transcriptase (PE-Cas9) and combining it with two PE guide RNAs (pegRNAs) targeting complementary DNA strands, we achieve precise and specific deletion and repair of target sequences via using this PE-Cas9-based deletion and repair (PEDAR) method. PEDAR outperformed other genome-editing methods in a reporter system and at endogenous loci, efficiently creating large and precise genomic alterations. In a mouse model of tyrosinemia, PEDAR removed a 1.38-kb pathogenic insertion within the Fah gene and precisely repaired the deletion junction to restore FAH expression in liver.
  • Optimized RNA-targeting CRISPR/Cas13d technology outperforms shRNA in identifying functional circRNAs.
    Zhang Y, Nguyen TM, Zhang XO, Wang L, Phan T, Clohessy JG†, Pandolfi PP†
    Genome Biol. 2021, 22:41. DOI: 10.1186/s13059-021-02263-9
    Abstract: Short hairpin RNAs (shRNAs) are used to deplete circRNAs by targeting back-splicing junction (BSJ) sites. However, frequent discrepancies exist between shRNA-mediated circRNA knockdown and the corresponding biological effect, querying their robustness. By leveraging CRISPR/Cas13d tool and optimizing the strategy for designing single-guide RNAs against circRNA BSJ sites, we markedly enhance specificity of circRNA silencing. This specificity is validated in parallel screenings by shRNA and CRISPR/Cas13d libraries. Using a CRISPR/Cas13d screening library targeting > 2500 human hepatocellular carcinoma-related circRNAs, we subsequently identify a subset of sorafenib-resistant circRNAs. Thus, CRISPR/Cas13d represents an effective approach for high-throughput study of functional circRNAs.
  • An organ-on-a-chip model for pre-clinical drug evaluation in progressive non-genetic cardiomyopathy.
    Wang EY, Kuzmanov U, Smith JB, Dou W, Rafatian N, Lai BFL, Lu RXZ, Wu Q, Yazbeck J, Zhang XO, Sun Y, Gramolini A, Radisic M†
    J Mol Cell Cardiol. 2021, 160:97-110. DOI: 10.1016/j.yjmcc.2021.06.012
    Abstract: Angiotensin II (Ang II) presents a critical mediator in various pathological conditions such as non-genetic cardiomyopathy. Osmotic pump infusion in rodents is a commonly used approach to model cardiomyopathy associated with Ang II. However, profound differences in electrophysiology and pharmacokinetics between rodent and human cardiomyocytes may limit predictability of animal-based experiments. This study investigates the application of an Organ-on-a-chip (OOC) system in modeling Ang II-induced progressive cardiomyopathy. The disease model is constructed to recapitulate myocardial response to Ang II in a temporal manner. The long-term tissue cultivation and non-invasive functional readouts enable monitoring of both acute and chronic cardiac responses to Ang II stimulation. Along with mapping of cytokine secretion and proteomic profiles, this model presents an opportunity to quantitatively measure the dynamic pathological changes that could not be otherwise identified in animals. Further, we present this model as a testbed to evaluate compounds that target Ang II-induced cardiac remodeling. Through assessing the effects of losartan, relaxin, and saracatinib, the drug screening data implicated multifaceted cardioprotective effects of relaxin in restoring contractile function and reducing fibrotic remodeling. Overall, this study provides a controllable platform where cardiac activities can be explicitly observed and tested over the pathological process. The facile and high-content screening can facilitate the evaluation of potential drug candidates in the pre-clinical stage.
  • 5'-Modifications improve potency and efficacy of DNA donors for precision genome editing.
    Ghanta KS*, Chen Z*, Mir A, Dokshin GA, Krishnamurthy PM, Yoon Y, Gallant J, Xu P, Zhang XO, Ozturk AR, Shin M, Idrizi F, Liu P, Gneid H, Edraki A, Lawson ND, Rivera-Pérez JA, Sontheimer E†, Watts JK†, Mello CC†
    Elife. 2021, 10:e72216. DOI: 10.7554/eLife.72216
    Abstract: Nuclease-directed genome editing is a powerful tool for investigating physiology and has great promise as a therapeutic approach to correct mutations that cause disease. In its most precise form, genome editing can use cellular homology-directed repair (HDR) pathways to insert information from an exogenously supplied DNA-repair template (donor) directly into a targeted genomic location. Unfortunately, particularly for long insertions, toxicity and delivery considerations associated with repair template DNA can limit HDR efficacy. Here, we explore chemical modifications to both double-stranded and single-stranded DNA-repair templates. We describe 5'-terminal modifications, including in its simplest form the incorporation of triethylene glycol (TEG) moieties, that consistently increase the frequency of precision editing in the germlines of three animal models (Caenorhabditis elegans, zebrafish, mice) and in cultured human cells.
  • Investigating the potential roles of SINEs in the human genome.
    Zhang XO, Pratt HE, Weng Z†
    Annu Rev Genomics Hum Genet. 2021. 22:199-218. DOI: 10.1146/annurev-genom-111620-100736
    Abstract: Short interspersed nuclear elements (SINEs) are nonautonomous retrotransposons that occupy approximately 13% of the human genome. They are transcribed by RNA polymerase III and can be retrotranscribed and inserted back into the genome with the help of other autonomous retroelements. Because they are preferentially located close to or within gene-rich regions, they can regulate gene expression by various mechanisms that act at both the DNA and the RNA levels. In this review, we summarize recent findings on the involvement of SINEs in different types of gene regulation and discuss the potential regulatory functions of SINEs that are in close proximity to genes, Pol III-transcribed SINE RNAs, and embedded SINE sequences within Pol II-transcribed genes in the human genome. These discoveries illustrate how the human genome has exapted some SINEs into functional regulatory elements.
  • Genetic and epigenetic features of promoters with ubiquitous chromatin accessibility support ubiquitous transcription of cell-essential genes.
    Fan K, Moore JE, Zhang XO, Weng Z†
    Nucleic Acids Res. 2021, 49:5705-5725. DOI: 10.1093/nar/gkab345
    Abstract: Gene expression is controlled by regulatory elements within accessible chromatin. Although most regulatory elements are cell type-specific, a subset is accessible in nearly all the 517 human and 94 mouse cell and tissue types assayed by the ENCODE consortium. We systematically analyzed 9000 human and 8000 mouse ubiquitously-accessible candidate cis-regulatory elements (cCREs) with promoter-like signatures (PLSs) from ENCODE, which we denote ubi-PLSs. These are more CpG-rich than non-ubi-PLSs and correspond to genes with ubiquitously high transcription, including a majority of cell-essential genes. ubi-PLSs are enriched with motifs of ubiquitously-expressed transcription factors and preferentially bound by transcriptional cofactors regulating ubiquitously-expressed genes. They are highly conserved between human and mouse at the synteny level but exhibit frequent turnover of motif sites; accordingly, ubi-PLSs show increased variation at their centers compared with flanking regions among the ∼186 thousand human genomes sequenced by the TOPMed project. Finally, ubi-PLSs are enriched in genes implicated in Mendelian diseases, especially diseases broadly impacting most cell types, such as deficiencies in mitochondrial functions. Thus, a set of roughly 9000 mammalian promoters are actively maintained in an accessible state across cell types by a distinct set of transcription factors and cofactors to ensure the transcriptional programs of cell-essential genes.
  • YAP1 withdrawal in hepatoblastoma drives therapeutic differentiation of tumor cells to functional hepatocyte-like cells.
    Smith JL, Rodríguez TC, Mou H, Kwan SY, Pratt HE, Zhang XO, Cao Y, Liang S, Ozata DM, Yu T, Yin Q, Hazeltine M, Weng Z, Sontheimer EJ, Xue W†
    Hepatology. 2021, 73:1011-1027. DOI: 10.1002/hep.31389
    Abstract: Despite surgical and chemotherapeutic advances, the 5-year survival rate for stage IV hepatoblastoma (HB), the predominant pediatric liver tumor, remains at 27%. Yes-associated protein 1 (YAP1) and β-catenin co-activation occurs in 80% of children's HB; however, a lack of conditional genetic models precludes tumor maintenance exploration. Thus, the need for a targeted therapy remains unmet. Given the predominance of YAP1 and β-catenin activation in HB, we sought to evaluate YAP1 as a therapeutic target in HB. We engineered the conditional HB murine model using hydrodynamic injection to deliver transposon plasmids encoding inducible YAP1S127A, constitutive β-cateninDelN90, and a luciferase reporter to murine liver. Tumor regression was evaluated using bioluminescent imaging, tumor landscape characterized using RNA and ATAC sequencing, and DNA footprinting. Here we show that YAP1S127A withdrawal mediates more than 90% tumor regression with survival for 230+ days in mice. YAP1S127A withdrawal promotes apoptosis in a subset of tumor cells, and in remaining cells induces a cell fate switch that drives therapeutic differentiation of HB tumors into Ki-67-negative hepatocyte-like HB cells ("HbHeps") with hepatocyte-like morphology and mature hepatocyte gene expression. YAP1S127A withdrawal drives the formation of hbHeps by modulating liver differentiation transcription factor occupancy. Indeed, tumor-derived hbHeps, consistent with their reprogrammed transcriptional landscape, regain partial hepatocyte function and rescue liver damage in mice. YAP1S127A withdrawal, without silencing oncogenic β-catenin, significantly regresses hepatoblastoma, providing in vivo data to support YAP1 as a therapeutic target for HB. YAP1S127A withdrawal alone sufficiently drives long-term regression in HB, as it promotes cell death in a subset of tumor cells and modulates transcription factor occupancy to reverse the fate of residual tumor cells to mimic functional hepatocytes.
  • Perspectives on ENCODE.
    The ENCODE Project Consortium (Zhang XO is one co-author of The ENCODE Project Consortium)
    Nature. 2020, 583:693-698. DOI: 10.1038/s41586-020-2449-8
    Abstract: The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis-regulatory elements (cCREs) that may serve functional roles in regulating gene expression1. The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community.
  • Expanded encyclopedias of DNA elements in the human and mouse genomes.
    The ENCODE Project Consortium (Zhang XO is one co-author of The ENCODE Project Consortium)
    Nature. 2020, 583:699-710. DOI: 10.1038/s41586-020-2493-4
    Abstract: The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (, including phase II ENCODE and Roadmap Epigenomics data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.
  • Chemical modifications of adenine base editor mRNA and guide RNA expand its application scope.
    Jiang T, Henderson JM, Coote K, Cheng Y, Valley HC, Zhang XO, Wang Q, Rhym LH, Cao Y, Newby GA, Bihler H, Mense M, Weng Z, Anderson DG, McCaffrey AP, Liu DR, Xue W†
    Nat Commun. 2020, 11:1979. DOI: 10.1038/s41467-020-15892-8
    Abstract: CRISPR-Cas9-associated base editing is a promising tool to correct pathogenic single nucleotide mutations in research or therapeutic settings. Efficient base editing requires cellular exposure to levels of base editors that can be difficult to attain in hard-to-transfect cells or in vivo. Here we engineer a chemically modified mRNA-encoded adenine base editor that mediates robust editing at various cellular genomic sites together with moderately modified guide RNA, and show its therapeutic potential in correcting pathogenic single nucleotide mutations in cell and animal models of diseases. The optimized chemical modifications of adenine base editor mRNA and guide RNA expand the applicability of CRISPR-associated gene editing tools in vitro and in vivo.
  • Comprehensive identification of alternative back-splicing in human tissue transcriptomes.
    Zhang P*, Zhang XO*, Jiang T, Cai L, Huang X, Liu Q, Li D, Lu A, Liu Y, Xue W, Zhang P†, Weng Z†
    Nucleic Acids Res. 2020, 48:1779-1789. DOI: 10.1093/nar/gkaa005
    Abstract: Circular RNAs (circRNAs) are covalently closed RNAs derived from back-splicing of genes across eukaryotes. Through alternative back-splicing (ABS), a single gene produces multiple circRNAs sharing the same back-splice site. Although many ABS events have recently been discovered, to what extent ABS involves in circRNA biogenesis and how it is regulated in different human tissues still remain elusive. Here, we reported an in-depth analysis of ABS events in 90 human tissue transcriptomes. We observed that ABS occurred for about 84% circRNAs. Interestingly, alternative 5' back-splicing occurs more prevalently than alternative 3' back-splicing, and both of them are tissue-specific, especially enriched in brain tissues. In addition, the patterns of ABS events in different brain regions are similar to each other and are more complex than the patterns in non-brain tissues. Finally, the intron length and abundance of Alu elements positively correlated with ABS event complexity, and the predominant circRNAs had longer flanking introns and more Alu elements than other circRNAs in the same ABS event. Together, our results represent a resource for circRNA research-we expanded the repertoire of ABS events of circRNAs in human tissue transcriptomes and provided insights into the complexity of circRNA biogenesis, expression, and regulation.
  • Depletion of TRRAP/KAT5 induces p53-independent senescence in liver cancer by regulating G2/M genes.
    Kwan SY*, Sheel A*, Song CQ, Zhang XO, Dang H, Cao Y, Mou H, Yin H, Weng Z, Wang XW, Xue W†
    Hepatology. 2020, 71:275-290. DOI: 10.1002/hep.30807
    Abstract: Hepatocellular carcinoma (HCC) is an aggressive subtype of liver cancer with few effective treatments, and the underlying mechanisms that drive HCC pathogenesis remain poorly characterized. Identifying genes and pathways essential for HCC cell growth will aid the development of new targeted therapies for HCC. Using a kinome CRISPR screen in three human HCC cell lines, we identified transformation/transcription domain-associated protein (TRRAP) as an essential gene for HCC cell proliferation. TRRAP has been implicated in oncogenic transformation, but how it functions in cancer cell proliferation is not established. Here, we show that depletion of TRRAP or its co-factor, histone acetyltransferase KAT5, inhibits HCC cell growth through induction of p53-independent and p21-independent senescence. Integrated cancer genomics analyses using patient data and RNA sequencing identified mitotic genes as key TRRAP/KAT5 targets in HCC, and subsequent cell cycle analyses revealed that TRRAP-depleted and KAT5-depleted cells are arrested at the G2/M phase. Depletion of topoisomerase II alpha (TOP2A), a mitotic gene and TRRAP/KAT5 target, was sufficient to recapitulate the senescent phenotype of TRRAP/KAT5 knockdown. Conclusion: Our results uncover a role for TRRAP/KAT5 in promoting HCC cell proliferation by activating mitotic genes. Targeting the TRRAP/KAT5 complex is a potential therapeutic strategy for HCC.
  • Mitochondrial DNA stress signaling protects the nuclear genome.
    Wu Z, Oeck S, West AP, Mangalhara KC, Sainz AG, Newman LE, Zhang XO, Wu L, Yan Q, Bosenberg M, Liu Y, Sulkowski PL, Tripple V, Kaech SM, Glazer PM, Shadel GS†
    Nat Metab. 2019, 1:1209-1218. DOI: 10.1038/s42255-019-0150-8
    Abstract: The mammalian genome comprises nuclear DNA (nDNA) derived from both parents and mitochondrial DNA (mtDNA) that is maternally inherited and encodes essential proteins required for oxidative phosphorylation. Thousands of copies of the circular mtDNA are present in most cell types that are packaged by TFAM into higher-order structures called nucleoids. Mitochondria are also platforms for antiviral signalling and, due to their bacterial origin, mtDNA and other mitochondrial components trigger innate immune responses and inflammatory pathology. We showed previously that instability and cytoplasmic release of mtDNA activates the cGAS-STING-TBK1 pathway resulting in interferon stimulated gene (ISG) expression that promotes antiviral immunity. Here, we find that persistent mtDNA stress is not associated with basally activated NF-κB signalling or interferon gene expression typical of an acute antiviral response. Instead, a specific subset of ISGs, that includes Parp9, remains activated by the unphosphorylated form of ISGF3 (U-ISGF3) that enhances nDNA damage and repair responses. In cultured primary fibroblasts and cancer cells, the chemotherapeutic drug doxorubicin causes mtDNA damage and release, which leads to cGAS-STING-dependent ISG activation. In addition, mtDNA stress in TFAM-deficient mouse melanoma cells produces tumours that are more resistant to doxorubicin in vivo. Finally, Tfam+/- mice exposed to ionizing radiation exhibit enhanced nDNA repair responses in spleen. Therefore, we propose that damage to and subsequent release of mtDNA elicits a protective signalling response that enhances nDNA repair in cells and tissues, suggesting mtDNA is a genotoxic stress sentinel.
  • Genome-wide analysis of polymerase III-transcribed Alu elements suggests cell-type-specific enhancer function.
    Zhang XO, Gingeras TR, Weng Z†
    Genome Res. 2019, 29:1402-1414. DOI: 10.1101/gr.249789.119
    Abstract: Alu elements are one of the most successful families of transposons in the human genome. A portion of Alu elements is transcribed by RNA Pol III, whereas the remaining ones are part of Pol II transcripts. Because Alu elements are highly repetitive, it has been difficult to identify the Pol III-transcribed elements and quantify their expression levels. In this study, we generated high-resolution, long-genomic-span RAMPAGE data in 155 biosamples all with matching RNA-seq data and built an atlas of 17,249 Pol III-transcribed Alu elements. We further performed an integrative analysis on the ChIP-seq data of 10 histone marks and hundreds of transcription factors, whole-genome bisulfite sequencing data, ChIA-PET data, and functional data in several biosamples, and our results revealed that although the human-specific Alu elements are transcriptionally repressed, the older, expressed Alu elements may be exapted by the human host to function as cell-type-specific enhancers for their nearby protein-coding genes.
  • The temporal landscape of recursive splicing during Pol II transcription elongation in human cells.
    Zhang XO, Fu Y, Mou H, Xue W, Weng Z†
    PLoS Genet. 2018, 14:e1007579. DOI: 10.1371/journal.pgen.1007579
    Abstract: Recursive splicing (RS) is an evolutionarily conserved process of removing long introns via multiple steps of splicing. It was first discovered in Drosophila and recently proven to occur also in humans. The detailed mechanism of recursive splicing is not well understood, in particular, whether it is kinetically coupled with transcription. To investigate the dynamic process that underlies recursive splicing, we systematically characterized 342 RS sites in three human cell types using published time-series data that monitored synchronized Pol II elongation and nascent RNA production with 4-thiouridine labeling. We found that half of the RS events occurred post-transcriptionally with long delays. For at least 18-47% RS introns, we detected RS junction reads only after detecting canonical splicing junction reads, supporting the notion that these introns were removed by both recursive splicing and canonical splicing. Furthermore, the choice of which splicing mechanism was used showed cell type specificity. Our results suggest that recursive splicing supplements, rather than replaces, canonical splicing for removing long introns.
  • Bioconda: sustainable and comprehensive software distribution for the life sciences.
    Grüning B*, Dale R*, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J†, The Bioconda Team (Zhang XO is one co-author of The Bioconda Team)
    Nat Methods. 2018, 15:475-476. DOI: 10.1038/s41592-018-0046-7
  • Inhibition of protein arginine methyltransferase 5 enhances hepatic mitochondrial biogenesis.
    Huang L, Liu J, Zhang XO, Sibley K, Najjar SM, Lee MM†, Wu JQ†
    J Biol Chem. 2018, 17:jbc-RA118. DOI: 10.1074/jbc.RA118.002377
    Abstract: Protein arginine methyltransferase 5 (PRMT5) regulates gene expression either transcriptionally by symmetric dimethylation of arginine residues on histones H4R3, H3R8, and H2AR3 or at the posttranslational level by methylation of nonhistone target proteins. Although emerging evidence suggests that PRMT5 functions as an oncogene, its role in metabolic diseases is not well-defined. We investigated the role of PRMT5 in promoting high-fat-induced hepatic steatosis. A high-fat diet up-regulated PRMT5 levels in the liver but not in other metabolically relevant tissues such as skeletal muscle or white and brown adipose tissue. This was associated with repression of master transcription regulators involved in mitochondrial biogenesis. In contrast, lentiviral short hairpin RNA-mediated reduction of PRMT5 significantly decreased phosphatidylinositol 3-kinase/AKT signaling in mouse AML12 liver cells. PRMT5 knockdown or knockout decreased basal AKT phosphorylation but boosted the expression of peroxisome proliferator-activated receptor α (PPARα) and PGC-1α with a concomitant increase in mitochondrial biogenesis. Moreover, by overexpressing an exogenous WT or enzyme-dead mutant PRMT5 or by inhibiting PRMT5 enzymatic activity with a small-molecule inhibitor, we demonstrated that the enzymatic activity of PRMT5 is required for regulation of PPARα and PGC-1α expression and mitochondrial biogenesis. Our results suggest that targeting PRMT5 may have therapeutic potential for the treatment of fatty liver.
  • Co-dependent assembly of Drosophila piRNA precursor complexes and piRNA cluster heterochromatin.
    Zhang G*, Tu S*, Yu T, Zhang XO, Parhad SS, Weng Z†, Theurkauf WE†
    Cell Rep. 2018, 24:3413-3422.e4. DOI: 10.1016/j.celrep.2018.08.081
    Abstract: In Drosophila, the piRNAs that guide germline transposon silencing are produced from heterochromatic clusters marked by the HP1 homolog Rhino. We show that Rhino promotes cluster transcript association with UAP56 and the THO complex, forming RNA-protein assemblies that are unique to piRNA precursors. UAP56 and THO are ubiquitous RNA-processing factors, and null alleles of uap56 and the THO subunit gene tho2 are lethal. However, uap56sz15 and mutations in the THO subunit genes thoc5 and thoc7 are viable but sterile and disrupt piRNA biogenesis. The uap56sz15 allele reduces UAP56 binding to THO, and the thoc5 and thoc7 mutations disrupt interactions among the remaining THO subunits and UAP56 binding to the core THO subunit Hpr1. These mutations also reduce Rhino binding to clusters and trigger Rhino binding to ectopic sites across the genome. Rhino thus promotes assembly of piRNA precursor complexes, and these complexes restrict Rhino at cluster heterochromatin.
  • CRISPR/Cas9-mediated genome editing induces exon skipping by alternative splicing or exon deletion.
    Mou H*, Smith JL*, Peng L, Yin H, Moore JE, Zhang XO, Song CQ, Sheel A, Wu Q, Ozata DM, Li Y, Anderson DG, Emerson CP, Sontheimer EJ, Moore MJ†, Weng Z†, Xue W†
    Genome Biol. 2017, 18:108. DOI: 10.1186/s13059-017-1237-8
    Abstract: CRISPR is widely used to disrupt gene function by inducing small insertions and deletions. Here, we show that some single-guide RNAs (sgRNAs) can induce exon skipping or large genomic deletions that delete exons. For example, CRISPR-mediated editing of β-catenin exon 3, which encodes an autoinhibitory domain, induces partial skipping of the in-frame exon and nuclear accumulation of β-catenin. A single sgRNA can induce small insertions or deletions that partially alter splicing or unexpected larger deletions that remove exons. Exon skipping adds to the unexpected outcomes that must be accounted for, and perhaps taken advantage of, in CRISPR experiments.
  • Diverse alternative back-splicing and alternative splicing landscape of circular RNAs.
    Zhang XO*, Dong R*, Zhang Y*, Zhang JL, Luo Z, Zhang J, Chen LL†, Yang L†
    Genome Res. 2016, 26:1277-1287. DOI: 10.1101/gr.202895.115
    Abstract: Circular RNAs (circRNAs) derived from back-spliced exons have been widely identified as being co-expressed with their linear counterparts. A single gene locus can produce multiple circRNAs through alternative back-splice site selection and/or alternative splice site selection; however, a detailed map of alternative back-splicing/splicing in circRNAs is lacking. Here, with the upgraded CIRCexplorer2 pipeline, we systematically annotated different types of alternative back-splicing and alternative splicing events in circRNAs from various cell lines. Compared with their linear cognate RNAs, circRNAs exhibited distinct patterns of alternative back-splicing and alternative splicing. Alternative back-splice site selection was correlated with the competition of putative RNA pairs across introns that bracket alternative back-splice sites. In addition, all four basic types of alternative splicing that have been identified in the (linear) mRNA process were found within circRNAs, and many exons were predominantly spliced in circRNAs. Unexpectedly, thousands of previously unannotated exons were detected in circRNAs from the examined cell lines. Although these novel exons had similar splice site strength, they were much less conserved than known exons in sequences. Finally, both alternative back-splicing and circRNA-predominant alternative splicing were highly diverse among the examined cell lines. All of the identified alternative back-splicing and alternative splicing in circRNAs are available in the CIRCpedia database ( Collectively, the annotation of alternative back-splicing and alternative splicing in circRNAs provides a valuable resource for depicting the complexity of circRNA biogenesis and for studying the potential functions of circRNAs in different cells.
  • CircRNA-derived pseudogenes.
    Dong R, Zhang XO, Zhang Y, Ma XK, Chen LL, Yang L†
    Cell Res. 2016, 26:747-750. DOI: 10.1038/cr.2016.42
  • ADAR1 is required for differentiation and neural induction by regulating microRNA processing in a catalytically independent manner.
    Chen T*, Xiang JF*, Zhu S*, Chen S, Yin QF, Zhang XO, Zhang J, Feng H, Dong R, Li XJ, Yang L†, Chen LL†
    Cell Res. 2015, 25:459-476. DOI: 10.1038/cr.2015.24
    Abstract: Adenosine deaminases acting on RNA (ADARs) are involved in adenosine-to-inosine RNA editing and are implicated in development and diseases. Here we observed that ADAR1 deficiency in human embryonic stem cells (hESCs) significantly affected hESC differentiation and neural induction with widespread changes in mRNA and miRNA expression, including upregulation of self-renewal-related miRNAs, such as miR302s. Global editing analyses revealed that ADAR1 editing activity contributes little to the altered miRNA/mRNA expression in ADAR1-deficient hESCs upon neural induction. Genome-wide iCLIP studies identified that ADAR1 binds directly to pri-miRNAs to interfere with miRNA processing by acting as an RNA-binding protein. Importantly, aberrant expression of miRNAs and phenotypes observed in ADAR1-depleted hESCs upon neural differentiation could be reversed by an enzymatically inactive ADAR1 mutant, but not by the RNA-binding-null ADAR1 mutant. These findings reveal that ADAR1, but not its editing activity, is critical for hESC differentiation and neural induction by regulating miRNA biogenesis via direct RNA interaction.
  • Gene expression profiling of non-polyadenylated RNA-seq across species.
    Genomics Data. 2014, 2:237-241. DOI: 10.1016/j.gdata.2014.07.005
    Abstract: Transcriptomes are dynamic and unique, with each cell type/tissue, developmental stage and species expressing a different repertoire of RNA transcripts. Most mRNAs and well-characterized long noncoding RNAs are shaped with a 5' cap and 3' poly(A) tail, thus conventional transcriptome analyses typically start with the enrichment of poly(A)+ RNAs by oligo(dT) selection, followed by deep sequencing approaches. However, accumulated lines of evidence suggest that many RNA transcripts are processed by alternative mechanisms without 3' poly(A) tails and, therefore, fail to be enriched by oligo(dT) purification and are absent following deep sequencing analyses. We have described an enrichment strategy to purify non-polyadenylated (poly(A)-/ribo-) RNAs from human total RNAs by removal of both poly(A)+ RNA transcripts and ribosomal RNAs, which led to the identification of many novel RNA transcripts with non-canonical 3' ends in human. Here, we describe the application of non-polyadenylated RNA-sequencing in rhesus monkey and mouse cell lines/tissue, and further profile the transcription of non-polyadenylated RNAs across species, providing new resources for non-polyadenylated RNA identification and comparison across species.
  • Species-specific alternative splicing leads to unique expression of sno-lncRNAs.
    Zhang XO*, Yin QF*, Wang HB, Zhang Y, Chen T, Zheng P, Lu X, Chen LL†, Yang L†
    BMC Genomics. 2014, 15:287. DOI: 10.1186/1471-2164-15-287
    Abstract: Intron-derived long noncoding RNAs with snoRNA ends (sno-lncRNAs) are highly expressed from the imprinted Prader-Willi syndrome (PWS) region on human chromosome 15. However, sno-lncRNAs from other regions of the human genome or from other genomes have not yet been documented. By exploring non-polyadenylated transcriptomes from human, rhesus and mouse, we have systematically annotated sno-lncRNAs expressed in all three species. In total, using available data from a limited set of cell lines, 19 sno-lncRNAs have been identified with tissue- and species-specific expression patterns. Although primary sequence analysis revealed that snoRNAs themselves are conserved from human to mouse, sno-lncRNAs are not. PWS region sno-lncRNAs are highly expressed in human and rhesus monkey, but are undetectable in mouse. Importantly, the absence of PWS region sno-lncRNAs in mouse suggested a possible reason why current mouse models fail to fully recapitulate pathological features of human PWS. In addition, a RPL13A region sno-lncRNA was specifically revealed in mouse embryonic stem cells, and its snoRNA ends were reported to influence lipid metabolism. Interestingly, the RPL13A region sno-lncRNA is barely detectable in human. We further demonstrated that the formation of sno-lncRNAs is often associated with alternative splicing of exons within their parent genes, and species-specific alternative splicing leads to unique expression pattern of sno-lncRNAs in different animals. Comparative transcriptomes of non-polyadenylated RNAs among human, rhesus and mouse revealed that the expression of sno-lncRNAs is species-specific and that their processing is closely linked to alternative splicing of their parent genes. This study thus further demonstrates a complex regulatory network of coding and noncoding parts of the mammalian genome.
  • Identification of multipotent mammary stem cells by protein C receptor expression.
    Wang D*, Cai C*, Dong X, Yu QC, Zhang XO, Yang L, Zeng YA†
    Nature. 2015, 517:81-84. DOI: 10.1038/nature13851
    Abstract: The mammary gland is composed of multiple types of epithelial cells, which are generated by mammary stem cells (MaSCs) residing at the top of the hierarchy. However, the existence of these multipotent MaSCs remains controversial and the nature of such cells is unknown. Here we demonstrate that protein C receptor (Procr), a novel Wnt target in the mammary gland, marks a unique population of multipotent mouse MaSCs. Procr-positive cells localize to the basal layer, exhibit epithelial-to-mesenchymal transition characteristics, and express low levels of basal keratins. Procr-expressing cells have a high regenerative capacity in transplantation assays and differentiate into all lineages of the mammary epithelium by lineage tracing. These results define a novel multipotent mammary stem cell population that could be important in the initiation of breast cancer.
  • Complementary sequence-mediated exon circularization.
    Zhang XO*, Wang HB*, Zhang Y, Lu X, Chen LL†, Yang L†
    Cell. 2014, 159:134-147. DOI: 10.1016/j.cell.2014.09.001
    Abstract: Exon circularization has been identified from many loci in mammals, but the detailed mechanism of its biogenesis has remained elusive. By using genome-wide approaches and circular RNA recapitulation, we demonstrate that exon circularization is dependent on flanking intronic complementary sequences. Such sequences and their distribution exhibit rapid evolutionary changes, showing that exon circularization is evolutionarily dynamic. Strikingly, exon circularization efficiency can be regulated by competition between RNA pairing across flanking introns or within individual introns. Importantly, alternative formation of inverted repeated Alu pairs and the competition between them can lead to alternative circularization, resulting in multiple circular RNA transcripts produced from a single gene. Collectively, exon circularization mediated by complementary sequences in human introns and the potential to generate alternative circularization products extend the complexity of mammalian posttranscriptional regulation.
  • Human colorectal cancer-specific CCAT1-L lncRNA regulates long-range chromatin interactions at the MYC locus.
    Xiang JF, Yin QF, Chen T, Zhang Y, Zhang XO, Wu Z, Zhang S, Wang HB, Ge J, Lu X, Yang L, Chen LL†
    Cell Res. 2014, 24:513-531. DOI: 10.1038/cr.2014.35
    Abstract: The human 8q24 gene desert contains multiple enhancers that form tissue-specific long-range chromatin loops with the MYC oncogene, but how chromatin looping at the MYC locus is regulated remains poorly understood. Here we demonstrate that a long noncoding RNA (lncRNA), CCAT1-L, is transcribed specifically in human colorectal cancers from a locus 515 kb upstream of MYC. This lncRNA plays a role in MYC transcriptional regulation and promotes long-range chromatin looping. Importantly, the CCAT1-L locus is located within a strong super-enhancer and is spatially close to MYC. Knockdown of CCAT1-L reduced long-range interactions between the MYC promoter and its enhancers. In addition, CCAT1-L interacts with CTCF and modulates chromatin conformation at these loop regions. These results reveal an important role of a previously unannotated lncRNA in gene regulation at the MYC locus.
  • Circular intronic long noncoding RNAs.
    Zhang Y*, Zhang XO*, Chen T, Xiang JF, Yin QF, Xing YH, Zhu S, Yang L†, Chen LL†.
    Mol Cell. 2013, 51:792-806. DOI: 10.1016/j.molcel.2013.08.017
    Abstract: We describe the identification and characterization of circular intronic long noncoding RNAs in human cells, which accumulate owing to a failure in debranching. The formation of such circular intronic RNAs (ciRNAs) can be recapitulated using expression vectors, and their processing depends on a consensus motif containing a 7 nt GU-rich element near the 5' splice site and an 11 nt C-rich element close to the branchpoint site. In addition, we show that ciRNAs are abundant in the nucleus and have little enrichment for microRNA target sites. Importantly, knockdown of ciRNAs led to the reduced expression of their parent genes. One abundant such RNA, ci-ankrd52, largely accumulates to its sites of transcription, associates with elongation Pol II machinery, and acts as a positive regulator of Pol II transcription. This study thus suggests a cis-regulatory role of noncoding intronic transcripts on their parent coding genes.
  • Panning for long noncoding RNAs.
    Zhu S, Zhang XO, Yang L†.
    Biomolecules. 2013, 3:226-241. DOI: 10.3390/biom3010226
    Abstract: The recent advent of high-throughput approaches has revealed widespread transcription of the human genome, leading to a new appreciation of transcription regulation, especially from noncoding regions. Distinct from most coding and small noncoding RNAs, long noncoding RNAs (lncRNAs) are generally expressed at low levels, are less conserved and lack protein-coding capacity. These intrinsic features of lncRNAs have not only hampered their full annotation in the past several years, but have also generated controversy concerning whether many or most of these lncRNAs are simply the result of transcriptional noise. Here, we assess these intrinsic features that have challenged lncRNA discovery and further summarize recent progress in lncRNA discovery with integrated methodologies, from which new lessons and insights can be derived to achieve better characterization of lncRNA expression regulation. Full annotation of lncRNA repertoires and the implications of such annotation will provide a fundamental basis for comprehensive understanding of pervasive functions of lncRNAs in biological regulation.