Discovering novel anti-cancer therapy targets through functional characterization of cancer- and tissue-specific splicing events
Alternative splicing is a source of transcript diversity across tissues and developmental stages, and it is believed to be one of the important molecular forces driving protein evolution. Yet, the functional impact of the majority of observed splicing variations remains elusive. Although recent cancer-related analyses proposed links between annotated transcriptomic patterns and changes in protein domain architecture, the majority of cancer-associated and tissue-specific splicing events still lack functional characterization. For the first time, we will undertake a systematic effort to look into the functional impact of alternative splicing on the structure and interactions of the corresponding expressed proteins. In collaboration with the group of Dr. Olga Kalinina (Max-Planck-Institut für Informatik, Saarbrucken), we approach this problem using protein structure modeling techniques. Although coverage of the human proteome with experimentally resolved structures is still low, homology modeling offers a powerful alternative, which allows to model most biologically relevant variants. We will study the effect of transcriptomic indels on protein stability and interactions with other molecules, which allows to directly predict their functional importance and suggest molecular mechanisms. This opens a new dimension for personalized cancer therapy.
The impact of single nucleotide variants disrupting RNA structure on alternative splicing in cancer (Svetlana Kalmykova).
The way pre-mRNAs are processed in eukaryotic cell strongly depends on its secondary structure and, among other elements, long-range intramolecular base pairings. We check the hypothesis that somatic mutations that are associated with tumors could affect secondary structure and result in abnormal RNA processing. We constructed a pipeline that integrates three data sources: alternative splicing, somatic mutations, and putative conserved long-range RNA structures. The latter data is obtained from whole-genome computational analysis of conserved complementary regions, extending pipelines such as IRBIS that were previously developed in our group. We continue a collaboration with Dr Olga Dontsova lab, particularly with Marina Kalinina and Dmitry Skvortsov, in order to validate some of the intronic RNA structures that impact alternative splicing.
Novel autoregulatory cases of alternative splicing coupled with nonsense-mediated mRNA decay (Anastasia Danchurova)
Nonsense-mediated decay (NMD) is a eukaryotic mRNA surveillance system that selectively degrades transcripts with premature termination codons (PTC). Many RNA-binding proteins (RBP) regulate their expression levels by a negative feedback loop, in which RBP binds its own pre-mRNA and causes alternative splicing to introduce a PTC. In a recent paper, we presented a bioinformatic framework to identify such autoregulatory feedback loops by combining eCLIP assays for a large panel of RBPs with the data on shRNA inactivation of NMD pathway, and shRNA-depletion of RBPs followed by RNA-seq. We show that RBPs frequently bind their own pre-mRNAs and respond prominently to NMD pathway disruption. Poison and essential exons, i.e., exons that trigger NMD when included in the mRNA or skipped, respectively, respond oppositely to the inactivation of NMD pathway and to the depletion of their host genes, which allows identification of novel autoregulatory mechanisms for a number of human RBPs. For example, SRSF7 binds its own pre-mRNA and facilitates the inclusion of two poison exons; SFPQ binding promotes switching to an alternative distal 3’-UTR that is targeted by NMD; RPS3 activates a poison 5’-splice site in its pre-mRNA that leads to a frame shift; U2AF1 binding activates one of its two mutually exclusive exons, leading to NMD; TBRG4 is regulated by cluster splicing of its two essential exons. Our results indicate that autoregulatory negative feedback loop of alternative splicing and NMD is a generic form of post-transcriptional control of gene expression. Currently, in collaboration with the lab of Dr. Zatsepin we are performing experimental validation of these predictions and in the future we will extend these results from auto-regulatory to cross-regulatory splicing networks.
Stem cell markers frequently associated with tumors (Artyom Baranovsky)
Growing body of evidence suggests the connection between tumor progression and reacquisition of stem cell traits. Expression of the genes related to these traits, which are dormant in the normal somatic cells, promotes progression of the disease and to a great extent enhances cancerous cells ability to survive and evade treatment. Recently, we assessed the expression of such genes across different tumors. While unable to unravel inter-tumor heterogeneity, we observed predominant trend of reactivation of stem-cell genes in tumors. Moreover, stem-cell gene expression showed highly embryonic origin-specific patterns. Currently, we are working on integration of data on alternative splicing into our analysis to further elucidate the effect of stem cell genes on tumor complexity. Another direction of future research is to integrate single-cell RNA-seq data from a variety of cancers to address the core of stem-cell genes in heterogeneous tumors.
Evolution and function of upstream open reading frames in primate genomes (Stepan Denisov)
The classic scanning model of translation of a eukaryotic gene implies that only the first start codon of a gene is used for translation initiation, and that all downstream potential start codons are simply ignored. A paradigm shift happened after the invention of ribosome profiling technology, which showed that about 50% of human protein-coding transcripts contain premature start codons in their 5’UTR. Emerging from this discovery is a new class of short open reading frames located within 5’UTR, functional importance and evolution of which are poorly understood. We study all upstream open reading frames (uORFs) having their start and the stop codon located within 5’UTR. We ask how they appear in evolution. Second, we ask how natural selection acts on newly arisen and old uORFs, and what properties affect the evolutionary selection acting on them. Many of uORFs represent non-functional or slightly-deleterious elements. Thus, we investigate how uORFs are related to molecular processes such as alternative splicing, nonsense- mediated decay and, of course, translation of the main coding sequence. Answering these questions will help understanding the involvement of uORFs in human pathogenic states, e.g., cancer.
RNA-binding proteins, eCLIP, and splicing code (Zoya Chervontseva)
Most eukaryotic genes are spliced, and most of the spliced genes produce multiple transcript isoforms. RNA-binding proteins (RBPs) are known to influence splicing of mRNAs. Several large data sets on RBPs binding assays and data on RBPs expression in various cell lines have been collected. Still, little is known about how binding of a particular RBP to a transcript affects its splicing. We use sequences of splice junction regions and the data on RBPs expression to predict the inclusion rate for each exon. We apply a convolutional neural network to extract features from the sequences, so we expect it to learn motifs of RBPs binding sites. Further layers of the neural network should provide us with higher-level logic of motif combinations. Additionally, we explore current high-throughput RBP binding assays such as enhanced cross-linking and immunoprecipitation (eCLIP) and functional perturbations of RBP expression levels to link the combinatorial motif analysis with protein-RNA interactions and gene expression networks. Ultimately, this line of research is targeting deciphering of ‘the splicing code’, a combination of RBPs binding and structural clues that guides the spliceosome. Machine learning methods are a good proxy for studying alternative splicing in wellness and disease.
Functional importance of tandem alternative splice sites (Aleksey Mironov)
It is well known that more than 90% of human genes are alternatively spliced, but the functional importance of alternative transcripts is still matter of debates. While on average 85% of total mRNA corresponds to one major isoform of a gene, proteomic studies identified only a small portion of mRNA isoforms that are translated to proteins, however this may be caused by low sensitivity of mass-spectrometry studies or other technical issues. On the other hand, recent advances in high-throughput sequencing have uncovered massive and heterogeneous repertoire of RNA isoforms produced by alternative splicing, which includes a large number of unannotated transcripts. Among them are specific splicing events that are characterized by tandem arrangement of alternative splice sites (TASS). We focus on TASS with moderate read support (minor splice sites) that are located in proximity (~30 nt) from well-supported (major) splice sites using a large panel of human RNA-seq experiments from Genotype Tissue Expression project (GTEx). Using bioinformatics, we assess whether minor TASS are functionally important. We use previously developed method to estimate selection acting on splice sites and to compare the selection on different groups of splice sites (major and minor TASS, etc). We plan to identify tissue-specific and individual-specific TASS, analyze the functional consequences of alternative splicing on the protein level using the structural proteomic data, and reveal the usage of TASS in gene regulatory pathways. Ultimately, this project aims at clarifying the contribution of noise to alternative splicing, identifying its evolutionary mechanisms, and discovering functional, previously unannotated transcript isoforms.
Single-cell splicing analysis (Alexey Samosyuk)
Recently developed single-cell protocols in combination with modern analytical approaches can improve a resolution of the bulk protocols from individual tissues to cell subpopulations that comprise these tissues. It is a useful tool to study transcriptional programs in tissue-specific and disease-associated rare cell types. Because of data sparsity, most current methods are focused on the analysis of transcriptional output at the gene level. The goal of this project is to extend previously developed tools for single-cell transcriptomics from gene expression to splicing analysis. Such a technique will be a great asset to the toolbox of methods for the analysis of single-cell data.
Evolution of mutually exclusive splicing patterns (Timofei Ivanov)
Mutually exclusive exons (MXE) represent a particular type of alternative splicing, in which one and only one exon from an array is included in the mature RNA. A number of genes with MXE do so by using a mechanism that depends on RNA structure. Transcripts of these genes contain multiple sites called selector sequences that are all complementary to a regulatory element called the docking site; only one of the competing base pairings can form at a time, which exposes one exon from the cluster to the spliceosome. MXE tend to have similar lengths and sequence content, which suggests that they originate through tandem genomic duplications. In a recent paper we have shown that pre-mRNAs of this class of exons have an increased capacity to fold into competing secondary structures, and proposed an evolutionary mechanism for the generation of such structures via duplications that affect not only exons, but also their adjacent introns with stem-loop structures. If one of the two arms of a stem-loop is duplicated, it will generate two selector sequences that compete for the same docking site, a pattern that is associated with MXE splicing. Currently, we study in the evolution of MXE in Drosophilidae family, using as a model the gene MRP1. The goal is to reconstruct the evolutionary history of MXE in this gene, and to build sufficient intuition to analyze the evolutionary history of other genes with MXE clusters.
The role of RNA secondary structure in the expression of chimeric non-collinear transcripts (Olga Vasyutkina)
Trans-splicing is a rare and poorly-understood form of RNA processing in eukaryotes where exons from two different RNA transcripts are ligated through splicing. In particular, trans-splicing events may happen between two transcripts within a locus, leading to transcripts with post-transcriptional exon shuffling (PTES) or rearrangement (RREO). Our hypothesis is that trans-splicing events are more likely to occur in the genomic regions where RNA-polymerase II slows down, so that different copies of the same transcripts have a chance to be located close to each other. The aim of this work is to assess the contribution of trans-splicing and formation of RNA secondary structure in the expression of transcripts with PTES events. Towards this end, we are planning to find chimeric non-collinear transcripts in different human cell lines using paired-end RNA-seq analysis. Then, we are planning to test if there is enrichment of these transcripts in the genome regions with strong signal of ChIP-Seq RNA-Pol II peaks or with conservative RNA secondary structures.