Pervouchine Laboratory Projects

Dmitri Pervouchine Laboratory

skoltech

Researchers: Dmitry Svetlichnyy

PhD students: Timofei Ivanov

MSc students: Daria Romanovskaya, Alexander Tashkeev, Anton Krotov, Yaroslav Popov

My group works on RNA biosynthesis, processing, and function. We want to understand structural determinants of the RNA production and processing by integrating multiple orthogonal data sources that have become available with recent advances in next generation sequencing (NGS) technology. The capability of NGS increases every year with the invention of novel assays. Each assay gives a snapshot of the transcriptional activity at a certain angle (e.g. RAMPAGE is promoter activity profiling that also captures the first intron, while BRU-seq allows to uncouple RNA production and degradation). Ultimately, the big goal is to get a mechanistic picture of RNA production for each gene that will include spatial and temporal information as well as structural description of how and when do stochastic ensembles of macromolecules interact to produce a mature transcript.

1. Nuclear RNA processing networks

We integrate different data sources, including histone modifications, eCLIP, shRNA knockdown of splicing factors followed by RNA-seq, and splicing networks that are inferred from large panels of RNA-seq data in order to get a unified view of RNA processing landscape. These sources of information are to a large extent orthogonal: eCLIP represents RBP footprints, shRNA knockdown followed by RNA-seq highlights splicing events that are reactive to perturbation of splicing factors, while splicing networks represent correlations. We ask how do these pieces of evidence put together, where they coincide and where they don’t, and what we can say about RNA processing by using such integrative approach. Towards this end, we build statistical models of co- and post-transcriptional RNA processing using chromatin states and epigenetic data to identify ribonucleoproteins at the chromatin-RNA interface. These models are approached from several directions with different machine learning methods, including convolutional neural networks and random forests.

2. Association study of long-range RNA structure and pre-mRNA splicing.

This project is focused on conserved long-range intramolecular RNA-RNA interactions and their relation with long-range RNA processing events such as coordinated exon inclusion and exon skipping in distant parts of the gene. Previously, we found a strong evolutionarily conserved pattern of association between long-range intramolecular RNA structure and splicing. The continuation of this project is to test RNA-structure-based models of transcript rescue from premature cleavage and polyadenylation as well as capping and 5’-processing.

3. Evolution of regulatory RNA-structures that lead to mutually exclusive and mutually inclusive splicing patterns.

One class of regulatory RNA structures in eukaryotic genomes are associated with special types of alternative splicing events, e.g., mutually exclusive exons or mutually inclusive (array) exons. Mutually exclusive exons usually come as an array of 2+ exons, of which one and only one is included in the mRNA. Mutually inclusive exons are either included together or neither of them is included. Molecular mechanisms underlying mutually exclusive and mutually inclusive patterns often involve RNA structure. The origin of such events is likely related to genomic duplications which copy-and-paste different elements of RNA structure. The goal of this project is to describe the evolutionary mechanism which leads to the formation of mutually inclusive and mutually exclusive patterns.

4. Are all long non-coding RNAs actually not coding for proteins?

Long non-coding RNAs have been actively studied for the past 5 years as a possible “missing link” between genomic and phenotypic complexity. Recently, a novel class of genes called micropeptides, i.e. small proteins of <100 amino acids, has come to the fore.  Micropeptides are the products of translation of short open reading frames (sORFs) that reside in genes that are either unannotated or more likely mis-annotated as non-coding. Many of them are harboured by genes annotated as long non-coding RNAs. Micropeptides may act as signaling molecules, or bind to and modulate the function of protein complexes, and have been demonstrated to play roles in core cellular processes. The aim of this project is to identify by using multiple data sources functional micropeptides that reside in long non-coding RNAs.

5. The analysis of the functionality of annotated nonsense-mediated decay event.

The nonsense mediated decay (NMD) pathway has evolved to destroy eukaryotic transcripts with premature translation termination codons. However, it is quite often implicated in splicing-mediated regulation of gene expression. For instance, a splicing factor binds its own pre-mRNA to induce inclusion of a poisonous exons leading to premature translation termination and degradation by NMD. The aim of this project is to identify the regulatory potential of NMD in human genes as well as to find to what extent this mechanism is widespread in human genes.

6. In-house pipelines for NGS data analysis.

This is an initiative to build a set of in-house utilities (a bioinformatics toolbox) for the efficient analysis, storage, and processing of RNA-seq data. One part of it is the IPSA package (Integrative Pipeline for Splicing Analyses) that is currently under development within the framework of projects in the Center for Genomic Regulation in Barcelona. The aim is to extend it to a library of elementary operations (mapping to the reference, lift-over, data-type conversion, phasing allelic information etc) on standardized data types.

7. The role of metabolic genes in creating slow dynamic oscillatory patterns in rat entorhinal cortex.

In 2006 my colleagues from Leeds University (UK) and I have built a mathematical model of  slow-wave oscillation on rat neocortex in response to ischemic conditions  under the application of kainate. We found that slow-wave oscillation was dependent on metabolic genes and reacted to the blockade of ATP-sensitive potassium channels (Kir6.2). A similar pattern of oscillatory activity is known for pancreatic beta cell which also uses Kir6.2 protein to discharge insulin vesicles in response to glucose stimulation. Accordingly, the project is to compare the expression of gene networks in human pancreas and in the neocortex (specifically, entorhinal cortex) and to understand the common origin and evolution of action potential-evoking circuits in these two very different organs.