1 Preface

1.2 Acknowledgements

Wolfgang Huber and Matthias Hentze for mentoring, advice and discussion. Benjamin Lang and Gian Tartaglia for great help with functional analysis and benchmarking, as well as feedback on the vignette. Ina Huppertz for helpful feedback and language improvement on the vignette. Mike Love, Simon Anders, Bernd Klaus and Frederick Ziebell for comments and discussion.

2 Introduction

2.1 Introduction to eCLIP sequencing

RNA-binding proteins (RBPs) play a key role in the life-time of RNAs. They are involved in RNA synthesis, stability, degradation, transport and translation and add an important layer of regulation in the cell. Over 1,900 murine and over 1,400 human RBPs were detected in different high-throughput detection studies, many of them without known RNA-binding function (Hentze et al. 2018).

It is of great interest to detect an RBP’s binding sites to study the underlying mechanism of its regulatory potential. Individual nucleotide resolution crosslinking and immunoprecipitation (iCLIP) (König et al. 2010) and the further enhanced CLIP (eCLIP) protocol (Eric L. Van Nostrand et al. 2016) rely on UV crosslinking inducing covalent bonds of RNA and proteins in close proximity. When reverse transcribing the RNA fragment bound to the protein, a majority of the time the reverse transcriptase will terminate at the crosslink site. Although eCLIP introduces updates in chemistry, the use of a size-matched input (SMI) control sample is an essential addition to the protocol which can be also adapted to iCLIP or similar protocols.

Crosslink site trunction at reverse transcription

Figure 1: Crosslink site trunction at reverse transcription

In iCLIP and eCLIP, truncation events are extracted as one nucleotide position next to the cDNA fragment (aligned read). In the classical protocols real truncations cannot be distinguished from read-through reads or other reads coming from otherwise truncated reads, which might be caused by RNA modifications or the crosslinking sites of other proteins. This might be different for each individual proteins (and the remaining polypeptide of the digested protein). Other protocols like HITS-CLIP and PAR-CLIP (Hafner et al. 2010) rely exclusively on read-through events (although using other reverse transcriptases). While hybrid approaches exist, the technical difficulty of these protocols requires many optimizations steps, which makes them rather hard to combine.

Truncation sites (referred to as crosslink sites) are the neighboring position of the aligned read

Figure 2: Truncation sites (referred to as crosslink sites) are the neighboring position of the aligned read

In summary, iCLIP and eCLIP protocols provide count data for single-nucleotide positions which might be the result of many heuristic events. These are described in the next chapter.

2.2 The idea behind DEWSeq

2.2.1 Understanding the signal

2.2.1.1 Binding modes

Unlike transcription factors, RNA-binding proteins have many different binding modes (Hentze et al. 2018), some bind in a sequence specific manner, some have preference for structures (like stem-loops), some prefer to bind RNA modifications, others are mostly found at UTRs. A large portion of RBPs do not have a known RNA-binding domain and bind using disordered regions with unknown target preferences.