Genomic analysis can be utilised to identify differences between RNA populations in two conditions, both in production and abundance. This includes the identification of RNAs produced by multiple genomes within a biological system. For example, RNA produced by pathogens within a host. Or for our purposes, RNAs moving between the roots and shoots across a plant graft junction of partners with distinct genotypes. To locate and identify these RNAs genomic pipelines typically align reads consequential to each genome in the system. This comes with benefits and disadvantages. Here, we address the main disadvantages, the high levels of data noise and false positives. The mobileRNA package provides methods to pre-process, analyse and visualise the sRNA and mRNA populations based on the premise of mapping reads to all genotypes at the same time. This vignette explains the use of the package and demonstrates quick or advanced workflows.
In plants, systemic signalling is an elaborated molecular system which coordinates plant development, integrating and transmitting the information perceived from the environment to distant organs. An important role in long-distance signalling is played by small RNA molecules (sRNAs). The nucleotide length of a sRNA helps researchers identify the class of sRNA and predict its functionality. Micro-RNAs (miRNAs) are involved in directing translational repression and/or the cleavage of messenger RNAs (mRNAs). Whereas small interfering RNAs (siRNAs) are involved in the maintenance and de novo DNA methylation and account for the majority of sRNAs in plants. These endogenous sRNAs can be produced in a tissue and then transported systemically across the vascular system into recipient organs, where they can induce a molecular response and coordinate physiological changes. Similarly, messenger RNAs (mRNAs) can move across distances, and it’s thought they may translate into proteins which act as transcription factors in the recipient tissues.
Plant grafting can be utilised to create chimeric plant systems composed of two
genotypes, such as different species like tomato and eggplant, or plant
varieties or accessions. Grafting has been used as a method to study RNA
mobilomes and their impact on the phenotype. Yet, it is clear that there is no
standardised genomic approach for the analysis of sequencing data to identify an
RNA mobilome. Here we introduce the R package, mobileRNA, a recommended pipeline
and analysis workflow for the identification of a sRNAs/mRNA mobilome. In
addition, the flexibility supports standard RNA analysis between treatment and
control conditions. For example, to identify sRNA population changes due to the
application of a treatment such as cold/heat stress or exposure to a pest.
mobileRNA
ultimately assists in pre-processing and analysis including the
characterization of different populations, visualization of the results, and
supporting output for functional analysis.
As stated, this was developed for applications for plant grafting experimental analysis, however, we believe it could have further applications including the analysis of dual-host systems.
In grafted plants, when different genotypes are used as rootstock and scions, the sequence variation between the two genomes involved can be used to discriminate the origin of a sequenced RNA molecule. Therefore, if an RNA molecule sequenced from one of the grafted partners (scion or rootstock) has been found to match the genome of the other grafting partner, this could empirically demonstrate its movement across the graft junction.
Most available genomics approaches to implement this analysis are based on RNA sequencing, followed by alignment on a genotype of reference and post-alignment screening of genetic variants to identify molecules which have a better match for the genotype of the grafted partner. These methods have many limitations, which might include:
Here, to circumvent such problems we propose a method inspired by the RNAseq analysis of plant hybrids (Lopez-Gomollon 2022), including an alignment step performed simultaneously on both genomes involved. The rationale of this approach considers that alignment tools already implement an algorithm ideal for the identification of the best matches (according to set parameters) in a given genome reference, but they do not account for potential matches to DNA sequences which are not provided as reference. Therefore, the two genomes from all partners involved in the system are merged in a single FASTA file and used as a reference for the unique alignment. Ultimately, in a bid to supply the algorithm with as much information as possible to make the best possible predictions and placement of sequencing reads to each genome.
The summarised workflow is shown below (Figure 1) where it contains a core RNA analysis and a mobile sRNA/mRNA analysis. The core analysis represents the standard workflow for the identification of RNA populations which have been gained, lost or changed in abundance, for example, the sRNA population difference between treatment and condition samples, or similarly in a chimeric system, such as plant grafting, we might want to explore the native sRNA population from the sample tissue origin (i.e. leaf) which have been lost or gained or changes in sRNA abundance. While the mobile analysis represents the workflow for the identification of putative mobile sRNAs or mRNA in a plant graft system.
As input, the pipeline requires cleaned sRNA or mRNA sequencing reads in FASTQ
format, along with the genome assemblies which represent the genotypes in the
system. The diagram below illustrates the complete workflow using mobileRNA
,
including essential, optional, and plotting functions.