RNA Integrity Number (RIN)
By Benjamin Musrie, Medical Researcher. Sydney, Australia.
Focussing on the genomics side of ALS means crunching a lot of genomic data. Gene expression analysis always includes RNA samples. Determining the quality of RNA samples is essential to ensure that the resulting gene expression analysis is both accurate and precise. As a result, assessment of RNA integrity number (RIN) has become standard practice prior to any RNA sequencing study, as it was shown to be robust and reproducible when compared to other RNA integrity calculation algorithms such as the 28S:18S ribosomal subunit ratio (Imbeaud et al., 2005).
RNA degradation — which is what RIN measures — is a major source of variance in experiments, because when transcripts degrade, their measured expression will be altered. A prime example may be seen in the GLM column of Table 1 below, where they compare RNA from identical samples at different time points to the sample with the least degraded RNA (time point 0). As time progresses, more and more genes become “differentially expressed”, even though they should not. After 84 hours, 10,000 genes were “differentially expressed”, solely because of the effect of RNA degradation. They also show that by controlling for RIN in the linear model, this effect can largely be removed, indicating that RIN is a good measure of RNA degradation.
As RNA degradation is such a driver of variance in gene expression analysis, if you have samples with varying levels of RNA degradation, you can see how that is going to interfere with your results. For example, let’s assume we have two groups (ALS and CONTROL), each with a number of samples in them and each sample has a different RIN value. Each RIN value will alter the RNA expression level in each sample. Therefore, we will get two possible effects:
1. It can make it harder to find genes that are truly differentially expressed between ALS and CONTROL
2. It can result in false positives (genes that are not related to ALS suddenly become significantly expressed due to their degradation)
Therefore, we want to obtain high quality sequencing data which means using high-quality starting material. For RNA-seq data, this means using RNA that has a high RIN value to avoid the RNA degradation problem. However, if you are working with post-mortem samples, you do not have this luxury. The tissues you get may have already undergone RNA degradation to some degree. If we still want to have a large sample size, we will need to consider and account for RIN to ensure we do not obtain too many false positives/negatives.
RIN value is usually graded on a 10-point scale (10 = excellent). The old method, 28S:18S, relied on the assumption that the quality of ribosomal RNA (rRNA), which is a very stable molecule, is linearly reflective of mRNA quality, which in reality is far less stable and experiences higher turnover (Palmer and Prediger, 2017). Copois and colleagues compared the performance of different methods in assessing RNA quality. These methods consisted of 28S/18S ratio, RIN and their in-house RNA quality scale (RQS). They found that RIN and RQS have a similar capacity to detect reliable RNA samples whereas the 28S/18S ratio leads to a misleading categorisation (Copois et al., 2007). The RIN method is an improvement because it takes into account the whole RNA sample, not just the rRNA measurement.
One must remember that the sample preparation method has significant bearing on the overall RIN/sequence data quality relationship. Chen et al., showed that samples with RIN 9.4 still correlated >95% with the same RNA sample manually degraded to RIN of 4.5 if samples were prepared for sequencing using Ribozero or NuGen FFPE methods. It is also important to note that RNA sample degradation manifests disproportionately in transcripts that encode pseudogenes, short non-coding RNAs and transcripts with extended 3’ untranslated regions (Reiman, Laan, Rull and Sõber, 2017).
GenieUs has recently produced a large analysis based on differential gene expression and alternative splicing in the New York Genome Center’s (NYGC) post-mortem multi-tissue ALS RNA-sequencing data set. As aforementioned, in a post-mortem study, it is often impossible to guarantee that all samples are harvested from the donor’s body at the same time, and it can never be guaranteed that all RNA samples taken from human research participants will be of the same quality. This can have a major impact across the experimental cohort which in turn, can make it challenging to interpret the data. We decided to investigate and quantify the effect size of RIN on variability within the data set. We also quantified the effect of post-mortem interval (PMI) and sex on data variability within the cohort.
We found that a lot of the differences in the RNA expression profiles of the post-mortem samples was due to RIN (approximately 30% of DEGs were discovered due to RIN, PMI and sex) rather than the disease condition. This meant that our interpretation of the data would have been misguided if we didn’t account for RIN, PMI and sex.
An example may be seen in the Euler diagrams (Figure 1) below. Focusing on Spinal_Cord_Cervical specifically, if we do not adjust for RIN+PMI+Sex (unadjusted), there were 802 differentially expressed genes (DEGs) detected. When we do adjust for RIN+PMI+Sex, we actually detect 152 DEGs that were missed in the unadjusted analysis. This illustrates the presence of both false positive and false negative results that have emerged due to the effect of RIN, PMI and sex, with RIN being the largest effect. With this in mind, accounting for these variables in any analysis of RNA sequencing is strongly advised, as the effect size of these variables can be larger than you might expect!
Koppelkamm and colleagues looked at RNA integrity in post-mortem subjects to find out influencing parameters and implications on reverse transcription quantitative polymerase chain reaction (RT-qPCR) assays which is a molecular biology technique used to measure cellular mRNA levels. The influence of RNA integrity on the reliability of quantitative gene expression data was analysed by generating degradation profiles for three gene transcripts (brain, skeletal and cardiac muscle). They demonstrated that RT-qPCR performance is affected by impaired RNA integrity. Interestingly, they revealed significant RIN differences among the three tissue types with the brain having lower RNA integrities compared to skeletal and cardiac muscle. BMI was also shown to influence RNA integrity where samples originated from the deceased with a BMI > 25 were significantly lower compared to normal weight donors. When data was normalised by the researchers, it was found to partly lessen the effects caused by impaired RNA quality. They concluded that in post-mortem tissue with low RIN, the detection of large differences in gene expression may still be possible, whereas small expression differences are prone to misinterpretation due to the effects of degradation (Koppelkamm, Vennemann, Lutz-Bonengel, Fracasso and Vennemann).
Similarly, Opitz et al. wanted to evaluate the impact of RNA degradation on gene expression profiling. To investigate the effects, whole genome sequencing expression arrays were used. They used tumor biopsies from patients diagnosed with advanced rectal cancer. The mRNA was subjected to heat-induced degradation in a time-dependent manner. Expression profiling was then performed, and the data was analysed to assess the impacts. They found that the differences induced by RNA degradation were significantly outweighed by the biological difference between each patient. Only a small number of mRNA (0.6%) showed a significant effect due to degradation which were the short mRNAs with the probe binding side close to the 5’ end. They concluded that these types of mRNAs should be excluded from gene expression analysis when working with degraded mRNAs e.g., post-mortem samples (Opitz et al., 2010).
Furthermore, Shen and colleagues were assessing the effects that RIN values had on RNA-sequencing results. They wanted to find the threshold of RIN where the quality of the samples would have very little effect on the results. The RNAs were extracted from blood samples. Their results showed the samples with RIN values > 5.3 barely affected the quantitative results of RNA-seq, whereas samples < 5.3 recorded more variability (Shen et al., 2018).
To sum up, we can see how accounting for RIN is absolutely critical when performing gene expression analysis. RIN can be a highly significant influence on your data and can lead to detection of false positives and false negatives — DEGs that are not involved with the condition of interest. Even though it is an attractive solution to only use samples with high RIN values, this is largely futile, as it is impossible to control the degree degradation in samples that have already undergone degradation e.g., post-mortem samples. We encourage all researchers to account for the effects of RIN when analysing samples, as doing so will reduce the effect of RNA degradation on your analysis and variance in your experiments. We would also like to thank institutions like the New York Genome Centre that provide highly valuable samples for researchers worldwide that allow us to accelerate research.