LNRLMI: Linear neighbour representation for predicting lncRNA‐miRNA interactions

Abstract LncRNA and miRNA are key molecules in mechanism of competing endogenous RNAs(ceRNA), and their interactions have been discovered with important roles in gene regulation. As supplementary to the identification of lncRNA‐miRNA interactions from CLIP‐seq experiments, in silico prediction can select the most potential candidates for experimental validation. Although developing computational tool for predicting lncRNA‐miRNA interaction is of great importance for deciphering the ceRNA mechanism, little effort has been made towards this direction. In this paper, we propose an approach based on linear neighbour representation to predict lncRNA‐miRNA interactions (LNRLMI). Specifically, we first constructed a bipartite network by combining the known interaction network and similarities based on expression profiles of lncRNAs and miRNAs. Based on such a data integration, linear neighbour representation method was introduced to construct a prediction model. To evaluate the prediction performance of the proposed model, k‐fold cross validations were implemented. As a result, LNRLMI yielded the average AUCs of 0.8475 ± 0.0032, 0.8960 ± 0.0015 and 0.9069 ± 0.0014 on 2‐fold, 5‐fold and 10‐fold cross validation, respectively. A series of comparison experiments with other methods were also conducted, and the results showed that our method was feasible and effective to predict lncRNA‐miRNA interactions via a combination of different types of useful side information. It is anticipated that LNRLMI could be a useful tool for predicting non‐coding RNA regulation network that lncRNA and miRNA are involved in.

one kind of small ncRNA (20-25nt), can inhibit translation of mRNA into proteins via mRNA degradation and repressing translation initiation. 7,8 LncRNAs with lengths of more than 200nt, a loosely classified group of RNA transcripts, can regulate gene expression and nuclear architecture by binding to protein partners via structural motifs as well as interacting with RNA and DNA via base pairing. 9 Although more and more lncRNAs have been found by computational prediction techniques, improved epigenomic technologies as well as deeper and more sensitive RNA sequencing, only a small number of lncRNAs, like HOTAIR, XIST and TERC, are well studied.
It is an urgent need to understand the functional roles and mechanisms of other types of lncRNAs. [9][10][11] It is reported that lncRNAs are associated with different kinds of biological molecules, forming a complex mechanism by which the expression of proteins is critically regulated. 12 However, identification of lncRNA-miRNA interactions based on CLIP-seq experiments are expensive and time-consuming for the data collection. 13 As supplement to biological experimental method, computational methods can combine with other useful information and learn the hidden pattern underlying the known lncRNA-miRNA interaction network.
They are of high efficiency to yield the most potential candidates for experimental validation and therefore attracting increasing attention in the field of non-coding RNA.
To unify the patterns of different types of non-coding RNA act in, Salmena et al proposed competing endogenous RNAs (ceRNA) mechanism where different non-coding RNAs compete for binding to miRNAs that usually repress target gene expression. 1, 14 More and more experimental and theoretical evidence support this hypothesis. [15][16][17] To annotate the biological functions of lncRNAs, many works have been done to investigate the correlation of expression level between lncRNAs and protein-coding genes with little consideration on lncRNA-miRNA interactions. [18][19][20][21] As the crosstalk between lncRNAs and miRNAs plays a significant role in the biological function, predicting lncRNA-miRNA interactions by using efficient approaches can contribute to annotating biological functions. 22 Recent studies show that lncRNAs and miRNAs are involved in the pathological processes of diverse human diseases. 9,23,24 Therefore, much effort has been made to systematically investigate the impacts of lncRNA-miRNA interactions. For instance, it is reported that in the vasculature, CERS1, NAT8L and LARP1 as downstream targets can be repressed by the overexpression of miRNAs (miR-4459, miR4488 and miR-3960) that bind to lncRNA TGFb2-OT1 that functions as ceRNA. 25 Xia et al reported that lncRNA-FER1L4 in gastric cancer competes for miR-106a-5p through the corresponding MREs and then regulates expression of CDKN1A, E2F1, HIPK3, IL-10, PAK7, PTEN, RB1, RUNX1 and VEGFA. 26 Du et al investigated into prostate cancer and revealed that, lncRNAs TUG1 and CTB-89H12.4, acting as miRNA sponges, can suppress tumour and regulate their phosphatase and tensin homolog (PTEN) expression. 27 Such understanding of regulation network constructed by lncRNAs and miRNAs in pathophysiology can pave the way for new biomarker discovery and therapeutic approaches. However, the number of the existing lncRNA-miRNA interactions identified by biological experiments is still limited in number.
To accelerate the identification processes of lncRNA-miRNA interactions, it is an urgent need to propose effective computational methods to find the most potential lncRNA-miRNA pairs as candidate based on the known interactions. 22,[28][29][30] Most existing computational prediction approaches for miRNA-target interactions are developed according to some common rules that mainly focus on the following four aspects: conservation, seed match, free energy and site accessibility. 9 Some prediction tools for miRNA-target interactions have been proposed. Most of them are based on the observation that the miRNA seed regions of mRNA generally have higher conservation than the non-seed regions. However, the basic assumption of these methods contradicts the fact that lncRNAs have prominently lower sequence conservation and faster evolution than mRNAs. 31,32 Some methods are based on the calculation of the free energy of the potential-binding sites are proposed to predict lncRNA-RNA interactions. 33 For instance, LncTar, a prediction tool for lncRNA-RNA interactions, evaluates the free energy joint structure of each RNA pair. 31 Although such sequence-based prediction approaches have been widely applied, they suffer from high falsepositive rates. 28 Most existing prediction approaches for miRNAtarget interactions are not effective for predicting lncRNA-miRNA interactions, because such approaches cannot incorporate current understanding of lncRNA-miRNA interactions.
Previous researches on miRNA-target threshold effects, small RNA (sRNA) regulation and protein-protein interaction (PPI) indicate that lncRNAs and miRNAs can interact with each other according to a titration mechanism. [34][35][36] This finding suggests the importance of expression levels of lncRNA and miRNA on their interaction pattern. In addition, previous study suggests that ceRNA crosstalk is closely related to indirect interactions, the number of MREs, relative abundance of ceRNAs and miRNAs and stoichiometry. 37,38 More and more studies on co-expressed gene indicate that associations established by multiple lncRNAs and particular miRNA clusters in a synergistic manner can regulate biological processes. 12,32 Increasing attention is drawn to predict the interactions be- In this work, considering that all lncRNA-miRNA interactions were positive, we proposed a computational method called LNRLMI to predict potential lncRNA-miRNA interactions. Specifically, it was based on a constructed lncRNA-miRNA bipartite network that was composed of similarities of lncRNAs and miRNAs and known lncRNA-miRNA interaction network. Based on such a constructed network, linear optimization, a semi-supervise model, was introduced to predict the new links of the known lncRNA-miRNA interaction network.
To validate the effectiveness of our proposed method, 2-fold cross validation, 5-fold cross validation and 10-fold cross validation were implemented to predict lncRNA-miRNA interactions on the dataset that was collected from the lncRNASNP database. 41 LNRLMI was compared with the state-of-the-art computational approaches such as EPLMI and INLMI that were initially developed for predicting lncRNA-miRNA interactions. Some classical algorithms, such as KATZ measure 42 and LFM, 43 were also imple-

| Data processing
To investigate into potential lncRNA-miRNA interactions, the lncR- tw) 46 , and 272 records of miRNAs in our dataset were collected.
Lnc-GFP method based on a coding-non-coding co-expression network was employed to predict probable biological functions, and 10 most of probable biological function are predicted as the functional annotations for lncRNAs. The third type of information is RNA sequence that is collected from the miRBase database (http://www.mirba se.org/) 47 and LNCipedia database (https ://lncip edia.org/) 10 .
These three types of biological information are widely used in bioinformatics researches. We considered these three types of side information are closely related, and therefore they collectively describe the relation of different types of lncRNA/miRNA with regards to their roles in the regulation network. (1)

| Linear neighbour representation method for predicting lncRNA-miRNA interactions
In this section, we propose a linear neighbour representation method for predicting lncRNA-miRNA interactions (see Figure 1). 49 Based on an assumption that lncRNAs with similar functions tend to interact with functionally similar miRNAs and vice versa, similarities of RNAs can be helpful information to reflect the correlation between RNAs. As it is reported that the interactions be- where M can also be treated as a weighted graph G(V, E, W) that V, E, W denote the vertices, edges and weights, respectively. Note that E and W are respectively related to LMN and similarity matrixes of LSM and MSM. The corresponding link m ij is defined as a weighted link form node i to node j. Here, a score matrix S is defined as follows: where C is a weight matrix. Specifically, denote an element s ij in S, and each element can be unfolded by a linear summation of contributions from node i's neighbours, as follows: where c kj is the contribution from node k to node j. In the score matrix S, the observed links are utilized to estimate the rationality of S, and the non-observed ones are undetermined and need to be predicted. According to self-consistence, the value of m ij has obviously positive correlation with the score of s ij , and S is closely related to M so the magnitude of C should be small. Thus, to obtain the matrix S, C can be simply obtained by solving optimization problem as follows: where parameter α is set to balance the two factors and || ⋅ || is defined as a certain matrix norm. To solve Eq. (6), the Frobenius norm is used and set with power 2. That is to optimize the minimum of the following formula: where the function || ⋅ || 2 F can be solve as ||A|| 2 F = Tr(A T A). Eq. (7) can be unfolded as follows: then take partial derivative of Q with respect to C as follows: Taking Eq.(9) as 0, the optimal solution of C can be obtain as follows: The flowchart of prediction process of LNRLMI where E is the identity matrix. The final score matrix S for link prediction can be solved as follows: Finally, the target prediction network is computed as LMN ′ in S.

| Performance evaluation using k-fold cross validation
To

| Evaluation on the effectiveness of using side information
Based on the assumption that lncRNAs with similar profile tend to interact with same miRNAs, we here implemented 5-fold cross validation on a bipartite network combining with expression-based similarities and known interaction network as well as a singlelayer network without any side information for 20 times, respectively. As a result, the highest AUC of 0.8884 and average AUC of 0.8838 ± 0.0017 were achieved without using side information while the highest AUC of 0.9009 was yielded by using expression profile-based similarity. The ROCs of best performance were also plotted in Figure 3.
From the results, the assumption was justified and the biological similarity as side information in prediction model could improve the performance.

| Comparison with different kinds of side information
In this sub-section, other kinds of bio-information were also investigated, such as nucleotide sequence information derived from highthroughput sequencing and biological functional information. Two types of similarity were constructed by using sequence data and biological functional data, respectively.
For the purpose of comparison with the performance achieved by using expression similarity of RNAs, we similarly employed 5-fold cross validation on using biological functional similarity and sequence similarity, respectively. As a result, the average AUCs of 0.8940 ± 0.0019 and 0.8970 ± 0.0017 were yielded by using function-based similarity (11) S = MC * F I G U R E 2 Performance results of LNRLMI by using 2-fold, 5fold, 10-fold cross validation F I G U R E 3 Performance results of LNRLMI by using bipartite network and single-layer network and sequence-based similarity, respectively. The best performance was achieved at AUC of 0.8980 (function-based similarity), 0.9007 (sequence-based similarity) and 0.9009 (expression profile-based similarity). By using expression profile-based similarity, it reached the lowest standard deviation that demonstrate the better stability. All the results were shown in Table 1. From all results, the performance among three types of similarities were close, which releases that the model might make full use of the side information.
The results yielded by respectively using three kinds of similarity showed that the side information was helpful to yield a better result.

| Comparison with different prediction methods
To evaluate the performance of our proposed method, we compared it with current state-of-the-art computational methods based on the same similarities of lncRNA and miRNA (see Table 2). KATZ measure, a graph-based computational method, is proposed to solve link prediction problem by computing similarities between nodes and is widely used in social network and biological network. As such prediction task can be tackled by using matrix completion method, LFM was implemented.
We also compared our proposed method with the-state-of-the-art methods such as EPLMI and INLMI that were previously developed for predicting lncRNA-miRNA interactions. As the first effective technique to predict potential links in the bipartite graph, EPLMI combined two outputs that were based on lncRNA and miRNA by using the two-way diffusion method. INLMI integrated expression profile-based similarity and sequence-based similarity and employed NFM method and twoway diffusion method to obtain the prediction results.

| Sensitivity to hyper-parameter
Our proposed method has one hyper-parameter α, where α can balance two factors in solving optimization problem. We studied the sensitivity of α by ranging it from 0.006 to 0.04 at an interval of 0.002. We tested the performance by implementing each experiment at different parameter α for 20 times. As a result, the highest AUC was yielded, when α was 0.018. To search the best parameter, α of 0.017 and 0.019 were also tested. From Figure 4, the best performance was achieved with α of 0.018. The box plot in Figure 4 shows that the distribution of AUC is on a bell-shape curve, which demonstrates that the proposed model could be easily optimized.
In addition, the prediction performance tends to be stable with AUCs of around 0.895 when α increases up to 0.014. Therefore, we consider the proposed model is robust to the setting of hyper-parameter, which is important for its application on various and large datasets.

| D ISCUSS I ON
In this work, we aimed to develop a robust method to investigate into the potential lncRNA-miRNA interaction from the current lim- We anticipate that LNRLMI can offer great insights into the mechanism of ceRNA regulation networks that lncRNA and miRNA are involved in. Different from traditional prediction tools that focus on binding sites, LNRLMI are only based on the network structure of lncRNA-miRNA interactions with node attributes. As we correspondingly defined the model as a semi-supervised one, to yield prediction results of lncRNA-miRNA interactions by using the bipartite network that combines the expression similarities of lncRNA and miRNA as well as the known lncRNA-miRNA interaction network.
Even though LNRLMI is effective and reliable as demonstrated by the experimental results, some of its limitations should be noted.
Imbalanced data amounts of sample number for different lncRNA/ miRNA might result in prediction-bias. Moreover, if lncRNA/miRNA are well studied further, better prediction results can be yielded owing to a more complete lncRNA-miRNA interaction network.

ACK N OWLED G EM ENTS
This work is supported in part by the National Natural Science

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflict of interest.

AUTH O R CO NTR I B UTI O N
LW conceived the project, developed the prediction method, de-

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available in [lncR-