HPOLDA

Recently, many long non-coding RNAs (lncRNAs) have been identified and characterized their biological functions; however, understanding their underlying molecular mechanisms related to diseases is still limited. To overcome the limitation in experimentally identifying disease-lncRNA associations, computational methods have been proposed as a powerful tool to predict such associations. These methods are usually based on the similarities between diseases or lncRNAs since it was reported that similar diseases are associated with functionally similar lncRNAs. Therefore, prediction performance is highly dependent on how well the similarities can be captured. Previous studies usually produce the similarity between two diseases by exactly mapping each disease to one Disease Ontology (DO) term, then use a semantic similarity measure to calculate the similarity between them. However, the problem of this approach is that a disease can be described by more than one DO term. Until now there is no annotation database of DO terms for diseases except for genes. In contrast, Human Phenotype Ontology (HPO), a controlled vocabulary database, is designed to fully annotate human disease phenotypes. Therefore, in this study, we constructed disease similarity networks/matrices using HPO instead of DO. Then, we used these networks/matrices as inputs of two representative machine learning-based and network-based ranking algorithms, i.e., regularized least square and heterogeneous graph based inference, respectively. The results showed that the prediction performance of the two algorithms on HPO-based is better than on DO-based networks/matrices. In addition, our method can predict 21 novel cancer-associated lncRNAs, which are supported by evidences in the literatures.