Resources

PPPredSS CodeDependencies Here is the list of packages required for running the Python code: pandas torch keras numpy tqdm gensim sklearn nltk spacy networkx en_core_web_sm How to Run Unzip sequences_labels.zip, word2vec_100_10_5.zip, propheno_scoms.zip, and propheno_masks.zip files located in the data directory. Set the data_path to the data directory in the main.py file. Run the main.py file using the following command: python -m main Jupyter Notebook The notebook for the code is also available in the root folder which can be used as an alternative to the main.py.

DeepPPPred DatasetThis dataset contains train, validation, and test sets which contains protein-phenotype relations.

Not seeing anything above? Reauthenticate

Dataset for Automated Medical TranscriptionWe generated this dataset to train a machine learning model for automatically generating psychiatric case notes from doctor-patient conversations. Since, we did have access to real doctor-patient conversations, we used transcripts from two different sources to generate audio recordings of enacted conversations between a doctor and a patient. We employed eight students who worked in pairs to generate these recordings. Six of the transcripts that we used to produce this recordings were hand-written by Cheryl Bristow and rest of the transcripts were adapted from Alexander Street which were generated from real doctor-patient conversations. Our study requires recording the doctor and the patient(s) in seperate channels which is the primary reason behind generating our own audio recordings of the conversations. We used Google Cloud Speech-To-Text API to transcribe the enacted recordings. These newly generated transcripts are auto-generated entirely using AI powered automatic speech recognition whereas the source transcripts are either hand-written or fine-tuned by human transcribers (transcripts from Alexander Street). We provided the generated transcripts back to the students and asked them to write case notes. The students worked independently using a software that we developed earlier for this purpose. The students had past experience of writing case notes and we let the students write case notes as they practiced without any training or instructions from us. NOTE: Audio recordings are not included in Zenodo due to large file size but they are available in the GitHub repository.

nalane/DeepACPpredDeep learning system for predicting anti-cancer peptides - nalane/DeepACPpred

Video: Mohammad Anani's presentation titled "BRret: Retrieval of Brain Research Related Literature" at the AMIA 2020 Virtual Informatics Summit on 03/25/2020 (starts from 3:43).

Data and code associated with RDoC Task 2019Please cite: M. Anani, N. Kazi, M. Kuntz, and I. Kahanda, RDoC Task at BioNLP-OST 2019, 2019 Empirical Methods in Natural Language Processing Conference and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP-2019), Hong Kong, 2019. Link to the paper: https://www.aclweb.org/anthology/D19-5729/

Data associated with Automated Biomedical Text Classification with Research Domain CriteriaEach file contains a set of abstracts for a given RDoC construct. Each file is named after one RDoC category. In each file, each line represents one PubMed abstract that belong to the RDoC category. Each line has a PubMeD ID and an abstract text separated by a tab. The dataset was created on August 2018.

Enhanced Kinase Dictionaries associated with KinDER: A Biocuration Tool for Extracting Kinase Knowledge from Biomedical LiteratureThis zip file contains Kinase dictionaries used for annotating documents with KinDER described in the following paper: Dopp, Daniel, Adam Morrone, and Indika Kahanda. "KinDER: A Biocuration Tool for Extracting Kinase Knowledge from Biomedical Literature." Proceedings of the BioCreative VI Workshop. 2017.

Data and software associated with PHENOstruct: Prediction of human phenotype ontology terms using heterogeneous data sourcesData and software associated with the paper: PHENOstruct: Prediction of human phenotype ontology terms using heterogeneous data sources

Google Sites

Report abuse