AI4Research - Resources

Resources

Background Literature

Related Tutorial

Scientific Text Mining and Knowledge Graphs. Meng et al., 2020
New Frontiers of Information Extraction. Chen et al., 2022
Retrieval-based Language Models and Applications. Asai et al., 2023
Machine Learning for Theorem Proving. First et al., 2023
Language + Molecules. Edwards et al., 2024
Towards a Human-Computer Collaborative Scientific Paper Lifecycle . Wang et al., 2024

Scientific Language Model

SCIBERT: A Pretrained Language Model for Scientific Text. Beltagy et al., 2019
SciFive: a text-to-text transformer model for biomedical literature. Phan et al., 2021
BioReader: a Retrieval-Enhanced Text-to-Text Transformer for Biomedical Literature. Frisoni et al., 2022
LinkBERT: Pretraining Language Models with Document Links. Yasunaga et al., 2022
Galactica: A Large Language Model for Science. Taylor et al., 2022
Translation between Molecules and Natural Language. Edwards et al., 2022
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking. Huang et al., 2022
The Diminishing Returns of Masked Language Models to Science. Hong et al., 2023
GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule Zero-Shot Learning. Zhao et al., 2023
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. Scao et al., 2023
SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions. Horawalavithana et al., 2023
Unifying Molecular and Textual Representations via Multi-task Language Modelling. Christofidellis et al., 2023
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models. Chen et al., 2023
SOCIALITE-LLAMA: An Instruction-Tuned Model for Social Scientific Tasks. Dey et al., 2024
BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning. Pei et al., 2024

Scientific Information Extraction

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction. Luan et al., 2018
Identification of Tasks, Datasets, Evaluation Metrics, and Numeric Scores for Scientific Leaderboards Construction. Hou et al., 2019
SCIREX: A Challenge Dataset for Document-Level Information Extraction. Jain et al., 2020
Cross-lingual Unified Medical Language System entity linking in online health communities. Bitton et al, 2020
Fine-grained Information Extraction from Biomedical Literature based on Knowledge-enriched Abstract Meaning Representation. Zhang et al., 2021
Extracting Material Property Measurement Data from Scientific Articles. Panapitiya et al., 2021
CitationIE: Leveraging the Citation Graph for Scientific Information Extraction. Viswanathan et al., 2021
SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts. Cattan et al., 2021
Extracting Fine-Grained Knowledge Graphs of Scientific Claims: Dataset and Transformer-Based Results. Magnusson et al., 2021
VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups. Shen et al., 2022
ReSel: N-ary Relation Extraction from Scientific Text and Tables by Learning to Retrieve and Select. Zhuang et al., 2022
MatSci-NLP: Evaluating Scientific Language Models on Materials Science Language Tasks Using Text-to-Schema Modeling. Song et al., 2023
Iterative Document-level Information Extraction via Imitation Learning. Chen et al., 2023
Scim: Intelligent Skimming Support for Scientific Papers. Fok et al., 2023
S2abEL: A Dataset for Entity Linking from Scientific Tables. Lou et al., 2023
ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision. Zhong et al., 2023
PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents. Lo et al., 2023
Can NLI Provide Proper Indirect Supervision for Low-resource Biomedical Relation Extraction? Xu et al., 2023
DISCOMAT: Distantly Supervised Composition Extraction from Tables in Materials Science Articles. Gupta et al., 2023
REACTION MINER: An Integrated System for Chemical Reaction Extraction from Textual Data. Zhong et al., 2023
GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule Zero-Shot Learning. Zhao et al., 2023
KEBLM: Knowledge-Enhanced Biomedical Language Models. Lai et al., 2023
A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-Domain Evaluation Framework for Academic Documents. Meuschke et al., 2023
Controllable Contrastive Generation for Multilingual Biomedical Entity Linking. Zhu et al., 2024
ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction. Jin et al., 2024
Chem-FINESE: Validating Fine-Grained Few-shot Entity Extraction through Text Reconstruction. Wang et al., 2024
Large Language Models for Scientific Information Extraction: An Empirical Study for Virology. Shamsabadi et al., 2024

Scientific Information Retrieval

LitSense: making sense of biomedical literature at sentence level. Allot et al., 2019
CORD-19: The COVID-19 Open Research Dataset. Wang et al., 2020
EVIDENCEMINER: Textual Evidence Discovery for Life Sciences. Wang et al., 2020
Scientific Discourse Tagging for Evidence Extraction. Li et al., 2021
ESRA: Explainable Scientific Research Assistant. Hongwimol et al., 2021
Explaining Relationships Between Scientific Documents. Luu et al., 2021
Neural Extractive Search. Ravgogel et al., 2021
Text2Mol: Cross-Modal Molecule Retrieval with Natural Language Queries. Edwards, et al., 2021
Dataset Construction for Scientific-Document Writing Support by Extracting Related Work Section and Citations from PDF Papers. Kobayashi et al., 2022
DataFinder: Scientific Dataset Recommendation from Natural Language Descriptions. Viswanathan et al., 2023
Predictive Chemistry Augmented with Text Retrieval. Qian et al., 2023
PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge. Wei et al., 2024
PubMed and beyond: biomedical literature search in the age of artificial intelligence. Jin et al., 2024

Hypothesis Generation

Network-based prediction of protein interactions. Kovácset al., 2019
Predicting research trends with semantic and neural networks with an application in quantum physics. Krenn & Zeilinger, 2019
SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search. Hope et al., 2020
Drug repurposing for COVID-19 via knowledge graph completion. Zhang et al., 2021
COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation. Wang et al., 2021
Degree-based Feature Is All You Need: Science4Cast Report. Aghajohari et al., 2021
Forecasting the future of artificial intelligence with machine learning-based link prediction in an exponentially growing knowledge network. Krenn et al., 2023
Goal Driven Discovery of Distributional Differences via Language Descriptions. Zhong et al., 2023
Large Language Models are Zero Shot Hypothesis Proposers. Qi et al., 2023
MyCrunchGPT: A chatGPT assisted framework for scientific machine learning. Kumar et al., 2023
Conversational Drug Editing Using Retrieval and Domain Feedback. Liu et al., 2023
Drugassist: A large language model for molecule optimization. Ye et al., 2023
SciMON: Scientific Inspiration Machines Optimized for Novelty. Wang et al., 2024
Forecasting high-impact research topics via machine learning on evolving knowledge graphs. Gu et al., 2024
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models. Baek et al., 2024
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers. Si et al., 2024

LLMs as Experimental Agents

PyLabRobot: An Open-Source, Hardware Agnostic Interface for Liquid-Handling Robots and Accessories. Wierenga et al., 2023
Autonomous chemical research with large language models. Boiko et al., 2023
Scaling deep learning for materials discovery. Merchant et al., 2023
An autonomous laboratory for the accelerated synthesis of novel materials. Szymanski, et al., 2023
Do Large Language Models Understand Chemistry? A Conversation with ChatGPT. Monteiro, et al., 2023
Augmenting large language models with chemistry tools. Bran, et al., 2024
Self-Driving Laboratories for Chemistry and Materials. Tom, et al., 2024

Paper Draft Generation

Automatic Generation of Related Work Sections in Scientific Papers: An Optimization Approach. Hu et al., 2014
Text Generation from Knowledge Graphs with Graph Transformers. Koncel-Kedziorski et al., 2019
PaperRobot: Incremental Draft Generation of Scientific Ideas. Wang et al., 2019
Automatic Generation of Citation Texts in Scholarly Papers: A Pilot Study. Xing et al., 2020
Multi-XScience: A Large-scale Dataset for Extreme Multi-document. Lu et al., 2020
BACO: A Background Knowledge- and Content-Based Framework for Citing Sentence Generation. Ge et al., 2021
AutoCite: Multi-Modal Representation Fusion for Contextual Citation Generation. Wang et al., 2021
SciGen: a Dataset for Reasoning-Aware Text Generation from Scientific Tables. Moosavi et al., 2021
SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation. Chen et al., 2021
SCICAP: Generating Captions for Scientific Figures. Hsu et al., 2021
Automatic Related Work Generation: A Meta Study. Li et al., 2022
Generating Scientific Definitions with Controllable Complexity. August et al., 2022
Generating Scientific Claims for Zero-Shot Scientific Fact Checking Wright et al., 2022
CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities. Lee et al., 2022
SCILIT: A Platform for Joint Scientific Literature Discovery, Summarization and Citation Generation. Gu et al., 2023
CiteBench: A Benchmark for Scientific Citation Text Generation. Funkquist et al., 2023
Enabling Large Language Models to Generate Text with Citations. Gao et al., 2023
Evaluating Unsupervised Argument Aligners via Generation of Conclusions of Structured Scientific Abstracts. Gao et al., 2024

Review Generation

A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications. Kang et al., 2018
ReviewRobot: Explainable Paper Review Generation based on Knowledge Synthesis. Wang et al., 2020
KID-Review: Knowledge-Guided Scientific Review Generation with Oracle Pre-training. Yuan et al., 2022
Exploiting Labeled and Unlabeled Data via Transformer Fine-tuning for Peer-Review Score Prediction. Muangkammuen et al., 2022
NLPEER: A Unified Resource for the Computational Study of Peer Review. Dycke et al., 2023
ReviewerGPT? An Exploratory Study on Using Large Language Models for Paper Reviewing. Liu et al., 2023
Scientific Opinion Summarization: Meta-review Generation with Checklist-guided Iterative Introspection. Zeng et al., 2023
When Reviewers Lock Horn: Finding Disagreement in Scientific Peer Reviews. Kumar et al., 2023
CocoSciSum: A Scientific Summarization Toolkit with Compositional Controllability. Ding et al., 2023
Exploring Jiu-Jitsu Argumentation for Writing Peer Review Rebuttals. Purkayastha et al., 2023
MARG: Multi-Agent Review Generation for Scientific Papers. D'Arey et al., 2024

Fact-checking

Fact or Fiction: Verifying Scientific Claims. Wadden et al., 2020
Evidence-based Fact-Checking of Health-related Claims. Sarrouti et al., 2021
Abstract, Rationale, Stance: A Joint Model for Scientific Claim Verification. Zhang et al., 2021
Extracting Fine-Grained Knowledge Graphs of Scientific Claims: Dataset and Transformer-Based Results.Magnusson et al., 2021
MultiVerS: Improving scientific claim verification with weak supervision and full-document context. Wadden et al., 2022
Missing Counter-Evidence Renders NLP Fact-Checking Unrealistic for Misinformation. Glockner et al., 2022
Human and Technological Infrastructures of Fact-checking. Juneja et al., 2022
Check-COVID: Fact-Checking COVID-19 News Claims with Scientific Evidence. Wang et al., 2023
Characterizing and Verifying Scientific Claims: Qualitative Causal Structure is All You Need. Wu et al., 2023
Detecting Contradictory COVID-19 Drug Efficacy Claims from Biomedical Literature. Sosa et al., 2023
SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables. Lu et al., 2023
The student becomes the master: Outperforming GPT3 on Scientific Factual Error Correction. Ashok et al., 2023
The Intended Uses of Automated Fact-Checking Artefacts: Why, How and Who. Schlichtkrull et al., 2023
Comparing Knowledge Sources for Open-Domain Scientific Claim Verification. Vladika et al., 2024
What Makes Medical Claims (Un)Verifiable? Analyzing Entity and Relation Properties for Fact Verification. Wührl et al., 2024

Ethical Concerns and Potential Solutions

Data-driven predictions in the science of science. Clauset et al., 2017
Scientific sleuths spot dishonest ChatGPT use in papers. Conroy et al., 2023
Using AI to write scholarly publications. Hosseini et al., 2023
AI did not write this manuscript, or did it? Can we trick the AI text detector into generated texts? The potential future of ChatGPT and AI in Sports & Exercise Medicine manuscript generation. Anderson et al., 2023
Scientists used ChatGPT to generate an entire paper from scratch — but is it any good? Conroy et al., 2023
Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. Gao et al., 2023
Machine-Generated Text: A Comprehensive Survey of Threat Models and Detection Methods. Crothers et al., 2023
A Watermark for Large Language Models. Kirchenbauer et al., 2023
Preserving Privacy Through DeMemorization: An Unlearning Technique For Mitigating Memorization Risks In Language Models. Kassem et al., 2023
AI vs. Human -- Differentiation Analysis of Scientific Content Generation. Ma et al., 2023
Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. Manakul, et al. 2023
ERC. Foresight: Use and Impact of Artificial Intelligence in the Scientific Process. European Research Council, 2023
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models. Li et al., 2023
Scalable Extraction of Training Data from (Production) Language Models. Nasr et al., 2023
FLAME : Factuality-Aware Alignment for Large Language Models. Lin et al., 2024

Page updated

Google Sites

Report abuse