13th INTERNATIONAL CONFERENCE ON DATA SCIENCE - CODS

ASSETOPSBENCH: Agentic AI Challenge

@ CODS 2025

Benchmarking autonomous agents for predictive maintenance, diagnosis, work orders, and root cause analysis — at scale.

The AI Challenge is organized as a part of conference - 13th INTERNATIONAL CONFERENCE ON DATA SCIENCE - CODS

Overview

This AI challenge invites participants to learn, design, develop, and evaluate autonomous AI agents that solve realistic industrial tasks across the full pipeline: 📡 Sensing → 🧠 Reasoning → ⚙️ Actuation. You’ll work with a curated set of scenarios rooted in Industry 4.0 applications such as predictive maintenance, fault diagnosis, work-order generation, and root-cause analysis. These tasks demand both strong single-agent intelligence and coordinated multi-agent behavior.

Participants will interpret heterogeneous data—including textual logs and multivariate time-series—and build modular pipelines where agents take on roles such as Work-Order Agent, Time-Series Foundation Model Agent, and Supervisor Agent. The benchmark encourages innovation in the agent development lifecycle, autonomous decision-making, modular reasoning, and collaborative problem-solving under realistic constraints, advancing the next generation of intelligent industrial systems.

Aligned with national priorities, the challenge targets operational inefficiencies and maintenance bottlenecks in the manufacturing sector. Using the provided datasets and tools, teams aim to improve predictive accuracy, increase operational uptime, and reduce costs—outcomes that can shape industry practice and set new benchmarks for broad adoption.

Why Participate?

🧠 Gain hands-on experience tackling real Agentic AI problems in collaboration with leading researchers.

🚀 Unlock opportunities for internships with top startups.

🌍 Make real-world impact—top solutions may be considered for deployment.

🎓 Get recognized—winners will attend the CODS conference with travel fellowships.

Competition Timeline

Below are the key milestones and dates for the Agentic AI Challenge. We recommend subscribing to our GitHub repository and Codabench Challenge for submission updates and the starter kit.

Sep 01 🌐 Website and dataset release

Nov 13 🏁Submission deadline

Dec 05 📣 Notification of winners

Dec 20🎖️ Award ceremony

All deadlines are 11:59 PM AoE (Anywhere on Earth).

Tasks and Scenarios (141+)

Predictive Maintenance

Analyze time‑series (e.g., compressor pressure), fetch documentation, suggest failure modes and next steps.

Fault Diagnosis

Fuse logs + sensor patterns, identify likely faults, and recommend inspection steps.

Task Allocation

Orchestration agent delegates RCA, repair planning, and auto‑creates work items.

Technician Assist

Guide field personnel through step‑by‑step procedures for a given root cause/task.

Asset Logging

Summarize diagnosis & maintenance, update asset logs with structured records.

Tracks

Track 1: Planning-Oriented Multi-Agent Orchestration

Challenges participants to design better prompts that transform complex multi-agent interactions into clear, structured DAG plans, ensuring effective sequencing, communication, and fallback handling.

Track 2: Execution-Oriented Dynamic Multi-Agent Workflow

Challenges participants to move beyond rigid sequential pipelines and design flexible, fault-tolerant workflows that enable parallelism, multi-agent collaboration, dynamic context sharing, and adaptive execution paths.

How to Participate

You can participate individually or as a team. Submissions must follow our track-specific formats. Resources, data, and examples are provided in the Starter Kit.

📁 Starter Kit

Includes dataset samples, baseline code, and submission guidelines. Get started with our GitHub repository.

📊 Submit on Codabench

Upload your outputs via Codabench — ranked live on a leaderboard with track filtering.

🧾 Submission Format

Participants submit their modified template files for both tracks, ensuring that only the designated TODO sections are edited and the workflow runs end-to-end to produce valid JSON outputs.

Team Formation Guidelines

📁 Team

Participants are allowed and encouraged to form teams.

There are no strict limits on the number of members in a team, but:
- Each participant can only join one team.
- Each team member must register with a separate Codabench account.

📊 Submission Policy for Teams

Only one registration per team is required. Individual members do not need to register separately.

- Specify:

✅ Team Name

✅ Team Leader (Contact Person)

✅ Email Address

✅ Codabench Username

The competition organizers will review and approve teams via email.
Once approved, the team will be officially registered for the competition.

📊 Team Registration Process

A team’s total number of submissions will be capped according to competition rules:
- Stage 1 (Public Phase):
  Teams must respect both the daily submission limit and the absolute submission limit.
- Current limit: A total of 50 submissions per team. These 50 submissions will be used to compute the public leaderboard.
- Based on these 50 evaluations, participants must select one best solution for consideration on the private leaderboard.
  - If no selection is made, the most recent submission with the highest score will be used by default.
Team’s total submissions = (number of submissions per day) × (days since competition started), capped at 50.
⚠️ Submissions from all team members are pooled together and counted toward this total.

Evaluation Criteria

🏆 Evaluation Configuration

Model: All submissions must use the fixed model: LLaMA-3-70B.

Submissions will be evaluated on the following metrics:

✅ Task Accomplishment – Public Leaderboard

✅ T-match – Semantic Measure – Published in Final Testing

🏭 Development & Evaluation Phases

Local Development (Warm-up)

Participants may run their solutions on 2–3 scenarios for local testing and debugging.

Phase 1: Leaderboard Evaluation (Agent Development)

The public leaderboard will be based on 10 selected scenarios drawn from the existing pool of 141 scenarios.
Performance on these scenarios will determine Phase 1 rankings.

Phase 2: Generalization Test (Final Testing)

After Phase 1, participants will be asked to submit a finalized solution.
This solution will then be evaluated on 10 new scenarios from an entirely different set of datasets (outside the original 141) and representing different asset classes.

🏁 Final Leaderboard

The final leaderboard will be determined by the weighted average of task accomplishment scores across both phases for the same solution.
Each participant’s performance in Phase 1 and Phase 2 will be measured separately.
The final score is computed as a weighted average of the two phases.
We will also have Human evaluation for 1-10 entries from the leaderboard to finalize the Phase 2 winner

🔬 Post-Competition Evaluation (optional)

Top-performing agents may be validated in real industrial settings or high-fidelity simulations.
We plan limited-scale trials with domain experts and compare agent-generated actions (e.g., maintenance scheduling, diagnostics) against expert workflows to assess practical effectiveness and safety.

Questions?

Questions? Contact us at CODS-2025 AI-Agent Challenge Team

Page updated

Google Sites

Report abuse