MongoDB-Specific role for SRE, DBA, and Architect

MongoDB daily administration tasks when using Ops Manager

Cluster health monitoring

Check cluster, replica set, and sharded cluster status in Ops Manager
Review alerts for node down, replication lag, election events, or disk pressure
Verify all agents (Automation, Monitoring, Backup) are running and healthy

Ops Manager use: Real-time dashboards, alerting, and topology view

Performance monitoring

Review key metrics: CPU, memory, disk I/O, network, connections
Analyze slow queries and query execution plans
Watch cache usage (WiredTiger cache) and eviction rates

Ops Manager use: Performance Advisor, Query Profiler, Metrics Explorer

Backup verification

Ensure scheduled backups completed successfully
Validate snapshot retention and storage usage
Perform periodic restore tests (to staging or test cluster)

Ops Manager use: Snapshot backups, point-in-time recovery, restore workflows

Automation & configuration management

Confirm desired state vs actual state (no drift)
Review recent automation changes or deployments
Safely apply config changes (storage, parameters, version upgrades)

Ops Manager use: Automation Agent, versioned configuration management

User & security management

Review database users and roles
Rotate credentials if required
Verify TLS, authentication, and authorization settings

Ops Manager use: Centralized user management, security configuration tracking

Capacity planning

Track data growth trends
Monitor disk utilization and index sizes
Plan scale-up or scale-out (add nodes or shards)

Ops Manager use: Historical metrics, capacity graphs

Alert management

Review triggered alerts and acknowledgments
Tune alert thresholds to avoid noise
Investigate recurring alerts and apply fixes

Index and schema optimization

Review index usage and unused indexes
Apply recommendations from Performance Advisor
Coordinate index builds (foreground vs background)

Ops Manager use: Index suggestions, impact analysis

Log review & troubleshooting

Check MongoDB logs for warnings or errors
Correlate logs with performance spikes or failures
Investigate issues like replication lag, step-downs, OOM events

Compliance & audit readiness

Review access logs and audit events
Ensure backup policies meet compliance requirements
Document operational changes

Sharding Components

Shard
A replica set that stores a subset of the data

Config servers (CSRS)
Store metadata about chunks and shard keys

mongos
Query router that directs client requests to the right shard(s)

Sharding troubleshooting scenarios

Uneven data distribution

Poor shard key
Balancer disabled
Jumbo chunks
Check:

db.collection.getShardDistribution()

MongoDB Ops Manager commands to SRE and DBA role expectations, written the way interview panels think about responsibility, I’ve kept it practical and production-oriented so you can confidently explain what you ran and why.

1. Monitoring & reliability (SRE core responsibility)

Expectation:
Ensure clusters are always healthy and issues are detected early.

Commands / actions

systemctl status mongodb-mms-automation-agent

systemctl status mongodb-mms-monitoring-agent

rs.status()

rs.printSlaveReplicationInfo()

What this proves

You understand monitoring dependencies (agents first)
You know how to detect replication lag and node failures

Interview phrasing

“From an SRE point of view, my priority is cluster availability and replication health, which I validate through Ops Manager alerts and replica set status.”

2. Incident response & troubleshooting (SRE heavy, DBA involved)

Expectation:
Respond quickly, identify root cause, and restore service.

Commands

db.currentOp({ "secs_running": { $gt: 5 } })

tail -f /var/log/mongodb/mongod.log

db.serverStatus().wiredTiger.cache

What this proves

You can correlate performance spikes with queries and system resources
You understand memory pressure and cache eviction issues

Interview phrasing

“During incidents, I correlate Ops Manager metrics with logs and current operations to identify whether the issue is query-related, memory pressure, or infrastructure.”

3. Performance tuning (DBA primary responsibility)

Expectation:
Ensure MongoDB runs efficiently at scale.

Commands

db.setProfilingLevel(1, { slowms: 100 })

db.getProfilingStatus()

db.collection.aggregate([{ $indexStats: {} }])

What this proves

You know how to identify slow queries safely
You don’t drop indexes blindly

Interview phrasing

“As a DBA, I rely on Ops Manager Performance Advisor and indexStats before making any schema or index changes.”

4. Backup & disaster recovery (SRE + DBA shared)

Expectation:
Data must be recoverable at any time.

Ops Manager API

GET /api/public/v1.0/groups/{GROUP-ID}/clusters/{CLUSTER-ID}/snapshots

POST /api/public/v1.0/groups/{GROUP-ID}/clusters/{CLUSTER-ID}/restoreJobs

What this proves

You understand RPO/RTO concepts
You test restores, not just backups

Interview phrasing

“We verify backups daily and periodically restore snapshots to staging to validate recoverability.”

5. Change management & automation (SRE mindset)

Expectation:
Make changes safely with zero or minimal downtime.

Actions

Rolling restarts via Ops Manager
Version upgrades using Automation Agent

Validation

db.version()

What this proves

You follow controlled change processes
You avoid manual restarts in production

Interview phrasing

“All production changes go through Ops Manager automation to avoid configuration drift and ensure safe rollouts.”

6. Security & access control (DBA ownership)

Expectation:
Ensure secure access without breaking applications.

Commands

db.getUsers()

db.runCommand({ connectionStatus: 1 })

What this proves

You understand authentication and authorization
You verify access impact before changes

Interview phrasing

“I regularly audit users and roles through Ops Manager and validate access at the database level.”

7. Capacity planning (SRE + DBA)

Expectation:
Prevent outages caused by resource exhaustion.

Commands

db.stats(1024*1024)

Ops Manager metrics:

Disk growth
Cache utilization
Connections

What this proves

You plan ahead, not react
You understand growth trends

Interview phrasing

“I use Ops Manager historical metrics to plan storage and scaling well before thresholds are hit.”

8. Automation & API usage (senior-level expectation)

Expectation:
Reduce manual work and support audits.

Example

GET /api/public/v1.0/groups/{GROUP-ID}/clusters

What this proves

You can integrate Ops Manager into scripts
You support compliance and reporting

Interview phrasing

“We use Ops Manager APIs for inventory, audit reports, and backup verification.”

Quick interviewer mapping table

Skill Area SRE DBA

Monitoring & Alerts ✅ Primary ✅ Support

Incident Response ✅ Primary ✅ Support

Performance Tuning ⚠️ Support ✅ Primary

Backup & DR ✅ ✅

Security ⚠️ ✅ Primary

Capacity Planning ✅ ✅

Automation ✅ Primary ⚠️

Page updated

Google Sites

Report abuse