Aim oto surface problems quickly. Observability defined "how well internal states of a system can be inferred from knowledge of its external outputs"
Monitoring / telemetry: 3 pillars - metrics, logs, traces (for observability - only data not value)
"The problem today is that many teams try to monitor their systems for every possible failure. Modern distributed systems fail in innumerable and unpredictable ways. You can’t possibly monitor for every single failure mode. And you can’t predict what those failure modes might even be because many of the ones you’ve seen in the past are likely to never occur again in the future."