Skip to content

AwesomeEngineering:Ep2 OpenTelemetry for seamless Observability

MLTP

bhanuwealthy/otel-demo

MLTP¶

Metrics¶

What is it?
Read World Similarity

Sample

metric_name{label="value", ..} measurement
...
histogram_name{label="value", ..} bucket

Logs¶

Who doesn't know it?
Collect it to centralised storage, so we can link to our Central Monitoring-UI like Grafana

Sample Data Model

log_id: "unique identifier for the log"
timestamp: "timestamp when the log was generated"
message: "content of the log"
severity: "level of severity of the log"
tags: "key-value pairs for additional information"

Traces¶

Tracing is a method used to monitor and understand the flow of a request through a distributed system.
Read World Example of a Trace
Microservice Example of a Trace - @graph.wealthy

Sample Data Model

trace_id: "unique identifier for the trace"
span_id: "unique identifier for the span"
parent_span_id: "id of the parent span"
start_time: "timestamp when the span started"
end_time: "timestamp when the span ended"
operation_name: "name of the operation"
tags: "key-value pairs for additional information"
logs: "events that occurred during the span"

Continuous Profiling¶

Continuous Profiling is like a doctor checking your health regularly, not just when you're sick.
It helps us understand how our system is performing over time, not just when there's a problem.

Sample Data Model

profile_id: "unique identifier for the profile"
start_time: "timestamp when the profiling started"
end_time: "timestamp when the profiling ended"
duration: "duration of the profiling"
cpu_time: "total CPU time used during the profiling"
memory_usage: "total memory used during the profiling"
disk_io: "total disk I/O during the profiling"
network_io: "total network I/O during the profiling"

Flamegraph 🔥¶

Flamegraph is a visualization tool that presents a graphical representation of the execution of a program.
It helps us understand the flow of execution and identify performance bottlenecks.
Google Chrome as example & Demo flame

Golden Signals 🚦¶

From SRE handbook¶

Golden Signals are like the health indicators of a system.
They help us understand if the system is working well or not.
There are four main golden signals:
- Latency: 🕒 How long it takes for the system to respond.
- Traffic: 🚦 How much data the system is handling.
- Errors: ❌ How many mistakes the system is making.
- Saturation: 🔄 How full the system is.