Service Monitoring

Monitor server side errors

High level design
- storage - time series database store metrics data
  - such as current CPU usage or number of exceptions
- data collector service - fetch relevant data from database
- querying service - API that can query the time series storage to return the relevant information
Pro
- Ensure constant monitoring to ensure the smooth operation of the system
Cons
- May be single point of failure
- need to process large amount of information constantly
  - associated cost and mechanisms

metric

can use a heat map to trouble shoot, check health of clusters with certain measures

Prober - acts like the client, perform reachability and health checks to the web server
Pasted image 20240331142720.png

Agent: This is a prober embedded in the client application that sends the appropriate service reports about any failures.
Collector: This is a report collector independent of the primary service. It’s made independent to avoid the situations where client agents want to report an error to the failed service. We summarize errors reports from collectors and look for spikes in the errors graph to see client-side issues.