Service Monitoring
Monitor server side errors
-
High level design
- storage - time series database store metrics data
- such as current CPU usage or number of exceptions
- data collector service - fetch relevant data from database
- querying service - API that can query the time series storage to return the relevant information
- storage - time series database store metrics data
-
Pro
- Ensure constant monitoring to ensure the smooth operation of the system
-
Cons
- May be single point of failure
- need to process large amount of information constantly
- associated cost and mechanisms
metric
- can use a heat map to trouble shoot, check health of clusters with certain measures
Monitor client side errors
Prober - acts like the client, perform reachability and health checks to the web server
-
Agent: This is a prober embedded in the client application that sends the appropriate service reports about any failures.
-
Collector: This is a report collector independent of the primary service. It’s made independent to avoid the situations where client agents want to report an error to the failed service. We summarize errors reports from collectors and look for spikes in the errors graph to see client-side issues.