Who:
A UK-based mobile operator
What:
The operator used the ELK stack to monitor middleware service response times, but the process was tedious, error-prone, and relied heavily on manual observation. Gradual performance degradation often went unnoticed for days, eventually breaching SLAs and impacting other services. This highlighted the limitations of human monitoring in a complex environment.
How:
Torry Harris Integration Solutions (THIS) introduced its 4Sight machine learning solution to detect gradual response time increases, sudden 10% fluctuations, and SLA breaches. By identifying these patterns early, 4Sight minimized manual monitoring and helps maintain SLA compliance.
Results:
- Cost-effective and efficient solution: The Kafka cluster ran on a commonplace server with 32 Gigabytes of RAM and was able to process about 300 transactions per second.
- The machine learning capabilities were introduced into the existing ELK environment without any disruption and rapidly detected and sent alerts about anomalies based on service response times.
- Automatic monitoring of the different services ran 24x7, requiring minimal human interaction and freeing up the workforce.
- Early detection of problems in any service allowed developers to implement fast fixes, reducing pressure on the server and preventing downtime.