Case Study

Not too small to fail: AIOps detect service anomalies invisible to the human eye

Who

A UK-based mobile operator wanted to implement an AI solution to monitor web services and detect and flag deviations in their performance.

The operator is part of a larger group with operations in multiple countries. The operator’s IT environment includes several legacy integration frameworks, independent web services, secure access gateway to connect billing, customer relationship management (CRM), order management, prepaid systems, and the service delivery platform (DSP), among others.

What

The operator used an Elastic Search, Logstash, and Kibana (ELK) stack to monitor different middleware services based on their response times. This was proving tedious, time-consuming, and was prone to human error. The main challenge was in identifying services that had such a slight gradual increase in response time that problems only became obvious to the human eye after three to four days and cumulatively, could break the terms of service level agreements (SLAs) over a 24-day period.

This hard-to-detect decay of service affected other services and took a lot of human resources and time to fix. It underlined the limitations of humans’ ability to monitor the services effectively in an increasingly complex environment.

How

Torry Harris Integration Solutions (THIS) introduced its 4Sight machine learning solution to explore three scenarios, as shown below.

Scenario 1: Gradual increase in response time

4Sight monitors the response time of different services, looking for gradual increases within various operating parameters, such as specific periods of time or numbers of transactions. Where a gradual increase is identified, the solution interprets the trends and can predict when the service will breach its terms of SLA and generate alerts.

Scenario 2: Sudden spike in the response time

If there is a sudden increase or decrease in the response time (referred to as the configurable value) of any of the services, 4Sight sends an alert about the deviation in the service’s normal behavior.

Scenario 3: Sudden breach of SLA

If any of the services breach the SLA regarding response time, the monitoring solution flags the breach. For example, if the SLA stipulates 3 seconds for a “Offers” service as the fixed base value and the actual response time is 2 seconds, then the solution dynamically adapts and makes that shorter time the new base for processing and predictions. Other examples could be that an alarm is sent if there is a 10% increase or decrease in the dynamically derived base or when a response time breaches an SLA term.

Results:

The machine learning capabilities were introduced into the existing ELK environment without any disruption and now rapidly detect and send alerts about anomalies based on the response time of services.
Automatic monitoring of the different services runs 24x7, requiring minimal human interaction, thereby freeing up the workforce.
The solution is efficient and cost-effective: the Kafka cluster runs on a commonplace server with a 32 Gigabyte random access memory (RAM)that can process about 300 transactions per second.
Further, early detection of problems in any service allows a fast fix by developers, avoiding pressure on the server, which in turn prevents downtime.

Creating a solutions architecture

THIS used 4Sight to create an artificial neural network (ANN) to monitor end-to-end business services running on a Kafka system. An ANN comprises a collection of linked units or nodes, known as an artificial neuron because they are loosely modelled on how neurons work in a biological brain in that they can send a signal to other neurons. Once an artificial neuron receives and processes the signal, it can relay it to others to which it is connected.

The signal is a real number, which is an expression of a continuous quantity, such as to represent a distance along a line. The value of each neuron is computed non-linearly as the sum of its outputs.

The mobile operator worked by forwarding the Logstash output from the existing ELK set-up to the Kafka system which provides the response time and other details about each service to the neural network. Another option is to deploy a Kafka cluster to process different services in parallel to save time.

Previous Case Study

25% Faster Telco Integration With Event-Streaming Platform

Next Case Study

Enabling digital business transformation through integrated ecosystems.

Pioneering the digital ecosystem landscape for 25 years

Our global presence

Excellence milestones

What our clients say about us

Building brighter futures

Delivering value through collaboration

Digital Transformation World 2025

Unlocking Digital Marketplace platform success: Strategies for maximum ROI

Building Ecosystems through Open Business Platforms and Data-as-a-Service

Torry Harris powers Milvik’s digital insurance expansion with a scalable offshore operating model

From telco to fintech enabler - Torry Harris helps Tigo democratize credit access across Africa

A Middle Eastern Government Partners with Torry Harris to Transform Car Rental Oversight

AI-Augmented Software Delivery: Enhancing Speed, Security, and Scalability for Business Growth (Copilot Economy)

How Seamless Data Integration Drives Growth Despite Common Barriers

Connecting the Dots : Data, AI, and the Power of Unified Intelligence

Research Report: Building the Modern Enterprise – How GCCs Enable Deep-Tech Adoption at Scale

Scaling Smarter: How GCCs are redefining enterprise value

Transforming Customer Experience with GenAI: Guide to Building an Effective CX Strategy

Torry Harris Cited in 2025 Gartner® Market Guide for CSP B2B Digital Marketplace Solutions

Torry Harris recognized among leading Contenders in the ISG Provider Lens™ report on Global Capability Centers, 2025

Torry Harris Recognized as Leading Global Capability Center Vendor

Torry Harris Cited in 2025 Gartner Market Guide for CSP B2B Digital Marketplace Solutions

Torry Harris Integration Solutions Named Finalist in TM Forum Excellence Awards 2025 for Customer Experience Innovation

Torry Harris Wins Comparably’s 8th Annual Award for Best Engineering Teams 2025

Why is India the Preferred Destination for GCCs?

Revolutionizing E-Governance in the Middle East: How G2B2C and G2G2C Models are Shaping the Future

Overcoming hybrid cloud challenges: A guide to integrated data ecosystems

Expert insights & best practices

How Schneider Electric Cut Costs by 30% with Azure APIM Migration | Torry Harris Case Study

Women's Day Celebrations at Torry Harris | IWD 2025

Celebrating the Women of Torry Harris | International Women's Day (IWD) 2025

API protocols: the backbone of digital connectivity

Understanding the core type of APIs

Data as a Service (DaaS) | Complete Explanation