Abstract

Data integration brings data together from different systems to increase its value to any business. Without data integration, there is no way to access the data gathered in one system from another, or to combine data sets so they can be processed, analysed and acted on. Data also is the foundational enabler of end-to-end automation, a hallmark of all digital companies and a huge boost to customer experience, faster time to market, working within an ecosystem and operational efficiency, among other things.

Importantly, data integration helps to clean and validate the data that businesses rely on. Companies need their data to be robust, free of errors, duplication, and inconsistencies, which is a big ask when a company is running maybe hundreds of databases with the data often held in silos and incompatible formats.

It becomes even more of a challenge as the extended enterprise becomes increasingly common – that is companies using a platform-based ecosystem to provide products and solutions with partners in B2B and B2B2X business models. In this operational model, companies also need to offer their data to third parties in a secure, controlled and timely way.

Companies need a proper integration strategy to make their data more usable and relevant, avoiding pitfalls such as heavily customized integrations that fit today’s needs but ossify rapidly and become obstacles when flexibility and fluidity in data’s use are what’s required.

This white paper looks at what data integration is and the various tools and approaches that are available for future-proof data integration – such as APIs, data virtualization, containerization and microservices, cloudification, analytics and business intelligence, and working with a data integration specialist firm such as Torry Harris Integration Solutions (THIS). It concludes with critical success factors.

Introduction

One of the defining characteristics of a digital company is that it bases decisions on information and intelligence derived from data. Yet this apparently straightforward principle of using data to inform business and operational decisions is anything but simple to follow. This is because of the volume, variety, velocity and veracity aspects shown below. Also, the greatest value is derived from combining data sets to extract value and insights – hence data integration is a multi-faceted and complex discipline.

Volume

Companies generate immense amounts of data from sources including their networks, services and customers, as well as gather it from other sources like social media

Variety

The data comes in many formats, is often incomplete, siloed, incompatible, structured and unstructured.

Veracity

Data should be a ‘single source of truth’ – the cleansing needed to get it to that point is not necessarily simple.

Velocity

We need the speed that data is produced at rises all the time. The question is but how fast a firm can act on it in real time or close to it.

Value is derived from integrating data to gain insights, improve CX, efficiency, profitability & TTM

It is becoming more complex as the extended enterprise becomes increasingly common – that is companies using a platform-based ecosystem to provide products and solutions with partners in B2B and B2B2X business models. They need to expose their own data and consume data from others to interoperate successfully.

It’s worth noting that when data integrations fail, it is rarely solely due to technical issues so much as poor planning, strategy and execution – and sometimes tools can seem more of a hindrance than a help. The starting point always has to be for enterprises to have a clear vision of what they want to achieve through data integration, and resist the temptation of a shiny new piece of technology providing the reason to kick off a project.

After the vision, comes the strategy to make it happen, and other preparatory steps. For instance, the quality of data is critical – garbage in, garbage out applies here just like everywhere else. Companies need to think about data cleaning – removing records that are incomplete, duplicated, out of date and so on – before they start integration.

The data lifecycle

It is also crucial to see data integration as part of the wider transformation canvas. It cannot be viewed in isolation. That custom integration of two monolith systems might work really well now, but what happens when business needs change and the data must be exposed and combined with other systems’ or parties’ data sets? That customization could become a most inflexible barrier?

Never lose sight of the fact that data is the key enabler of automation – another hallmark of a digital company – and that goal is end-to-end automation, not isolated automated islands, to deliver operational efficiency, the best customer experience, shorter time to market, and increased profitability.

What is data integration?

Google describes data integration as the process of pulling data together from different sources to gain a unified and more valuable view of it, so that businesses can make better decisions faster. Data integration can consolidate all kinds of data – structured, unstructured, batch and streaming – for everything from basic querying of inventory databases to complex predictive analytics.

According to Gartner Research, data integration involves a whole series of practices, architectures, techniques and tools. In the first instance, this is to achieve consistent access to an enterprise’s many sources of data and then for the delivery of data to meet all the data consumption requirements of applications and processes. Naturally these applications and processes are prone to change over time as business and operational needs change.

Even accessing the data can be difficult because it is typically in many, often incompatible, formats and stored in siloes, many of which were not designed with sharing and combining data in mind.

Also, for a long time, vendors developed specific data integration tools for particular sectors, rather than generic ones. In the recent past, most effort has been put into extract, transform, load (ETL) tools. Other subsections of tools include for data replication and enterprise information integration (EII), with vendors optimizing tools for particular approaches to data integration. There are also tools for data quality, data modelling and adapters that can be applied to data integration.

The specific-sector approach resulted in a highly fragmented tool market for data integration which added to the complexity of integrating data in large organizations because they were forced to buy portfolios of tools from many vendors to assemble all the capabilities they required. Also, different teams use different tools, with little consistency but a lot of overlap and redundancy, and no common management of metadata.

This nightmarish situation has begun to improve as enterprises become increasingly aware that they need a holistic view of data integration to gain a common set of capabilities for use across their organizations. For instance, sub-classes of tools are now consolidating as vendors expand their capabilities into adjacent areas and through mergers and acquisitions among the tooling firms.

Gartner concludes we now have complete data integration tools on the market that address a range of different data integration styles and are based on common design tooling, metadata and runtime architecture.

As the world becomes increasingly cloudified and software-driven, the impact of cloud technologies on data integration is huge and growing, as we explore in the chapter 2.

What are some approaches to integrate data?

Data integration can be used for various purposes. For instance, to move data out of siloed, on-premises platforms into data lakes to make the data easier to access and process. Putting data in a lake is not a guaranteed solution – execution is all, and some applications are more suited to this approach than others – and if the data is not easily accessible and usable, it’s a swamp not a lake.!

Another approach is to construct data warehouses which combines data from various sources for analysis for business purposes.

Database replication duplicates data from source databases like MongoDB or MySQL, say, into a cloud-based data warehouse to be used for other purposes and in combination with other sources.

Data integration can pull everything known about customers into a single location to build the vaunted 360-degree view of the customer for marketing purposes – to provide better, tailored service to individuals, create customer satisfaction, sell more to them and build customer loyalty.

IoT applications collect data from potentially vast numbers of devices, so data integration is used to gain insight from the mass of detail.

3. Having the right tools for the job

There are a number of technologies and approaches that can ease data integrations radically. A key consideration to bear in mind is that business and operational needs will change, so whatever you deploy now should be as flexible as possible to fit new purposes in future – new technology ages alarmingly quickly.

Getting a jump start with APIs to enable large datastores

APIs are a controlled, secure way of allowing one system to access certain data in another or others. They are also a very useful mechanism for pulling data from multiple sources into a large datastore (a lake or warehouse, for example as outlined above) to make it faster and easier to access, process and analyze the data as required.

APIs are the key to securely exposing data for internal applications and to your partner network. However, if their use is not replicated wherever possible, in a consistent way throughout an organization, their impact is greatly lessened, hence governance is key.

Shuba Sridhar, VP – Strategic Initiatives at Torry Harris Integration Solutions (THIS), comments, “The role of Integration and API governance is to balance competing objectives to the benefit of all stakeholder interests. The purpose of governance is to align the interests of all stakeholders as closely as possible to the objectives of the organization’s integration-driven digital programs”.

It's important to establish tools and frameworks that use open APIs for controlled public access, as well as deploying proper API management, such as gateways, a developer portal, publisher portal and an authentication server.

If these factors seem like a lot to take on board and deploy, consider THIS’ DigitMarketTM API Manager (DM-APIM). This is a complete package designed to help companies manage secure APIs, lowering the barriers to working within ecosystems and helping to leverage APIs’ full potential. The schematic below shows how it works.

Ovum (now Omdia) described DigiMarketTM as “a single, comprehensive package providing enterprises with a ready set of tools to create and manage digital platforms and a secure channel to share and monetize data as they proceed through their digital transformation journeys.”

Its On the radar report also said, “DigitMarketTM allows easy onboarding of partners and curation of content and can function as a marketplace for both physical and digital products (such as APIs) for the enterprise and its partners. DigitMarketTM is vertical agnostic (although vertical-specific templates are available) and can be white-labeled and configured to allow enterprises flexibility in how they market the platform.”

The report says, “DigitMarketTM provides a viable solution for enterprises to quickly build and scale a digital platform to support their customers, at less cost and risk than building one from the ground up, while providing quick returns on investment.”

Download the report from here.

Virtualizing data – simply the best

Establishing a data virtualization platform is a flexible, scalable way to homogenize data access across disparate data sources. It can be used to enable API-based access or classic data queries for analytics and business intelligence.

Virtualized data allows access across disparate systems, protocols, data formats and structures, and more, through a single layer. It provides real-time data access, which overcomes the limitations of older ETL tools, and supports centralized security and access control for data across the organization for internal applications. For external applications, virtualized data can be securely exposed through an API gateway.

A data virtualization platform can be a terrific asset for companies wanting to get the most from their data. THIS partners with specialist providers like Denodo to enable virtualization. The infographic below is a sneak-peek of an ongoing study by Forrester Research, The Total Economic Impact of Data Virtualization (using the Denodo platform) to be published in November, 2021.

The Total Economic ImpactTM Of Data Virtualization

Using The Denodo Platform

Through four customer interviews and data aggregation, Forrester concluded that the data virtualisation using the Denodo Platform has the following three-year financial impact

Containerizing data using microservices

The first step here is to identify which data can remain within legacy applications and which would benefit from containerization. Legacy applications can be re-engineered and modernized using a cloud native model running microservices. It is essential to bear in mind that adopting a microservices architecture is a major shift in mindset, organizational culture and team structure, not just a new technology.

Certainly there are elephant traps when deploying microservices for the unwary, so some specialist guidance on this journey is a good way to ensure a company gains the intended benefits, not unexpected costs. The advantages of microservices include transforming data to scale using a cloud native framework to support digital offers, and secure access to data using policy-driven API gateways and micro-gateways.

It’s important to apply the basics, such as putting governance principles in place regarding data sharing before adopting the microservices architecture, and check out this evaluation of tools to accelerate microservices-driven applications.

Cloud-enabling your data platform

Another way to make the most of data is by migrating legacy data integration platforms to cloud-based services by setting up a cloud-based data warehouse. Use a hybrid data integration solution that combines cloud-based services and traditional tools. Data architecture can be transformed using in-memory data grids (IMDG), with proven frameworks like Gigaspaces’ SmartDIH and Smart Cache.

An IMDG is a set of networked or clustered computers that pool their random access memory (RAM) so applications can share data with other applications running in the cluster. They are designed to process data at extremely high speeds, and for building and running applications that are too large to run on a single server. They are especially useful for applications that carry out a lot of parallel processing on large data sets. THIS partners with Gigaspaces and can show you how to get the most value out of the frameworks.

Leveraging analytics and business intelligence (BI)

Analytics and business intelligence are fundamental to extracting value when integrating data, for example through data modelling for hybrid analytics solutions. Of course AI has a big role to play, most notably machine learning. There is a raft of modern ‘Elastic’ tools for analytics, including:

  • ElasticSearch – a distributed, RESTful search and analytics engine that can address “a growing number of use cases” according to the company. At its heart is Elastic Stack which stores users’ data to speed up the search which can be tuned to improve relevance.
  • Logstash is a free and open server-side data processing pipeline that ingests data from a multitude of sources, transforms it, and then sends it to the user’s preferred "stash”.
  • Kibana a free and open user interface that lets users visualize their Elasticsearch data and navigate the Elastic Stack for tasks including tracking a query load to understand how requests flow through apps.

Then there’s Grafana Labs which offers operational dashboards for data, wherever it is located.

4. Critical success factors

Here’s a check list before you embark on your data integration journey.

Why are we doing this?

As always though, enterprises need to figure out what they are trying to achieve – their vision – and devise a strategy to reach their goals, which should be SMART – specific, measurable, attainable, relevant and time-bound.

This is not new, but it’s amazing how many transformations are sparked by shiny new technology not clearly defined desired outcomes. Companies also need to figure out how they are going to measure their progress and success before they start because constant assessment and adjustments are likely to be necessary as needs evolve and market conditions change.

Support from the top

Data integration programs need to have the support from top management because the purpose and impact of systems integration goes way beyond the IT department: Data integration is all about making data accessible as, where and when it is required for multiple purposes. Top management needs to be involved from the conception of the vision and stay involved because data integration can have implications for just about any part of a business and consistency, and a universal comprehension of purpose and the plan does not happen by itself. All transformation efforts need buy-in from the workforce, which only comes from understanding what’s going on and why it’s necessary.

Find a proven, specialist data integration partner

As the two points above suggest, data integration failures are rarely caused just by technical issues. This does not mean that the technologies involved do not have issues: Some pre-configured data integration solutions require a great deal of additional work for even small customizations to fit into a company’s systems. It is always a good idea to have a data integration specialist partner to help evaluate solutions’ suitability and applicability, and their deployment.

Quality matters

Remember that without good quality data, it is impossible to properly leverage analytics and AI to gain actionable insights. And as mentioned earlier, garbage in results in garbage out, so data cleansing needs to be right at the top of the schedule, before the integration begins.

Data integration is integral to every area of digital transformation, and it is important to view it as a constantly evolving part of running a digital business. The purpose of data integration is to generate value by combining disaparate data sets to gain intelligence and actionable insights. The speed and scale of cloud-based tools and technologies are bringing unprecedented opportunities to all organizations to leverage their own data and combine it with that of partners, customers and even sources like social media to make data-driven decisions.

The power of this knowledge can be used for many purposes, which are often interlinked, from providing outstanding customer experience through greater understanding of them, to better operational efficiency, higher profit margins, shorter times to market and more. Data integration provides a huge opportunity – be clear what you want to achieve from the outset and engage specialist help to reap the rewards more quickly – and sustainably as your business needs change.

Other Resources