In this blog I’ll describe a solution to an urgent priority affecting enterprises today.
That question is “How to implement a closed loop analytics framework that can be scaled as the number of use cases and data sources increase?”.
This is not an easy problem to solve because by its very nature, any technology implementation gets outdated very quickly. However, thinking in terms of the entire value chain of data will help channel this inevitable entropy so we can maximize the value of our data.
The 5 Stages of a Closed Loop Data Pipeline
Typically, when we think of developing a new application or digital capability, the focus is on doing it well. That implies we lay a lot of emphasis on agile methodology, our lifecycle processes, devops and other considerations.
Rarely do we think about these two questions:
- What will happen to the zero and first party data that we are about to generate as part of the application?
- And (to a lesser degree) what existing insights can the new application use to improve the outcomes?
As a result, the integration of data into the analytics AI pipeline is often accomplished as part of separate data initiatives.
This according to me is a combination of data strategy and data operations stages of the closed loop analytics model.
From a data strategy perspective (the first stage), understanding the value of zero-party and first-party data across all parts of the enterprise, and then creating a plan to combine relevant third-party data is critical.
It defines how we adopt the capabilities of the cloud and which technologies will likely be used. It also helps us create a roadmap of business insights that will be generated based on feasibility, costs, and of course benefits.
Finally, feeding this data strategy consideration into the governance of software development lifecycle helps unlock the benefits that enterprise data can deliver for us.
The second stage which is closely linked is data operations. This is the better-known aspect of data management lifecycle and has been a focus of improvement for several decades.
Legacy landscapes would use what are called ETLs (batch programs that map and transfer data) into different kinds of data warehouses after the data has been matched and cleaned to make sense. Then we implement various kinds of business intelligence and advanced analytics on top of this golden source of data.
As technology has progressed we have made great strides in applying machine learning to solve the problems of data inconsistencies – especially with third party data received into the enterprise.
And now we are moving to the concept of a data fabric where applications are plugged straight into an enterprise wide layer so that latencies and costs are reduced. The management of master data is also seeing centralization so that inconsistencies are minimized.
Stage 3 of the data management lifecycle is compliance and security. This entails a few different things such as but not limited to:
- Maintaining the lineage of each data element group as it makes it way to different applications
- Ensuring that the right data elements are accessible to applications on an auditable basis
- Ensuring that the data is masked correctly before being transmitted for business intelligence.
- Ensuring that compliance to regulations such as GDPR and COPPA is managed
- Encryption of data at rest and in transit
- Access control and tracking for operational support purposes
As is obvious, the complexity of compliance and security needs is not a trivial matter. So I find that even as the need for AI and Customer Data Platforms (CDPs) has increased, this area still has a lot of room to mature.
Stage 4 is about insights generation. Of late this stage has received a lot of investment and has matured quite a bit. There is an abundance of expertise (including at Ignitho) that can create and test advanced analytics models on the data to produce exceptional insights.
In addition to advanced analytics and machine learning, the area of data visualization and reporting has also matured significantly. From our partnership with AWS, Microsoft and a host of other visualization providers such as Tableau, we are developing intuitive and real time dashboards for our clients.
However, I believe that the success at this stage has a lot of dependency on the first 2 stages of data strategy and data operations.
Stage 5 is another area that is developing in maturity but is not quite there. This stage is all about taking the insights we generate in stage 4 and use them to directly improve operations in an automated fashion.
From my experience I often see 2 challenges in this area:
- The blueprint for insights operationalization is still maturing. A logical path for a cloud native organization would be feed these insights as they are generated into the applications so that they can result in assisted as well as unassisted improvements. However, because of the lack of these automated integrations, the manual use of insights is more prevalent. Anything else requires large investments in multi-year tech roadmaps.
- The second challenge is due to the inherent entropy of an organization. New applications, customer interactions, and support capabilities must constantly be developed to meet various business goals. And as data strategy (stage 1) is not a key consideration (or even a feasible one during implementation), the entropy is left to the addressed later.
The emergence of AI and analytics is a welcome and challenging trend. It promises dramatic benefits but also requires us to question existing beliefs and ways of doing things. In addition, it’s also just a matter of evolving our understanding of this complex space.
In my view, stage 1 (data strategy), stage 3 (compliance and security) are key to making the other 3 stages successful. This is because stage 2 and stage 4 will see investments whether we do stage 1 and 3 or not. The more we think about stage 1 and 3, the more will our business benefits be amplified.