Ian Russell, Director of Operations and Jon Stace, Director of Technology discuss how to get the most from your data and examine the approach adopted at Software Solved. This was presented as a video at TechExeter Tech and Digital Conference 2021
Data access challenges
How we previously dealt with data requests
After some research, it was decided the focus would be on the DataOps approach. This is the perfect combination of Agile, DevOps, Statistical Process Control (SPC) and fits well with our continuous delivery, promoting improvement in speed and accuracy of analytics DataOps. It focuses on and attempts to integrate real-time analytics and the Continuous Integration / Continuous Delivery (CI/CD) process into a collaborative team approach. Access to stable data in an evolving environment can be challenging.
The Value Pipeline
The Value Pipeline is broken down into two different streams/pipelines – Value and Innovation. The Value Pipeline focuses on ensuring the hygiene of data passing through the process increasing confidence. Both of which are attempting to extract value from the processes as well as the systems. Data enters the pipeline and moves through into productions with several quality checks and balances to increase confidence. Production is generally a series of stages: access, transforms, models, visualisations, and reports.
As the diagram above illustrates, a new feature undergoes development before it can be deployed to production systems. This creates a feedback loop which spurs new questions and ideas for enhanced analytics. During the development of new features, code changes, but data is kept constant
The end-to-end process in practice in comparison to DevOps (CI/CD):
Sandbox Environments – Isolated environments that enables testing without affecting the application, system, or platform on which they run.
Orchestration – Automation of the data factory/pipeline process e.g., container deployment, runtime processes, data transfers/integration etc.
Why do it this way?
This allows you to integrate data from different sources, automating their ingestion, loading and availability. You can control the storage of data with its different versions over time, centralise management of metadata, manage request, authorisation, access permissions to data and apply analytical. On top of this, you can do reporting and dashboarding mechanisms and techniques to monitor and track what is happening throughout the platform.
Tapping into The Value Pipeline
Access – We pull & ingest raw data from multiple sources via replication engine at regular intervals so that it’s as close to real life as possible.
Transform – Generally an ETL process that standardises the data set, sorts it and validates that it is correct. To get it into a place to be analysed. e.g. filtering, aggregation, mapping values etc.
Model – This is when we can start to model the data and interrogate it to analyse or ask the questions we want to gain value from.
Visualise/Report – Then we can visualise and report using several different technologies.
Orchestration and automation allow us to access the value of the data much faster with higher levels of quality.
Our tech approach and how it’s evolved
Originally, we used traditional tools and approaches, like:
Excel and SQL Server (SSIS / SSRS). These days we are using ADF, which gives us:
- Azure Data Factory
- IaC / Sandboxes
- Statistical Process Control (SPC)
- Azure monitor for metrics
- Outputs / logic
- Warnings of variance
What does this mean for customers?
This means better access to data and all the below:
What does the future hold?
- IoT and the ever-expanding sources of data
- Devices have new ways to communicate data
- Data Integration – explosion of new types of data in great volumes has demolished the assumption that you can master big data through a single platform
- AI and machine learning to supplement human talent – With more data in a variety of formats to deal with, organizations will need to take advantage of advancements in automation (AI and machine learning) to augment human talent
- Working with SME’s and non-technical users – self service