Data maturity Navigating the terminology
Join Technical Lead, Tom Rooks has he explains the terminology we frequently use when describing our Data Maturity Assessments. This short video will help you to understand the difference between a custom component and an SQL Server, helping you to make informed decisions on your data maturity.
A data maturity assessment sees our technical experts get under the hood of your data collection systems, understanding how they are used, by who and for what. Our assessment framework is thorough and uses a data maturity model to identify your organisation’s performance in key areas.
Depending on your current level of maturity, we will recommend key projects to help you progress further and reach your full potential.
ETL stands for Extract, Transform and Load. This phrase is most likely to be used when there is a need to prepare data to be moved from one system to another. Here is how each of the steps are used. Extract – the data is extracted from its current location. Transform – the data is changed and manipulated as needed in transit (eg date format change) so that it can sit within the new location. Load – the final step of the process where the transformed data is loaded into its new location.
Data Flow is a phrase which can be used alongside ETL as it is an important part of the planning stages. Data will always have a source and a destination, the data flow is what allows the stages in between to be shown and, is where actions such as data being cleaned, value being derived and alterations to data being made can take place.
Sequel Server Integration Services or SSIS for short is Microsoft’s ETL and data flow product. The steps it covers are very similar to those I covered under ETL, in that SSIS allows you to take data out of one system, change and alter it as needed and then push it to its new destination.
Although ETL and SSIS help extract., guide and load data to its new destination there is sometimes a need to use ‘scripting’ or ‘custom components’. Both of these are ways of creating commands that both transform the data and then guide it onwards to its final destination.
Scripting would be the first port of call for this with custom components being the next step for more complex requirements.
There are many programming languages that can be used for scripts and custom components, our standard go to languages are C# and Python, both are great for use with data.
If your project involves a database, you will likely come across both SQL Server and SQL Azure. The fundamental difference between the two is one is where the equipment physically sits on premise (SQL Server) and the other (SQL Azure) is a cloud-based solution.
When deciding which is right for you, you will need to consider things such as whether you want a solution that allows the kit to physically be at your premises, who need access to it and what levels of support you have available in house. All of these will be key factors in choosing the right solution for your organisation.
Let’s turn our attention to data storage. Data Warehouse – provides a central source of data from multiple systems. Data would be extracted from other systems, be altered as needed and then pushed to the data warehouse. This then provides one central data set for analysis, reporting, insight and more. Data Mart – this is a subset of a data warehouse and is often used to provide subsets of data for specific business areas or users. Data Lake – this provides a central repository for organisation data to be held in it raw format in its original and transformed states.
Hopefully this provides you with a bit more context and an understanding of just some of the terminology used in the world of data.
If you’re looking to get a data project underway or have some questions to help you navigate the right route for your organisation, then please get in touch.