Article
4 min read
Adriana Calomfirescu

Data mesh seems to herald a paradigm shift in data storage and processing. Instead of central data repositories, such as data warehouses or data lakes, companies could in future rely on a distributed data architecture to finally be able to exploit the full potential of their data. Let’s take a look at the principles of this new data architecture concept, its advantages and what needs to be considered when deciding whether it is a good fit for a company.

 

Introduction

 

“Data is the new oil” – this quote by British mathematician Clive Humby is over 15 years old, and most companies have since recognised the meaning of his words: they are trying to exploit the potential of their data. To do this, they are collecting ever larger amounts of data in central data stores where it is cleaned and processed so that it can then be further processed as high-quality data.

 

The data originates from internal operational and transactional systems and domains that are essential for business operations. Furthermore, data from external sources that offer companies additional information is also fed into the data warehouse or data lake.

 

Data volumes become a problem for data repositories

 

However, companies are slowly reaching their limits with this monolithic data platform architecture – and they often do not even achieve the desired results. They face the challenge of controlling their ever-growing data volumes and harmonising them to reach their full potential. Moreover, this process costs money and takes time. Thus, their ability to react flexibly and quickly to the increasing number of internal and external data sources and to connect them to their existing data is limited.

 

Furthermore, the origin of the data in these repositories often cannot be fully traced: for example, from which system did it originally come from? Through which other systems did it migrate? When was it changed, how and by whom? This information is important to ensure a high level of data quality. However, due to the large amount of data that ends up in the repository – as well as the speed at which data changes – it is sometimes neglected and not fully tracked and recorded. This usually causes those subject matter experts who are supposed to work with the data to become reluctant to use it.

 

As a result, companies struggle to generate meaningful insights from their data and identify new use cases – such as new products or services for their customers. In addition, it takes time to transform the data and make it ready for its consumers. This is especially the case if a company does not employ enough data specialists who know exactly how the data should be processed to reach its purpose.

 

Getting more out of data

 

The data mesh concept attempts to address these challenges by managing data as a product. This means that the data is structured as data domains, has data owners and is properly catalogued so everyone within the company interested in certain data can easily access the metadata.

 

The team generating the data is considered the data owner and must prepare its data in such a way that other data consumers in the company can use it easily via self-service options. To do this, they need to satisfy several principles when building and managing their data products, such as data integrity, discoverability, self-description and interoperability. This increases consumer confidence in the products.

 

The biggest advantage here is that the data-producing departments naturally know their data best. Accordingly, it is easier for them to derive benefits from it and to develop new possible use cases.

 

In this new data architecture, the role of data scientists and engineers also changes: they are no longer acting as go-between for the data-producing and data-consuming teams, but they become part of the data-producing team. In this way, they learn the domain knowledge necessary to support their team colleagues in the best possible way when preparing the data products. This simplifies and speeds up the entire process, which also leads to lower costs overall.

 

Central standards and a central register

 

The data mesh approach is particularly suitable for larger companies that work with very large datasets and a variety of data sources. Smaller companies, on the other hand, can usually get by with a central data repository. When implementing a data mesh approach, companies should consider two key things to set up the necessary processes:

 

A central data governance model: data mesh only works if all data products in a company adhere to consistent standards and guidelines. Only then are they interoperable and data consumers can merge multiple data products and work with them according to their individual needs. Therefore, companies must first define standards and policies that determine how data products are categorised, managed and accessed.

 

A central data catalogue: for data consumers to be able to find data products, companies need a central data catalogue. All existing data products are listed in this catalogue, including additional information such as the origin of the data. Furthermore, data owners can add sample datasets that data consumers can use to try out the product before using their own datasets.

 

Conclusion

 

Data mesh is a new, decentralised approach to storing and processing data that might see widespread adoption. But the more companies realise that data repositories, which have become established in recent years, are no longer sufficient for their requirements, the more they will look for alternatives.

 

Data mesh offers them the opportunity to get more out of their existing data, while at the same time deploying their people more efficiently and making internal processes more effective and flexible.

 

No video selected

Select a video type in the sidebar.