Data mesh: what it is and why you might care


Missed a session from the Future of Work Summit? Visit our Future of Work Summit on-demand library to stream.

This article was contributed by Bruno Aziza, Chief of Data and Analytics at Google Cloud

“Data mesh” is a term that most vendors, educators, and data experts have flocked to to define one of the most disruptive trends of the data, AI, and analytics world. According to Google Trends, in 2021 “data mesh” overcame the “data lakehouse” that has been quite popular in the industry until now.

To put it mildly, if you work in technology, you will not escape the data mesh in 2022.

Data mesh: a simple definition

The genesis of the data mesh comes from a newspaper written in May 2019 by Zhamak Dehghani. In this piece, the Thoughtworks consultant describes the boundaries of centralized, monolithic, and domain agnostic data platforms.

These platforms often take the form of proprietary enterprise data warehouses with “thousands of untenable ETL tasks, tables, and reports that only a small group of specialized people understand, resulting in under-realized business impact,” or complex data lakes. which are “operated by a central team of hyper-specialized data engineers who” [have], at best, have enabled R&D analysis,” said Dehghani. The latter case is often referred to as a “data swamp,” a data lake where all kinds of data stagnate, go unused, and end up useless.

The data mesh aims to address these issues by focusing on domain-driven design and leads leaders toward a “modern data stack” to achieve a balance between centralization and decentralization of metadata and data management.

One of the best explanations and implementations of the data mesh concept I’ve read to date is in L’Oréal CIO Francois Nguyen’s two-part series entitled “Toward a Data Mesh” (Part 1, Part 2).

If you haven’t read it yet, stop everything and do it now. There is no better guidance than that of practitioners testing theories in practice and reporting real-world findings on their data journey. Francois’ paper is full of helpful guidelines for how a data mesh can guide the composition and organization of your data team. “Part Deux” from his blog provides real, tested, and technical guidelines for successfully implementing a data mesh.

Remember that a data mesh is more than a technical architecture; it’s a way to organize yourself around data ownership and its activation. When implemented successfully, the data mesh becomes the foundation of a modern data stack based on six key principles. For your data mesh to work, data must be 1) discoverable, 2) addressable, 3) reliable, 4) self-descriptive, 5) interoperable, and 6) secure.

In my opinion, a seventh dimension needs to be added to the data mesh concept: financially sound and financially accurate. One of the biggest challenges (and opportunities) of a distributed and modern data stack is the real allocation of resources (and costs) to the domains.

Many will interpret this comment as a “cloud costs you more” argument. That’s not what I’m aiming for. In fact, I believe that costs should not be judged in isolation. It should correlate with business value: If your business can extract exponentially more value from data by investing in a modern (and responsible) data mesh in the cloud, then you need to invest more.

The biggest problems in this area were not the lack of data or the lack of investment. They were about value. According to Accenture, almost 70% of organizations are still not getting value from their data.

Don’t get distracted by the hype

If your ultimate goal is to extract “business value” from data, how does the data mesh concept help you? One of your biggest challenges this year will probably be to avoid getting caught up in the buzzword euphoria that surrounds the term. Instead, focus on using the data mesh as a way to reach your end goal.

There are two important concepts to consider:

The data mesh is not the beginning

In a recent piece, my friend Andrew Brust noted that “dissemination is the natural state of operational data” and that “the general operational data corpus is supposed to be dispersed. It has come about through optimization, not incompetence.” In other words, the data you need is supposed to live in a distributed state. It will be on-premises, it will be in the cloud, it will be in multiple clouds. Ask your team, “Do we have all the data we have inventoried what we need, do we understand where it all lies?”

Remember, according to Dehghani’s original article, for your data network to work, your data must be “discoverable, addressable, reliable, self-descriptive, interoperable and secure”. This assumes that there is a stage before the data mesh stage.

I have the honor of spending a lot of time with many data leaders, and the best description I’ve heard of what that stage could be is the “data ocean” from Vodafone’s Johan Wibergh and Simon Harris. The data ocean is broader than the concept of landlocked data lakes. It aims to securely provide complete visibility into the entire data domain available to data teams to realize their potential, without necessarily moving it.

The data mesh is not the end

Now that we’ve established that the data mesh needs a database to run successfully, let’s take a look at where the data mesh leads. If your goal is to generate value from the data, how do you materialize the results of your data mesh? This is where data products come into play.

We know that the value of data comes from its use and application. I’m not talking about simple dashboards here. I’m referring to intelligent and rich data products that trigger actions to create value and protect your people and business. Think anomaly detection for your networks, fraud prediction for your bank accounts or recommendation engines that create superior customer experiences in real time.

In other words, while the data ocean is the architectural foundation needed to make your data mesh successful, the data mesh itself is the organizational model that helps your team build data products. If any company is a ‘data company’, then the currencies are the ‘data products’ it can produce, their repeatability and their reliability. Here’s a concept that McKinsey Analytics came up with: “data factory”.

What should you be concerned about?

As you read about the concept of data mesh year round, you’ll probably hear three kinds of people: the disciples, the distractors, and the distorters.

The disciples will encourage you to go back to the original document or even contact Dehghani directly if you have any questions. You can also order her book, that is coming out soon.

The distractors will be experts or salespeople who want to label the concept of the “data mesh” as a fad or an old trend: “Look away!” they’ll say, “There’s nothing new here!” Be careful. Novelty is relative to your current state. Go back to the origins and decide for yourself whether this concept is new to you, your team and your organization.

The scramblers are likely to be vendors (software, vendors, services) who will directly benefit from drawing a straight line from the Dehghani paper to their product, solution or services. Watch out. As my friend Eric Broda explains in his blog about data mesh architecture“there is no single product that offers you the data mesh.”

The best solution in my opinion is to connect with the practitioners. Those leaders who have put the theory into practice and who are willing to share their lessons.

Bruno Aziza is the head of data and analytics at Google Cloud.

DataDecision makers

Welcome to the VentureBeat Community!

DataDecisionMakers is where experts, including the technical people who do data work, can share data-related insights and innovation.

If you want to read about the latest ideas and up-to-date information, best practices and the future of data and data technology, join us at DataDecisionMakers.

You might even consider contributing an article yourself!

Read more from DataDecisionMakers

Leave a Reply

Your email address will not be published.