Data mesh is a new approach based on a modern, distributed architecture for analytical data management. It enables end users to easily access and query data where it lives without first transporting it to a data lake or data warehouse.
Data mesh is a new approach based on a modern, distributed architecture for analytical data management. It enables end users to easily access and query data where it lives without first transporting it to a data lake or data warehouse. The decentralized strategy of data mesh distributes data ownership to domain-specific teams that manage, own, and serve the data as a product. The main objective of data mesh is to eliminate the challenges of data availability and accessibility at scale. Data mesh allows business users and data scientists alike to access, analyze, and operationalize business insights from virtually any data source, in any location, without intervention from expert data teams. Simply put, data mesh makes data accessible, available, discoverable, secure, and interoperable. The faster access to query data directly translates into faster time to value without needing data transportation.
Global data creation is projected to exceed 180 zettabytes in the next five years. Current data platforms have several architectural failures that hinder enterprise data processing and inhibit business growth.
Problem #1 : Until now, enterprises used a centralization strategy to process extensive data with various data sources, types, and use cases. However, centralization requires users to import/transport data from edge locations to a central data lake to be queried for analytics, which is time-consuming and expensive.
How Data Mesh Solves It: The distributed architecture of data mesh views data as a product with separate domain ownership of each business unit. This decentralized data ownership model reduces the time-to-insights and time-to-value by empowering business units and operational teams to access and analyze “non-core” data quickly and easily.
Problem #2: As global data volumes continue to increase, the query method in a centralized management model requires changes in the entire data pipeline that fails to respond at scale. It slows down the response time to new consumers/data sources as the number of sources increases, which negatively affects business agility to get value from data and respond to change.
How Data Mesh Solves It: Data mesh delegates datasets ownership from the central to the domains (individual teams or business users) to enable business agility and change at scale. Data mesh architecture steers enterprises towards real-time decision-making by closing the time and space gap between an event happening and its consumption/process for analysis.
Problem #3: Data transfer is often susceptible to data residency and privacy guidelines that prohibit data migration if the data is stored in particular geographies or legal jurisdictions, such as data stored in an EU country but needing to be accessed by a user in North America. Abiding by data governance regulations is time-consuming and tedious, and can significantly delay data processing and analysis teams need for critical business intelligence that helps them maintain a competitive advantage.
How Data Mesh Solves It: In decentralized data management, the domains are responsible for the quality, security, and transfer of their data products. Data mesh provides a connectivity layer that enables direct access and query capabilities by technical and non-technical users to data sets where they reside, avoiding costly data transfers and residency concerns.
In the data mesh implementation, the central IT still exists to build a self-service data platform, but it does not own the data. For instance, a marketing company with the central IT team responsible for delivering an enabling technology, is still responsible for overarching governance and security for connected systems but individual functional teams have responsibility for the data itself
Data mesh offers a modern development approach to data analytics and software teams. It reduces data latency by providing instant access to query data from proximate geographies without access limitations.
The distributed data enables sales and marketing teams to curate a 360-degree perspective of consumer behaviors and profiles from various systems and platforms to create more targeted campaigns, increase lead scoring accuracy, and project customer lifetime values (CLV), churn, and other essential performance metrics.
Data mesh enables development and intelligence teams to create virtual data warehouses and data catalogs from different sources to feed machine learning (ML) and artificial intelligence (AI) models to help them learn, without having to consolidate data in a central location.
Data mesh implementation in the financial sector creates faster time-to-insight at lower operating costs and operational risks. Distributed data analytics compacts fraudulent behavior modeling to detect and prevent fraud in real-time. It allows international financial bodies to analyze data locally – within any particular country or region, to identify fraud threats without replicating data sets and transporting them to their central database.
A decentralized data platform makes it easy to comply with worldwide data governance rules to provide global analytics across multiple regions with end-to-end data sovereignty and data residency compliance.
Want to read more? You can check out the Data Mesh O'Reilly book right here!
Here are some amazing companies in the Data Mesh.
Starburst is a full-featured data lake analytics platform that activat ...