Let’s face it — traditional data management doesn’t work. Today, 75% of executives don’t trust their own data, and only 27% of data projects are successful. Those are dismal numbers in what has been called the “golden age of data”.
As data just keeps growing in size and complexity, we’re struggling to keep it under control. To make matters worse, data teams and their members, tools, infrastructure, and use cases are becoming more diverse at the same time. The result is data chaos like we’ve never seen before.
DataOps has been around for several years, but right now it’s on fire because it promises to solve this problem. Just a week apart, Forrester and Gartner recently made major shifts toward recognizing the importance of DataOps.
On June 23 of this year, Forrester launched the latest version of its Wave report about data catalogs — but instead of being about “Machine Learning Data Catalogs” like normal, they renamed the category to “Enterprise Data Catalogs for DataOps”. A week later, on the 30th, Gartner released its 2022 Hype Cycle, predicting that DataOps will fully penetrate the market in 2-5 years and moving it from the far left side of the curve to its “Peak of Inflated Expectations”.
But the rise of DataOps isn’t just coming from analysts. At Atlan, we work with modern data teams around the world. I’ve personally seen DataOps go from an unknown to a must-have, and some companies have even built entire strategies, functions, and even roles around DataOps. While the results vary, I’ve seen incredible improvements in data teams’ agility, speed, and outcomes.
In this blog, I’ll break down everything you should know about DataOps — what it is, why you should care about it, where it came from, and how to implement it.
The first, and perhaps most important, thing to know about DataOps is that it’s not a product. It’s not a tool. In fact, it’s not anything you can buy, and anyone trying to tell you otherwise is trying to trick you.
Instead, DataOps is a mindset or a culture — a way to help data teams and people work together better.
DataOps can be a bit hard to grasp, so let’s start with a few well-known definitions.
DataOps is a collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and data consumers across an organization.
Gartner
DataOps is the ability to enable solutions, develop data products, and activate data for business value across all technology tiers from infrastructure to experience.
Forrester
DataOps is a data management method that emphasizes communication, collaboration, integration, automation and measurement of cooperation between data engineers, data scientists and other data professionals.
Andy Palmer
As you can tell, there’s no standard definition for DataOps. However, you’ll see that everyone talks about DataOps in terms of being beyond tech or tools. Instead, they focus on terms like communication, collaboration, integration, experience, and cooperation.
In our mind, DataOps is really about bringing today’s increasingly diverse data teams together and helping them work across equally diverse tools and processes. Its principles and processes help teams drive better data management, save time, and reduce wasted effort.
The short answer: It helps you tame the data chaos that every data person knows all too well.
Now for the longer, more personal answer…
At Atlan, we started as a data team ourselves, solving social good problems with large-scale data projects. The projects were really cool — we got to work with organizations like the UN and Gates Foundation on large-scale projects affecting millions of people.
But internally, life was chaos. We dealt with every fire drill that could possibly exist, leading to long chains of frustrating phone calls and hours spent trying to figure out what went wrong. As a data leader myself, this was a personally vulnerable time, and I knew it couldn’t continue.
We put our minds to solving this problem, did a bunch of research, and stumbled on the idea of “data governance”. We were an agile, fast-paced team, and traditional data governance didn’t seem like it fit us. So we came together, reframed our problems as “How Might We” questions, and started an internal project to solve these questions with new tooling and practices. By bringing inspiration from diverse industries back to the data world, we stumbled upon what we now know as DataOps.
It was during this time that we saw what the right tooling and culture can do for a data team. The chaos decreased, the same massive data projects became exponentially faster and easier, and the late-night calls became wonderfully rare. And as a result, we were able to accomplish far more with far less. Our favorite example: we built India’s national data platform, done by an eight-member team in just 12 months, many of whom had never pushed a line of code to production before.
We later wrote down our learnings in our DataOps Culture Code, a set of principles to help a data team work together, build trust, and collaborate better.
That’s ultimately what DataOps does, and why it’s all the rage today — it helps data teams stop wasting time on the endless interpersonal and technical speed bumps that stand between them and the work they love to do. And in today’s economy, anything that saves time is priceless.
Some people like to say that data teams are just like software teams, and they try to apply software principles directly to data work. But the reality is that they couldn’t be more different.
In software, you have some level of control over the code you work with. After all, a human somewhere is writing it. But in a data team, you often can’t control your data, because it comes from diverse source systems in a variety of constantly changing formats. If anything, a data team is more like a manufacturing team, transforming a heap of unruly raw material into a finished product. Or perhaps a data team is more like a product team, taking that product to a wide variety of internal and external end consumers.
The way we like to think about DataOps is, how can we take the best learnings from other teams and apply them to help data teams work together better? DataOps combines the best parts of Lean, Product Thinking, Agile, and DevOps, and applying them to the field of data management.
Key idea: Reduce waste with Value Stream Mappings.
Though its roots go back to Benjamin Franklin’s writings from the 1730s, Lean comes from Toyota’s work in the 1950s. In the shadow of World War II, the auto industry — and the world as a whole — was getting back on its feet. For car manufacturers everywhere, employees were overworked, orders delayed, costs high, and customers unhappy.
To solve this, Toyota created the Toyota Production System, a framework for conserving resources by eliminating waste. It tried to answer the question, how can you deliver the highest quality good with the lowest cost in the shortest time? One of its key ideas is to eliminate the eight types of waste in manufacturing wherever possible — from overproduction, waiting time, transportation, underutilized workers, and so on — without sacrificing quality.
The TPS was the precursor to Lean, coined in 1988 by businessman John Krafcik and popularized in 1996 by researchers James Womack and Daniel Jones. Lean focused on the idea of Value Stream Mapping. Just like you would map a manufacturing line with the TPS, you map out a business activity in excruciating detail, identify waste, and optimize the process to maintain quality while eliminating waste. If a part of the process doesn’t add value to the customer, it is waste — and all waste should be eliminated.
What does a Value Stream Mapping actually look like? Let’s start with an example in the real world.
Say that you own a cafe, and you want to improve how your customers order a cup of coffee. The first step is to map out everything that happens when a customer takes when they order a coffee: taking the order, accepting payment, making the coffee, handing it to the customer, etc. For each of these steps, you then explain what can go wrong and how long the step can take — for example, a customer having trouble locating where they should order, then spending up to 7 minutes waiting in line once they get there.
How does this idea apply to data teams? Data teams are similar to manufacturing teams. They both work with raw material (i.e. source data) until it becomes a product (i.e. the “data product”) and reaches customers (i.e. data consumers or end users).
So if a supply chain has its own value streams, what would data value streams look like? How can we apply these same principles to a Data Value Stream Mapping? And how can we optimize them to eliminate waste and make data team more efficients?
Key idea: Ask what job your product is really accomplishing with the Jobs To Be Done framework.
The core concept in product thinking is the Jobs To Be Done (JTBD) framework, popularized by Anthony Ulwick in 2005.
The easiest way to understand this idea is through the Milkshake Theory, a story from Clayton Christensen. A fast food restaurant wanted to increase the sales of their milkshakes, so they tried a lot of different changes, such as making them more chocolatey, chewier, and cheaper than competitors. However, nothing worked and sales stayed the same.
Next, they sent people to stand in the restaurant for hours, collecting data on customers who bought milkshakes. This led them to realize that nearly half of their milkshakes were sold to single customers before 8 am. But why? When they came back the next morning and talked to these people, they learned that these people had a long, boring drive to work and needed a breakfast that they could eat in the car while driving. Bagels were too dry, doughnuts too messy, bananas too quick to eat… but a milkshake was just right, since they take a while to drink and keep people full all morning.
Once they realized that, for these customers, a milkshake’s purpose or “job” was to provide a satisfying, convenient breakfast during their commute, they knew they needed to make their milkshakes more convenient and filling — and sales increased.
The JTBD framework helps you build products that people love, whether it’s a milkshake or dashboard. For example, a product manager’s JTBD might be to prioritize different product features to achieve business outcomes.
How does this idea apply to data teams? In the data world, there are two main types of customers: “internal” data team members who need to work more effectively with data, and “external” data consumers from the larger organization who use products created by the data team.
We can use the JTBD framework to understand these customers’ jobs. For example, an analyst’s JTBD might be to provide the analytics and insights for these product prioritization decisions. Then, once you create a JTBD, you can create a list of the tasks it takes to achieve it — each of which is a Data Value Stream, and can be mapped out and optimized using the Value Stream Mapping process above.
Key idea: Increase velocity with Scrum and prioritize MVPs over finished products.
If you’ve worked in tech or any “modern” company, you’ve probably used Agile. Created in 2001 with the Agile Software Development Manifesto, Agile is a framework for software teams to plan and track their work.
The core idea in Agile is Scrum, an iterative product management framework based on the idea of creating an MVP, or minimum viable product.
Here’s an example: if you wanted to build a car, where should you start? You could start with conducting interviews, finding suppliers, building and testing prototypes, and so on… but that will take a long time, during which the market and world will have changed, and you may end up creating something that people don’t actually like.
An MVP is about shortening the development process. To create an MVP, you ask what the JTBD is — is it really about creating a car, or is it about providing transportation? The first, fastest product to solve this job could be a bike rather than a car.
The goal of Scrum is to create something as quick as possible that can be taken to market and be used to gather feedback from users. If you focus on finding the minimum solution, rather than creating the ideal or dream solution, you can learn what users actually want when they test your MVP — because they usually can’t express what they actually want in interviews.
How does this idea apply to data teams? Many data teams work in a silo from the rest of the organization. When they are assigned a project, they’ll often work for months on a solution and roll it out to the company only to learn that their solution was wrong. Maybe the problem statement they were given was incorrect, or they didn’t have the context they needed to design the right solution, or maybe the organization’s needs changed while they were building their solution.
How can data teams use the MVP approach to reduce this time and come to an answer quicker? How can they build a shipping mindset and get early, frequent feedback from stakeholders?
Agile can be used to open up siloed data teams and improve how they work with end data consumers. It can help data teams find the right data, bring data models into production and release data products faster, allowing them to get feedback from business users and iteratively improve and adapt their work as business needs change.
Key idea: Improve collaboration with release management, CI/CD, and monitoring.
DevOps was born in 2009 at the Velocity Conference Movement, where engineers John Allspaw and Paul Hammond presented about improving “dev & ops cooperation”.
The traditional thinking at the time was that software moved in a linear flow — the development team’s job is to add new features, then the operations team’s job is to keep the features and software stable. However, this talk introduced a new idea: both dev and ops’ job is to enable the business.
DevOps turned the linear development flow into a circular, interconnected one that breaks down silos between these two teams. It helps teams work together across two diverse functions via a set process. Ideas like release management (enforcing set “shipping standards” to ensure quality), and operations and monitoring (creating monitoring systems to alert when things break), and CI/CD (continuous integration and continuous delivery) make this possible.
How does this idea apply to data teams? In the data world, it’s easy for data engineers and analysts to function independently — e.g. engineers manage data pipelines, while analysts build models — and blame each other when things inevitably break. Instead of solutions, this just leads to bickering and resentment. Instead, it’s important to bring them together under a common goal — making the business more data-driven.
For example, your data scientists may depend on either engineering or IT now to deploy their models—from exploratory data analysis to deploying machine learning algorithms. With DataOps, they can deploy their models themselves and perform analysis quickly — no more dependencies.
Note: I cannot emphasize this enough — DataOps isn’t just DevOps with data pipelines. The problem that DevOps solves is between two highly technical teams, software development and IT. DataOps solves complex problems to help an increasingly diverse set of technical and business teams create complex data products, everything from a pipeline to a dashboard or documentation. Learn more.
Every other domain today has a focused enablement function. For example, SalesOps and Sales Enablement focus on improving productivity, ramp time, and success for a sales team. DevOps and Developer Productivity Engineering teams are focused on improving collaboration between software teams and productivity for developers.
Why don’t we have a similar function for data teams? DataOps is the answer.
Rather than executing data projects, the DataOps team or function helps the rest of the organization achieve value from data. It focuses on creating the right tools, processes, and culture to help other people be successful at their work.
A DataOps strategy is most effective when it has a dedicated team or function behind it. There are two key personas in this function:
At the beginning of a company’s DataOps journey, DataOps leaders can use the JBTD framework to identify common data “jobs” or tasks, also known as Data Value Streams. Then, with Lean, they can do a Value Stream Mapping exercise to identify and eliminate wasted time and effort in these processes.
Meanwhile, the Scrum ideology from Agile helps data teams understand how build data products more efficiently and effectively, while ideas from DevOps show how they can collaborate better with the rest of the organization on these data products.
Creating a dedicated DataOps strategy and function is far from easy. But if you do it right, DataOps has the potential to solve some of today’s biggest data challenges, save time and resources across the organization, and increase the value you get from data.
In our next blogs, we’ll dive deeper into the “how” of implementing a DataOps strategy, based on best practices we’ve seen from the teams we’ve worked with — how to identify data value streams, how to build a shipping mindset, how to create a better data culture, and more. Stay tuned, and let me know if you have any burning questions I should cover!
Similar Journal