DataOps (data operations) is an agile, process-oriented methodology for developing and delivering analytics. It brings together DevOps teams with data engineers and data scientists to provide the tools, processes, and organizational structures to support the data-focused enterprise. Research firm Gartner further describes the methodology as one focused on “improving the communication, integration, and automation of data flows between data managers and data consumers across an organization.”
According to Dataversity, the goal of DataOps is to streamline the design, development, and maintenance of applications based on data and data analytics. It seeks to improve the way data are managed and products are created, and to coordinate these improvements with the goals of the business. According to Gartner, DataOps also aims “to deliver value faster by creating predictable delivery and change management of data, data models, and related artifacts.”
DevOps is a software development methodology that brings continuous delivery to the systems development lifecycle by combining development teams and operations teams into a single unit responsible for a product or service. DataOps builds on that concept by adding data specialists — data analysts, data developers, data engineers, and/or data scientists — to focus on the collaborative development of data flows and the continuous use of data across the organization.
DataKitchen, which specializes in DataOps observability and automation software, maintains that DataOps is not simply “DevOps for data.” While both practices aim to accelerate the development of software (software that leverages analytics in the case of DataOps), DataOps has to simultaneously manage data operations.
Like DevOps, DataOps takes its cues from the agile methodology. The approach values continuous delivery of analytic insights with the primary goal of satisfying the customer.
According to the DataOps Manifesto, DataOps teams value analytics that work, measuring the performance of data analytics by the insights they deliver. DataOps teams also embrace change and seek to constantly understand evolving customer needs. They self-organize around goals and seek to reduce “heroism” in favor of sustainable and scalable teams and processes.
DataOps teams also seek to orchestrate data, tools, code, and environments from beginning to end, with the aim of providing reproducible results. Such teams tend to view analytic pipelines as analogous to lean manufacturing lines and regularly reflect on feedback provided by customers, team members, and operational statistics.
Enterprises today are increasingly injecting machine learning into a vast array of products and services and DataOps is an approach geared toward supporting the end-to-end needs of machine learning.
“For example, this style makes it more feasible for data scientists to have the support of software engineering to provide what is needed when models are handed over to operations during deployment,” Ted Dunning and Ellen Friedman write in their book, Machine Learning Logistics.
“The DataOps approach is not limited to machine learning,” they add. “This style of organization is useful for any data-oriented work, making it easier to take advantage of the benefits offered by building a global data fabric.”
They also note DataOps fits well with microservices architectures.
To make the most of DataOps, enterprises must evolve their data management strategies to deal with data at scale and in response to real-world events as they happen, according to Dunning and Friedman.
Because DataOps builds on DevOps, cross-functional teams that cut across “skill guilds” such as operations, software engineering, architecture and planning, product management, data analysis, data development, and data engineering are essential, and DataOps teams should be managed in ways that ensure increased collaboration and communication among developers, operations professionals, and data experts.
Data scientists may also be included as key members of DataOps teams, according to Dunning. “I think the most important thing to do here is to not stick with the more traditional Ivory Tower organization where data scientists live apart from dev teams,” he says. “The most important step you can take is to actually embed data scientists in a DevOps team. When they live in the same room, eat the same meals, hear the same complaints, they will naturally gain alignment.”
But Dunning also notes that data scientists may not need to be permanently embedded in a DataOps team.
“Typically, there’s a data scientist embedded in the team for a time,” Dunning says. “Their capabilities and sensibilities begin to rub off. Someone on the team then takes on the role of data engineer and kind of a low-budget data scientist. The actual data scientist embedded in the team then moves along. It’s a fluid situation.”
Most DevOps-based enterprises already have the nucleus of a DataOps team on hand. Once they have identified projects that need data-intensive development, they need only add someone with data training to the team. Often that person is a data engineer rather than a data scientist. DataKitchen suggests organizations seek out DataOps engineers who specialize in creating and implementing the processes that enable teamwork within data organizations. These individuals design the orchestrations that allow work to flow from development to production and ensure that hardware, software, data, and other resources are available on demand.
Many teams are built of individuals with overlapping skillsets, or individuals may take on multiple roles with a DataOps team, depending on expertise.
According to Michele Goetz, vice president and principal analyst at Forrester, some of the key areas of expertise on DataOps teams include:
Regardless of makeup, DataOps teams must share a common goal: the data-driven needs of the services they support.
According to Goetz, DataOps team members include:
Here are some of the most popular job titles related to DataOps and the average salary for each position, according to data from PayScale:
The following are some of the most popular DataOps tools: