“Breaking Changes” with Sam Ramji of DataStax: The New API Stack for Modern Data Apps
I recently sat down with Sam Ramji, chief strategy officer at DataStax, for Breaking Changes episode 18, “The New API Stack for Modern Data Apps.” Sam shared his views on how data and APIs have evolved in recent years, the changing makeup of our teams, and the ways in which machine learning is supporting these teams. Today, teams can be fail-safe and equipped with the API-powered intelligence that data infrastructure has long promised—but is now finally able to deliver.
Watch episode 18 and read more of what I learned below:
As my conversation with Sam unfolded, I learned about how DataStax is a hybrid database as a service built on Apache Cassandra, and how they deliver highly reliable key/value data stores. But we quickly dove into how data is shifting the API landscape—changing team and organizational structure, and pushing enterprise operations to new levels using machine learning. I walked away from our chat feeling confident that everything we’ve been learning as part of our API journey over the last decade has laid the groundwork and trained us for what we will need to do—and at the scale we will need—in coming years.
While I already knew about Cassandra and have a light understanding of how it fits into the overall open-source data stack, Sam enlightened me on the need to scale, partition, and federate data infrastructure to power application development and satisfy our analytical data needs. We discussed the obvious scalability and reliability that exists with Cassandra, but he kind of blew my mind when it comes to what is possible when you combine Cassandra and Kubernetes to scale up or scale down your data infrastructure. It makes it easier for anyone to deliver a minimum-viable three-node Cassandra cluster, but then scale it up to as many nodes as you need to support operations.
I began doing databases back in the 1980s and 1990s, and the approach Sam shared with me enables really dreamy levels of scale and elasticity that has always been just out of reach—so it makes me happy to see the future finally arrive when it comes to the availability and reliability of our data stores, and also APIs as the default approach to accessing them.
The role of APIs in data evolution
When I probed Sam on the role of APIs as part of this data evolution, he shared that high-quality REST and GraphQL are the default across this highly available and scalable data infrastructure from DataStax. He also shared a fascinating view of a growing world of what he called “version 2.0 microservices,” which expands on what we’ve learned from microservices with “data products.” We now have API-accessible data services that do one thing and do it well, meeting the growing needs of application development and providing the analytics we need to make sense of the massive amount of data we are working with each day. Expanding our microservices toolbox has helped us go from APIs powering a handful of applications to powering hundreds or thousands of applications, dashboards, integrations, and business automation workflows. Sam and I talked a lot about how the success of microservices 1.0 has influenced how we will do microservices 2.0, but more importantly, how our teams and organizational structures are continuing to evolve.
The role of people in data evolution
One of the things I really appreciate about how Sam sees the world is his strong understanding of the role of people in all of this. He spoke to how this microservices evolution is changing roles, and how it is escalating and evolving because of the shifts occurring in the data landscape within enterprise organizations.
He acknowledged that along with the increasing importance of data, the importance of the data scientist is at an all-time high. Highlighting that it will take years to expand the pipeline of qualified data scientists to meet the growing demand, he shared with me ways that we can further embrace DevOps and machine learning to augment developers with the safety net, exoskeleton, and other resources they will need to be successful. It is necessary to equip the next generation of data engineers to deliver more intelligent applications and provide the rich sources of data that our data scientists will need. All of this forward motion also elevates the role of the chief data officer (CDO), bringing a new balance of the C-suite within the enterprise, and continuing all the way down to the ground floor.
I learned a lot from Sam about the shifting roles associated with the evolution of data and APIs across the enterprise and his view of how the changes in the human landscape are also being enhanced using a much more precise real-time version of machine learning. The momentum of DevOps is being applied to this rapidly evolving world of DataOps, and the same practices are being used to optimize how machine learning models are developed and iterated with MLOps. But wait, there’s more!
Sam also walked me through the concept of ModelOps, which is about introducing observability and traceability at the machine learning layer, establishing a data feedback loop around how machine learning is being applied—all of which feeds back into the data layer behind all of this. DevOps has enabled us to shorten the cycles involved in delivering high-quality software and machine learning, which subsequently shortens the cycle involved in delivering high-quality data. With all three of these dimensions being observable, traceable, and possessing an automated feedback loop that goes right back into the data pipeline to develop new data products and evolve machine learning models—we can establish a virtuous cycle across operations.
The data API landscape
My discussion with Sam provided me with a pretty sophisticated view of the data API landscape. It is a way more mature look at what is happening compared to many of the “create, read, update, and delete” discussions (or simply CRUD API discussions) we’ve been having for the last decade. Relational database using CRUD API feels pretty old school compared to the high-velocity data-driven and API-enabled pipeline that Sam talked about.
We ended our discussion agreeing that the last decade feels like we’ve been in training for what is to come during the next decade, and that the databases, containers, CI/CD, gateways, and other essential infrastructure behind our APIs were just the groundwork for what is to come. The people and repeatable processes augmented with more practical machine learning are what will allow us to scale the business side of the equation, and the proven open-source solutions like Cassandra and Kubernetes will handle the technical scaling.
Ultimately, my discussion with Sam triggered ideas for future Breaking Changes episodes that just might be dedicated to DataOps, MLOps, and ModelOps. I am extremely curious how APIs are enabling all these dimensions of API operations in the same way as DevOps. And I’m excited to learn more about how they’re making our teams more productive while also ensuring the quality of services we need to power our applications and the dashboards we depend on to answer the short- and long-term questions concerning our business operations.
What else have I discussed with other stellar guests from the API universe? Check out the key takeaways and full videos of previous Breaking Changes episodes here.