Data Chatbots: what people are really doing
The AI craze has caused many to imagine the potential for a data chatbot - allowing business stakeholders to chat with their data without needing an analyst. Let's evaluate its success so far.
Humanity has been talking about self-driving cars for decades. Over time, it just made sense to us - cars should be able to drive themselves and allow us to read a book, do our makeup, or just sleep. The reality, however, is that apart from limited successes (such as Waymo in SF and LA), and a global human trial (led by Tesla’s FSD), we haven’t reached the dream yet. And it’s 2025!
However, that doesn’t mean that the effort to make self-driving possible hasn’t resulted in massive benefits for humanity. Driver-assist technologies have been helping people avoid accidents proving that the technology saves lives. So while the goal is quite aspirational, and some may give up on it along the way, chasing that goal has massive benefits.
Data Chatbots: self-driving dream, or reality?
Over the past year, we spoke with over 170 data and analytics leaders and practitioners. We were interested in understanding their challenges, and how they’re looking to solve them. As part of those conversations, we also asked - what do you think of data chatbots?
Some of those companies have already been working on building their own data chatbots. Smart people are trying to wrestle ChatGPT, Claude and other LLMs to get them to understand their organization’s data stack, write legible SQL queries, and run them. It just makes sense, doesn’t it? “If LLMs can write Python code, better than most programmers, they can write SQL code too.”
The reality, is that while we consider SQL as just another language, that’s an oversimplification of the situation. Each organization has its own data and business language. Even multiple languages. And SQL is just a way to interact with it. Within the SQL code of each such organization lay thousands of concepts about the company’s business lines, how the data is organized, and what data is usable.
To make matters worse, business stakeholders don’t even know what they’re asking for. Across my conversations, I heard data analysts complaining that the requests they are getting are poorly structured, and oftentimes aren’t even what the stakeholder needs. So, for every analytics request, there’s a need for some back and forth between the stakeholder making the request, and the analyst who will be working on it, to better define the task at hand.
So, if a business stakeholder asks a chatbot for data, and the chatbot takes the request literally, it quite probably will provide the wrong answer.
Not to mention the complexity of data access in larger organizations - making sure each person is only getting responses to questions they are supposed to be able to get answers to.
Still, companies like Databricks are trying to build this.
So should we just give up on data chatbots?
Not yet. Just like with self-driving cars, chatbots can operate well within certain constraints. The most successful approach to this that I’ve witnessed is one where you work with your data analysts and data scientists to curate a list of 100-150 questions that the chatbot should be able to answer and focus on those. It’s a small drop in the bucket of data analysis requests, but it’s a start. WHOOP did it.
Or consider aCommerce’s approach - focusing on the set of questions their customers use most often and making it available to them in a chatbot.
I’ve also started to see a different approach, that actually is working better, because it changes the target user persona of the chatbot. You see, people were thinking that “ai chatbots” should focus on the business stakeholders and just give them answers. But if that’s way too hard right now, you can instead focus on data analysts.
Keep the human in the loop. Simply help the analysts do their job better/faster, which will result in improved output back to the business while reducing the chances for errors introduced by putting the AI directly in front of the business stakeholders.
LinkedIn built it. Grab built it. I also know of several other companies who have built it, too. The idea in this case is to focus your user persona on those people who are able to understand and run SQL, and help them use the more correct SQL for each use case, faster and better.
Our nearest stepping-stone: analyst-assist
Just like driver-assist technologies have been improving safety and ride experience all over the world, I truly believe that analyst-assist technologies will bring closer that dream of broader data access to all. I’ve seen it work.
Is it easy to build? No. Not at all. It’s super hard.
We are seeing Hex make a dent in this effort by making notebook-style analysis easier and faster. We are seeing home-grown solutions making progress towards helping analysts use the right queries each time (like the LinkedIn example above).
But, at the same, time, and I can say this from first-hand experience, building a solution that can answer a variety of questions asked by 200+ analysts in a 3000-person organization is really, really, really hard.
Isn’t that what the CTO of Microsoft meant when he said that you should now pursue problems that are just merely hard?