Semantic layer for AI: let's not make the same mistakes we did with data catalogs
Most data catalog projects fail. One of the main causes is the need for Data Stewards (humans) to continuously catalog the data. Semantic layers are falling into the same trap.
Speak with someone in the data world about data catalogs, and you’ll get an eye-roll. While there are some fantastic data catalog companies out there, the reality is that the majority of implementations of data catalogs fail.
“We implemented a data catalog on top of DataHub and no one uses it”, told me a data leader recently. Another one said “I’m at a point where I have a genetic pre-disposition to hate catalogs.”
Data catalogs are well intentioned. They are there to solve governance issues, and help with data democratization. But they keep failing to achieve the latter.
Why? There are multiple reasons - the top one being the fact that data catalogs aren’t well suited for the actual users who need them for data democratization: the business users and business analysts. But today, I won’t discuss that subject (future post to follow).
Today - we’ll talk about the human cost.
“It’ll all be perfect if we can just invest X man years in documentation of data”
Humans hate to document.
It’s boring.
It doesn’t serve us (because the person who knows the information doesn’t need the documentation), it serves others (who don’t know the information) and so we need to be major altruists for it to work.
It’s a Sisyphean task, a never-ending toil.
It doesn’t get us promoted. It doesn’t get us hired (usually).
So… we suck at it.
Still, with all that failure, dozens of data catalog companies told their thousands of prospective customers: “All you need to do is hire a bunch of data stewards to document the data… and boom! Data democratization nirvana.”
And now… Semantic Layer companies (or those products who have semantic layers within them) are saying the same thing. Snowflake, dbt, even AWS.
It goes something like: “Before GenAI… semantic layer’s value was kinda limited. But now with AI… it’ll be amazing. Just get a robust semantic layer and your AI can use it to replace your BI platform.”
And there’s actual proof it works!
Except… humans documenting data again? Haven’t we learned our lesson?
One of the 250+ companies we’ve spoken with over the past year told us, with a large amount of excitement, that they solved the Text2SQL challenge on their own. They even posted articles about it. Except… we now hear from them that the solution is starting to die off because no one wants to maintain the documentation it relies on.
Sounds familiar?
There is a solution to this pain: AI that generates the semantic layer, so that another AI can use it to do Text2SQL / AI for BI / etc. This is a really, really, really hard task. But not impossible anymore.
That’s exactly what we’re doing at Solid - we start with documenting all of the assets within the BI and data warehouse, at a much higher level of accuracy than what you’d get elsewhere. We make it accessible to users (via the web browser, Slack bot, chrome extension), as well as to machines (API, MCP, documentation exports).
Interested in learning about how we’re building this at Solid? Hit us up: https://www.getsolid.ai/contact