Layers of Data Structure: Usage is the New Schema
In this post, Tal Segalov, Solid's CTO & Co-Founder, takes us through a challenge in modern data - the lack of a rigid schema for AI to learn off of.
Remember the days of the classic relational DBs? For decades, the schema was the law. It was the blueprint, the single source of truth that told you exactly what was where. If you wanted to know how ORDERS related to CUSTOMERS, you just looked at the diagram.
But in the modern data stack, that’s just not true anymore. The real, living, breathing logic of our business isn’t in a static schema diagram. It’s scattered across the entire stack, a kind of “ghost in the machine” that you can only see by observing its behavior.
From Blueprint to Reality: The Layers of Data Truth
Paraphrasing the words of Shrek - “data stacks are like onions”. They have layers. If the schema is just the blueprint, the real “tribal knowledge” is built in layers on top of it. Each layer adds crucial context, moving us further from raw data and closer to real business logic.
Layer 1: The Traditional Schema. This is your foundation. It tells you the column names and data types (e.g., ORDERS.AMOUNT is a NUMBER). It’s the “what,” but it tells you nothing about the “how” or “why.”. Also in modern data warehouses, often key items like foreign keys are missing - they are no longer needed for DWH operations.
Layer 2: Query Logs & SQL Constructs. This is the first and most important layer of behavior. It’s the “desire path” showing how people and systems actually use the data. The query logs are a living history of every business question ever asked, revealing what’s joined, what’s filtered, and what’s aggregated. This is also where you find “local slang”—complex WITH clauses (CTEs) or subqueries used over and over, representing common knowledge so established no one wrote it down.
Unfortunately, there’s a ton of garbage here too. Test queries, bad queries, people using the data incorrectly. There’s a lot of noise in the signal here.Layer 3: Modeling Layers (dbt). This is the explicit manufacturing layer. Your dbt models are a goldmine, showing the formal, multi-step transformations that turn raw “bronze” data into trusted “silver” and analytics-ready “gold” tables.
Layer 4: Lineage. This layer provides the data’s supply chain. It answers, “Where did this number come from?” Understanding its origin is essential for trusting it.
Layer 5: BI Models. This is the “last mile” of analytics. The semantic models in your BI tool (Looker, Power BI, etc.) represent the final, polished set of definitions and metrics that the business actually consumes.
Why This Is the Bottleneck for AI
This stack of fragmented layers is, in my opinion, the single biggest barrier to successful enterprise AI.
We’re all under pressure to “leverage AI”. We have access to incredibly powerful LLMs, the “charismatic storytellers” that can supposedly answer any business question in natural language. But we’re handing these powerful engines an outdated map (Layer 1) and wondering why they get lost…
When a marketing manager asks, “Which campaigns had the best ROI?”, the AI can’t just guess. It needs to know precisely what you mean by “ROI.” Is that logic hidden in a dbt model (Layer 4), a common query (Layer 2), or a BI tool (Layer 6)? Without this, the AI “hallucinates”, giving answers that look right but are fundamentally wrong. This is why so many AI projects get “stuck at the POC stage”.
At Solid, we believe the solution isn’t to try and manually update the blueprint. It’s to build a new, living map based on all the layers of behavior. Our platform is built to be a “digital archaeologist”, automatically sifting through all these layers—the queries, the models, the lineage, the BI tools, and the data itself.
By doing this, we can auto-generate the rich documentation and semantic models that make your data AI-ready. The real value isn’t just in the AI; it’s in making your data trustworthy enough for the AI to use in the first place.

