Free, AI-generated, high accuracy documentation and semantic model of your data assets now available
For a limited time, Solid is offering a free, full documentation of your data warehouse and BI assets. The output can be used by both humans and machines (AI). Yoni, our CEO, shares the details
At this point, we all know that a full documentation of your tables, views, columns, primary/foreign keys, best/verified queries, and BI dashboards/reports is crucial for:
Helping data engineers know what exists, and how it’s being used.
Aiding analysts in using the right assets, in the right way.
Furthering data democratization, by making data assets more accessible to business stakeholders.
In recent months, another important use case showed up:
Generating a semantic model for your AI initiative. AI can’t make sense of data on its own, but with a detailed semantic model, it can.
Unfortunately, it’s a lot of manual work, but it doesn’t have to be.
The key is to autogenerate it. In this blog post, I’ll share how Solid can do exactly that for you. You can skip all this and fill out a form for us to show you a demo.
How does the auto-documentation work?
Over the past 18 months, Solid has built a robust, AI-based engine, that is really good at documenting the data assets in BigQuery, Databricks, and Snowflake, as well as the dashboards and reports in Looker, PowerBI, and Tableau.
It works by pulling in metadata from those system, plus other sources of information:
Schema (tables/views/columns/keys) from the data warehouse.
SQL query log, from the same data warehouse.
Metadata from the BI platform (dashboard/report definitions, codified semantic model, usage information, etc)
dbt code repo (if you use dbt, not a must)
JIRA tickets associated with analytics tasks (helps the engine understand the language of the business, as well as what people ask for)
Shared analytics Slack channels (where users ask questions about data, same reason as JIRA ticket data)
Any information you’d like to share with the platform about your business (glossaries, terminology, documentation you may have, etc)
Then, it gets to work:
It starts by establishing a glossary and an understanding of the business. This is crucial to for the AI to be able to comprehend your internal acronyms and lingo, which it couldn’t otherwise.
It builds a graph map of the relationships of the assets, such as what columns are used by each BI dashboard, and what tables get used together in SQL queries. This helps the AI understand the role certain columns and tables have that wouldn’t be obvious otherwise (like that COL_157 you have in your main dataset there).
We then cluster the data assets, based on their semantic information (what they’re used for) and usage information (who is using them). This helps identify things like Product Analytics, Operational datasets, etc.
Quality is automatically assessed for each asset based on various algorithms - one for each type of asset. For example, for tables in your DWH, it will use information about how often the table is used, when the data was last updated, whether the table is “just a step in a pipeline”, vs being used for analytics, etc.
Lastly, the hard work of documentation will occur. The Solid engine will iterate through all assets, in a cyclical process, and document each of them. Each asset isn’t documented on its own, rather each asset is documented with the context of all of the other assets that used it, who uses it and for what, etc. This moved the accuracy from around 50% (if you just drop your CREATE TABLE statement in ChatGPT, for example) to 95%.
Users are able to influence the algorithm to drive the quality even higher if they’d like.
This process can take up to two weeks, depending on the number of assets involved. We’ve worked with organizations of various sizes, from hundreds of tables to nearly 100,000.
IMPORTANT: The entire process runs in our Azure environment in the US. Our software and environment, as well as our company’s related operations, are SOC 2 audited annually. We take security extremely seriously. Report available upon request under NDA.
What is the output that you get from this?
Once the documentation process is done, you can access it in a variety of ways:
Through our own web user interface, using your browser.
Using our API, which you can use to search/query through the documentation and retrieve any information you’d like. You can also do this via MCP (with Solid being an MCP Server).
As a file export, in multiple formats:
Straight up JSON (same format we utilize for the API)
BigQuery schema JSON, so you can use Gemini with BigQuery
As a series of COMMENT ON SQL commands (for both tables/views and columns), this can be used with Databricks Genie
A Snowflake semantic model YAML, to be used with Snowflake Cortex Analyst
Why is Solid doing this for free?
Solid is really good at documenting assets. We built this documentation engine because the customers we were working with on our core product didn’t have robust documentation nor an accessible semantic layer. So instead of making it a requirement to have one for our product to work, we decided to auto-calculate it for those who don’t have it.
“By accident”, we ended up with a really strong engine that we think can benefit many organizations out there, and we’re offering access to it for free.
If you decide that you like our documentation and want to graduate to our platform’s full features - helping business users and analysts get to insights faster and more consistently - we’d be even happier of course.
To get started with your free asset documentation and semantic model generation, fill out this form. The offer is available until August 1, 2025.