Joey on SQL Server

SQL Server 2025 Brings AI-Powered Semantic Search to Local and Cloud Data

A new SQL Server 2025 feature lets organizations run vector-based semantic searches on their own data, connecting to local or cloud-hosted AI models without relying on massive general-purpose LLMs.

I have mixed feelings about large, language model (LLM)-based AI like ChatGPT and Copilot. There are productivity benefits for using LLMs for some software development tasks. However, those LLMs aren't nearly good enough (and likely never will be) to replace human coders, much less understand all the various business rules and conditions required to build applications. There are also economic and environmental concerns. With that information in mind, in my opinion, the best business case for using those LLMs is building on your firm's data, without the massive footprint of general models to enable functionality like semantic search within your data -- even within your database. Note: There are several other scenarios for other uses of AI in databases. for the sake of brevity, we're going to focus on semantic search.

One of the hero features (that's a marketing term for features that get all the press) of SQL Server 2025 is its integration with AI. I talked about it during Microsoft Build. Still, Microsoft has included functionality to connect the data in your database to AI models, running either on your hardware or a cloud platform. In this article, we're going to walk through that configuration and how you build queries to leverage those models.

You might ask, how could I use this functionality in an application? Or, we have queries with LIKE clauses or full-text indexes, how does this help us? Well, let's walk through this demo using the AdventureWorks2025 database and then come back to those questions. You can run this on your laptop, courtesy of my MVP colleague Anthony Nocentino. You will need to have Docker Desktop running on a non-ARM chip, along with a SQL Server client. I'm using SQL Server Management Studio 21. Apologies to Mac users (myself included), the SQL Server implementation of vectors does not support Rosetta, which Apple Silicon uses to process x86 instructions.

The script uses Docker Compose to run several containers:

  • SQL Server 2025
  • SQL Tools
  • Ollama
  • Nginx with SSL (Required to act as a secure proxy between SQL Server and Ollama)
  • Automation containers

This implementation of LLMs in SQL Server depends on the sp_invoke_external_rest_endpoint functionality that has been in Azure SQL Database for several years and is part of SQL Server 2025; however, there are dedicated commands around the AI models, including CREATE EXTERNAL MODEL and AI_GENERATE_EMBEDDINGS. In this demo, we will use the AdventureWorksLT database. An AI model is also required to create the embeddings. As mentioned, we are using Ollama locally and I have also tested with Azure AI Foundry to demonstrate how a cloud model can work. One last note, if you're trying to build this locally from scratch yourself, SSL to the model server is required. You'll need to ensure that your certificate is properly trusted by adding it to the local computer certificate store in Windows.

Let's get started here -- in this demo gif, I'm creating a model object in SQL Server, pointing to my local Ollama model. Note: this is a database-scoped model, despite the catalog view sys.external_models being instance-scoped.

Figure 1.

The second block of code that I executed here was to vectorize a sample text string called "test string". This execution results in an external call to my Ollama service (which in this case is running in a Docker container on my laptop). Embedding data can be a resource-intensive process, as you'll see in the next GIF.

Figure 2.

In this video, I'm adding two columns: one for our "chunks" of text, that the model will use to vectorize the data, and then an embeddings column using the VECTOR data type. We then run an update statement to generate the embeddings for the Product table in the AdventureWorks database. I've edited the video, but the process takes approximately 30 seconds for 295 rows.  I have heard of several days of processing required for much larger datasets. You can then see the embeddings as I execute a SELECT query across the table.

Vector embeddings are representations of text, generated by a large language model. You can think of these embeddings like lossy compression of data. When we specify the size for the VECTOR data type (768 in this example), that represents the number of dimensions the vector will represent and not the bytes like SQL Server data types typically use (each dimension of the vector is stored internally as 4-byte float), which will depend on the model generating your embeddings. Generating vectors does not represent a full copy of your data, but  aims to provide a representative sample using a dense numerical representation.  The way the SQL Server engine compares these values is by calculating their mathematical distance, typically using cosine similarity or Euclidean distance. You should note that different language models produce different, and potentially more accurate vectors for a given set of chunks. You may want to experiment with this in testing.

Now, that we've vectorized our data, let's explore how semantic search works in the following demo.

Figure 3.

The first thing we're doing here is declaring a variable for a natural language query, and then vectorizing that string of text so we can compare it to the embeddings in our table. Then we're running that query using the VECTOR_DISTANCE function in SQL Server 2025. Our initial query scans the whole table because it's looking for the exact distance between our input vector (the query string) and the embeddings column in the table, returning the top four values.

In the next bit of code, we are creating a vector index, which uses DiskANN, which is Microsoft's algorithm to support approximate nearest neighbor (the ANN part of that acronym). By using a DiskANN index, we get an approximation based on the graph distance. If you don't have a Ph.D. in graph theory, you can learn more about this in the link, but basically, this is a faster search that requires less data. We are using that index in conjunction with the VECTOR_SEARCH function which uses an approximate search approach and can leverage that DIskANN index. There are still some significant limitations on DiskANN indexes in SQL Server, the biggest being that they are non-updatable. I expect that to be fixed in time, but I can offer no guarantees of when that might happen.

Aside from catalog search, what are some other scenarios  where you might use vector-based searches? The project I'm currently working on has a somewhat incomplete set of data around vendors. While we are in the process of sourcing more data, we can gather free text, such as descriptions or about us pages, from vendor Web sites and generate embeddings on that text. Having this vectorized data will allow us to add functionality to our application, like "would you like to see more vendors like this one?" (I promise we won't try to sell you a toilet seat after you just bought one). The other advantage of vector search in your database is that the corpus tends to be your data, and the total set is narrower, significantly reducing the hallucinations that plague LLMs. It's exciting technology, and I look forward to seeing where it goes in time, especially once SQL Server 2025 goes GA.

About the Author

Joseph D'Antoni is an Architect and SQL Server MVP with over two decades of experience working in both Fortune 500 and smaller firms. He holds a BS in Computer Information Systems from Louisiana Tech University and an MBA from North Carolina State University. He is a Microsoft Data Platform MVP and VMware vExpert. He is a frequent speaker at PASS Summit, Ignite, Code Camps, and SQL Saturday events around the world.

Featured

comments powered by Disqus

Subscribe on YouTube