Some of you may have heard of RAG, retrieval augmented generation?
If you want to use an LLM to answer questions about data it wasn’t trained on, you can use the RAG pattern to supplement it with extra data.

But before we get into RAG, I wanted to touch on Vector Databases a little as they have become popular with the world of AI.
TLDR; A Vector Database is fantastic at cataloging how different pieces of data are related to each other.
What is a Vector?
Vectors are arrays of numbers and when those arrays represent something we call them embeddings. The term vector really just refers to the mathematical concept whereas embedding is kind of like an applied vector if you will. So what do these embeddings represent? Well, technically anything you want, but because it’s very common to use vector databases for natural language processing and semantic search.
Want to learn more about Vector Databases, take on this book! I have not braved it but in the content I have been reading and watching this is mentioned a lot.

Vector databases are just collections of embeddings and these are organised into indexes. An index is kind of like like a table, so a collection of rows of embeddings and we call those records.
RAG
Ok this then brings us back to one of the initial things we said:
If you want to use an LLM to answer questions about data it wasn’t trained on, you can use the RAG pattern to supplement it with extra data.
Let’s say you have a bunch of support docs.
These would get turned into embeddings and stored in a vector database. Then when the user types in a prompt, that prompt gets turned into embedding which is used to search the vector database for similar information.
What you’re doing here is a similarity search. Basically, you’re just looking for the nearest neighbour’s to the embedding that you give the database.
An example
Obviously, I wanted to get hands-on and start playing with some of this stuff in a world of AI but also as a Data Technologist I wanted to see what was possible with some of this data and see how it would handle being hovered above a powerful LLM.
Which then led me down a rabbit hole of how important do these Vector Databases become after your own data is embedded, how much CPU and GPU time and effort does this cost to re embed if something was to go wrong? Anyway that might be another post shortly.
Above we mentioned
Let’s say you have a bunch of support docs.
Now instead of docs lets pretend that we have an amazing community repository called 90DaysOfDevOps full of data and learning information. Kind of similar to support docs! We could probably ask an LLM about 90DaysOfDevOps and get some info back… but its going to be vast and wide and the LLM probably was not trained on this repository.
I am using Ollama with Mistral here… the other model will become clear later.

and if we then ask mistral a question about 90DaysOfDevOps what do we get?

For some this might be the way we have been interacting with LLMs so far, but what if we were able to take that personal data, or data that we want to specifically embed and use against or alongside (not sure terms) with an LLM. We can surely get a more rich response overall?
I have my dataset in the 90DaysOfDevOps repository, locally git cloned to my machine. I then have that mxbai-embed-large model you saw above and a trusty friend of mine in a Postgres Database instance running on a VM but could be anywhere and this has the Pg-Vector extension enabled for Knowledge storage. (Maybe another post, lets see how this one goes first)

I wrote a little app to deal with that embed process which is then in turn the same app which will allow me to interact with that RAG + LLM via a chat / API interface.
https://github.com/MichaelCade/vector-demo
Again maybe we need to go into more detail about this app another time, but for now. We have our Knowledge from our 90DaysOfDevOps repository. Each of these markdown files contains basically a blog about a topic related to the world of DevOps.

We have our Golang code to embed our data.

When the worlds align, and we run our binary against our data that has access to our likely hard coded postgres database instance…. we should start the embedding process into our vector database.

NOTE: if you made it this far and want to see how to spike your GPU… change the code to use mistral for the embedding process, a model that does not know how to embed or has not been trained on that like the embed model. Another rabbit hole I found that there are all sorts of models trained for different scenarios.
Here is what things look like within our super secure vector database, that we leaked connection info and all sorts via GitHub.

Using the same Golang binary we ran we can now interact with that API and chat with the vector plus mistral model.

I wanted to be sure that we were indeed getting something from the vector when we did this so added some additional code to tell me the chunks it was using to respond.

Now our whole app looks like the above embed part but also we added Backend API to the same code base. In the GitHub repository, linked above you will see a vector-demo-ui this is the React Frontend… no shame in saying I used vibe coding for this… who likes frontend stuff anyway.

and to top things off if you don’t want to interact with your AI chat assistant via curl then the frontend almost looks pretty…

Before we wrap things up, we should ask it something specific to the vector embeddings we have provided. First if we ask mistral directly about Day 49 of 90DaysOfDevOps we get:

Then with our RAG + LLM we get:

If you made it this far, I am impressed! We have seen a demise in blog views I think over the last few years so when I jot something down it is mostly for future me, looking for something I have done before, but hopefully this helps spur on someone else to unlock some of their data, and if useful, let me know… Also if you would like to see some content about protecting vector databases, or a deeper dive into the terrible coding I am doing with Golang let me know.