Supabase as VectorStore
We will introduce Supabase as a vector store and walk through the process of storing various embeddings in Supabase Vector, a combination of PostgreSQL and pgVector
, aimed at facilitating semantic search.
Supabase, an open-source alternative to Firebase, is constructed atop PostgreSQL, a robust SQL database suitable for production environments. Given that Supabase Vector is built on pgVector
, it allows for the storage of embeddings alongside other application data within the same database. This integration with pgvector
’s indexing algorithms ensures that vector searches remain efficient even at large scales.
Supabase further enhances app development by offering a range of services and tools, including an auto-generated REST API, to streamline the storage and querying of embeddings within PostgreSQL. Additionally, Supabase provides a user-friendly interface that simplifies the interaction with SQL editors and database components, making it easier to manage and manipulate data.
Our goal in this business case is to utilize Supabase as a vector store, focusing on the following objectives:
- Leverage the open-source nature of PostgreSQL with the
vector
extension for efficient vector storage. - Manage both data and extensions seamlessly within the Supabase UI, ensuring ease of use and control.
- Ensure vector data is stored persistently, maintaining integrity and availability for future use.
- Integrate with LangChain.
Installing Self-Hosting with Docker
There are multiple methods to install Supabase, and we will focus on self-hosting with Docker. Alternatively, you can opt for the Supabase online service, eliminating the need to manage its infrastructure yourself.
One of the advantages of self-hosting Supabase is the assurance of persistent local storage for all data, ensuring that queries and responses remain within your private network for enhanced security.
Let’s assume that you already have had git
and docker
installed and configured properly.
# Get the code
git clone --depth 1 https://github.com/supabase/supabase
# Go to the docker folder
cd supabase/docker
# Copy the fake env vars
cp .env.example .env
# Pull the latest images
docker compose pull
# Start the services (in detached mode)
docker compose up -d
It may take a while to download the images within your Docker environment. When it’s done, the output looks like this.
[+] Running 12/13
⠴ Network supabase_default Created 19.6s
✔ Container supabase-imgproxy Started 1.1s
✔ Container supabase-vector Healthy 6.4s
✔ Container supabase-db Healthy 12.1s
✔ Container supabase-analytics Healthy 17.8s
✔ Container supabase-auth Started 18.7s
✔ Container supabase-meta Started 18.5s
✔ Container supabase-edge-functions Started 18.7s
✔ Container realtime-dev.supabase-realtime Started 18.5s
✔ Container supabase-studio Started 18.9s
✔ Container supabase-kong Started 18.5s
✔ Container supabase-rest Started 18.1s
✔ Container supabase-storage Started 18.9s
For more information about self-hosting install, here’s the official documentation
Reference > https://supabase.com/docs/guides/self-hosting/docker#accessing-supabase-dashboard
Configuring Vector for Embedding
It’s important to note that when using the sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
embedding model, the dimensionality of the embedding vector is 384
. This detail must be considered when changing the embedding model, as it affects the configuration of the vector database with Supabase.
After all Docker containers have been fully created and launched, you can access the Supabase portal at http://127.0.0.1:8000/. The default user ID and password can be found in supabase/docker/.env
.
DASHBOARD_USERNAME=supabase
DASHBOARD_PASSWORD=this_password_is_insecure_and_should_be_updated
It is strongly recommended to change these default values for security reasons, especially if running in a production environment.
Please be aware that if you modify these values in the supabase/docker/.env
file, you must restart all the containers to apply the updates. This can be done using the following commands:
cd supabase/docker
docker compose down
docker compose up -d
Security Warning: The IP address on port 8000 is intended to be accessible from anywhere on the network. Therefore, it is not advisable to expose this directly to the internet without first setting up a proper firewall configuration.
After you successfully log into the portal, you can see
Figure 6.2 Supabase Dashboard
Select SQL Editor
from navigation panel, copy and paste the following SQL into SQL Query
, then click Run
. The SQL code creates an extension of vector
and a table called documents
which will be used to store the embedded documents.
-- Enable the pgvector extension to work with embedding vectors
create extension if not exists vector;
-- Create a table to store your documents
create table
documents (
id uuid primary key,
content text, -- corresponds to Document.pageContent
metadata jsonb, -- corresponds to Document.metadata
embedding vector (384) -- 384 works for Sentence-Transformers embeddings, change if needed
);
-- Create a function to search for documents
create function match_documents (
query_embedding vector (384),
filter jsonb default '{}'
) returns table (
id uuid,
content text,
metadata jsonb,
similarity float
) language plpgsql as $$
#variable_conflict use_column
begin
return query
select
id,
content,
metadata,
1 - (documents.embedding <=> query_embedding) as similarity
from documents
where metadata @> filter
order by documents.embedding <=> query_embedding;
end;
$$;
Note: Since we are utilizing the
sentence-transformers/paraphrase-multilingual
model for embeddings, it is necessary to set the “embedding vector” value to384
.
If you need to switch to OpenAI’s embedding model, you’ll have to specify the vector value. Here’s an example:
ALTER TABLE documents
ALTER COLUMN embedding TYPE vector(1536);
Upon completion of the execution, you will discover a database named “documents” under “Database” and an extension named “vector” located under “Extensions.”
For detailed documentation of LangChain with Supabase, you may want to check
Reference > https://python.langchain.com/docs/integrations/vectorstores/supabase/
On macOS, you may experience an issue where the containers supabase-meta
and supabase-studio
continuously restart. This problem can be resolved by disabling the “Use Virtualization Framework” option and changing the file sharing implementation from “VirtioFS” to “gRPC FUSE”. This solution applies to Docker Desktop version v4.29.0
.
Settings > General > Use Virtualization framework > select “gRPC FUSE” then un-check “Use Virtualization framework”
Figure 6.3 Change Virtualization framework for Supabase on macOS
Reference > https://github.com/supabase/cli/issues/1724
Accessing the APIs
Upon completing the Supabase configuration, it’s necessary to identify the API service key for LangChain, which will be responsible for managing Supabase as the vector store. This key can be found in the section of ANON_KEY
.
cd supabase/docker
cat .env
The ANON_KEY
is API service key.
############
# Secrets
# YOU MUST CHANGE THESE BEFORE GOING INTO PRODUCTION
############
POSTGRES_PASSWORD=your-super-secret-and-long-postgres-password
JWT_SECRET=your-super-secret-jwt-token-with-at-least-32-characters-long
ANON_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyAgCiAgICAicm9sZSI6ICJhbm9uIiwKICAgICJpc3MiOiAic3VwYWJhc2UtZGVtbyIsCiAgICAiaWF0IjogMTY0MTc2OTIwMCwKICAgICJleHAiOiAxNzk5NTM1NjAwCn0.dc_X5iR_VP_qT0zsiyj_I_OZ2T9FtRU2BBNWN8Bu4GE
SERVICE_ROLE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyAgCiAgICAicm9sZSI6ICJzZXJ2aWNlX3JvbGUiLAogICAgImlzcyI6ICJzdXBhYmFzZS1kZW1vIiwKICAgICJpYXQiOiAxNjQxNzY5MjAwLAogICAgImV4cCI6IDE3OTk1MzU2MDAKfQ.DaYlNEoUrrEn2Ig7tqibS-PHK5vgusbcbo7X36XVt4Q
DASHBOARD_USERNAME=supabase
DASHBOARD_PASSWORD=this_password_is_insecure_and_should_be_updated
Security Warning: the API key and password used here are default values without any modifications. For production environments, it’s crucial to update all passwords and API keys to enhance security. You can visit the following website to update and modify your keys.
https://supabase.com/docs/guides/self-hosting/docker#securing-your-services
Inserting the Embedded Data into VectorStore with SupabaseVectorStore.from_documents
After establishing the documents table in the database, we proceed to insert the embedded data into it using SupabaseVectorStore.from_documents
and define the vector store.
import os
import getpass
os.environ["SUPABASE_URL"] = getpass.getpass("Supabase URL: ")
os.environ["SUPABASE_SERVICE_KEY"] = getpass.getpass("Supabase Svc Key: ")
from langchain_community.vectorstores import SupabaseVectorStore
from supabase.client import Client, create_client
supabase_url = os.environ.get("SUPABASE_URL")
supabase_key = os.environ.get("SUPABASE_SERVICE_KEY")
supabase_client = create_client(supabase_url, supabase_key)
# create a new collection if it doesn't exist
vectorstore = SupabaseVectorStore.from_documents(
chunks,
embedding = embedding,
client=supabase_client,
table_name="documents",
query_name="match_documents",
chunk_size=500,
)
retriever = vectorstore.as_retriever()
Once the embedded data has been loaded, you can execute queries against the vector store using the SupabaseVectorStore
function, instead of SupabaseVectorStore.from_documents
, without the need to reload the data into the vector store. We’ll see the code below.
# query an existing collection
vectorstore = SupabaseVectorStore(
embedding = embedding,
client = supabase_client,
table_name = "documents",
query_name = "match_documents",
)
retriever = vectorstore.as_retriever()