Defining Embedding Model and VectorStore with FAISS
It’s a little surprising to me that Facebook AI Similarity Search (FAISS) was released in 2017. An explanation from its official documentation:
… FAISS, a library that allows us to quickly search for multimedia documents that are similar to each other - a challenge where traditional query search engines fall short. We’ve built nearest-neighbor search implementations for billion-scale data sets that are some 8.5x faster than the previous reported state-of-the-art, along with the fastest k-selection algorithm on the GPU known in the literature. This lets us break some records, including the first k-nearest-neighbor graph constructed on 1 billion high-dimensional vectors.
Traditional databases consist of structured tables filled with symbolic data. For instance, an image collection would be organized into a table with each photo represented by a row, containing details like an image ID and descriptive text. These rows can also connect to entries from other tables, such as linking an image with people to a table of names.
AI tools, including text embedding methods like word2vec or convolutional neural network (CNN) descriptors trained with deep learning, generate high-dimensional vectors. These vectors offer a more potent and adaptable representation compared to fixed symbolic representations. However, traditional databases designed for SQL queries are not equipped to handle these new vector representations. The sheer volume of new multimedia content generates billions of vectors, and more critically, identifying similar entries involves finding similar high-dimensional vectors, a task that is inefficient and often impossible with conventional query languages.
Let’s install FAISS and its dependencies.
pip install -U langchain-community faiss-cpu langchain-openai tiktoken
Then define an embedding model using paraphrase-multilingual_MiniLM
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
embedding = HuggingFaceEmbeddings(
model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")
Use FAISS.from_documents
to insert the embedded document into FAISS vectorstore. Then define retriever
from langchain_community.vectorstores import FAISS
vectorstore = FAISS.from_documents(data, embedding)
retriever = vectorstore.as_retriever()
The last line of code snippet converts the vectorstore
into a retriever
class. This allows us to easily use it in other LangChain methods, which largely work with retrievers.
Compressed Retriever
In our project, we found that:
The LLM generates a predicted result if the RAG (Retrieval-Augmented Generation) cannot find an existing answer from similarity_search
. Or the retrieved documents contain too much irrelevant information and are distracting the LLM. This is unacceptable when proposing such a solution to our customers. Technically, if the vector database does not have a relative output, it should indicate that it does not know, without causing confusion with predictions made by the LLM.
Often, the most relevant information for a query may be buried within documents containing a significant amount of irrelevant text. Passing the full document (or spreadsheet, and CSV) through your application can lead to more expensive LLM calls and poorer responses.
Contextual compression is a solution to this problem. The idea is simple: instead of immediately returning the retrieved documents as-is, you can compress them using the context of the given query, ensuring that only the relevant information is returned. “Compressing” here refers to both reducing the contents of individual documents and filtering out documents entirely.
To use the Contextual Compression Retriever, you’ll need:
- A Base Retriever
- A Context Compressor
The Contextual Compression Retriever passes queries to the base retriever, takes the initial documents, and then passes them through the Context Compressor. The Context Compressor then shortens the list of documents by reducing their contents or dropping them altogether.
Reference > https://python.langchain.com/docs/modules/data_connection/retrievers/contextual_compression/
Here’s an example:
rectangle ContextualCompressionRetriever { file base_retriever file base_compressor } file compression_retriever note bottom of compression_retriever: new retriever, used in chain base_retriever -> base_compressor ContextualCompressionRetriever -> compression_retriever