Defining Embedding Model and VectorStore with FAISS

It’s a little surprising to me that Facebook AI Similarity Search (FAISS) was released in 2017. An explanation from its official documentation:

… FAISS, a library that allows us to quickly search for multimedia documents that are similar to each other - a challenge where traditional query search engines fall short. We’ve built nearest-neighbor search implementations for billion-scale data sets that are some 8.5x faster than the previous reported state-of-the-art, along with the fastest k-selection algorithm on the GPU known in the literature. This lets us break some records, including the first k-nearest-neighbor graph constructed on 1 billion high-dimensional vectors.

Traditional databases consist of structured tables filled with symbolic data. For instance, an image collection would be organized into a table with each photo represented by a row, containing details like an image ID and descriptive text. These rows can also connect to entries from other tables, such as linking an image with people to a table of names.

AI tools, including text embedding methods like word2vec or convolutional neural network (CNN) descriptors trained with deep learning, generate high-dimensional vectors. These vectors offer a more potent and adaptable representation compared to fixed symbolic representations. However, traditional databases designed for SQL queries are not equipped to handle these new vector representations. The sheer volume of new multimedia content generates billions of vectors, and more critically, identifying similar entries involves finding similar high-dimensional vectors, a task that is inefficient and often impossible with conventional query languages.

Let’s install FAISS and its dependencies.

pip install -U langchain-community faiss-cpu langchain-openai tiktoken

Then define an embedding model using paraphrase-multilingual_MiniLM

from langchain_huggingface.embeddings import HuggingFaceEmbeddings
embedding = HuggingFaceEmbeddings(
    model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")

Use FAISS.from_documents to insert the embedded document into FAISS vectorstore. Then define retriever

from langchain_community.vectorstores import FAISS
vectorstore = FAISS.from_documents(data, embedding)
retriever = vectorstore.as_retriever()

The last line of code snippet converts the vectorstore into a retriever class. This allows us to easily use it in other LangChain methods, which largely work with retrievers.

Compressed Retriever

In our project, we found that:

The LLM generates a predicted result if the RAG (Retrieval-Augmented Generation) cannot find an existing answer from similarity_search. Or the retrieved documents contain too much irrelevant information and are distracting the LLM. This is unacceptable when proposing such a solution to our customers. Technically, if the vector database does not have a relative output, it should indicate that it does not know, without causing confusion with predictions made by the LLM.

Often, the most relevant information for a query may be buried within documents containing a significant amount of irrelevant text. Passing the full document (or spreadsheet, and CSV) through your application can lead to more expensive LLM calls and poorer responses.

Contextual compression is a solution to this problem. The idea is simple: instead of immediately returning the retrieved documents as-is, you can compress them using the context of the given query, ensuring that only the relevant information is returned. “Compressing” here refers to both reducing the contents of individual documents and filtering out documents entirely.

To use the Contextual Compression Retriever, you’ll need:

A Base Retriever
A Context Compressor

The Contextual Compression Retriever passes queries to the base retriever, takes the initial documents, and then passes them through the Context Compressor. The Context Compressor then shortens the list of documents by reducing their contents or dropping them altogether.

Reference > https://python.langchain.com/docs/modules/data_connection/retrievers/contextual_compression/

Here’s an example:

    
rectangle ContextualCompressionRetriever {
    file base_retriever
    file base_compressor
}

file compression_retriever
note bottom of compression_retriever: new retriever, used in chain

base_retriever -> base_compressor
ContextualCompressionRetriever -> compression_retriever

Figure 5.3 Contextual Compression Retriever logic

from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

## you can query with similar_search
compressed_docs = compression_retriever.invoke(query)
pprint(compressed_docs)

Fix Hallucination with `RetrievalQAWithSourcesChain`

While we cannot entirely safeguard ourselves from convincing yet false hallucinations generated by the language model, it’s important to acknowledge that such occurrences are possible and that it’s unlikely we can completely eliminate this issue. However, we can enhance our confidence in the responses provided by incorporating citations into the answers. This can be achieved by utilizing a variant of the RetrievalQA chain, known as RetrievalQAWithSourcesChain, which allows users to trace the origin of the information.

from langchain.chains.qa_with_sources.retrieval import RetrievalQAWithSourcesChain
chain = RetrievalQAWithSourcesChain.from_chain_type(
    llm=llm, retriever=compression_retriever
    )

pprint(chain.invoke(query))

Defining Embedding Model and VectorStore with FAISS

Compressed Retriever

Fix Hallucination with RetrievalQAWithSourcesChain

Fix Hallucination with `RetrievalQAWithSourcesChain`