Enabling Multi-Round Conversations with Chat History

In this scenario, we’re implementing a system that facilitates multiple-round conversations while maintaining context throughout the interaction. To achieve this, we’ll utilize LangChain’s built-in chain constructors: create_stuff_documents_chain and create_retrieval_chain.

The core logic involves adding the chat history as an input, creating a history-aware retriever, and combining these elements into a robust question-answering pipeline. Let’s break down the process:

Contextualizing the Question

We start by defining a sub-chain that processes historical messages and the latest user question. This sub-chain reformulates the question if it references any information from the chat history. We use a MessagesPlaceholder variable named chat_history to insert the conversation history into the prompt.

Key components:

create_history_aware_retriever: A helper function that manages cases where chat history is empty or applies the prompt-LLM-retriever sequence.
contextualize_q_system_prompt: A standard system prompt.
contextualize_q_prompt: A custom prompt template that includes the system message, chat history placeholder, and user input.

Example implementation:

from langchain_core.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
)
from langchain.chains.history_aware_retriever import create_history_aware_retriever

contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)
contextualize_q_prompt = ChatPromptTemplate(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

Building the QA Chain

We then construct our full QA chain by updating the retriever to use the history_aware_retriever. This step is crucial for incorporating context from previous conversations into the current query.

Key components:

create_stuff_documents_chain: Generates a question-answer chain that accepts input keys context, chat_history, and input.
create_retrieval_chain: Applies the history_aware_retriever and question_answer_chain in sequence, retaining intermediate outputs like retrieved context.

Example implementation:

from langchain.chains.retrieval import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)
qa_prompt = ChatPromptTemplate(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

Enabling Debug Mode (optional)

This configuration will generate detailed debug messages to assist in identifying any errors that may occur.

from langchain.globals import set_debug, set_verbose
set_debug(True)
set_verbose(True)

Managing Chat History

To maintain the conversation state across multiple turns, we need to manage the chat history. Here’s an example of how to implement this:

from langchain_core.messages import AIMessage, HumanMessage

chat_history = []

def process_question(question):
    ai_msg = rag_chain.invoke({"input": question, "chat_history": chat_history})
    chat_history.extend(
       [
           HumanMessage(content=question),
           AIMessage(content=ai_msg["answer"]),
       ]
    )
    return ai_msg["answer"]


from pprint import pprint
while True:
    user_input = input("Enter your question: ")
    if user_input == "exit":
        break
    else:
        pprint(process_question(user_input))

This implementation allows for multi-turn conversations where each subsequent question can reference information from previous exchanges. Best Practices

Use LangChain’s built-in functions (create_history_aware_retriever, create_stuff_documents_chain, create_retrieval_chain) to streamline the process.
Implement proper error handling and edge cases (e.g., empty chat history).
Consider using a database or persistent storage for managing chat history in production environments.
Test the system with various scenarios, including long conversations and complex queries.

By following these steps and best practices, you can create a robust system that supports multi-round conversations while maintaining context throughout the interaction.