Introduction:

In this blog, we will explore the concept of Agentic Retrieval-Augmented Generation (RAG) using LLamaIndex. We will delve into the following topics:

What is Router Engine?
Steps to create a QA system and Summarization system using Router Engine
What is Tool Calling and steps to perform tool calling
Combining Tool Calling with Router Engine
Understanding Agents in LLamaIndex

Router Engine

A Router Engine is a component that dynamically selects and executes a suitable query engine from multiple available query engines based on the given query. By creating multiple pipelines, the Router Engine can choose the most appropriate pipeline for a specific process, enhancing the efficiency and accuracy of the query execution.

Steps to Create a QA System and Summarization System using Router Engine

To create a QA system and a Summarization system using the Router Engine, follow these steps:

Load Data: Begin by loading the necessary data into your system.
Split the Document and Create Nodes: Evenly split the document into smaller parts and create nodes from these parts.
Create Summary Index and Vector Index: Generate a summary index and a vector index from the created nodes.
Define Query Engines: Establish query engines for both the summarization index and the vector index.
Create Query Tools: A query tool is a query engine with metadata, indicating which query engine is best suited for a particular task. Create a summary tool and a vector tool for the respective query engines.
Define Router Query Engine: Utilize the RouterQueryEngine class to define the router query engine by passing a selector and a list of query engine tools created in the previous step. LLamaIndex supports LLM selectors, which provide a JSON that is parsed, and then the corresponding indexes are queried. Alternatively, use Pydantic selectors that leverage function calling capabilities.
Test the Pipelines: Test the summarization pipeline and the QA pipeline to ensure they function correctly.

Tool Calling

In a basic RAG pipeline, Large Language Models (LLMs) are primarily used for information synthesis. Tool calling extends this capability by allowing LLMs to interact with external environments through a dynamic interface. Tool calling not only aids in selecting the appropriate tool but also infers the necessary arguments for execution. It enables LLMs to understand how to query a vector database instead of merely consuming its output.

Steps to Perform Tool Calling

Import Function Tool: Import the FunctionTool from LLamaIndex and create Python functions encapsulating different logics.
Wrap Functions as Tools: Use the FunctionTool class to wrap these Python functions, generating the respective tools.
Declare the LLM Model: Declare the LLM model and use the predict and call methods by passing these tools along with chat messages.

Combining Tool Calling with Router Engine

To integrate Tool Calling with Router Engine, follow these steps:

Load Data: Load the required data.
Split the Document and Create Nodes: Evenly split the document and create nodes from these parts.
Create Summary Index and Vector Index: Generate a summary index and a vector index from the created nodes.
Define Query Engines: Establish query engines for both the summarization index and the vector index.
Create Function Tool for Vector Query: Develop a function tool specifically for vector queries.
Declare the LLM Model: Use the predict and call methods of the LLM model, passing the tools and chat messages for execution.

Agents in LLamaIndex

Agents in LLamaIndex consist of two main components:

Agent Worker: This component is responsible for executing the actual work.
Agent Runner: Also known as the orchestrator, the Agent Runner serves as the overall task dispatcher.

To implement an agent in LLamaIndex, use the FunctionCallingAgentWorker to initiate the agent_worker by passing a list of various tools and the LLM model. Pass this to the AgentRunner class to create an Agent Object, which is then used for querying.

Internally, this process uses Chain of Thought reasoning. However, a significant drawback is the lack of past conversation records. To overcome this, implement a memory to store past conversations using the chat function of the agent object.

Conclusion:

By understanding and implementing these components, you can create robust QA and Summarization systems using Agentic RAG with LLamaIndex, enhancing the capabilities and efficiency of your LLM applications.

Agentic RAG with LLamaIndex

Table of contents