RAG Apps

The following section describes how Arch can help you build faster, smarter and more accurate Retrieval-Augmented Generation (RAG) applications, including fast and accurate RAG in multi-turn converational scenarios.

What is Retrieval-Augmented Generation (RAG)?

RAG applications combine retrieval-based methods with generative AI models to provide more accurate, contextually relevant, and reliable outputs. These applications leverage external data sources to augment the capabilities of Large Language Models (LLMs), enabling them to retrieve and integrate specific information rather than relying solely on the LLM’s internal knowledge.

Parameter Extraction for RAG

To build RAG (Retrieval Augmented Generation) applications, you can configure prompt targets with parameters, enabling Arch to retrieve critical information in a structured way for processing. This approach improves the retrieval quality and speed of your application. By extracting parameters from the conversation, you can pull the appropriate chunks from a vector database or SQL-like data store to enhance accuracy. With Arch, you can streamline data retrieval and processing to build more efficient and precise RAG applications.

Step 1: Define Prompt Targets

Prompt Targets

prompt_targets:
  - name: get_device_statistics
    description: Retrieve and present the relevant data based on the specified devices and time range

    path: /agent/device_summary
    parameters:
      - name: device_ids
        type: list
        description: A list of device identifiers (IDs) to reboot.
        required: true
      - name: time_range
        type: int
        description: The number of days in the past over which to retrieve device statistics
        required: false
        default: 7

Step 2: Process Request Parameters in Flask

Once the prompt targets are configured as above, handling those parameters is

Parameter handling with Flask

from flask import Flask, request, jsonify

app = Flask(__name__)


@app.route("/agent/device_summary", methods=["POST"])
def get_device_summary():
    """
    Endpoint to retrieve device statistics based on device IDs and an optional time range.
    """
    data = request.get_json()

    # Validate 'device_ids' parameter
    device_ids = data.get("device_ids")
    if not device_ids or not isinstance(device_ids, list):
        return (
            jsonify({"error": "'device_ids' parameter is required and must be a list"}),
            400,
        )

    # Validate 'time_range' parameter (optional, defaults to 7)
    time_range = data.get("time_range", 7)
    if not isinstance(time_range, int):
        return jsonify({"error": "'time_range' must be an integer"}), 400

    # Simulate retrieving statistics for the given device IDs and time range
    # In a real application, you would query your database or external service here
    statistics = []
    for device_id in device_ids:
        # Placeholder for actual data retrieval
        stats = {
            "device_id": device_id,
            "time_range": f"Last {time_range} days",
            "data": f"Statistics data for device {device_id} over the last {time_range} days.",
        }
        statistics.append(stats)

    response = {"statistics": statistics}

    return jsonify(response), 200


if __name__ == "__main__":
    app.run(debug=True)

Multi-Turn RAG (Follow-up Questions)

Developers often struggle to efficiently handle follow-up or clarification questions. Specifically, when users ask for changes or additions to previous responses, it requires developers to re-write prompts using LLMs with precise prompt engineering techniques. This process is slow, manual, error prone and adds signifcant latency to the user experience.

Arch is highly capable of accurately detecting and processing prompts in a multi-turn scenarios so that you can buil fast and accurate RAG apps in minutes. For additional details on how to build multi-turn RAG applications please refer to our multi-turn docs.