RAG Apps

The following section describes how Arch can help you build faster, smarter and more accurate Retrieval-Augmented Generation (RAG) applications, including fast and accurate RAG in multi-turn converational scenarios.

What is Retrieval-Augmented Generation (RAG)?

RAG applications combine retrieval-based methods with generative AI models to provide more accurate, contextually relevant, and reliable outputs. These applications leverage external data sources to augment the capabilities of Large Language Models (LLMs), enabling them to retrieve and integrate specific information rather than relying solely on the LLM’s internal knowledge.

Parameter Extraction for RAG

To build RAG (Retrieval Augmented Generation) applications, you can configure prompt targets with parameters, enabling Arch to retrieve critical information in a structured way for processing. This approach improves the retrieval quality and speed of your application. By extracting parameters from the conversation, you can pull the appropriate chunks from a vector database or SQL-like data store to enhance accuracy. With Arch, you can streamline data retrieval and processing to build more efficient and precise RAG applications.

Step 1: Define Prompt Targets

Prompt Targets
 1prompt_targets:
 2  - name: get_device_statistics
 3    description: Retrieve and present the relevant data based on the specified devices and time range
 4
 5    path: /agent/device_summary
 6    parameters:
 7      - name: device_ids
 8        type: list
 9        description: A list of device identifiers (IDs) to reboot.
10        required: true
11      - name: time_range
12        type: int
13        description: The number of days in the past over which to retrieve device statistics
14        required: false
15        default: 7

Step 2: Process Request Parameters in Flask

Once the prompt targets are configured as above, handling those parameters is

Parameter handling with Flask
 1from flask import Flask, request, jsonify
 2
 3app = Flask(__name__)
 4
 5
 6@app.route("/agent/device_summary", methods=["POST"])
 7def get_device_summary():
 8    """
 9    Endpoint to retrieve device statistics based on device IDs and an optional time range.
10    """
11    data = request.get_json()
12
13    # Validate 'device_ids' parameter
14    device_ids = data.get("device_ids")
15    if not device_ids or not isinstance(device_ids, list):
16        return (
17            jsonify({"error": "'device_ids' parameter is required and must be a list"}),
18            400,
19        )
20
21    # Validate 'time_range' parameter (optional, defaults to 7)
22    time_range = data.get("time_range", 7)
23    if not isinstance(time_range, int):
24        return jsonify({"error": "'time_range' must be an integer"}), 400
25
26    # Simulate retrieving statistics for the given device IDs and time range
27    # In a real application, you would query your database or external service here
28    statistics = []
29    for device_id in device_ids:
30        # Placeholder for actual data retrieval
31        stats = {
32            "device_id": device_id,
33            "time_range": f"Last {time_range} days",
34            "data": f"Statistics data for device {device_id} over the last {time_range} days.",
35        }
36        statistics.append(stats)
37
38    response = {"statistics": statistics}
39
40    return jsonify(response), 200
41
42
43if __name__ == "__main__":
44    app.run(debug=True)

Multi-Turn RAG (Follow-up Questions)

Developers often struggle to efficiently handle follow-up or clarification questions. Specifically, when users ask for changes or additions to previous responses, it requires developers to re-write prompts using LLMs with precise prompt engineering techniques. This process is slow, manual, error prone and adds signifcant latency to the user experience. Arch

Arch is highly capable of accurately detecting and processing prompts in a multi-turn scenarios so that you can buil fast and accurate RAG apps in minutes. For additional details on how to build multi-turn RAG applications please refer to our multi-turn docs.