Skip to main content

 

Splunk Lantern

Monitoring applications using OpenAI API and GPT models with OpenTelemetry and Splunk APM

 

If you're developing an AI assistant application with OpenAI APIs and GPT models, you'll quickly realize how essential monitoring is to ensure smooth performance and reliability. You'll also need to determine which GPT model is the best fit for your application.

By leveraging OpenTelemetry and Splunk Application Performance Monitoring (APM), you can gain valuable insights into your application's performance and the effectiveness of different GPT models. The integration provides a comprehensive monitoring solution that ensures your application's reliability and responsiveness. Splunk Application Performance Monitoring plays a crucial role in identifying performance bottlenecks, understanding user interactions, and maintaining overall system health.

The following is a step-by-step guide on how to use Splunk Application Performance Monitoring to monitor your OpenAI applications.

How to use Splunk software for this use case

Setting up the environment

This example uses the Instrumented Python frameworks for Splunk Observability Cloud to build the application.

The code snippets below are part of the entire example application code and configurations, which are available on GitHub.

Building the application with Flask

from flask import (
   Flask,
   render_template,
   request,
   Response,
   stream_with_context,
   jsonify,
)

import openai

client = openai.OpenAI()

app = Flask(__name__)

chat_history = [
   {"role": "system", "content": "Hello, I'm Shelly's Assistant; I (actually) run The Splunk T-Shirt Company. AMA"},
]


@app.route("/", methods=["GET"])
def index():
   return render_template("index.html", chat_history=chat_history)


@app.route("/chat", methods=["POST"])
def chat():
   content = request.json["message"]
   chat_history.append({"role": "user", "content": content})
   return jsonify(success=True)


@app.route("/stream", methods=["GET"])
def stream():...


@app.route("/reset", methods=["POST"])
def reset_chat():...
   global chat_history
   chat_history = [
       {"role": "system",
        "content": "Hello, I'm Shelly's Assistant; I (actually) run The Splunk T-Shirt Company. AMA"},
   ]
   return jsonify(success=True)

Instrumenting with OpenTelemetry

Next, integrate OpenTelemetry into the Flask app to capture traces and spans.

The Splunk Distribution of OpenTelemetry Python instrumentation provides a Python agent that automatically instruments your Python application to capture and report distributed traces to Splunk Application Performance Monitoring.

$ pip install splunk-opentelemetry[all]
# otel imports
from opentelemetry import trace
from opentelemetry.metrics import MeterProvider
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
# setup otel tracing
resource = Resource(attributes={
   SERVICE_NAME: "splunk-shelly-AI-Assistant"
})
provider = TracerProvider(resource=resource)
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__)
otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True)
span_processor = BatchSpanProcessor(otlp_exporter)
provider.add_span_processor(span_processor)
# Adding custom attributes to span in the stream() function 
@app.route("/stream", methods=["GET"])
def stream():
  def generate():...
    with tracer.start_as_current_span("call_gpt_model") as span:
      with client.chat.completions.create(
                    messages=chat_history,
                    stream=True,
                    model=GPT_model,
                    temperature=GPT_temperature,
                    top_p=GPT_top_p,
            ) as stream:...


      span.set_attribute("id", chunk.id)
      span.set_attribute("GPT-model", chunk.model)
      span.set_attribute("GPT-temperature", GPT_temperature)
      span.set_attribute("GPT-top_p", GPT_top_p)
      span.set_attribute("prompt", prompt)
      span.set_attribute("response", result)
      span.set_attribute("latency", duration)
      span.set_attribute("tokens_used", tokens_used)      span.set_attribute("completion_tokens", completion_tokens)

Configuring the Splunk Distribution of the OpenTelemetry Collector

Configure the Splunk Distribution of the OpenTelemetry Collector to receive and export metrics and traces to Splunk Observability Cloud.

First, install the collector with one of the following methods:

  • If you plan to run the collector on Linux, Windows, or Kubernetes, installing the collector is straightforward and all instructions are available here.
  • If you’re developing on a Mac, you have to build the custom binary. All steps to build and execute are available here. The agent and gateway configuration can be found in this GitHub repo. Modify the parameters as you see fit.

Next, set the following environment variables:

OPENAI_API_KEY=<OPENAI-API-KEY>
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
OTEL_RESOURCE_ATTRIBUTES=deployment.environment=dev,service.version=0.0.1
OTEL_SERVICE_NAME=Splunk-Shelly-AI-Assistant

After you have configured the collector and specified the correct Splunk Observability Cloud endpoint URL and the API token, you’re ready to start your collector. Start the collector with the proper configuration:

sudo SPLUNK_ACCESS_TOKEN=<TOKEN> SPLUNK_REALM=<REALM> ./otelcol --config=/etc/otel/collector/agent_config.yaml

Prefix your service run command splunk-py-trace to enable instrumentation:

splunk-py-trace python -m flask run --host=0.0.0.0 --port=5000

unnamed - 2024-06-24T112622.039.png

Analyzing data in Splunk Application Performance Monitoring

By adding all the data as attributes within spans, you can send it to your Splunk Observability Cloud OpenTelemetry endpoint. The benefit of doing this is that you can easily use the data in searches, build dashboards, and create visualizations to monitor performance. To determine the best GPT model for your application you can implement custom spans and attributes that provide deeper insights into model performance and accuracy. These metrics might include response latency, temperature, top_p, and completion tokens. Along with API performance and GPT model-related metrics, you can also capture user satisfaction scores for every response from each model.

The screenshot below shows the Service Map and workflows showing the stream() function invoking the OpenAI API completions call.

unnamed - 2024-06-24T113409.605.png

The screenshot below shows the Tag Spotlight of the service with a variety of metrics along with custom metrics like GPT model, temperature, and top_p parameters that are set in each request sent to the respective GPT models. The custom metrics help analyze and filter data based on requests to respective models.

clipboard_eb51bab17d6b2dbefa03b75bdf35405e8.png

The screenshot below shows the Tag Spotlight of the service displaying latency for each GPT-model. These metrics and visualizations can help you to make decisions on which model is best suited for the type of application you want to create.

clipboard_e470576002541d33a17deef9a9b951e47.png

You can also deep dive into a single trace to understand span performance and tie them to specific model attributes such as prompt tokens and completion tokens, along with user satisfaction scores:

unnamed - 2024-06-24T113717.000.png

Next steps

This process is only one example you can use to understand how to integrate your AI applications with Splunk Observability Cloud or Splunk Application Performance Monitoring. You might also want to explore other models and monitoring frameworks like LangChain using Splunk.