Skip to main content
 
 
 
Splunk Lantern

Monitoring LangChain LLM applications with Splunk Observability Cloud

 

In my previous article, Instrumenting LLM applications with OpenLLMetry and Splunk, I showed how to instrument a simple Python application that uses GPT-3.5 Turbo via OpenAI’s API with OpenTelemetry, and how OpenLLMetry further enhances the instrumentation by capturing additional spans, span attributes, and multi-dimensional metrics.

I also demonstrated how powerful features in Splunk Observability Cloud can use this data to help understand exactly how the large language model (LLM) application is performing across different segments of user traffic and resolve issues quickly when something goes wrong.

In this article, we’ll use the same application as a starting point and walk through the following improvements:

  1. Update the application to use the LangChain framework, which provides many features and benefits that we’ll use throughout the article.
  2. Use LangChain’s capabilities to ensure context is retained in subsequent requests to OpenAI.
  3. Modify the application to answer questions from a custom set of data that we provide. To accomplish this, we’ll introduce the concepts of Retrieval-Augmented Generation (RAG), embeddings, and a vector database named Chroma.
  4. Demonstrate how LangChain makes it easy to switch to another LLM provider such as Google’s Gemini.

Along the way, I’ll show how the updated application with its new dependencies and more advanced logic can be monitored using Splunk Observability Cloud. Expand the first section below and let’s get started!

Process

Revisiting the sample app
We'll start by revisiting our sample application to ensure it's functioning correctly before implementing the new improvements. Let’s review the sample application's source code and confirm it runs as expected. The source code is defined in a file named app.py.
from openai import OpenAI
from flask import Flask, request
from opentelemetry.instrumentation.openai import OpenAIInstrumentor

app = Flask(__name__)
OpenAIInstrumentor().instrument()
client = OpenAI()

@app.route("/askquestion", methods=['POST'])
def ask_question():

   data = request.json
   question = data.get('question')

   completion = client.chat.completions.create(
       model="gpt-3.5-turbo",
       messages=[
           {"role": "user", "content": question}
       ]
   )

   return completion.choices[0].message.content

To run the application, we need to first create a virtual environment.

python3 -m venv openai-env
source openai-env/bin/activate

Next, use pip to install the required modules as follows.

pip3 install --upgrade openai==1.43.0
pip3 install flask==3.0.3
pip3 install "splunk-opentelemetry[all]"
pip3 install opentelemetry-instrumentation-openai==0.28.2

Then we’ll run the bootstrap script to install instrumentation for supported packages in our environment.

splunk-py-trace-bootstrap

We’ll want to then provide a name for our service, as well as the deployment environment. This is a best practice, as it makes it easy to find our service in Splunk Observability Cloud.

export OTEL_SERVICE_NAME=my-llm-app
export OTEL_RESOURCE_ATTRIBUTES='deployment.environment=test'

Our application will generate metrics and traces, so we’ll want to ensure an OpenTelemetry collector is running on our machine as well. Refer to Install and configure the Splunk Distribution of the OpenTelemetry Collector for details on installing the collectors.

Now we can run our application as follows.

splunk-py-trace flask run -p 8080

Define a file named question.json with the following content.

{
  "question":"Hello, World!"
}

Then exercise the application using the following curl command.

curl -d "@question.json"  -H "Content-Type: application/json" -X POST http://localhost:8080/askquestion

It will respond with something like:

Hello! How can I assist you today?
Introducing LangChain

Our application is working fine, but it’s inherently tied to OpenAI. While OpenAI provides a few different LLM models that we can use, such as GPT-3.5 Turbo, GPT-4, and the lightweight GPT-4o mini, new LLM providers are constantly popping up. What if we want to switch our application to use a different LLM provider in the future?

We could just update the code to use the API for the new LLM provider of interest. However, as the application grows in complexity, this approach becomes less manageable.

Let’s solve this problem by using LangChain, which is a framework that can be used to simplify the creation of LLM applications. LangChain provides lots of useful features, which we’ll explore throughout in this article. But for now, let’s focus on resolving the initial issue of making it easier to switch between different LLM providers.

To add LangChain to our application, let’s first import the required modules and remove some that we no longer need.

pip3 uninstall openai
pip3 uninstall opentelemetry-instrumentation-openai
pip3 install langchain==0.2.15
pip3 install -qU langchain-openai==0.1.23
pip3 install opentelemetry-instrumentation-langchain==0.28.2

Let’s update our application code to utilize OpenAI’s GPT-3.5 Turbo model via LangChain.

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
from flask import Flask, request
from opentelemetry.instrumentation.langchain import LangchainInstrumentor

app = Flask(__name__)
LangchainInstrumentor().instrument()
model = ChatOpenAI(model="gpt-3.5-turbo")

@app.route("/askquestion", methods=['POST'])
def ask_question():

   data = request.json
   question = data.get('question')

   messages = [
       SystemMessage(content="You are a helpful assistant!"),
       HumanMessage(content=question),
]
 
   return model.invoke(messages).content
     

The implementation of our application is quite different from before, with changes to almost every line of code in the app.py file. But the good news is that, now that we’re using LangChain, changing to a different LLM provider will only require changing two lines of code. We’ll see how this works later in the article.

Note that in addition to updating our app to use LangChain instead of OpenAI directly, we also had to update the OpenLLMetry instrumentation to use LangChainInstrumentor() instead of OpenAIInstrumentor().

In Splunk Observability Cloud, we can see that the instrumentation is still working as expected, and it has automatically incorporated the appropriate span attributes that are associated with LangChain.

1 - LangChain Trace.png

Testing the sample app

Now that our app is using LangChain, let’s run a few tests to ensure it still works as expected. Our application expects a JSON document to be posted to the /askquestion endpoint such as the following.

{
  "question":"What is the capital of Canada?"
}

If we send this question to our application, we’ll receive a response such as the following.

The capital of Canada is Ottawa.

Which makes sense. But what if we want to ask a follow-up question such as:

{
  "question":"And England?"
}

In this case, we’ll get an answer that doesn’t provide us what we’re looking for, such as:

England is a country that is part of the United Kingdom, located in the southern part of the island of Great Britain...

This might be useful information about England, but it doesn’t answer our question about what the capital of England is.

The reason we didn’t get the expected answer is because GPT-3.5 Turbo, and LLMs in general, are stateless - they process each input independently and do not retain conversational context unless explicitly programmed to do so. So when we posted a follow-up question to GPT-3.5-Turbo via the OpenAI API, it wasn’t aware of our original question, so it tried its best to answer the question based on the text from the second question only.

How do we resolve this issue and ensure that ChatGPT retains the context needed to answer follow-up questions correctly?

Using message history

It turns out that LangChain can help us with this problem as well. Specifically, LangChain includes message history capabilities that we can use to make our application stateful.

To store message history, let’s import another module from LangChain.

pip3 install langchain_community==0.2.15

Then, we’ll update our source code to implement message history by adding the following import statements.

from langchain_core.chat_history import (
   BaseChatMessageHistory,
   InMemoryChatMessageHistory,
)
from langchain_core.runnables.history import RunnableWithMessageHistory

Then, add the following code to wrap our existing model and add in message history.

store = {}
config = {"configurable": {"session_id": "test"}}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
   if session_id not in store:
       store[session_id] = InMemoryChatMessageHistory()
   return store[session_id]

with_message_history = RunnableWithMessageHistory(model, get_session_history)

To keep the example simple, we’ve hard-coded the session_id. In a real-world application, a new session_id would be assigned for each client and used in this config object.

Finally, we’ll update the code used to invoke the LLM as follows.

response = with_message_history.invoke(
   [
       SystemMessage(content="You are a helpful assistant"),
       HumanMessage(content=question)
   ],
   config=config
)

return response.content

The final source code looks like this.

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.chat_history import (
   BaseChatMessageHistory,
   InMemoryChatMessageHistory,
)
from langchain_core.runnables.history import RunnableWithMessageHistory
from flask import Flask, request
from opentelemetry.instrumentation.langchain import LangchainInstrumentor

app = Flask(__name__)
LangchainInstrumentor().instrument()
model = ChatOpenAI(model="gpt-3.5-turbo")

store = {}
config = {"configurable": {"session_id": "test"}}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
   if session_id not in store:
       store[session_id] = InMemoryChatMessageHistory()
   return store[session_id]

with_message_history = RunnableWithMessageHistory(model, get_session_history)

@app.route("/askquestion", methods=['POST'])
def ask_question():

   data = request.json
   question = data.get('question')

   response = with_message_history.invoke(
       [
           SystemMessage(content="You are a helpful assistant"),
           HumanMessage(content=question)
       ],
       config=config
   )

   return response.content

If we return to our original line of questioning:

{
  "question":"What is the capital of Canada?"
}

We’ll receive the same answer as usual:

The capital of Canada is Ottawa.

But now the followup question:

{
  "question":"And England?"
}

Results in:

The capital of England is London

It’s clear that our application is saving the message history successfully, as now we get the correct answer. This is also reflected in the trace that we captured for the latest request.
2 - Trace with Message History.png

Specifically, we can see that our application sent both the original question (“What is the capital of Canada?”) and the follow-up question (“And England?”) to OpenAI, which allowed for the question to be answered correctly.

The trace also tracks new activity happening in our application, such as loading the message history. This demonstrates the power of OpenLLMetry, as it was able to instrument all of the LangChain-related activity with just a single line of code. Combining OpenLLMetry with Splunk Observability Cloud allows engineering teams to focus on what they do best - adding new features and delivering value - rather than having to spend time on manual instrumentation.

Prompts and responses might include information that you don’t want to collect as part of traces. The TRACELOOP_TRACE_CONTENT environment variable can be set to false to avoid collecting this data.

Adding custom data to our application

We’ve made a number of improvements to our app to make it more usable and maintainable. But taking a step back, we might wonder: what value does our app provide above and beyond just using https://chatgpt.com? It doesn’t really provide much value in its current state, other than serving as an example of how to use LangChain to interact with an LLM.

To deliver more value from our application, it would be helpful if our app could utilize data that goes beyond what the LLM already has. This would provide functionality that’s not possible by simply using chatgpt.com.

Let’s modify our application so that it’s able to use the LLM to answer questions using data from our organization’s customer database.

To keep our example simple, we’ll use the customers-1000.csv file from the Sample CSV Files GitHub repo. But in a real application, the customer information would be stored in a database instead.

We’ll use LangChain’s CSVLoader class to load our csv file and parse it into a separate document for each row.

To do this, let’s create a new file named customer_data.py and add the following import statement.

from langchain_community.document_loaders.csv_loader import CSVLoader

Then we can load and parse the file with:

file_path = (
   "./customers-1000.csv"
)

loader = CSVLoader(file_path=file_path)
customer_data = loader.load()

Our document is loaded, but how do we get the LLM to use it when answering questions?

One approach could be to send the entire document to the LLM as part of our question. But an organization could have hundreds of thousands of customers, and the more data we send to an LLM as part of the prompt, the more our query will cost. Also, LLMs have a prompt length limit, so this approach is not feasible.

To solve this problem, we’re going to use something called “embeddings”. But before talking about embeddings, let’s take a moment to talk about a more general concept named RAG.

Introducing Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is commonly used in large language model applications as a way to enrich LLM responses with custom data. For example, when constructing a prompt for an LLM, we can tell the LLM that it has access to a particular set of tools that can help answer the question, and then explain what each tool does. Alternatively, we could provide the additional data directly in the prompt itself, and instruct the LLM to use the provided data to answer the question. These options can give us more accurate and context-specific answers.

We'll use RAG to ensure the LLM is able to use data from our customer database to answer questions from the end-user. But the problem still remains: how do we pass just the set of data that the LLM needs to answer the question, thus minimizing the cost for each query and staying within prompt length limits?

Introducing embeddings

Embeddings are commonly used in LLM applications to measure the semantic similarity between two or more snippets of text. An embedding is a numerical representation of text that captures its semantic meaning. It’s made up of a vector, or array, with many dimensions. For example, the embedding calculated by OpenAI for “The rabbit jumped into the stream” is:

[-0.003961991518735886, -0.01459040492773056, 0.013468066230416298, 
-0.005304065067321062, -0.028477657586336136, 0.008268797770142555, 
-0.020364364609122276, -0.016510551795363426, -0.0057435352355241776, 
-0.01803855411708355, 0.014347006566822529, 0.029424209147691727, 
0.02301470749080181, -0.009059843607246876, -0.01572626642882824, 
-0.0016589993610978127, 0.048328179866075516, 0.006155960727483034, 
0.01771402359008789, -0.031236177310347557, …]

There are hundreds more numbers in the dimensions. I’ve only shown the first 20 for brevity.

So what we’ll do is calculate embeddings for each line of our CSV file. Then, when a user asks a question, we’ll calculate an embedding for the question as well, and then determine which 2-3 embeddings match the question embedding most closely. After we know this, we can send the user’s question along with the top two or three snippets of text that we believe have the answer to the LLM.

This can be confusing, so let’s illustrate the concept with a diagram.
LangChain and Embeddings.png

Calculate embeddings for custom data

LangChain provides support for embeddings which we can use for this task. We’ll start by adding the following import statement to the customer_data.py file.

from langchain_openai import OpenAIEmbeddings

Then we’ll install another module that’s required to use OpenAI Embeddings.

pip3 install tiktoken==0.7.0

Now we can create an embeddings model.

embeddings_model = OpenAIEmbeddings()

Let’s test it by printing out the embeddings for the first customer in the CSV file.

embeddings = embeddings_model.embed_documents(customer_data[0].page_content)
print(embeddings[0][:20])

To run the new file we’ll run the following command.

python customer_data.py

The first 20 numbers in this embedding are:

[-0.013050409965217113, -0.033757805824279785, 0.007747807539999485, 
-0.01461534108966589, -0.020847121253609657, 0.019352054223418236, 
-0.014189177192747593, -0.021210409700870514, 0.013008492067456245, 
-0.00559602677822113, 0.01722821779549122, 0.017870957031846046, 
0.015830958262085915, 0.0053375340066850185, -0.010528355836868286, 
-0.0007388013182207942, 0.02897917665541172, 0.010863698087632656, 
0.012882738374173641, -0.012952601537108421]

Excellent. Now we can calculate embeddings for every customer in the CSV file.

However, calculating embeddings requires a call to the LLM every time, which will significantly increase our cost to run this application. To use our LLM efficiently and keep costs in check, let’s store the embeddings in a vector database, so we can re-use them as needed.

Using the Chroma vector database

A vector database is optimized for storing and analyzing vectors such as the ones we calculated above with embeddings. Some of the more popular vector databases are Chroma, Pinecone, and Faiss. In this example, we’ll use Chroma.

As before, we’ll start by adding the following import statement to the customer_data.py file.

from langchain.vectorstores.chroma import Chroma

Then we’ll need to import the following Python module.

pip3 install chromadb==0.5.5

Now we can use the following code to calculate embeddings from our customer data and store them in Chroma.

db = Chroma.from_documents(
   customer_data,
   embedding=embeddings_model,
   persist_directory="my_embeddings"
)

In this case, we’re instructing Chroma to persist the embeddings to a local folder named my_embeddings.

Let’s test it out by asking it to find the entries in the vector database that are the most similar to our sample prompt.

results = db.similarity_search(
   "Which customers are associated with the company Cherry and Sons?"
)

for result in results:
   print("\n")
   print(result.page_content)
   

Run the test with the following command.

python customer_data.py

Our test produces the following output:

Index: 47
Customer Id: EE9381bAEbac1eA
First Name: Levi
Last Name: Grimes
Company: Carpenter, Chang and Bass
City: Frederickfurt
Country: Heard Island and McDonald Islands
Phone 1: +1-325-527-6948
Phone 2: 001-221-413-5502x8170
Email: robertmarks@willis.com
Subscription Date: 2021-06-12
Website: https://cherry.com/

Index: 47
Customer Id: EE9381bAEbac1eA
First Name: Levi
Last Name: Grimes
Company: Carpenter, Chang and Bass
City: Frederickfurt
Country: Heard Island and McDonald Islands
Phone 1: +1-325-527-6948
Phone 2: 001-221-413-5502x8170
Email: robertmarks@willis.com
Subscription Date: 2021-06-12
Website: https://cherry.com/

Index: 382
Customer Id: 591CE8Bb3aB2D87
First Name: Christian
Last Name: Moore
Company: Cherry and Sons
City: South Anne
Country: Gambia
Phone 1: 6085361723
Phone 2: 388-121-8428x069
Email: moralesleslie@scott.com
Subscription Date: 2020-06-14
Website: https://stevens-crane.com/

Index: 382
Customer Id: 591CE8Bb3aB2D87
First Name: Christian
Last Name: Moore
Company: Cherry and Sons
City: South Anne
Country: Gambia
Phone 1: 6085361723
Phone 2: 388-121-8428x069
Email: moralesleslie@scott.com
Subscription Date: 2020-06-14
Website: https://stevens-crane.com/

As we look at the results of this test, notice that two of the customers are associated with “Cherry and Sons”, but the other two customers are associated with a different company but have “www.cherry.com” as their website. This is to be expected, as the intent of the current logic is not to answer the question itself, but rather to find the embeddings that are most similar to the question being asked.

Running the customer_data.py program multiple times will result in duplicate embeddings stored in the vector database. If needed, you can delete the my_embeddings folder and run this program once to start fresh.

Modifying the application to use embeddings

As a final step, we need to modify our application to utilize the embeddings when answering the question. In this section, we’ll be working in the app.py file again instead of customer_data.py.

We’ll need to calculate an embedding whenever a prompt is asked, so let’s add the following import statement.

from langchain_openai import OpenAIEmbeddings

And we’ll use our Chroma vector database to look for similar documents.

from langchain.vectorstores import Chroma

Next, we’ll get a reference to the vector database that we populated earlier by adding the following code.

db = Chroma(
   persist_directory="my_embeddings",
   embedding_function=embeddings_model
)

Then, as part of the ask_question() function, we’ll use our vector database to find documents that are the most similar to the question being asked.

# find the documents most similar to the question that we can pass as context
context = db.similarity_search(question)

And finally, we can add this context to the system message to ensure the LLM takes it into account when answering the question.

response = with_message_history.invoke(
   [
       SystemMessage(
           content=f'Use the following pieces of context to answer the question: {context}'
       ),
       HumanMessage(
           content=question
       )
   ],
   config=config
)

The resulting code in the app.py file looks like this.

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.chat_history import (
   BaseChatMessageHistory,
   InMemoryChatMessageHistory,
)
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores.chroma import Chroma
from flask import Flask, request
from opentelemetry.instrumentation.langchain import LangchainInstrumentor

app = Flask(__name__)
LangchainInstrumentor().instrument()
model = ChatOpenAI(model="gpt-3.5-turbo")

embeddings_model = OpenAIEmbeddings()

db = Chroma(
   persist_directory="my_embeddings",
   embedding_function=embeddings_model
)

store = {}
config = {"configurable": {"session_id": "test"}}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
   if session_id not in store:
       store[session_id] = InMemoryChatMessageHistory()
   return store[session_id]

with_message_history = RunnableWithMessageHistory(model, get_session_history)

@app.route("/askquestion", methods=['POST'])
def ask_question():

   data = request.json
   question = data.get('question')

   # find the documents most similar to the question that we can pass as context
   context = db.similarity_search(question)

   response = with_message_history.invoke(
       [
           SystemMessage(
               content=f'Use the following pieces of context to answer the question: {context}'
           ),
           HumanMessage(
               content=question
           )
       ],
       config=config
   )

   return response.content

We could have used a RetrievalQA chain to handle this task for us more elegantly. But for the sake of clarity, I’ve chosen to implement the logic explicitly.

Let’s test it by asking the following question.

{
  "question":"Which customers are associated with the company Cherry and Sons?"
}

We get the following response, which matches the content of the CSV file:

The customers associated with the company Cherry and Sons are:

1. Customer ID: 591CE8Bb3aB2D87
- First Name: Christian
- Last Name: Moore
- City: South Anne
- Country: Gambia
- Phone 1: 6085361723
- Phone 2: 388-121-8428x069
- Email: moralesleslie@scott.com
- Subscription Date: 2020-06-14
- Website: https://stevens-crane.com/
2. Customer ID: 8aaa5d0CE9ee311
- First Name: Marissa
- Last Name: Gamble
- City: Webertown
- Country: Sudan
- Phone 1: 001-645-334-5514x0786
- Phone 2: (751)980-3163
- Email: katieallison@leonard.com
- Subscription Date: 2021-11-17
- Website: http://www.kaufman.org/

The customers with the website of www.cherry.com are not included in the response, so it looks like our application is working correctly.

Let’s ask a follow-up question to ensure that message history is retained.

{
  "question":"And Giles LLC?"
}

And the answer again matches the content of the CSV file:

The customer associated with the company Giles LLC is:
Customer ID: c5CB6C5bFB91fdC
- First Name: Darius
- Last Name: Benitez
- City: Mejiashire
- Country: Jersey
- Phone 1: +1-797-864-3151x25142
- Phone 2: 139-216-5379x6030
- Email: garrettdurham@olsen.com
- Subscription Date: 2022-02-28
- Website: https://washington.com/

It’s great to see that our application is working as expected. But is it observable? Let’s have a look at the trace that was captured for our most recent test.

3 - Trace with Embeddings.png

We can see that the latest trace has captured all of the activities in our new and improved application:

  • Calculating an embedding for the user’s question using OpenAI
  • Running a SELECT query against the vector database to find similar embeddings
  • Loading the message history
  • Sending the prompt with the context and message history to OpenAI

Not surprisingly, the slowest part of our application is the call to the OpenAI API. We can also see this particular prompt consumed a total of 1,465 tokens, the bulk of which were used by the prompt itself.
4 - Trace with Large Token Usage.png

This is to be expected, as the prompt includes not only the user’s question, but also the context from the vector databases (that is, the documents that match the question most closely) as well as the message history.

This level of detail in the trace allows us to not only understand where the performance bottlenecks in our application are and where errors are occurring, but it also lets us know which requests are consuming the most tokens and why.

Switching to another LLM provider

Earlier in the article I mentioned that one of the benefits of LangChain is the ability to quickly switch to another LLM provider. Now that we’ve added more logic to our application, let’s go ahead and make the switch and see how much effort is involved.

One of the more popular LLM models is Gemini from Google. Let’s update our application to use Gemini instead of OpenAI.

First, we’ll need to sign-up for an account and get an API key. After we have a key, we’ll need to create an environment variable for it.

export GOOGLE_API_KEY=your-api-key

You might also need to enable the Generative Language API for your project using the Google Cloud Platform console.

Then we’ll install a Python package to use LangChain with Gemini models.

pip3 install -U langchain-google-genai==1.0.10

To use Gemini in our application, let’s modify the app.py file to include the following import statement.

from langchain_google_genai import ChatGoogleGenerativeAI

And instead of ChatOpenAI, we’ll create our model using ChatGoogleGenerativeAI.

model = ChatGoogleGenerativeAI(model="gemini-1.5-pro-latest")

Let’s run our application to see how the results change with Gemini. We’ll start with the following question.

{
  "question":"Which customers are associated with the company Cherry and Sons?"
}

We get the following response, which is formatted differently than OpenAI’s response, but is effectively the same answer.

Based on the provided context, the customers associated with Cherry and Sons are:
* **Christian Moore** (Customer Id: 591CE8Bb3aB2D87)
* **Marissa Gamble** (Customer Id: 8aaa5d0CE9ee311)

Without any additional code changes, we can see that OpenLLMetry has captured all of the details of the calls to Gemini.

5 - Trace with Gemini LLM.png

Summary

We covered a lot of ground in this article. We started by revisiting a sample application that I created in an earlier article, which makes calls to OpenAI’s API directly using the GPT-3.5 Turbo model to answer questions. We then introduced LangChain as a framework for building LLM applications, and updated our application to invoke OpenAI via LangChain instead of calling it directly.

Then, after realizing that our application wasn’t responding correctly to follow-up questions, we used LangChain to implement message history.

Following this, we decided to add more real-world functionality to our application by giving it the ability to answer questions based on data from a CSV file. We used OpenAI to calculate embeddings for each row in the CSV file, and then we stored those embeddings in a vector database named Chroma to avoid having to calculate them repeatedly.

We then modified our application to calculate an embedding for the question, and then found similar embeddings in the vector database. After we had this information, we passed it along as context to the LLM to answer questions.

Finally, we updated our application to swap out OpenAI’s LLM for Google Gemini, and confirmed that our application is still working as expected, with slightly different but still correct results.

And along the way, we ensured our application was observable using OpenTelemetry, OpenLLMetry, and Splunk Observability Cloud.

Next steps

To get started instrumenting your own LLM application with OpenTelemetry and Splunk Observability Cloud today, see Instrument back-end applications to send spans to Splunk APM and select the appropriate language guide (Python, Node.js, etc.).

For more help, ask a Splunk Expert.