Automatically find the bad LLM responses in your LLM Evals with Cleanlab
This guide will walk you through the process of evaluating LLM responses captured in MLflow with Cleanlab's Trustworthy Language Models (TLM).
TLM boosts the reliability of any LLM application by indicating when the modelβs response is untrustworthy. It works by analyzing the prompt and the generated response to calculate a trustworthiness_score, helping to automatically identify potentially incorrect or hallucinated outputs without needing ground truth labels. TLM can also provide explanations for its assessment.
MLflow provides tracing and evaluation capabilities that can be used to monitor, review, and debug the performance of AI applications. This post will show how to apply Cleanlab's TLM to LLM responses recorded with MLflow tracing. Using Cleanlab's TLM with MLflow enables you to systematically log, track, and analyze the trustworthiness evaluations provided by TLM for your LLM interactions.
You can find a notebook version of this guide here.
This guide requires a Cleanlab TLM API key. If you don't have one, you can sign up for a free trial here.
Install dependencies & Set environment variablesβ
To work through this guide, you'll need to install the MLflow, OpenAI, and Cleanlab TLM Python packages:
pip install -q mlflow openai cleanlab-tlm --upgrade
Next, import the dependencies:
import mlflow
import os
import json
import pandas as pd
from rich import print
from openai import OpenAI
from getpass import getpass
API Keysβ
This guide requires two API keys:
If they are not already set as environment variables, you can set them manually as follows:
if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
openai_api_key = getpass("π Enter your OpenAI API key: ")
if not (cleanlab_tlm_api_key := os.getenv("CLEANLAB_TLM_API_KEY")):
cleanlab_tlm_api_key = getpass("π Enter your Cleanlab TLM API key: ")
os.environ["OPENAI_API_KEY"] = openai_api_key
os.environ["CLEANLAB_TLM_API_KEY"] = cleanlab_tlm_api_key
Set Up MLflow Tracking Server and Loggingβ
To manage our experiments, parameters, and results effectively, we'll start a local MLflow Tracking Server. This provides a dedicated UI for monitoring and managing our experiments and allows us to configure MLflow to connect to this server. We'll then enable autologging for OpenAI to automatically capture relevant information from our API calls.
# This will start a server on port 8080, in the background
# Navigate to http://localhost:8080 to see the MLflow UI
%%bash --bg
mlflow server --host 127.0.0.1 --port 8080
# Set up MLflow tracking server
mlflow.set_tracking_uri("http://localhost:8080")
# Enable logging for OpenAI SDK
mlflow.openai.autolog()
# Set experiment name
mlflow.set_experiment("Eval OpenAI Traces with TLM")
# Get experiment ID
experiment_id = mlflow.get_experiment_by_name("Eval OpenAI Traces with TLM").experiment_id
