Documentation Index
Fetch the complete documentation index at: https://docs.testquorum.com/llms.txt
Use this file to discover all available pages before exploring further.
Python SDK
The Python client provides a straightforward way to evaluate test cases, stream
events, and gate builds from Python-based workflows.
Install
Quick start
from quorum_eval import QuorumClient
client = QuorumClient(api_key="qrm_...")
result = client.evaluate(
test_cases=[{
"input": "What is the boiling point of water?",
"actualOutput": "Water boils at 100 C at sea level.",
"retrievalContext": [
"The boiling point of water at standard pressure is 100 C (212 F)."
],
}],
strategy="auto",
)
print(f"Pass rate: {result.pass_rate:.1%}")
print(f"Cost: ${result.summary.total_cost:.4f}")
Streaming
for event in client.evaluate_stream(test_cases):
print(event["event"], event["data"])
CI gate
from quorum_eval import QuorumClient
import os
import sys
client = QuorumClient(api_key=os.environ["QUORUM_API_KEY"])
result = client.evaluate(test_cases, strategy="auto")
if result.pass_rate < 0.80:
print(f"FAIL: {result.pass_rate:.1%} below threshold")
sys.exit(1)
Calibration benchmark
benchmark = client.run_benchmark()
stats = benchmark.statistics
print(f"Council accuracy: {stats.council.accuracy:.0%}")
print(f"Single judge accuracy: {stats.single_openai.accuracy:.0%}")
print(f"Delta: +{stats.council_vs_single_openai_delta:.0%}")
Strategies
| Strategy | When | Cost per case |
|---|
council | High-stakes cases | ~$0.0035 |
hybrid | Medium-risk cases | ~$0.0002 |
single | Low-risk factoids | ~$0.00005 |
auto | Adaptive routing | variable |