Improving LLM Accuracy using Reranking

Last Updated : 14 Feb, 2026

Reranking is an advanced post-processing technique that enhances the accuracy and relevance of outputs generated by Large Language Models (LLMs). Instead of relying on a single model’s output, multiple candidate responses are generated from various sources (like OpenAI, Gemini or Mistral) and a reranker model such as Cohere Rerank is applied to reorder them based on semantic alignment with the query. This ensures the final response is the most contextually accurate and meaningful one.

  • Improve LLM output accuracy by ranking multiple candidate responses.
  • Generate responses from multiple LLMs and rerank them using semantic relevance.
  • Tools used include OpenAI GPT-4o-mini, Google Gemini 2.5 Flash and Cohere Reranker.
  • The process involves candidate generation, filtering invalid responses, reranking via Cohere and displaying ranked outputs.
  • The outcome is a structured evaluation that selects the most relevant and high-quality answer among all candidates.

Implementation

user_query
Workflow

Let's see the implementation,

Step 1: Install Dependencies

We need to install the required packages,

Python
!pip install -q --upgrade openai google-generativeai cohere

Step 2: Import Libraries

We will import necessary libraries such as OpenAI, Gemini, Cohere, os.

Python
from openai import OpenAI
import google.generativeai as genai
import cohere
from textwrap import shorten
import os

Step 3: Initialize API Keys and Clients

We need to set the API keys, here we are suing OpenAI, Gemini and Cohere, we can add more models as well to get more better and refined results.

Python
import os
import cohere
import requests
import google.generativeai as genai
from openai import OpenAI
from mistralai import Mistral
from mistralai.client import MistralClient

os.environ["COHERE_API_KEY"] = ""
os.environ["OPENAI_API_KEY"] = ""
os.environ["GEMINI_API_KEY"] = ""


openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
co = cohere.Client(api_key=os.getenv("COHERE_API_KEY"))

Step 4: Define Query Prompt

  • The query asks all models to analyze the societal, economic and philosophical impact of AGI development under different paradigms.
  • This complex and open-ended question helps test reasoning and coherence.
Python
query = """Compare and contrast the long-term societal, economic, 
and philosophical consequences of Artificial General Intelligence (AGI) 
emerging under three paradigms:
1. Open-source decentralized AGI driven by global collaboration.
2. Corporate-controlled proprietary AGI systems optimized for profit.
3. Government-regulated AGI programs under national sovereignty.
Discuss governance, inequality, innovation speed, and global cooperation.
Conclude which path is most sustainable for humanity and why."""

Step 5: Define Model Response Functions

  • safe_generate_openai(): Fetches a response from OpenAI GPT-4o-mini.
  • safe_generate_gemini(): Generates a response using Google Gemini 2.0 Flash.
  • dummy_response(): Simulated fallback for demonstration or testing.
Python
def safe_generate_openai(prompt):
    try:
        res = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}]
        )
        return res.choices[0].message.content.strip()
    except Exception as e:
        return f"Error fetching from OpenAI: {e}"


def safe_generate_gemini(prompt):
    try:
        model = genai.GenerativeModel("gemini-2.0-flash")
        res = model.generate_content(prompt)
        return res.text.strip()
    except Exception as e:
        return f"Error fetching from Gemini: {e}"


def dummy_response(prompt):
    return f"Simulated response for: {prompt[:300]}..."

Step 6: Collect Responses from All Models

  • Runs all three model functions for the same prompt.
  • Stores their responses as (model_name, output) pairs in a list for reranking.
Python
candidates = [
    ("OpenAI GPT-4o-mini", safe_generate_openai(query)),
    ("Google Gemini 2.5 Flash", safe_generate_gemini(query)),
    ("Dummy Baseline", dummy_response(query))
]

Step 7: Debugging and Validating Candidates

  • Prints a short preview of each model’s response.
  • Skips any invalid or failed model outputs (e.g., API errors).
  • Prepares a list of valid documents for reranking.
Python
print("=== Debug: Raw candidates ===\n")

valid_docs, model_names = [], []

for idx, (model, text) in enumerate(candidates):
    print(f"{idx}: <class 'str'> -> ({model}, {shorten(text, width=120)})\n")
    if text and not text.lower().startswith("error"):
        valid_docs.append(text)
        model_names.append(model)
    else:
        print(f"Skipping invalid or empty document from {model}\n")

print(f"Valid documents for reranking: {len(valid_docs)}\n")

if not valid_docs:
    raise ValueError("No valid text found for reranking.")

Output:

Screenshot-2025-10-31-171019
Result

Step 8: Apply Cohere Reranking

  • Uses Cohere’s rerank-english-v3.0 model.
  • Assigns a relevance score to each document based on how well it answers the query.
  • Orders them by descending relevance.
Python
rerank_results = co.rerank(
    model="rerank-english-v3.0",
    query=query,
    documents=valid_docs,
    top_n=len(valid_docs)
)

Step 9: Display Final Ranked Results

  • Prints model names, scores and text snippets in order of relevance.
  • Displays only the top portion of each response for clarity.
Python
print("=== Final Ranked Results by Cohere Reranker ===\n")

scores, labels = [], []

for rank, result in enumerate(rerank_results.results, start=1):
    score = round(result.relevance_score, 3)
    model_name = model_names[result.index]
    doc_text = getattr(result.document, "text",
                       None) or valid_docs[result.index]
    snippet = shorten(doc_text, width=400)

    print(f"Rank {rank} | {model_name} | Score: {score}\n{snippet}\n")

    scores.append(score)
    labels.append(model_name)

Output:

Screenshot-2025-10-31-171008
Result

Advantages

  • Improved Accuracy: Selects the most contextually relevant and coherent answer among multiple LLMs.
  • Cross-Model Validation: Ensures consistency and reduces hallucinations through comparative ranking.
  • Scalable Approach: Easily expandable to include other models (Claude, Mistral, etc.).
  • Automation Friendly: Enables automated quality filtering in production pipelines.

Limitations

  • Dependency on API Availability: Requires active API keys and stable connectivity.
  • Latency: Multiple model calls increase response time.
  • Cost: Running several LLMs per query can raise token usage and cost.
  • Subjective Relevance: Reranker’s score may not always align with human judgment in creative tasks.


Comment

Explore