Improving LLM Accuracy using Reranking

Reranking is an advanced post-processing technique that enhances the accuracy and relevance of outputs generated by Large Language Models (LLMs). Instead of relying on a single model’s output, multiple candidate responses are generated from various sources (like OpenAI, Gemini or Mistral) and a reranker model such as Cohere Rerank is applied to reorder them based on semantic alignment with the query. This ensures the final response is the most contextually accurate and meaningful one.

Improve LLM output accuracy by ranking multiple candidate responses.
Generate responses from multiple LLMs and rerank them using semantic relevance.
Tools used include OpenAI GPT-4o-mini, Google Gemini 2.5 Flash and Cohere Reranker.
The process involves candidate generation, filtering invalid responses, reranking via Cohere and displaying ranked outputs.
The outcome is a structured evaluation that selects the most relevant and high-quality answer among all candidates.

Implementation

Let's see the implementation,

Step 1: Install Dependencies

We need to install the required packages,

Python

!pip install -q --upgrade openai google-generativeai cohere

Step 2: Import Libraries

We will import necessary libraries such as OpenAI, Gemini, Cohere, os.

Python

from openai import OpenAI
import google.generativeai as genai
import cohere
from textwrap import shorten
import os

Step 3: Initialize API Keys and Clients

We need to set the API keys, here we are suing OpenAI, Gemini and Cohere, we can add more models as well to get more better and refined results.

Python

import os
import cohere
import requests
import google.generativeai as genai
from openai import OpenAI
from mistralai import Mistral
from mistralai.client import MistralClient

os.environ["COHERE_API_KEY"] = ""
os.environ["OPENAI_API_KEY"] = ""
os.environ["GEMINI_API_KEY"] = ""


openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
co = cohere.Client(api_key=os.getenv("COHERE_API_KEY"))

Step 4: Define Query Prompt

The query asks all models to analyze the societal, economic and philosophical impact of AGI development under different paradigms.
This complex and open-ended question helps test reasoning and coherence.

Python

query = """Compare and contrast the long-term societal, economic, 
and philosophical consequences of Artificial General Intelligence (AGI) 
emerging under three paradigms:
1. Open-source decentralized AGI driven by global collaboration.
2. Corporate-controlled proprietary AGI systems optimized for profit.
3. Government-regulated AGI programs under national sovereignty.
Discuss governance, inequality, innovation speed, and global cooperation.
Conclude which path is most sustainable for humanity and why."""

Step 5: Define Model Response Functions

safe_generate_openai(): Fetches a response from OpenAI GPT-4o-mini.
safe_generate_gemini(): Generates a response using Google Gemini 2.0 Flash.
dummy_response(): Simulated fallback for demonstration or testing.

Python

def safe_generate_openai(prompt):
    try:
        res = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}]
        )
        return res.choices[0].message.content.strip()
    except Exception as e:
        return f"Error fetching from OpenAI: {e}"


def safe_generate_gemini(prompt):
    try:
        model = genai.GenerativeModel("gemini-2.0-flash")
        res = model.generate_content(prompt)
        return res.text.strip()
    except Exception as e:
        return f"Error fetching from Gemini: {e}"


def dummy_response(prompt):
    return f"Simulated response for: {prompt[:300]}..."

Step 6: Collect Responses from All Models

Runs all three model functions for the same prompt.
Stores their responses as (model_name, output) pairs in a list for reranking.

Python

candidates = [
    ("OpenAI GPT-4o-mini", safe_generate_openai(query)),
    ("Google Gemini 2.5 Flash", safe_generate_gemini(query)),
    ("Dummy Baseline", dummy_response(query))
]

Step 7: Debugging and Validating Candidates

Prints a short preview of each model’s response.
Skips any invalid or failed model outputs (e.g., API errors).
Prepares a list of valid documents for reranking.

Python

print("=== Debug: Raw candidates ===\n")

valid_docs, model_names = [], []

for idx, (model, text) in enumerate(candidates):
    print(f"{idx}: <class 'str'> -> ({model}, {shorten(text, width=120)})\n")
    if text and not text.lower().startswith("error"):
        valid_docs.append(text)
        model_names.append(model)
    else:
        print(f"Skipping invalid or empty document from {model}\n")

print(f"Valid documents for reranking: {len(valid_docs)}\n")

if not valid_docs:
    raise ValueError("No valid text found for reranking.")

Output:

Step 8: Apply Cohere Reranking

Uses Cohere’s rerank-english-v3.0 model.
Assigns a relevance score to each document based on how well it answers the query.
Orders them by descending relevance.

Python

rerank_results = co.rerank(
    model="rerank-english-v3.0",
    query=query,
    documents=valid_docs,
    top_n=len(valid_docs)
)

Step 9: Display Final Ranked Results

Prints model names, scores and text snippets in order of relevance.
Displays only the top portion of each response for clarity.

Python

print("=== Final Ranked Results by Cohere Reranker ===\n")

scores, labels = [], []

for rank, result in enumerate(rerank_results.results, start=1):
    score = round(result.relevance_score, 3)
    model_name = model_names[result.index]
    doc_text = getattr(result.document, "text",
                       None) or valid_docs[result.index]
    snippet = shorten(doc_text, width=400)

    print(f"Rank {rank} | {model_name} | Score: {score}\n{snippet}\n")

    scores.append(score)
    labels.append(model_name)

Output:

Advantages

Improved Accuracy: Selects the most contextually relevant and coherent answer among multiple LLMs.
Cross-Model Validation: Ensures consistency and reduces hallucinations through comparative ranking.
Scalable Approach: Easily expandable to include other models (Claude, Mistral, etc.).
Automation Friendly: Enables automated quality filtering in production pipelines.

Limitations

Dependency on API Availability: Requires active API keys and stable connectivity.
Latency: Multiple model calls increase response time.
Cost: Running several LLMs per query can raise token usage and cost.
Subjective Relevance: Reranker’s score may not always align with human judgment in creative tasks.

Improving LLM Accuracy using Reranking

Implementation

Step 1: Install Dependencies

Step 2: Import Libraries

Step 3: Initialize API Keys and Clients

Step 4: Define Query Prompt

Step 5: Define Model Response Functions

Step 6: Collect Responses from All Models

Step 7: Debugging and Validating Candidates

Step 8: Apply Cohere Reranking

Step 9: Display Final Ranked Results

Advantages

Limitations

Explore