Skip to content

feat(guardrails): add before_final_action hook for workflow and chat final commits#534

Open
nitinawari wants to merge 1 commit into
GenAI-Security-Project:mainfrom
nitinawari:feat/Guardrail-framework
Open

feat(guardrails): add before_final_action hook for workflow and chat final commits#534
nitinawari wants to merge 1 commit into
GenAI-Security-Project:mainfrom
nitinawari:feat/Guardrail-framework

Conversation

@nitinawari

@nitinawari nitinawari commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds the third guardrail hook point for FinBot Labs — before_final_action — so defenders can inspect and score agent outcomes before they are committed. This completes the hook trio required for Deliverable A: before tool, after tool, and before agent final actions.
The hook fires when:

  • Workflow agents call complete_task (including forced-failure paths: stall, iteration error, exhausted iterations)
  • Chat assistants save the final assistant reply
    Guardrails remain passive: a block verdict is logged and scorable via GuardrailPreventionDetector, but does not stop execution.

What changed

Guardrail core

  • finbot/guardrails/schemas.py — Added HookKind.before_final_action; extended HookEnvelope with agent_name, task_status, task_summary
  • finbot/guardrails/service.pyinvoke() accepts and emits final-action fields on webhook payloads and agent.guardrail.* events

Agent integration

  • finbot/agents/base.py

    • _invoke_before_final_action_guardrail() helper for complete_task
    • Tool loop routes complete_taskbefore_final_action (not before_tool)
    • after_tool fires on complete_task before log_task_completion
    • Forced completion paths (stall / error / max iterations) invoke the hook before complete_task
  • finbot/agents/chat.py

    • _invoke_before_final_action_guardrail() for final chat replies (tool_name: chat_response)
    • Hook runs before _save_message("assistant", ...)

Labs configuration & UI

  • finbot/core/data/models.py — Default hooks include before_final_action: true
  • finbot/core/data/repositories.pyVALID_HOOK_KINDS includes before_final_action
  • finbot/apps/labs/templates/pages/guardrails.html — Checkbox for "Before Final Action"; "Test before_final_action" button
  • finbot/apps/labs/routes/guardrails.pyPOST /api/v1/guardrails/test/before-final-action

CTF detection

  • finbot/ctf/detectors/implementations/guardrail_prevention.py
    • Supports required_hook_kind: before_final_action
    • Optional required_task_status filter
    • Final-action evidence: agent_name, task_status, task_summary

Tests

  • tests/unit/labs/test_guardrail_final_action.py — Integration tests for base agent tool loop + chat stream_response ordering
  • tests/unit/labs/test_guardrail_service.py — Webhook payload test for before_final_action
  • tests/unit/labs/test_guardrail_detector.py — Detector tests for final-action block + required_task_status
  • tests/unit/labs/test_guardrail_config.py — Default hooks assertion updated

Test plan

  • uv run pytest tests/unit/labs/test_guardrail_final_action.py -v (7 tests)
  • uv run pytest tests/unit/labs/ -v
  • Labs → configure webhook → enable Before Final ActionTest before_final_action → webhook receives payload
  • Run a workflow until an agent calls complete_task → Guardrail Activity shows hook_kind: before_final_action
  • Send a chat message → Activity shows before_final_action with tool_name: chat_response
  • Existing Guardrail 101 / Carte Noire (before_tool) still work unchanged

Notes

  • Passive only — enforcement (actually blocking on block verdict) is out of scope for this PR; discuss with mentor separately
  • Existing Labs configs — saved hooks_json without before_final_action will have the hook disabled until users re-save config in Labs UI
  • Chat streaming — tokens may reach the client before the final-action hook; hook gates DB commit, not first streamed token
  • No new Labs challenge YAML in this PR — paired blue-path challenge can follow in a separate PR

GSoC mapping

Week 1-2 (phase 1)

  • Deliverable A : Third hook point — before agent final actions — for workflow agents and chat
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant