Skip to content

feat(async-jobs): async execution with job queue backends#3134

Merged
waleedlatif1 merged 11 commits intostagingfrom
fix/async
Feb 4, 2026
Merged

feat(async-jobs): async execution with job queue backends#3134
waleedlatif1 merged 11 commits intostagingfrom
fix/async

Conversation

@waleedlatif1
Copy link
Collaborator

@waleedlatif1 waleedlatif1 commented Feb 4, 2026

Summary

  • Add async job queue system with Redis, database, and trigger.dev backends
  • Jobs auto-cleanup via Redis TTL (24h) and cron retention cleanup
  • Optimized DB indexes for cleanup queries
  • Add createMockRedis() to @sim/testing

Type of Change

  • New feature

Testing

  • Unit tests for Redis backend and shared constants (14 tests)
  • Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)
Cursor Bugbot found 2 potential issues for commit fe401f2
@vercel
Copy link

vercel bot commented Feb 4, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Feb 4, 2026 10:42pm

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 4, 2026

Greptile Overview

Greptile Summary

This PR implements a robust async job queue system with three backend options (Trigger.dev, Redis, Database) using an automatic fallback chain. The implementation follows a clean adapter pattern with a unified JobQueueBackend interface.

Key Changes:

  • Added async job queue infrastructure with Redis, Database, and Trigger.dev backends
  • Redis backend uses TTL-based auto-cleanup (48h max lifetime, 24h retention for completed/failed)
  • Database backend properly increments attempts using SQL expressions
  • Unified job status API (/api/jobs/[jobId]) with access control
  • Extended cron cleanup to handle stale async jobs and enforce retention policy
  • Added dedicated async timeout environment variables (90 min default vs 5-50 min sync)
  • Workflow execution routes now support async mode via X-Execution-Mode: async header
  • Fire-and-forget execution for Redis/DB backends; Trigger.dev handles its own workers
  • Comprehensive unit tests (14 tests) and new createMockRedis() test utility

Architecture:
The system uses a singleton pattern for backend selection with fallback logic: Trigger.dev → Redis → Database. For Redis/DB backends, jobs execute inline (fire-and-forget) after being queued. The cleanup cron handles both stale job detection and retention enforcement.

Database Changes:
New async_jobs table with composite indexes on (status, started_at) and (status, completed_at) optimized for cleanup queries.

Confidence Score: 4.5/5

  • This PR is safe to merge with high confidence - well-architected async job system with proper cleanup
  • Score reflects solid architecture, comprehensive tests, and proper retry handling. Previous concerns about Redis attempts counter were addressed (uses hincrby). Minor deduction for complexity of inline execution pattern and potential race conditions.
  • Pay attention to apps/sim/app/api/workflows/[id]/execute/route.ts (complex async integration) and the inline fire-and-forget execution pattern in Redis/DB backends

Important Files Changed

Filename Overview
apps/sim/lib/core/async-jobs/config.ts Implements backend selection logic with fallback chain: trigger.dev → redis → database
apps/sim/lib/core/async-jobs/backends/redis.ts Redis backend implementation with TTL-based cleanup and job state management
apps/sim/lib/core/async-jobs/backends/database.ts Database backend using Drizzle ORM, properly increments attempts using SQL expression
packages/db/schema.ts Added async_jobs table with composite indexes for efficient cleanup queries
apps/sim/app/api/cron/cleanup-stale-executions/route.ts Extended cron to cleanup stale async jobs and enforce retention policy
apps/sim/app/api/workflows/[id]/execute/route.ts Integrated async execution mode with job queue and inline fire-and-forget execution

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as API Route<br/>/api/workflows/[id]/execute
    participant Config as Job Queue Config
    participant Backend as Job Backend<br/>(Redis/DB/Trigger.dev)
    participant Worker as Background Worker<br/>(executeWorkflowJob)
    participant Executor as Workflow Executor

    Client->>API: POST with X-Execution-Mode: async
    API->>Config: getJobQueue()
    Config->>Config: Determine backend type<br/>(trigger.dev → redis → database)
    Config-->>API: Return backend instance
    
    API->>Backend: enqueue('workflow-execution', payload)
    Backend->>Backend: Generate jobId (run_xxxx)
    Backend->>Backend: Store job with status=pending
    alt Redis Backend
        Backend->>Backend: Set TTL=48h (max lifetime)
    end
    Backend-->>API: Return jobId
    
    API->>Client: 202 Accepted<br/>{jobId, statusUrl, async: true}
    
    alt shouldExecuteInline() == true (Redis/DB)
        API->>Worker: Fire-and-forget execution
        Worker->>Backend: startJob(jobId)
        Backend->>Backend: Set status=processing<br/>Increment attempts
        Worker->>Executor: executeWorkflowCore()
        Executor-->>Worker: Execution result
        
        alt Success
            Worker->>Backend: completeJob(jobId, output)
            Backend->>Backend: Set status=completed<br/>Store output
            alt Redis Backend
                Backend->>Backend: Set TTL=24h (retention)
            end
        else Failure
            Worker->>Backend: markJobFailed(jobId, error)
            Backend->>Backend: Set status=failed<br/>Store error
            alt Redis Backend
                Backend->>Backend: Set TTL=24h (retention)
            end
        end
    else Trigger.dev Backend
        Note over Backend: Trigger.dev handles execution<br/>via its own worker system
    end
    
    Client->>API: GET /api/jobs/{jobId}
    API->>Backend: getJob(jobId)
    Backend-->>API: Job with status & output
    API-->>Client: Job status response
Loading
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@waleedlatif1
Copy link
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Collaborator Author

@greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

waleedlatif1 and others added 3 commits February 4, 2026 13:37
Resolve conflict in env.ts - keep staging's sync timeout values (3000)
while adding async timeout variants (5400)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@waleedlatif1
Copy link
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Collaborator Author

@greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@waleedlatif1
Copy link
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Collaborator Author

@greptile

@waleedlatif1
Copy link
Collaborator Author

@cursor review

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

@waleedlatif1 waleedlatif1 merged commit 8d846c5 into staging Feb 4, 2026
11 checks passed
@waleedlatif1 waleedlatif1 deleted the fix/async branch February 4, 2026 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant