Skip to content

PYTHON-5903 : Retry attempts should share a single stable operation_id#2901

Open
shrey-mongo wants to merge 1 commit into
mongodb:masterfrom
shrey-mongo:shrey-mongo/PYTHON-5903
Open

PYTHON-5903 : Retry attempts should share a single stable operation_id#2901
shrey-mongo wants to merge 1 commit into
mongodb:masterfrom
shrey-mongo:shrey-mongo/PYTHON-5903

Conversation

@shrey-mongo

@shrey-mongo shrey-mongo commented Jun 29, 2026

Copy link
Copy Markdown

PYTHON-5903: Retry attempts should share a single stable operation_id

Changes in this PR

Command-monitoring events carry an operation_id to correlate the events of one logical operation. For single-document CRUD and cursor operations, each retry attempt got a different operation_id (it defaulted to the per-attempt request_id), so consumers couldn't group a retried operation's events. Bulk writes already shared one across batches.

  • _ClientConnectionRetryable mints the operation_id once and stamps it on the checked-out connection (conn.op_id) on every attempt
  • Connection gains an op_id attribute, reset on check-in.
  • pool.command and server.run_operation source op_id from the connection and pass it to command_runner.run_command / run_cursor_command, which thread it through to the publish_command_* events.

Test Plan

  • New test/asynchronous/test_operation_id_retry.py forces multiple retries via a failCommand failpoint across a read/write matrix and asserts all command events share one operation_id; verifies non-retryable multi-writes don't retry.
  • Ran the new tests plus the existing command-monitoring and retryable reads/writes suites against a single-node replica set which passed.

Checklist

Checklist for Author

  • Did you update the changelog (if necessary)?
  • Is there test coverage?
  • Is any followup work tracked in a JIRA ticket? If so, add link(s).

Checklist for Reviewer

  • Does the title of the PR reference a JIRA Ticket?
  • Do you fully understand the implementation? (Would you be comfortable explaining how this code works to someone else?)
  • Is all relevant documentation (README or docstring) updated?
@shrey-mongo shrey-mongo requested a review from a team as a code owner June 29, 2026 15:26
@shrey-mongo shrey-mongo requested a review from aclark4life June 29, 2026 15:26
@shrey-mongo shrey-mongo force-pushed the shrey-mongo/PYTHON-5903 branch 2 times, most recently from 247b26f to 682dbac Compare June 29, 2026 15:56
@NoahStapp NoahStapp requested review from NoahStapp and Copilot and removed request for aclark4life June 29, 2026 16:44
@NoahStapp

Copy link
Copy Markdown
Contributor

Please fill out the pull request template completely.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ensures command-monitoring operation_id remains stable across retry attempts for retryable reads/writes and relevant cursor operations, improving APM correlation for a single logical operation.

Changes:

  • Mint a single per-operation operation_id in the retryable-operation wrapper and stamp it onto the checked-out connection for each attempt.
  • Thread the connection’s op_id through pool/server command execution into command_runner so published command-monitoring events use it.
  • Add sync + async integration tests that force multiple retries via failCommand and assert all related command events share one operation_id.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tools/synchro.py Registers the new test file in the synchro conversion list.
test/test_operation_id_retry.py Sync integration coverage asserting operation_id stability across retries (and no retry for non-retryable multi-writes).
test/asynchronous/test_operation_id_retry.py Async equivalent of the retry operation_id stability tests.
pymongo/synchronous/server.py Passes conn.op_id into cursor command execution for APM publishing.
pymongo/synchronous/pool.py Adds Connection.op_id, forwards it into run_command, and clears it on check-in.
pymongo/synchronous/mongo_client.py Mints/stabilizes per-operation ids across retries and stamps them on checked-out connections.
pymongo/synchronous/command_runner.py Accepts/threads op_id through run_command / run_cursor_command into _run_command.
pymongo/asynchronous/server.py Async counterpart: passes conn.op_id for cursor command execution.
pymongo/asynchronous/pool.py Async counterpart: adds/forwards/clears op_id on connections.
pymongo/asynchronous/mongo_client.py Async counterpart: stable per-operation id and per-attempt stamping onto connections.
pymongo/asynchronous/command_runner.py Async counterpart: threads op_id through command execution to APM event publishing.
@shrey-mongo

Copy link
Copy Markdown
Author

Updated description to include full template

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

"data": {
"failCommands": [command_name],
"closeConnection": True,
"appName": _APP_NAME,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class needs to add the @async_client_context.require_failCommand_appName annotation to safely use appName.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread pymongo/asynchronous/command_runner.py Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be updated to use the new operation_id, if present.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@shrey-mongo shrey-mongo requested a review from NoahStapp June 30, 2026 15:04
@NoahStapp

Copy link
Copy Markdown
Contributor

There are merge conflicts, please resolve.

@shrey-mongo shrey-mongo force-pushed the shrey-mongo/PYTHON-5903 branch from 5546477 to 75f4e21 Compare June 30, 2026 15:46
@shrey-mongo

Copy link
Copy Markdown
Author

I fixed the merge conflict and the typing error that was causing the workflow to fail

func=func,
)

def require_failCommand_appName(self, func):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this change?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was needed to fix an untyped-decorator mypy error that was occurring in my test. This matches the signature of require_failCommand_fail_point.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need both the require_failCommand_fail_point and require_failCommand_appName annotations, the latter is a strict subset of the former.

# For gossiping $clusterTime from the connection handshake to the client.
self._cluster_time = None
# Stable operation id for the operation currently using this connection.
self.op_id: Optional[int] = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to track op_id on connections. It introduces more risk of accidental inclusion or staleness on operations like auth compared to explicitly passing it through each operation. What about passing op_id as a kwarg on Connection.command() and on _Query/_GetMore where operations are actually initiated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants