Clear runner state of previous executions before warm starts #1905

nicktrn · 2025-04-09T14:40:34Z

The main change in here is the clearing of prior state, i.e. previous run execution and its completion, before proceeding to the warm start phase. Also adds excessive debug logs (we'll have to disable these at some point).

Summary by CodeRabbit

New Features
- Introduced a configurable wait period before process suspension for customizable control.
- Enabled explicit process suspension with refined error reporting.
Refactor
- Enhanced logging and diagnostic messaging for clearer tracking of process status and execution states.
- Updated execution controls to support flexible handling of run attempts, including warm start behavior.

changeset-bot · 2025-04-09T14:40:38Z

⚠️ No Changeset found

Latest commit: 74cca5a

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2025-04-09T14:40:42Z

Walkthrough

This pull request introduces several enhancements across multiple modules. The changes improve logging in the run controller by incorporating environment variables and detailed contextual data. In addition, the task execution flow now supports process suspension with refined error handling, including a new SuspendedProcessError. The updates also allow for a warm start option during task execution through modified method signatures in both the task run process and task executor.

Changes

File(s)	Change Summary
`packages/cli-v3/src/entryPoints/managed-run-controller.ts`	Added new environment variable `TRIGGER_PRE_SUSPEND_WAIT_MS` (default: 200ms). Enhanced logging: replaced console logs with `sendDebugLog` including additional context. Refined snapshot handling and updated method signatures for better run attempt management.
`packages/cli-v3/src/executions/taskRunProcess.ts`	Introduced a new private property `_isBeingSuspended` to track suspension state. Updated `execute` method to accept an optional `isWarmStart` flag. Added a public `suspend` method that triggers process suspension using a SIGKILL signal and adjusts error handling in the exit flow.
`packages/core/src/v3/errors.ts`	Added a new error class `SuspendedProcessError` extending the native Error class, designed to represent a suspended process state.
`packages/core/src/v3/workers/taskExecutor.ts`	Modified the `execute` method signature to take an optional `isWarmStart` parameter, updating the logic that sets the warm start state during task execution.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Controller as ManagedRunController
  participant Executor as TaskExecutor
  participant Process as TaskRunProcess

  Client->>Controller: startAndExecuteRunAttempt(...)
  Controller->>Controller: Log run start details (env & context)
  Controller->>Executor: execute(..., isWarmStart?)
  Executor->>Process: execute(..., isWarmStart?)
  alt Process triggers suspension
      Process->>Process: Set _isBeingSuspended = true
      Process->>Process: kill("SIGKILL")
      Process->>Controller: Return SuspendedProcessError
      Controller->>Client: Log suspension event with detailed context
  else Normal execution
      Process->>Controller: Return successful execution outcome
      Controller->>Client: Log execution events and status updates
  end

Possibly related PRs

Misc v4 fixes and improved logging #1831: Addresses enhancements in the ManagedRunController logging and execution status handling, aligning with the updates in this PR.
Fix v4 restore race condition #1870: Modifies the handling of run statuses in the controller, particularly for "FINISHED" state, directly relating to the refined snapshot handling.
Add v4 timeline metrics to prod runs #1884: Involves adjustments to the run attempt method parameters for tracking execution timings, which parallels the method signature updates in this PR.

Suggested reviewers

ericallam

Poem

I'm a little rabbit, hopping on code paths so bright,
With logs that sing and variables that shine in the night.
I suspend and I execute with a twitch of my ear,
My code runs smooth, never a bug to fear.
Hoppy changes abound—let's celebrate with delight!
🥕🐇

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8382274 and 74cca5a.

📒 Files selected for processing (4)

packages/cli-v3/src/entryPoints/managed-run-controller.ts (35 hunks)
packages/cli-v3/src/executions/taskRunProcess.ts (6 hunks)
packages/core/src/v3/errors.ts (1 hunks)
packages/core/src/v3/workers/taskExecutor.ts (2 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (2)

packages/core/src/v3/workers/taskExecutor.ts (3)

packages/core/src/v3/timeout/api.ts (1)

signal (27-29)

packages/core/src/v3/timeout/usageTimeoutManager.ts (1)

signal (12-14)

packages/core/src/v3/taskContext/index.ts (1)

isWarmStart (34-36)

packages/cli-v3/src/entryPoints/managed-run-controller.ts (2)

packages/core/src/v3/workers/taskExecutor.ts (3)

execution (997-1120)

execution (1247-1263)

result (1123-1170)

packages/core/src/v3/errors.ts (1)

SuspendedProcessError (500-506)

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: units / 🧪 Unit Tests
GitHub Check: typecheck / typecheck

🔇 Additional comments (39)

packages/core/src/v3/errors.ts (1)

500-506: Add new SuspendedProcessError class
This explicit error type helps represent a suspended process state more clearly.

packages/core/src/v3/workers/taskExecutor.ts (2)

92-93: Extended method signature to accept isWarmStart
No issues found. This optional parameter cleanly integrates into the existing function signature.

106-106: Propagate isWarmStart in task context
Good fallback logic with isWarmStart ?? this._isWarmStart; no concerns here.

packages/cli-v3/src/executions/taskRunProcess.ts (5)

32-32: Import SuspendedProcessError
No issues here; consistent with updated error handling.

77-77: Introduce _isBeingSuspended field
Sets default to false and enables clear tracking of the suspension state.

218-221: Extend execute(...) signature with isWarmStart
This addition is consistent with the warm start logic across the codebase.

355-359: Reject with SuspendedProcessError if isBeingSuspended
Ensures the correct error type is thrown, reflecting the process's suspension.

440-443: New suspend() method
This sets _isBeingSuspended and kills the process with SIGKILL to trigger suspension handling.

packages/cli-v3/src/entryPoints/managed-run-controller.ts (31)

11-11: Import SuspendedProcessError
No issues; aligns with the new error handling strategy.

59-59: Add TRIGGER_PRE_SUSPEND_WAIT_MS
Provides a configurable wait time before suspending. This is a useful addition.

174-185: Initialize controller log properties
Populating these properties for debug logs is a good practice.

186-188: Set heartbeat/snapshot intervals
No issues found; it clearly sets default intervals.

190-191: Initialize metadataClient
Allows dynamic environment overrides. Looks good.

208-211: Log snapshot poll skipping
Helps clarify execution flow when no run is present.

231-236: Log snapshot poll failure
Useful for diagnosing poll issues.

256-260: Log polling error
No concerns here; improves visibility into failures.

267-269: Log skipping heartbeat
Makes sense when run or snapshot ID is missing.

274-276: Log heartbeat started
Further clarity on lifecycle events, no issues.

305-309: Log SIGTERM
Provides helpful debugging information for graceful shutdown.

358-361: Log snapshot not changed
No concerns; ensures we skip redundant updates.

401-405: Log skipping exit from run phase
No issues; good for clarifying logic branches.

411-415: Log same run
Prevents unnecessary transition if the run ID matches.

420-423: Log exiting run phase
Clearly indicates the transition out of a run phase.

482-484: Log handleSnapshotChange locked
No issues; short-circuits concurrent snapshot updates.

492-499: Log missing snapshot
Straightforward debug log for clarity.

515-518: Log snapshot not changed
Good for diagnosing whether updates to snapshots are needed.

563-569: Log failed to cancel attempt
No concerns; helps diagnose cancellation issues.

578-581: Log run finished
Indicates the run's terminal state.

584-587: Suspend run if there's an active execution
Ensures clearing prior state before finishing.

624-624: Wait using TRIGGER_PRE_SUSPEND_WAIT_MS
Implements a pause before suspension; this is a sensible approach.

628-634: Log snapshot changed after threshold
No issues; clarifies concurrency timing.

640-647: Log missing run ID or snapshot ID after suspension
No concerns; handles edge case.

656-663: Log failed to suspend run
Helps track suspension failures.

690-694: Log run is suspending
Acknowledges the suspension request.

705-706: Suspend run
Triggers the new SuspendedProcessError approach.

711-714: Log run pending execution
No issues; a minor but clear logging addition.

887-888: Add isWarmStart param
No concerns; consistent with PR objective of clearing state between warm starts.

897-898: Skip lock check for immediate retry
Helps concurrency by bypassing the lock under certain conditions.

1003-1004: Suspend run logic on SuspendedProcessError
Allows the system to handle suspended runs gracefully.

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai plan to trigger planning for file edits and PR creation.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

nicktrn added 9 commits April 8, 2025 22:04

attach all run controller logs to the run

eae8de6

make controller-level pre-suspend wait duration configurable

ef3b388

snapshot status should remain EXECUTING for short retry delays

3c57c26

add suspended process error

c787086

ensure clean slate before waiting for next run

1429ecc

Merge remote-tracking branch 'origin/main' into fix/runner-clean-slate

d2a0177

treat immediate retries as warm starts

7f62aee

fix for finished runs waiting forever

e424c44

Merge remote-tracking branch 'origin/main' into fix/runner-clean-slate

73e07d8

Merge remote-tracking branch 'origin/main' into fix/runner-clean-slate

74cca5a

matt-aitken approved these changes Apr 9, 2025

View reviewed changes

nicktrn merged commit 98a3cfb into main Apr 9, 2025
12 checks passed

matt-aitken deleted the fix/runner-clean-slate branch April 9, 2025 14:53

coderabbitai bot mentioned this pull request Apr 28, 2025

Fix managed run controller edge cases #1987

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clear runner state of previous executions before warm starts #1905

Clear runner state of previous executions before warm starts #1905

nicktrn commented Apr 9, 2025 •

edited by coderabbitai bot

Loading

changeset-bot bot commented Apr 9, 2025 •

edited

Loading

coderabbitai bot commented Apr 9, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Clear runner state of previous executions before warm starts #1905

Clear runner state of previous executions before warm starts #1905

Conversation

nicktrn commented Apr 9, 2025 • edited by coderabbitai bot Loading

Summary by CodeRabbit

changeset-bot bot commented Apr 9, 2025 • edited Loading

⚠️ No Changeset found

coderabbitai bot commented Apr 9, 2025 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

nicktrn commented Apr 9, 2025 •

edited by coderabbitai bot

Loading

changeset-bot bot commented Apr 9, 2025 •

edited

Loading

coderabbitai bot commented Apr 9, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)