Skip to content

Conversation

kaovilai
Copy link
Collaborator

@kaovilai kaovilai commented Sep 9, 2025

Summary

This PR fixes issue #8279 where BackupRepositories become stale when BackupStorageLocation (BSL) configuration is updated or created while Velero is not running.

Problem

When a BSL is updated/created while Velero is not running, existing BackupRepositories that reference the BSL become stale and continue using the old configuration. This prevents successful backups/restores until the repositories are manually deleted.

Solution

The implementation validates BackupRepository configurations against their associated BackupStorageLocation on controller startup. If BSL configuration (bucket, prefix, CACert, or config) has changed while Velero was not running, the affected repositories are invalidated and will be re-established.

Implementation Details

Core Changes:

  1. Startup Validation (validateBackupRepositoriesOnStartup):

    • Runs once on first reconciliation after Velero starts
    • Compares stored BSL configuration in repository annotations with current BSL configuration
    • Invalidates repositories where configuration has changed
  2. Configuration Tracking:

    • Stores BSL configuration (bucket, prefix, CACert hash, config) as annotations on BackupRepository resources
    • Uses annotations: velero.io/bsl-bucket, velero.io/bsl-prefix, velero.io/bsl-cacert-hash, velero.io/bsl-config
  3. Shared Comparison Logic (compareBSLConfigs):

    • Centralized function to compare BSL configurations
    • Used by both startup validation and runtime update detection
    • Eliminates code duplication between needInvalidBackupRepoOnStartup and needInvalidBackupRepo
  4. Thread Safety:

    • Uses mutex to ensure startup validation runs only once even with concurrent reconciliations
    • Validation runs asynchronously to avoid blocking reconciliation

Testing

Unit Tests Added:

  • TestValidateBackupRepositoriesOnStartup: Tests the startup validation logic with various scenarios
  • TestNeedInvalidBackupRepoOnStartup: Tests the comparison logic for startup validation

E2E Test Added:

  • New test file: test/e2e/bsl-mgmt/startup_validation.go
  • Test scenario simulates BSL configuration change while Velero is stopped:
    1. Creates a backup to establish a BackupRepository
    2. Scales down Velero deployment (simulating shutdown)
    3. Modifies BSL configuration (changes prefix)
    4. Scales up Velero deployment (simulating startup)
    5. Verifies repository is invalidated with correct error message
    6. Restores original BSL configuration
    7. Verifies repository recovers to Ready state

Files Changed:

  • pkg/controller/backup_repository_controller.go: Core implementation
  • pkg/controller/backup_repository_controller_test.go: Unit tests
  • pkg/apis/velero/v1/labels_annotations.go: BSL annotation constants
  • test/e2e/bsl-mgmt/startup_validation.go: E2E test
  • test/e2e/e2e_suite_test.go: Test registration
  • .github/workflows/e2e-test-kind.yaml: CI test matrix
  • changelogs/unreleased/8279-kaovilai: Changelog entry

Fixes

Fixes #8279

Test plan

  • Unit tests added and passing
  • E2E test added for startup validation scenario
  • Verified repositories are invalidated on startup when configs have changed
  • Verified repositories remain valid when configs haven't changed
  • All existing tests pass

Note

Responses generated with Claude

Copy link

codecov bot commented Sep 9, 2025

Codecov Report

❌ Patch coverage is 81.37931% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.75%. Comparing base (3be76da) to head (26f3864).

Files with missing lines Patch % Lines
pkg/controller/backup_repository_controller.go 81.37% 22 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9236      +/-   ##
==========================================
+ Coverage   59.64%   59.75%   +0.11%     
==========================================
  Files         382      382              
  Lines       43960    44091     +131     
==========================================
+ Hits        26218    26346     +128     
  Misses      16195    16195              
- Partials     1547     1550       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Collaborator Author

@kaovilai kaovilai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Community Meeting: check existing invalidation logic if it can be reused instead of adding more code.

@kaovilai kaovilai force-pushed the 8279 branch 3 times, most recently from a6b170b to 5a40e17 Compare September 10, 2025 01:22
@kaovilai kaovilai force-pushed the 8279 branch 6 times, most recently from 29fe78d to 196bc7b Compare September 10, 2025 20:47
kaovilai and others added 3 commits September 10, 2025 15:55
…elero is not running

This change validates BackupRepository configurations against their associated
BackupStorageLocation on controller startup. If BSL configuration (bucket, prefix,
CACert, or config) has changed while Velero was not running, the affected repositories
are invalidated and will be re-established.

Key changes:
- Add startup validation that checks all BackupRepositories against current BSL configs
- Store BSL configuration in BackupRepository annotations for comparison on startup
- Add shared compareBSLConfigs function to eliminate code duplication
- Move BSL annotation constants to labels_annotations.go for consistency
- Add comprehensive test coverage for startup validation logic

Fixes vmware-tanzu#8279

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
…elero is not running

This change validates BackupRepository configurations against their associated
BackupStorageLocation on controller startup. If BSL configuration (bucket, prefix,
CACert, or config) has changed while Velero was not running, the affected repositories
are invalidated and will be re-established.

Key changes:
- Add startup validation that checks all BackupRepositories against current BSL configs
- Store BSL configuration in BackupRepository annotations for comparison on startup
- Add shared compareBSLConfigs function to eliminate code duplication
- Move BSL annotation constants to labels_annotations.go for consistency
- Add comprehensive test coverage for startup validation logic

Fixes vmware-tanzu#8279

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
This commit adds an E2E test to verify that backup repositories are
properly validated against BSL configuration changes when Velero
restarts. The test simulates a scenario where BSL configuration changes
while Velero is not running and verifies that repositories are
invalidated on startup with the correct error message.

Test scenario:
1. Creates a backup to establish a BackupRepository
2. Scales down Velero deployment (simulating shutdown)
3. Modifies BSL configuration (changes prefix)
4. Scales up Velero deployment (simulating startup)
5. Verifies repository is invalidated with correct message
6. Restores original BSL configuration
7. Verifies repository recovers to Ready state

Changes:
- Added new E2E test file: test/e2e/bsl-mgmt/startup_validation.go
- Registered test in test/e2e/e2e_suite_test.go
- Added test label to GitHub workflow matrix for CI execution

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Use pod status checks instead of fixed delays
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

backupRepository can become stale if velero deployment is not running to observe bsl update/create
1 participant