Skip to content

Conversation

hongkailiu
Copy link
Member

@hongkailiu hongkailiu commented Sep 23, 2025

This is to cover the node rebooting case from the rule [1] that is introduced recently:

Operators should not report Progressing only because DaemonSets
owned by them are adjusting to a new node from cluster scaleup or
a node rebooting from cluster upgrade.

The test fails if

  • co/machine-config never became Progressing=True during a
    cluster upgrade, or
  • some CO left Progressing=False during the upgrade after
    machine-config became Progressing=True. This should not
    have taken place as machine-config was rebooting the nodes
    which was the only thing ongoing to the cluster during that
    time.

[1]. https://github.com/openshift/api/blob/61248d910ff74aef020492922d14e6dadaba598b/config/v1/types_cluster_operator.go#L163-L164

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 23, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 23, 2025

@hongkailiu: This pull request references OTA-1637 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

This is to cover the node rebooting case from the rule [1] that is introduced recently:

Operators should not report Progressing only because DaemonSets
owned by them are adjusting to a new node from cluster scaleup or
a node rebooting from cluster upgrade.

[1]. https://github.com/openshift/api/blob/61248d910ff74aef020492922d14e6dadaba598b/config/v1/types_cluster_operator.go#L163-L164

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@hongkailiu hongkailiu force-pushed the OTA-1637-reboot branch 2 times, most recently from 22125c5 to dbb8c76 Compare September 23, 2025 18:29
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 23, 2025

@hongkailiu: This pull request references OTA-1637 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

This is to cover the node rebooting case from the rule [1] that is introduced recently:

Operators should not report Progressing only because DaemonSets
owned by them are adjusting to a new node from cluster scaleup or
a node rebooting from cluster upgrade.

The test fails if

  • co/machine-config never became Progressing=True during a
    cluster upgrade, or
  • some CO left Progressing=False during the upgrade after
    machine-config became Progressing=True. This should not
    have taken place as machine-config was rebooting the nodes
    which was the only thing ongoing to the cluster during that
    time.

[1]. https://github.com/openshift/api/blob/61248d910ff74aef020492922d14e6dadaba598b/config/v1/types_cluster_operator.go#L163-L164

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@hongkailiu
Copy link
Member Author

/test e2e-aws-ovn-upgrade

Copy link

openshift-trt bot commented Sep 24, 2025

Job Failure Risk Analysis for sha: dbb8c76

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-csi Medium
Job run should complete before timeout
This test has passed 83.33% of 6 runs on release 4.21 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:standard Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws Procedure:none SecurityMode:default Topology:ha Upgrade:none] in the last week.
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-upgrade IncompleteTests
Tests for this run (2140) are below the historical average (4244): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: dbb8c76

  • "[Monitor:legacy-cvo-invariants][bz-Bare Metal Hardware Provisioning] clusteroperator/baremetal should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/cloud-controller-manager should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/cluster-autoscaler should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/control-plane-machine-set should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/machine-api should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/machine-approver should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Credential Operator] clusteroperator/cloud-credential should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-DNS] clusteroperator/dns should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Etcd] clusteroperator/etcd should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Image Registry] clusteroperator/image-registry should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Insights Operator] clusteroperator/insights should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Machine Config Operator] clusteroperator/machine-config must go Progressing=True during an upgrade test" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Management Console] clusteroperator/console should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Monitoring] clusteroperator/monitoring should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Networking] clusteroperator/network should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Node Tuning Operator] clusteroperator/node-tuning should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/marketplace should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/operator-lifecycle-manager should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/operator-lifecycle-manager-catalog should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/operator-lifecycle-manager-packageserver should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • (...showing 20 of 33 tests)

@hongkailiu
Copy link
Member Author

/payload-job-with-prs periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade #30296

Copy link
Contributor

openshift-ci bot commented Sep 24, 2025

@hongkailiu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1adde290-995e-11f0-9d57-8ba68c7fd024-0

Copy link
Contributor

openshift-ci bot commented Sep 24, 2025

@hongkailiu: This PR was included in a payload test run from #30296
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1adde290-995e-11f0-9d57-8ba68c7fd024-0

@hongkailiu
Copy link
Member Author

hongkailiu commented Sep 24, 2025

The payload cmd above is working. The triggered job shows the code spotted some violations but also missed a few (comparing to the COs with Progressing=True in the spyclass).

Screenshot 2025-09-24 at 17 29 55
  • caught
    • dns
    • image-registry
    • network
    • node-tuning
    • storage
  • missed
    • kube-storage-version-migrator
    • csi-snapshot-controller
    • ingress
    • service-ca

Refining the code to see why missed.

@hongkailiu
Copy link
Member Author

/payload-job-with-prs periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade #30296

Copy link
Contributor

openshift-ci bot commented Sep 24, 2025

@hongkailiu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/05382750-998d-11f0-963e-c3b711331506-0

Copy link
Contributor

openshift-ci bot commented Sep 24, 2025

@hongkailiu: This PR was included in a payload test run from #30296
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/05382750-998d-11f0-963e-c3b711331506-0

Copy link
Contributor

openshift-ci bot commented Sep 25, 2025

@hongkailiu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi-ovn-dualstack-local-gateway 5feff1f link false /test e2e-metal-ipi-ovn-dualstack-local-gateway
ci/prow/e2e-aws-disruptive 5feff1f link false /test e2e-aws-disruptive
ci/prow/e2e-aws-ovn 5feff1f link false /test e2e-aws-ovn
ci/prow/e2e-metal-ipi-virtualmedia 5feff1f link false /test e2e-metal-ipi-virtualmedia
ci/prow/e2e-aws-ovn-kube-apiserver-rollout 5feff1f link false /test e2e-aws-ovn-kube-apiserver-rollout
ci/prow/e2e-aws-ovn-upgrade dbb8c76 link false /test e2e-aws-ovn-upgrade
ci/prow/e2e-metal-ipi-ovn-dualstack 5feff1f link false /test e2e-metal-ipi-ovn-dualstack
ci/prow/e2e-aws-ovn-single-node bad744e link false /test e2e-aws-ovn-single-node
ci/prow/e2e-aws-ovn-single-node-upgrade bad744e link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-openstack-ovn bad744e link false /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn-serial-2of2 bad744e link true /test e2e-aws-ovn-serial-2of2
ci/prow/e2e-aws-ovn-fips bad744e link true /test e2e-aws-ovn-fips
ci/prow/e2e-aws-ovn-single-node-serial bad744e link false /test e2e-aws-ovn-single-node-serial
ci/prow/e2e-aws-ovn-serial-1of2 bad744e link true /test e2e-aws-ovn-serial-1of2
ci/prow/e2e-aws-ovn-edge-zones bad744e link false /test e2e-aws-ovn-edge-zones
ci/prow/e2e-aws-ovn-microshift-serial bad744e link true /test e2e-aws-ovn-microshift-serial
ci/prow/okd-scos-e2e-aws-ovn bad744e link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-gcp-csi bad744e link false /test e2e-gcp-csi
ci/prow/e2e-aws-ovn-cgroupsv2 bad744e link false /test e2e-aws-ovn-cgroupsv2
ci/prow/e2e-aws-csi bad744e link false /test e2e-aws-csi
ci/prow/e2e-agnostic-ovn-cmd bad744e link false /test e2e-agnostic-ovn-cmd
ci/prow/e2e-aws-ovn-microshift bad744e link true /test e2e-aws-ovn-microshift

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link

openshift-trt bot commented Sep 25, 2025

Job Failure Risk Analysis for sha: bad744e

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-edge-zones Medium
Job run should complete before timeout
This test has passed 82.34% of 4739 runs on release 4.21 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-upgrade IncompleteTests
Tests for this run (2159) are below the historical average (4037): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: bad744e

  • "[Monitor:legacy-cvo-invariants][bz-Bare Metal Hardware Provisioning] clusteroperator/baremetal should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/cloud-controller-manager should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/cluster-autoscaler should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/control-plane-machine-set should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/machine-api should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/machine-approver should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Credential Operator] clusteroperator/cloud-credential should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-DNS] clusteroperator/dns should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Etcd] clusteroperator/etcd should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Image Registry] clusteroperator/image-registry should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Insights Operator] clusteroperator/insights should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Machine Config Operator] clusteroperator/machine-config must go Progressing=True during an upgrade test" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Management Console] clusteroperator/console should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Monitoring] clusteroperator/monitoring should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Networking] clusteroperator/network should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Node Tuning Operator] clusteroperator/node-tuning should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/marketplace should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/operator-lifecycle-manager should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/operator-lifecycle-manager-catalog should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/operator-lifecycle-manager-packageserver should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • (...showing 20 of 33 tests)

@hongkailiu
Copy link
Member Author

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-origin-30296-openshift-origin-30296-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade/1970962887476252672

Same caught list before, still missing something.

Let us try again with total number of events:

/payload-job-with-prs periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade #30296

Copy link
Contributor

openshift-ci bot commented Sep 25, 2025

@hongkailiu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5e26f440-9a14-11f0-8659-71352787cde2-0

Copy link
Contributor

openshift-ci bot commented Sep 25, 2025

@hongkailiu: This PR was included in a payload test run from #30296
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5e26f440-9a14-11f0-8659-71352787cde2-0

Copy link

openshift-trt bot commented Sep 25, 2025

Job Failure Risk Analysis for sha: d415550

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift IncompleteTests
Tests for this run (22) are below the historical average (1298): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial IncompleteTests
Tests for this run (22) are below the historical average (662): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: d415550

  • "[Monitor:legacy-cvo-invariants][bz-Bare Metal Hardware Provisioning] clusteroperator/baremetal should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/cloud-controller-manager should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/cluster-autoscaler should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/control-plane-machine-set should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/machine-api should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/machine-approver should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Credential Operator] clusteroperator/cloud-credential should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-DNS] clusteroperator/dns should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Etcd] clusteroperator/etcd should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Image Registry] clusteroperator/image-registry should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Insights Operator] clusteroperator/insights should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Machine Config Operator] clusteroperator/machine-config must go Progressing=True during an upgrade test" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Management Console] clusteroperator/console should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Monitoring] clusteroperator/monitoring should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Networking] clusteroperator/network should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Node Tuning Operator] clusteroperator/node-tuning should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/marketplace should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/operator-lifecycle-manager should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/operator-lifecycle-manager-catalog should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/operator-lifecycle-manager-packageserver should stay Progressing=False while MCO is Progressing=True" [Total: 3, Pass: 3, Fail: 0, Flake: 0]
  • (...showing 20 of 33 tests)

This is to cover the node rebooting case from the rule [1] that
is introduced recently:

```
Operators should not report Progressing only because DaemonSets
owned by them are adjusting to a new node from cluster scaleup or
a node rebooting from cluster upgrade.
```

The test fails if

- `co/machine-config` never became Progressing=True during a
  cluster upgrade, or
- some CO left Progressing=False during the upgrade after
  `machine-config` became Progressing=True. This should not
  have taken place as `machine-config` was rebooting the nodes
  which was the only thing ongoing to the cluster during that
  time.

[1]. https://github.com/openshift/api/blob/61248d910ff74aef020492922d14e6dadaba598b/config/v1/types_cluster_operator.go#L163-L164
@hongkailiu
Copy link
Member Author

Let us see if https://github.com/openshift/origin/compare/d41555013e72e50e055e6fc1ecf38229560c5b35..969fcc5a0bf55a5242c3e57a302f9e2fd2a04370 helps.

/payload-job-with-prs periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade #30296

Copy link
Contributor

openshift-ci bot commented Sep 25, 2025

@hongkailiu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7f025ef0-9a48-11f0-971e-3b18f6cdf426-0

Copy link
Contributor

openshift-ci bot commented Sep 25, 2025

@hongkailiu: This PR was included in a payload test run from #30296
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7f025ef0-9a48-11f0-971e-3b18f6cdf426-0

@hongkailiu
Copy link
Member Author

The last job failed too earlier for an irrelevant reason.
Let us rerun:

/payload-job-with-prs periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade #30296

Copy link
Contributor

openshift-ci bot commented Sep 25, 2025

@hongkailiu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/74f1e9a0-9a69-11f0-92f8-cb7320e8c78f-0

Copy link
Contributor

openshift-ci bot commented Sep 25, 2025

@hongkailiu: This PR was included in a payload test run from #30296
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/74f1e9a0-9a69-11f0-92f8-cb7320e8c78f-0

Copy link

openshift-trt bot commented Sep 26, 2025

Job Failure Risk Analysis for sha: 969fcc5

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial IncompleteTests
Tests for this run (22) are below the historical average (609): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: 969fcc5

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift Medium - "install should succeed: MicroShift rebase" is a new test, and was only seen in one job.

New tests seen in this PR at sha: 969fcc5

  • "[Monitor:legacy-cvo-invariants][bz-Bare Metal Hardware Provisioning] clusteroperator/baremetal should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/cloud-controller-manager should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/cluster-autoscaler should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/control-plane-machine-set should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/machine-api should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Compute] clusteroperator/machine-approver should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Cloud Credential Operator] clusteroperator/cloud-credential should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-DNS] clusteroperator/dns should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Etcd] clusteroperator/etcd should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Image Registry] clusteroperator/image-registry should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Insights Operator] clusteroperator/insights should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Machine Config Operator] clusteroperator/machine-config must go Progressing=True during an upgrade test" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Management Console] clusteroperator/console should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Monitoring] clusteroperator/monitoring should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Networking] clusteroperator/network should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-Node Tuning Operator] clusteroperator/node-tuning should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/marketplace should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/operator-lifecycle-manager should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/operator-lifecycle-manager-catalog should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/operator-lifecycle-manager-packageserver should stay Progressing=False while MCO is Progressing=True" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • (...showing 20 of 34 tests)

@hongkailiu
Copy link
Member Author

hongkailiu commented Sep 26, 2025

@petr-muller
Copy link
Member

/cc

@openshift-ci openshift-ci bot requested a review from petr-muller September 26, 2025 15:40
@hongkailiu
Copy link
Member Author

#30308 got in.

/payload-job-with-prs periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade #30296

Copy link
Contributor

openshift-ci bot commented Sep 29, 2025

@hongkailiu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a7e9d410-9d27-11f0-82cf-a7612e5a4c23-0

Copy link
Contributor

openshift-ci bot commented Sep 29, 2025

@hongkailiu: This PR was included in a payload test run from #30296
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a7e9d410-9d27-11f0-82cf-a7612e5a4c23-0

@DavidHurta
Copy link

/cc

@openshift-ci openshift-ci bot requested a review from DavidHurta September 29, 2025 12:22
Copy link
Member

@petr-muller petr-muller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments inline, probably nothing showstopping so LGTM with a /hold

/lgtm
/approve
/hold

Comment on lines +665 to +667
except := func(co string, condition *configv1.ClusterOperatorStatusCondition) string {
return ""
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing for a reader when it is not used yet. Can we add a comment or a bit of (commented-out) boilerplate that helps whoever reads this to 1) understand why this exists 2) how to actually add an exception (like, what should the function return? some kind of string but what should it look like?)

Copy link
Member Author

@hongkailiu hongkailiu Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original plan is to file the OCPBUGS and then use them as exceptions here after going through code review for those violating components we have found so far:

dns
image-registry
network
node-tuning
storage
kube-storage-version-migrator
csi-snapshot-controller
ingress
service-ca
olm

The exception function will look like

except := func(co string, condition *configv1.ClusterOperatorStatusCondition) string {
			switch co {
			case "dns":
				if condition.Reason == "DNSReportsProgressingIsTrue" {
					return "https://issues.redhat.com/browse/OCPBUGS-xxx"
				}
			}
			return ""
		}

Otherwise, it would cause payload failures like https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-origin-30296-openshift-origin-30296-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade/1972624949952647168 after merge.

Do I understand it correctly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, so this PR is essentially still WIP and we need to fill these exceptions? Then I guess it is fine. My worry was that we'd merge this empty and then an uninformed reader would need the whole function to understand why except is there.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 29, 2025
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 29, 2025
Copy link
Contributor

openshift-ci bot commented Sep 29, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hongkailiu, petr-muller
Once this PR has been reviewed and has the lgtm label, please assign bertinatto for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants