Add "zpool status -vv" #17502

asomers · 2025-07-01T18:20:18Z

Specifying the verbose flag twice will display a list of all corrupt sectors within each corrupt file, as opposed to just the name of the file.

Signed-off-by: Alan Somers asomers@gmail.com
Sponsored by: ConnectWise

Motivation and Context

Displays the record number of every corrupt record in every corrupt file. I find this is very useful when cleaning up the fallout from #16626.

Description

The kernel already tracks the blkid of every corrupt record, and already transmits that information to userland. But libzfs has always thrown it away, until now. This PR adds a -vv option to zpool status. When used, it will print the level and blkid of every corrupt record. It works in combination with -j, too.

How Has This Been Tested?

Manually tested on about half a dozen production datasets that had on-disk corruption as a result of #16626 , in both L0 and L1 blocks.
Manually tested on a test dataset that I intentionally corrupted. That one had multiple corrupted records on multiple files.

Example output, in human readable mode:

...
errors: Permanent errors have been detected in the following files:

        /testpool/randfile.7 L0 record 3
        /testpool/randfile.7 L0 record 9
        /testpool/randfile.7 L0 record 16
        /testpool/randfile.9 L0 record 8
        /testpool/randfile.9 L0 record 15
        /testpool/randfile.10 L0 record 3
        /testpool/randfile.10 L0 record 11
        /testpool/randfile.5 L0 record 17
        /testpool/randfile.8 L0 record 11
        /testpool/randfile.8 L0 record 19
        /testpool/randfile.6 L0 record 3
        /testpool/randfile.6 L0 record 12

Example output, in json mode

{
  "output_version": {
    "command": "zpool status",
    "vers_major": 0,
    "vers_minor": 1
  },
  "pools": {
    "testpool": {
      "name": "testpool",
      "state": "ONLINE",
      "pool_guid": "10305967396160717712",
      "txg": "1523",
      "spa_version": "5000",
      "zpl_version": "5",
      "status": "One or more devices has experienced an error resulting in data\n\tcorruption.  Applications may be affected.\n",
      "action": "Restore the file in question if possible.  Otherwise restore the\n\tentire pool from backup.\n",
      "msgid": "ZFS-8000-8A",
      "moreinfo": "https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A",
      "scan_stats": {
        "function": "SCRUB",
        "state": "FINISHED",
        "start_time": "Tue Jul  1 12:11:14 2025",
        "end_time": "Tue Jul  1 12:11:14 2025",
        "to_examine": "56.0M",
        "examined": "56.0M",
        "skipped": "92K",
        "processed": "0B",
        "errors": "12",
        "bytes_per_scan": "0B",
        "pass_start": "1751393474",
        "scrub_pause": "-",
        "scrub_spent_paused": "0",
        "issued_bytes_per_scan": "55.9M",
        "issued": "55.9M"
      },
      "vdevs": {
        "testpool": {
          "name": "testpool",
          "vdev_type": "root",
          "guid": "10305967396160717712",
          "class": "normal",
          "state": "ONLINE",
          "alloc_space": "56.0M",
          "total_space": "112M",
          "def_space": "112M",
          "read_errors": "0",
          "write_errors": "0",
          "checksum_errors": "0",
          "vdevs": {
            "/tmp/zfs.img": {
              "name": "/tmp/zfs.img",
              "vdev_type": "file",
              "guid": "1719526601577822810",
              "path": "/tmp/zfs.img",
              "class": "normal",
              "state": "ONLINE",
              "alloc_space": "56.0M",
              "total_space": "112M",
              "def_space": "112M",
              "rep_dev_size": "116M",
              "self_healed": "1.50K",
              "phys_space": "128M",
              "read_errors": "0",
              "write_errors": "0",
              "checksum_errors": "27",
              "slow_ios": "0"
            }
          }
        }
      },
      "error_count": "12",
      "errlist": [
        {
          "path": "/testpool/randfile.7",
          "level": 0,
          "record": 3
        },
        {
          "path": "/testpool/randfile.7",
          "level": 0,
          "record": 9
        },
        {
          "path": "/testpool/randfile.7",
          "level": 0,
          "record": 16
        },
        {
          "path": "/testpool/randfile.9",
          "level": 0,
          "record": 8
        },
        {
          "path": "/testpool/randfile.9",
          "level": 0,
          "record": 15
        },
        {
          "path": "/testpool/randfile.10",
          "level": 0,
          "record": 3
        },
        {
          "path": "/testpool/randfile.10",
          "level": 0,
          "record": 11
        },
        {
          "path": "/testpool/randfile.5",
          "level": 0,
          "record": 17
        },
        {
          "path": "/testpool/randfile.8",
          "level": 0,
          "record": 11
        },
        {
          "path": "/testpool/randfile.8",
          "level": 0,
          "record": 19
        },
        {
          "path": "/testpool/randfile.6",
          "level": 0,
          "record": 3
        },
        {
          "path": "/testpool/randfile.6",
          "level": 0,
          "record": 12
        }
      ]
    }
  }
}

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Quality assurance (non-breaking change which makes the code more robust against bugs)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

gamanakis · 2025-07-19T17:01:28Z

On a first pass it looks good to me, thanks!. Though not really sure why the checks are not successful. Could you squash and re-push?

asomers · 2025-07-20T00:26:07Z

Well, I think the "checkstyle" check is failing because I didn't update libzfs.abi. But I can find no instructions for how to do that. @ixhamza you were the last to do it. Could you please tell me how to update libzfs.abi due to a function prototype change?

gmelikov · 2025-07-20T07:23:36Z

In addition to abi I see:

cmd/zpool/zpool_main.c: In function ‘errors_nvlist’:
cmd/zpool/zpool_main.c:9590:41: error: ‘errnvl’ may be used uninitialized [-Werror=maybe-uninitialized]
 9590 |                                         fnvlist_add_nvlist_array(item,
      |                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 9591 |                                             "errlist",
      |                                             ~~~~~~~~~~
 9592 |                                             (const nvlist_t **)errnvl,
      |                                             ~~~~~~~~~~~~~~~~~~~~~~~~~~
 9593 |                                             count);
      |                                             ~~~~~~
cmd/zpool/zpool_main.c:9532:44: note: ‘errnvl’ was declared here
 9532 |                                 nvlist_t **errnvl;
      |                                            ^~~~~~
cc1: all warnings being treated as errors

You may get new abi here https://github.com/openzfs/zfs/actions/runs/16008660282 (see artifact, direct link to it https://github.com/openzfs/zfs/actions/runs/16008660282/artifacts/3443942581r )

amotin

I have no critical objections, but few things I would do differently. Also please rebase it to the latest master and clean the commit history.

cmd/zpool/zpool_main.c

amotin · 2025-08-04T18:52:03Z

cmd/zpool/zpool_main.c

+					if (cb->cb_verbosity < 2) {
+						errl[i] = safe_malloc(len);
+						zpool_obj_to_path(zhp, dsobj,
+						    obj, errl[i++], len);
+					} else {
+						uint64_t lvl, blkid;
+
+						errnvl[i] = fnvlist_alloc();
+						lvl = fnvlist_lookup_uint64(nv,
+						    ZPOOL_ERR_LEVEL);
+						blkid = fnvlist_lookup_uint64(
+						    nv, ZPOOL_ERR_BLKID);
+						zpool_obj_to_path(zhp, dsobj,
+						    obj, pathbuf, len);
+						fnvlist_add_string(errnvl[i],
+						    "path", pathbuf);
+						fnvlist_add_uint64(errnvl[i],
+						    "level", lvl);
+						fnvlist_add_uint64(errnvl[i++],
+						    "record", blkid);
+					}


Couldn't we always do it the verbose way here to simplify the code, and just not print the additional information later? It does not look like very performance-sensitive code path.

Not really, because if verbosity is < 2, then nverrlist won't contain the "level" and "blockid" fields. And I don't want for zpool_get_errlog to always supply those fields, because it could result in enormous nvlists if a single file has many corrupt records.

enormous nvlists if a single file has many corrupt records.

Along these same lines I'm a bit concerned with logging a line per block. That could be overwhelming.

Thinking about this from a user perspective, I really don't care about how ZFS decided to internally layout the file (objid, level, blkid). What is useful to me are the file offsets which are corrupt. That's a little more work to generate but shouldn't be too bad. Maybe something like:

errors: Permanent errors have been detected in the following files: /testpool/randfile.7 393216-524288,1048576-1179647,1966080-2097151 /testpool/randfile.5 917504-1048575 ...

And I don't want for zpool_get_errlog to always supply those fields

I didn't mean to always supply all the ranges. Merely using the same data structure with nvlists, just with one vs many entries per file, if it allow to make code cleaner. But if no, I won't insist.

cmd/zpool/zpool_main.c

Specifying the verbose flag twice will display a list of all corrupt sectors within each corrupt file, as opposed to just the name of the file. Signed-off-by: Alan Somers <asomers@gmail.com> Sponsored by: ConnectWise

asomers · 2025-08-12T19:46:38Z

I applied your suggestions, rebased, and squashed, @amotin .

amotin · 2025-08-12T19:51:02Z

@asomers You seem to address 1.5 of my comments. What's about other 1.5?

asomers · 2025-08-13T15:12:19Z

@asomers You seem to address 1.5 of my comments. What's about other 1.5?

Do you mean the comment about "Couldn't we always do it the verbose way here to simplify the code"? I explained why I thought that would be a bad idea. Or do you mean that I didn't replace one safe_malloc call with calloc? I thought it better to stick with safe_malloc so I wouldn't need to add extra error handling.

amotin · 2025-08-13T18:14:03Z

Do you mean the comment about "Couldn't we always do it the verbose way here to simplify the code"? I explained why I thought that would be a bad idea.

I'm sorry if I lost it in context switches, but I don't remember. Could you point where?

Or do you mean that I didn't replace one safe_malloc call with calloc? I thought it better to stick with safe_malloc so I wouldn't need to add extra error handling.

This is also. You've added error checks for two calloc() cases, while rely on safe_malloc() in the third case, which seems to be doing exactly the same. Why not use safe_malloc() in all 3 places then and let it handle the things? Sure calloc() might look nicer for arrays, but do we really care?

asomers · 2025-08-13T20:52:49Z

Do you mean the comment about "Couldn't we always do it the verbose way here to simplify the code"? I explained why I thought that would be a bad idea.

I'm sorry if I lost it in context switches, but I don't remember. Could you point where?

I was referring to #17502 (comment) .

Or do you mean that I didn't replace one safe_malloc call with calloc? I thought it better to stick with safe_malloc so I wouldn't need to add extra error handling.

This is also. You've added error checks for two calloc() cases, while rely on safe_malloc() in the third case, which seems to be doing exactly the same. Why not use safe_malloc() in all 3 places then and let it handle the things? Sure calloc() might look nicer for arrays, but do we really care?

Yes exactly. I chose to use calloc precisely because I was allocating an array. I think that it's best to always use calloc for arrays not just because it looks pretty, but because it protects from overflows during the multiplication. Do you really want me to change it?

amotin · 2025-08-13T21:14:47Z

I was referring to #17502 (comment) .

Right. I see my question, but not your response. Did you send it? ;)

Do you really want me to change it?

No. I won't fight over it.

asomers · 2025-08-13T22:18:54Z

I was referring to #17502 (comment) .

Right. I see my question, but not your response. Did you send it? ;)

Ahh, that's it. It was stuck in the "Pending" state. It's posted now.

behlendorf · 2025-08-13T22:15:09Z

cmd/zpool/zpool_main.c

 errors_nvlist(zpool_handle_t *zhp, status_cbdata_t *cb, nvlist_t *item)
 {
 	uint64_t nerr;
+	int verbosity = cb->cb_verbosity;


We should use cb->cb_verbosity throughout and remove the local variable.

The only reason I created the local variable is so I didn't have to split a long line at 80 columns. IMHO it looks better this way. But I'll change it if you want me to.

I'd just like to make sure we're using either the local variable or cb->cb_verbosity consistently in this function. I don't feel strongly about which one, so if you want to stick with the local variable we should update the other places it's used.

behlendorf · 2025-08-13T22:45:47Z

cmd/zpool/zpool_main.c

+					if (cb->cb_verbosity < 2) {
+						errl[i] = safe_malloc(len);
+						zpool_obj_to_path(zhp, dsobj,
+						    obj, errl[i++], len);
+					} else {
+						uint64_t lvl, blkid;
+
+						errnvl[i] = fnvlist_alloc();
+						lvl = fnvlist_lookup_uint64(nv,
+						    ZPOOL_ERR_LEVEL);
+						blkid = fnvlist_lookup_uint64(
+						    nv, ZPOOL_ERR_BLKID);
+						zpool_obj_to_path(zhp, dsobj,
+						    obj, pathbuf, len);
+						fnvlist_add_string(errnvl[i],
+						    "path", pathbuf);
+						fnvlist_add_uint64(errnvl[i],
+						    "level", lvl);
+						fnvlist_add_uint64(errnvl[i++],
+						    "record", blkid);
+					}


enormous nvlists if a single file has many corrupt records.

Along these same lines I'm a bit concerned with logging a line per block. That could be overwhelming.

Thinking about this from a user perspective, I really don't care about how ZFS decided to internally layout the file (objid, level, blkid). What is useful to me are the file offsets which are corrupt. That's a little more work to generate but shouldn't be too bad. Maybe something like:

errors: Permanent errors have been detected in the following files: /testpool/randfile.7 393216-524288,1048576-1179647,1966080-2097151 /testpool/randfile.5 917504-1048575 ...

asomers · 2025-08-18T21:47:10Z

Along these same lines I'm a bit concerned with logging a line per block. That could be overwhelming.

Me too. That's why I don't want it to be the default, but only opted-into with "-vv"

Thinking about this from a user perspective, I really don't care about how ZFS decided to internally layout the file (objid, > level, blkid). What is useful to me are the file offsets which are corrupt. That's a little more work to generate but
shouldn't be too bad.

I have two thoughts about this:

While it may not be for everyone, I actually do find the level and blkid to be useful when I'm trying to recover from the damage caused by Occasional panics with "blkptr at XXX has invalid YYY" #16626 .
In order to switch the display to byte offsets I would need to know the recsize of the object, not just the dataset. There's no ioctl to get that for a given object id. I could use the st_blksize, if zpool_obj_to_path succeeds. But if it does not, then I don't know of any way to get the object's recsize. Do you?

behlendorf · 2025-08-19T19:39:47Z

That's fair. Yeah, now that you point it out I don't see a great solution for getting the record size. You could imagine either extending or adding a new ioctl interface, but that's more complexity and compatibility code I'd really prefer to avoid. Reporting block IDs it is. Perhaps then something just a little more concise?

        /testpool/randfile.7 L0=3-4,7,10 L1=1

* Use a local variable more consistently * Condense error reports into runs of contiguous blocks

asomers · 2025-08-20T23:29:39Z

@behlendorf with the latest push, error reports look like this:

errors: Permanent errors have been detected in the following files:

        /testpool/tmp/randfile.6 L0=0-2,L0=4
        /testpool/tmp/randfile.5 L0=0-7,L0=22-62
        ...
        /testpool2/randfile L1=5

Combining runs of contiguous blocks is probably good. But I'm not sure that I like combining discontiguous runs onto a single line. That means a file with many discontiguous errors could end up being printed as an extremely long line.

behlendorf · 2025-08-22T21:43:45Z

@asomers I finally realized why this felt familiar. PR #9781 was working on adding exactly this same functionality but it unfortunately ended up stalling out. #9781 is unsurprisingly very similar to yours, but it has a few additions we should incorporate.

For example, the output format which they settled on I think is quite nice. It collapses contiguous ranges, prints the range byte offsets and even a nice summary of the damaged blocks. Here's the example output:

errors: Permanent errors have been detected in the following files:

    /var/tmp/testdir/10m_file: found 9 corrupted 128K blocks
       [0x0-0x1ffff] (128K)
       [0x100000-0x1fffff] (1M)

    /var/tmp/testdir/1m_file: found 1 corrupted 128K block
       [0x0-0x1ffff] (128K)

The original PR extends the ZFS_IOC_OBJ_TO_STATS ioctl to accomplish this. Now we can't do that exactly because it's one of the legacy ioctl interface and we don't want to break the user/kernel ioctl ABI by adding fields to zfs_stat_t, which is embedded in the zfs_cmd_t. But, we could register a new ioctl which uses the modern io/out nvlists and use that.

tonyhutter · 2025-08-22T23:03:41Z

@behlendorf whoa I totally forgot about that old PR (which ironically is actually an updated version of an even older PR #8902)).

@asomers feel free to revive whatever you want from that PR. I remember using the range tree was a nice way to collapse error ranges. If you do use bits from the old PR, please credit the original author: TulsiJain <tulsi.jain@delphix.com>.

Along with that, since we now support JSON (zpool status --json) you'll want to add in the JSON-ified versions of the error ranges.

asomers · 2025-09-24T19:30:53Z

@tonyhutter I forgot to mention that the PR as-is already works with JSON output. It looks like this:

      "error_count": "2",
      "errlist": [
        {
          "path": "POOL/DATASET@SNAPSHOT1:/FILE",
          "level": 0,
          "record": 867511
        },
        {
          "path": "POOL/DATASET@SNAPSHOT2:/FILE",
          "level": 0,
          "record": 867511
        },

amotin reviewed Aug 4, 2025

View reviewed changes

behlendorf added Status: Code Review Needed Ready for review and testing Status: Revision Needed Changes are required for the PR to be accepted labels Aug 6, 2025

Add "zpool status -vv"

b07f519

Specifying the verbose flag twice will display a list of all corrupt sectors within each corrupt file, as opposed to just the name of the file. Signed-off-by: Alan Somers <asomers@gmail.com> Sponsored by: ConnectWise

asomers force-pushed the zpool-status-vv branch from 4560e4e to b07f519 Compare August 12, 2025 19:45

github-actions bot removed the Status: Revision Needed Changes are required for the PR to be accepted label Aug 12, 2025

asomers requested a review from amotin August 12, 2025 19:46

behlendorf reviewed Aug 13, 2025

View reviewed changes

asomers added 2 commits August 20, 2025 14:55

Respond to review comments:

1e469dc

* Use a local variable more consistently * Condense error reports into runs of contiguous blocks

Condense error reports to a single line for each file.

c9f179f

asomers requested a review from behlendorf August 20, 2025 23:29

behlendorf mentioned this pull request Aug 22, 2025

zpool status -vv prints error ranges #9781

Closed

12 tasks

Add "zpool status -vv" #17502

Are you sure you want to change the base?

Add "zpool status -vv" #17502

Uh oh!

Conversation

asomers commented Jul 1, 2025

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

gamanakis commented Jul 19, 2025

Uh oh!

asomers commented Jul 20, 2025

Uh oh!

gmelikov commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amotin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amotin Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

asomers Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

behlendorf Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amotin Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

asomers commented Aug 12, 2025

Uh oh!

amotin commented Aug 12, 2025

Uh oh!

asomers commented Aug 13, 2025

Uh oh!

amotin commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asomers commented Aug 13, 2025

Uh oh!

amotin commented Aug 13, 2025

Uh oh!

asomers commented Aug 13, 2025

Uh oh!

behlendorf Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

asomers Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

behlendorf Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

behlendorf Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asomers commented Aug 18, 2025

Uh oh!

behlendorf commented Aug 19, 2025

Uh oh!

asomers commented Aug 20, 2025

Uh oh!

behlendorf commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tonyhutter commented Aug 22, 2025

Uh oh!

asomers commented Sep 24, 2025

Uh oh!

Uh oh!

gmelikov commented Jul 20, 2025 •

edited

Loading

behlendorf Aug 13, 2025 •

edited

Loading

amotin Aug 14, 2025 •

edited

Loading

amotin commented Aug 13, 2025 •

edited

Loading

behlendorf Aug 13, 2025 •

edited

Loading

behlendorf commented Aug 22, 2025 •

edited

Loading