[CPU] Add Float8OpaqueTensor for dynamic float8 act float8 weight #2505

Xia-Weiwen · 2025-07-08T07:51:42Z

Summary
This PR adds Float8OpaqueTensor for dynamic float8 act float8 weight quantization on X86 CPU.
It adds

A new tensor subclass: Float8OpaqueTensor
Two new ops: float8_linear_prepack_cpu and float8_linear_cpu
CPP kernels for the two new ops

The kernel computes FP8 GEMM with supported ISA.

Test plan

pytest -sv test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

pytorch-bot · 2025-07-08T07:51:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2505

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5e75764 with merge base a951643 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ght on CPU

Xia-Weiwen · 2025-07-10T08:49:27Z

Hi @chunyuan-w @mingfeima Could you please review this PR? Thanks.

chunyuan-w · 2025-07-10T09:30:09Z

Should we move the conversion vec code to this file? https://github.com/pytorch/pytorch/blob/cd995bfb2aac8891465809be3ce29543bd524287/aten/src/ATen/cpu/vec/vec512/vec512_float8.h

Similar to this PR: pytorch/pytorch#152417

Xia-Weiwen · 2025-07-11T01:34:13Z

Should we move the conversion vec code to this file? https://github.com/pytorch/pytorch/blob/cd995bfb2aac8891465809be3ce29543bd524287/aten/src/ATen/cpu/vec/vec512/vec512_float8.h

Similar to this PR: pytorch/pytorch#152417

Thanks for the comment. If we move it to PyTorch, a problem might be that we need to check if the function is available at compile time. We may do it step by step, and for now it might be better that we keep it here.

torchao/csrc/cpu/float8_linear.cpp

jerryzh168 · 2025-08-27T20:45:12Z

I think we want to add this with the new design
I feel this should probably be added to prototype first, and if we get wider adoptions we can move to official API

Copilot

Pull Request Overview

Adds Float8OpaqueTensor for dynamic float8 activation and weight quantization on X86 CPU. This introduces a CPU-optimized tensor subclass that uses opaque memory layout for better performance on supported CPU ISAs.

Adds Float8OpaqueTensor subclass with reordered memory layout for CPU optimization
Implements two new CPU operators: float8_linear_prepack_cpu and float8_linear_cpu
Extends Float8DynamicActivationFloat8WeightConfig to support opaque packing format

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
float8_packing_format.py	Defines Float8PackingFormat enum with PLAIN and OPAQUE options
float8_opaque_tensor.py	New Float8OpaqueTensor subclass implementation with CPU optimizations
quant_api.py	Extends config to support opaque packing format and CPU device checks
ops.py	Adds float8_linear_prepack_cpu and float8_linear_cpu operator definitions
float8_linear.cpp	CPU kernel implementation for float8 linear operations
observer.py	Adds PerGroup support to get_block_size function
init.py files	Updates module exports
test_float8_opaque_tensor.py	Comprehensive test suite for new functionality

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

torchao/quantization/quantize_/workflows/float8/float8_packing_format.py

torchao/quantization/quantize_/workflows/float8/float8_opaque_tensor.py

torchao/quantization/quant_api.py

torchao/quantization/qat/fake_quantize_config.py

torchao/csrc/cpu/aten_kernels/float8_linear.cpp

…tensor.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

torchao/csrc/cpu/aten_kernels/float8_linear.cpp

mingfeima

LGTM now, just some minor places to address.

torchao/csrc/cpu/aten_kernels/float8_linear.cpp

Xia-Weiwen · 2025-09-17T04:34:06Z

Hi @jerryzh168 Could you please review this PR again? Thanks.

Xia-Weiwen · 2025-09-18T01:36:09Z

Hi @jerryzh168 Could you please review this PR again? Thanks.

jerryzh168 · 2025-09-19T00:07:12Z

torchao/quantization/observer.py

        block_size[granularity.axis] = 1
        return tuple(block_size)
-    elif isinstance(granularity, PerRow):
+    elif isinstance(granularity, (PerRow, PerToken)):


maybe create a separate PR to merge these two:

ao/torchao/quantization/pt2e/observer.py

Line 1783 in 18dbe87

def get_block_size(

?

and move it to torchao/quantization/utils ?

seems like the pt2e one is not used that much, so should be relatively easy to remove

Sure. I will have another PR to do it.

jerryzh168 · 2025-09-19T00:08:47Z

torchao/quantization/qat/fake_quantize_config.py

        (act_granularity, weight_granularity) = _normalize_granularity(
            base_config.granularity
        )
+        assert act_granularity == weight_granularity and isinstance(


split fake quant changes to a separate file?

Hi @jerryzh168 This file is about fake quant. Do you mean a separate file or a separate PR?

And this PR needs this change for fake quant because this PR moves the checks from inside _normalize_granularity out. And the same for similar changes elsewhere.

sorry, separate PR

jerryzh168 · 2025-09-19T00:09:23Z

torchao/quantization/quant_api.py

    kernel_preference: KernelPreference = KernelPreference.AUTO
    set_inductor_config: bool = True
    version: int = 2
+    packing_format: Float8PackingFormat = Float8PackingFormat.PLAIN


nit: packing_format --> float8_packing_format

Thanks. Updated.

jerryzh168 · 2025-09-19T16:58:13Z

test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

+        return x
+
+
+class TestDynamicFloat8Linear(TestCase):


nit: TestFloat8OpaqueTensor?

jerryzh168 · 2025-09-19T16:59:27Z

test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

+    @common_utils.parametrize("x_dim", [2, 3])
+    @common_utils.parametrize("bias", [True, False])
+    @common_utils.parametrize("bs", [1, 128])
+    def test_dynamic_float8_linear_per_tensor_cpu(


this is per tensor activation? might be good to clarify

jerryzh168 · 2025-09-19T17:00:36Z

test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

+        with torch.no_grad():
+            quantize_(
+                m,
+                get_config([PerRow(), PerGroup(group_size)]),


why are these tests not combined into the same one? seems all of them are very similar

jerryzh168 · 2025-09-19T17:02:20Z

torchao/float8/inference.py

        ]
    ],
 ) -> Tuple[FP8Granularity, FP8Granularity]:
+    supported_granularities = (PerTensor, PerRow, PerGroup)


is this only supported for CPU? I think we also rely on this in the cuda path, and it will be surprising if it says supported here but error out somewhere else

jerryzh168

looks mostly fine, I'd suggest to split the PR to

get_block_size changes
add a normalize_granularity for cpu in quantization/quantize_/workflows/float8/? since cuda doesn't support PerBlock yet
PR to add float8 linear op? - also probably need some tests for the op itself
then the PR for Float8Tensor

Xia-Weiwen · 2025-09-22T14:10:04Z

looks mostly fine, I'd suggest to split the PR to

get_block_size changes

add a normalize_granularity for cpu in quantization/quantize_/workflows/float8/? since cuda doesn't support PerBlock yet

PR to add float8 linear op? - also probably need some tests for the op itself

then the PR for Float8Tensor

Thanks. I will split it into multiple PRs.

Xia-Weiwen · 2025-09-26T06:12:50Z

We split this PR into the following smaller ones:

Closing this one.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 8, 2025

Xia-Weiwen added topic: new feature Use this tag if this PR adds a new feature cpu labels Jul 8, 2025

[CPU] Add layout and implementation for dynamic float8 act float8 wei…

736e1f1

…ght on CPU

Xia-Weiwen added 3 commits July 10, 2025 14:45

Merge branch 'main' into float8_da8w8

5cc5bcc

Refine code

c238385

refine comments

3e7d179

chunyuan-w reviewed Jul 11, 2025

View reviewed changes

torchao/csrc/cpu/float8_linear.cpp Outdated Show resolved Hide resolved

Xia-Weiwen requested a review from chunyuan-w July 11, 2025 10:10

Xia-Weiwen added 2 commits July 11, 2025 17:57

Check K % num_groups == 0

953ac13

Check N & K % 32 == 0; update UT

cd53802

Merge branch 'main' into float8_da8w8

0598012

Xia-Weiwen requested review from Copilot and removed request for chunyuan-w September 9, 2025 06:38

Xia-Weiwen changed the title ~~[CPU] Add support for dynamic float8 act float8 weight on CPU~~ [CPU] Add Float8OpaqueTensor for dynamic float8 act float8 weight on CPU Sep 9, 2025

Xia-Weiwen changed the title ~~[CPU] Add Float8OpaqueTensor for dynamic float8 act float8 weight on CPU~~ [CPU] Add Float8OpaqueTensor for dynamic float8 act float8 weight Sep 9, 2025

This comment was marked as outdated.

Sign in to view

[CPU] Add float8OpaqueTensor for daf8wf8

b4f6520

Xia-Weiwen requested a review from Copilot September 9, 2025 13:52

Copilot AI reviewed Sep 9, 2025

View reviewed changes

Xia-Weiwen and others added 5 commits September 9, 2025 14:27

Update kernel implementation

afcee6b

Merge branch 'main' into float8_da8w8

6e2db8e

Fix issues in code

01c47ca

Move cpp file

a42d10f

Update torchao/quantization/quantize_/workflows/float8/float8_opaque_…

cca4141

…tensor.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Merge branch 'main' into float8_da8w8

2c58325

Xia-Weiwen requested a review from mingfeima September 16, 2025 05:52

mingfeima reviewed Sep 16, 2025

View reviewed changes

torchao/csrc/cpu/aten_kernels/float8_linear.cpp Show resolved Hide resolved

mingfeima approved these changes Sep 16, 2025

View reviewed changes

torchao/csrc/cpu/aten_kernels/float8_linear.cpp Outdated Show resolved Hide resolved

torchao/csrc/cpu/aten_kernels/float8_linear.cpp Outdated Show resolved Hide resolved

Xia-Weiwen added 2 commits September 16, 2025 13:37

Refine kernel code

704e156

Merge branch 'main' into float8_da8w8

9858a42

Xia-Weiwen requested a review from jerryzh168 September 17, 2025 04:33

Allocate buffer outside micro gemm kernel

8beeb03

Xia-Weiwen marked this pull request as ready for review September 18, 2025 01:36

jerryzh168 reviewed Sep 19, 2025

View reviewed changes

Xia-Weiwen requested a review from jerryzh168 September 19, 2025 05:50

Xia-Weiwen added 2 commits September 19, 2025 13:30

Merge branch 'main' into float8_da8w8

de9a931

packing_format --> float8_packing_format

5e75764

jerryzh168 reviewed Sep 19, 2025

View reviewed changes

This comment was marked as duplicate.

Sign in to view

jerryzh168 requested changes Sep 19, 2025

View reviewed changes

This was referenced Sep 24, 2025

[CPU] Add ops for float8 linear #3052

Merged

[CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight #3075

Open

Xia-Weiwen closed this Sep 26, 2025

Xia-Weiwen deleted the float8_da8w8 branch September 26, 2025 06:13

[CPU] Add Float8OpaqueTensor for dynamic float8 act float8 weight #2505

[CPU] Add Float8OpaqueTensor for dynamic float8 act float8 weight #2505

Uh oh!

Conversation

Xia-Weiwen commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2505

✅ No Failures

Uh oh!

Xia-Weiwen commented Jul 10, 2025

Uh oh!

chunyuan-w commented Jul 10, 2025

Uh oh!

Xia-Weiwen commented Jul 11, 2025

Uh oh!

Uh oh!

jerryzh168 commented Aug 27, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mingfeima left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Xia-Weiwen commented Sep 17, 2025

Uh oh!

Xia-Weiwen commented Sep 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as duplicate.

Uh oh!

This comment was marked as duplicate.

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Xia-Weiwen commented Jul 8, 2025 •

edited

Loading

pytorch-bot bot commented Jul 8, 2025 •

edited

Loading

jerryzh168 Sep 19, 2025 •

edited

Loading

jerryzh168 Sep 19, 2025 •

edited

Loading