Skip to content

What kernels should we integrate in Diffusers? #12375

@sayakpaul

Description

@sayakpaul

Now that we have an integration with the kernels lib to use Flash Attention 3 (FA3), it'd be nice to gather community interest about which kernels we should try to incorporate in the library through the kernels lib. FA3 delivers a significant speedup on Hopper GPUs.

I have done some work in the kernelize branch to see if replacing GELU, SiLU, and RMSNorm with their optimized kernels would have any speedups on Flux. So far, it hasn't had any. Benchmarking script: https://gist.github.com/sayakpaul/35236dd96e15d9f7d658a7ad11918411. One can compare the changes here: https://github.com/huggingface/diffusers/compare/kernelize?expand=1.

Note

The changes in the kernelize branch are quite hacky as we're still evaluating things.

Please use this issue to let us know which kernels we should try to support in Diffusers. Some notes to keep in mind:

Cc: @MekkCyber

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceAnything related to performance improvements, profiling and benchmarking

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions