Add WanImage2ImagePipeline

[Wan-AI/Wan2.2-T2V-A14B](https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B) supported via `WanPipeline` is not only one of the best video models, its also widely popular as *text-to-image* model  
(basically using it to produce video with single frame) due to its very good photo-realistic output.

Ask is to extend support by adding `WanImage2ImagePipeline` to allow *image-to-image* workflows
Work should be limited to setting `timesteps` and modifying `prepare_latents` method to take input `image` and `strength` params to encode input image and add noise to it, same as in pretty much any other image-to-image pipeline.

Note: this request is different than input image support in `WanImageToVideoPipeline` and `WanVACEPipeline` since
- a) those pipelines work with different model weights, so user cannot switch from text-to-image to image-to-image without full model reload
- b) they do not support denoising strength, image is taken as-is and if used in image-to-image workflow it will produce output identical to input.

Bonus: Also create `WanInpaintPipeline` to allow for image/mask combinations as input.

cc @yiyixuxu @sayakpaul @a-r-r-o-w @DN6 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add WanImage2ImagePipeline #12368

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add WanImage2ImagePipeline #12368

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions