-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Add Wan2.2-Animate: Unified Character Animation and Replacement with Holistic Replication #12442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Wan2.2-Animate: Unified Character Animation and Replacement with Holistic Replication #12442
Conversation
- Introduced WanAnimateTransformer3DModel and WanAnimatePipeline. - Updated get_transformer_config to handle the new model type. - Modified convert_transformer to instantiate the correct transformer based on model type. - Adjusted main execution logic to accommodate the new Animate model type.
…prove error handling for undefined parameters
…work for character animation and replacement - Added Wan 2.2 Animate 14B model to the documentation. - Introduced the Wan-Animate framework, detailing its capabilities for character animation and replacement. - Included example usage for the WanAnimatePipeline with preprocessing steps and guidance on input requirements.
- Introduced `WanAnimateGGUFSingleFileTests` to validate functionality. - Added dummy input generation for testing model behavior.
- Introduced `EncoderApp`, `Encoder`, `Direction`, `Synthesis`, and `Generator` classes for enhanced motion and appearance encoding. - Added `FaceEncoder`, `FaceBlock`, and `FaceAdapter` classes to integrate facial motion processing. - Updated `WanTimeTextImageMotionEmbedding` to utilize the new `Generator` for motion embedding. - Enhanced `WanAnimateTransformer3DModel` with additional face adapter and pose patch embedding for improved model functionality.
- Introduced `pad_video` method to handle padding of video frames to a target length. - Updated video processing logic to utilize the new padding method for `pose_video`, `face_video`, and conditionally for `background_video` and `mask_video`. - Ensured compatibility with existing preprocessing steps for video inputs.
…roved video processing - Added optional parameters: `conditioning_pixel_values`, `refer_pixel_values`, `refer_t_pixel_values`, `bg_pixel_values`, and `mask_pixel_values` to the `prepare_latents` method. - Updated the logic in the denoising loop to accommodate the new parameters, enhancing the flexibility and functionality of the pipeline.
…eneration - Updated the calculation of `num_latent_frames` and adjusted the shape of latent tensors to accommodate changes in frame processing. - Enhanced the `get_i2v_mask` method for better mask generation, ensuring compatibility with new tensor shapes. - Improved handling of pixel values and device management for better performance and clarity in the video processing pipeline.
…and mask generation - Consolidated the handling of `pose_latents_no_ref` to improve clarity and efficiency in latent tensor calculations. - Updated the `get_i2v_mask` method to accept batch size and adjusted tensor shapes accordingly for better compatibility. - Enhanced the logic for mask pixel values in the replacement mode, ensuring consistent processing across different scenarios.
…nced processing - Introduced custom QR decomposition and fused leaky ReLU functions for improved tensor operations. - Implemented upsampling and downsampling functions with native support for better performance. - Added new classes: `FusedLeakyReLU`, `Blur`, `ScaledLeakyReLU`, `EqualConv2d`, `EqualLinear`, and `RMSNorm` for advanced neural network layers. - Refactored `EncoderApp`, `Generator`, and `FaceBlock` classes to integrate new functionalities and improve modularity. - Updated attention mechanism to utilize `dispatch_attention_fn` for enhanced flexibility in processing.
Hello, thank you for your contribution! I'm very interested in this project and would like to ask how the progress is going so far. I'm also trying it out myself and would love to keep up with your progress. |
Hi @a-free-a. This PR will be my top priority in 1/1.5/2 days. I estimate this PR will be completed in about 10 days (or less). |
@tolgacangoz Wow, this is such exciting news — I’m really looking forward to it! I’m still a beginner and trying to get a better understanding of your work. Would you happen to have any tutorials or resources you could recommend to help me improve? |
I think you can examine/study previously merged PRs, either about pipelines or any other kinds. This site seems really good in terms of theory and its implementations: https://nn.labml.ai. After studying, you could try to work on |
@tolgacangoz Thanks a lot for the tips and the link! Really appreciate it. Hope everything goes great with your project! |
This PR is fixing #12441.
Project Page: https://humanaigc.github.io/wan-animate/
TODOs:
WanAnimatePipeline
WanAnimateTransformer3DModel
: Sincediffusers
doesn't like too many abstractions, I am removing and merging several small classes and functions, etc.Comparison with the original repo
Try
WanAnimatePipeline
!Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.