SALMONN family: A suite of advanced multi-modal LLMs
-
Updated
Sep 28, 2025
SALMONN family: A suite of advanced multi-modal LLMs
video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is developed by the Department of Electronic Engineering at Tsinghua University and ByteDance.
Add a description, image, and links to the audio-visual-understanding topic page so that developers can more easily learn about it.
To associate your repository with the audio-visual-understanding topic, visit your repo's landing page and select "manage topics."