[Feature]: Tracking Whisper feature requests

### 🚀 The feature, motivation and pitch

This issue is for keeping track of the recurrent Whisper asks as well as the linked on-going efforts to support that feature, if any. 
When a feature request has no linked PR, feel free to claim the work here if you want to help!

- [ ] Support different `response_formats` https://platform.openai.com/docs/api-reference/audio/createTranscription
   - Related issues: https://github.com/vllm-project/vllm/issues/19556, https://github.com/vllm-project/vllm/issues/14818, https://github.com/vllm-project/vllm/issues/24302 
   - PR(s): https://github.com/vllm-project/vllm/pull/24209 (`verbose_json`, **help needed with other formats**)
- [ ] Support timestamp granularities:  
  - Context: Very much related to the above. Unfortunately outputting by `word` requires aligning encoder latents  (usually extrapolated from the crossattn layers) with decoder ones. I feel a lot of these whsper-specific techniques bring in added complexity to vLLM. However, I think we're open to exploring in this direction if we can come up with a less invasive solutions. Some references to get started https://github.com/m-bain/whisperX https://github.com/openai/whisper/discussions/684  
- [ ] Automatic language detection:
  - Context: one should be able to let the decoder predict part of its "preamble" prompt, including the language and task token, conditioned on the encoder output. This is effectively utilizing whisper "built-in automatic language detection" feature. Mind that it would be ideal to _guide_ the output tokens among valid languages. Accuracy to evaluate.
  - Related issues: https://github.com/vllm-project/vllm/issues/14174
- [ ] Beam search:
  - PR(s): https://github.com/vllm-project/vllm/pull/13758 , this one needs reviving. Feel free to claim work here. 
- [ ] Feed previous chunk context to improve accuracy
  - PR(s): https://github.com/vllm-project/vllm/pull/20249 (first tentative, to re-do) 

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Tracking Whisper feature requests #25750

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Tracking Whisper feature requests #25750

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions