Topics
Speech Recognition
Models for transcribing, translating, and understanding spoken audio.
Speech Synthesis · Independent Researcher
A Broad Benchmark for Long-Form Speech Generation turns long-form speech generation into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
Brain Decoding · Meta AI
Brain2Qwerty decodes typed sentences from non-invasive brain recordings: MEG reaches 32% CER on average, EEG trails at 67%, and the best participants reach 19%.
Multimodal Models · Skywork AI
The Audio Interaction Model runs a perceive-decide-respond loop so an audio LLM listens, decides if and when to reply, and answers on the fly — trained on StreamAudio-2M and competitive across 8 benchmarks.
Speech Recognition · Shanghai AI Laboratory
Mega-ASR fights ASR's noise-robustness gap by synthesizing 2.4M clips across 54 compound acoustic scenarios, then training Qwen3-ASR-1.7B in two stages — cutting WER to 45.69% vs 54.01% on VOiCES R4-B-F.
Speech Recognition · OpenAI
OpenAI's Whisper trains a single sequence-to-sequence model on 680,000 hours of web audio. It matches fully supervised systems zero-shot — no fine-tuning — and adds translation and language ID.