Universal-2 is, according to Artificial Analysis, the best speech to text model available by Word Error Rate. From my experiments with it, it also seems to hallucinate way less. It would be awesome if we could add it as a voice model with our own API Key.
Edit: link to post about Universal-2 https://www.assemblyai.com/research/universal-2