jsm on Nostr: With the rise of really good transcription models, TTS that’s actually enjoyable to ...
With the rise of really good transcription models, TTS that’s actually enjoyable to listen to, and LLMs that can carry a conversation and understand complex commands, why haven’t we seen an explosion of really good voice interfaces?
It seems obvious to me but I’ve only seen Apple making a serious attempt with the latest Siri update. There are so many times that I’m doing something with my hands, driving, etc. and wish I could give commands to my RSS reader or just chat with an LLM that has the Arxiv and Wikipedia connected with RAG.
Published at
2024-09-22 13:43:14Event JSON
{
"id": "a12ffe9735d73a10ee6480318e9541ed4dcb734b730cb0c4bcb3230a2b4a532e",
"pubkey": "e0339348ca6cac9708cd98e631e2f4baad534dfce870881b65aa57d30ff7253e",
"created_at": 1727012594,
"kind": 1,
"tags": [],
"content": "With the rise of really good transcription models, TTS that’s actually enjoyable to listen to, and LLMs that can carry a conversation and understand complex commands, why haven’t we seen an explosion of really good voice interfaces?\n\nIt seems obvious to me but I’ve only seen Apple making a serious attempt with the latest Siri update. There are so many times that I’m doing something with my hands, driving, etc. and wish I could give commands to my RSS reader or just chat with an LLM that has the Arxiv and Wikipedia connected with RAG.",
"sig": "4534bb1767505b491079e64badece34a5bfbb7477b9fa49f22ff34d6efb4dff7881b7db12d57f3cb6bb7e7e909ddccd57dc74ad28951d5dabdf9be20c394b29e"
}