I am excited about Mozilla Common Voice
Jan. 3rd, 2021 02:15 pmSo this is not new, I simply remembered that it exists a few days ago. Used to read a lot of sentences for it two-or-so years ago!
Mozilla Common Voice is, uh, an attempt to get lots of public domain speech data in as many languages as possible to make speech recognition stuff for cool things, hopefully.
I like its very practical approach. As with many language-related things, practicality makes a for a refreshing and surprising amount of inclusivity and non-shittyness. Record in a language that you learned later in life? Definitely, speech recognition needs to be able to understand you! Record with terrible audio quality? Go for it, no use if you need a studio setup because your voice assistant was trained on folks who want audio to sound pretty.
And what about the corpus? Won't it be terribly boring? Usually not, thanks to the sentence collector! Anyone can add sentences to be read, and these can contain as many queer words as you want, hint hint.
Well, to be honest, a lot of the boring and overcomplicated sentences come from the collector too, at least in German. But yeah. I can and will throw my old diary entries in there too, albeit heavily edited to make the sentences 14 words long at most. (You won't believe how long my diary sentences often are.)
And, what I only fully realised today: All the sentences are here on Github, freely available for any kind of, uh, art or mocking analysis or whatever a cat might feel like on a given day.
Mozilla Common Voice is, uh, an attempt to get lots of public domain speech data in as many languages as possible to make speech recognition stuff for cool things, hopefully.
I like its very practical approach. As with many language-related things, practicality makes a for a refreshing and surprising amount of inclusivity and non-shittyness. Record in a language that you learned later in life? Definitely, speech recognition needs to be able to understand you! Record with terrible audio quality? Go for it, no use if you need a studio setup because your voice assistant was trained on folks who want audio to sound pretty.
And what about the corpus? Won't it be terribly boring? Usually not, thanks to the sentence collector! Anyone can add sentences to be read, and these can contain as many queer words as you want, hint hint.
Well, to be honest, a lot of the boring and overcomplicated sentences come from the collector too, at least in German. But yeah. I can and will throw my old diary entries in there too, albeit heavily edited to make the sentences 14 words long at most. (You won't believe how long my diary sentences often are.)
And, what I only fully realised today: All the sentences are here on Github, freely available for any kind of, uh, art or mocking analysis or whatever a cat might feel like on a given day.