maunzikation: Blue-haired person in front of a wall with colorful glow in the dark galaxy-things and sheep. (Default)
So this is not new, I simply remembered that it exists a few days ago. Used to read a lot of sentences for it two-or-so years ago!

Mozilla Common Voice is, uh, an attempt to get lots of public domain speech data in as many languages as possible to make speech recognition stuff for cool things, hopefully.

I like its very practical approach. As with many language-related things, practicality makes a for a refreshing and surprising amount of inclusivity and non-shittyness. Record in a language that you learned later in life? Definitely, speech recognition needs to be able to understand you! Record with terrible audio quality? Go for it, no use if you need a studio setup because your voice assistant was trained on folks who want audio to sound pretty.

And what about the corpus? Won't it be terribly boring? Usually not, thanks to the sentence collector! Anyone can add sentences to be read, and these can contain as many queer words as you want, hint hint.

Well, to be honest, a lot of the boring and overcomplicated sentences come from the collector too, at least in German. But yeah. I can and will throw my old diary entries in there too, albeit heavily edited to make the sentences 14 words long at most. (You won't believe how long my diary sentences often are.)

And, what I only fully realised today: All the sentences are here on Github, freely available for any kind of, uh, art or mocking analysis or whatever a cat might feel like on a given day.

Profile

maunzikation: Blue-haired person in front of a wall with colorful glow in the dark galaxy-things and sheep. (Default)
catship

October 2021

S M T W T F S
     12
3456789
10111213141516
17181920212223
2425 2627282930
31      

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Apr. 23rd, 2025 05:07 pm
Powered by Dreamwidth Studios