How we’re improving learning with anonymized speech data
https://blog.duolingo.com/how-were-improving-learning-with-anonymized-speech-data/
So last month, we began asking a subset of learners if they are willing to share their recorded speech with us, in order to better understand their learning process. We only collect speech data from learners who have given their permission, and we ensure that the speech data is anonymized to protect privacy. Collecting and analyzing speech data will help us develop new features to help you improve your speaking skills, such as:
- Giving tips on pronunciation, word by word, sound by sound
- Picking speaking exercises that focus on areas where you need the most practice
- Grading beginners’ speech more leniently, to reduce frustration
- Improving how the app understands speech
Protecting our learners’ privacy is a top priority for Duolingo, so we’ve taken many steps to ensure the data we collect can never be tied to an individual learner.
As our first line of defense, we:
- Do not collect speech data with any uniquely identifying information (e.g. name, ID) and information about when the data was received
- Do not store speech data from child users (see our privacy policy)
We also treat all speech data as an aggregation, and never at an individual level. So as our second line of defense, we:
- Only collect data from frequently used exercises — to ensure larger numbers of learners are generating speech and avoid any chance of identifying an individual based on a particular exercise
- Only access the data after enough has been collected that none can be tied back to any particular learning moment
By following these precautions, we ensure that no learner can ever be identified by their speech data.