How we’re improving learning with anonymized speech data

Original author: Gretchen Swecker et al. (Please visit the linked url for the original post.)

https://blog.duolingo.com/how-were-improving-learning-with-anonymized-speech-data/

So last month, we began asking a subset of learners if they are willing to share their recorded speech with us, in order to better understand their learning process. We only collect speech data from learners who have given their permission, and we ensure that the speech data is anonymized to protect privacy. Collecting and analyzing speech data will help us develop new features to help you improve your speaking skills, such as:

Giving tips on pronunciation, word by word, sound by sound

Picking speaking exercises that focus on areas where you need the most practice

Grading beginners’ speech more leniently, to reduce frustration

Improving how the app understands speech

Protecting our learners’ privacy is a top priority for Duolingo, so we’ve taken many steps to ensure the data we collect can never be tied to an individual learner.

As our first line of defense, we:

Do not collect speech data with any uniquely identifying information (e.g. name, ID) and information about when the data was received

Do not store speech data from child users (see our privacy policy)

We also treat all speech data as an aggregation, and never at an individual level. So as our second line of defense, we:

Only collect data from frequently used exercises — to ensure larger numbers of learners are generating speech and avoid any chance of identifying an individual based on a particular exercise

Only access the data after enough has been collected that none can be tied back to any particular learning moment

By following these precautions, we ensure that no learner can ever be identified by their speech data.

How we’re improving learning with anonymized speech data

More

Recent Posts

Categories