HOW ACCURATE IS VOICE TO TEXT TRANSCRIPTION ON THE SOTERIA PLATFORM?

insightfultechnologycom

Speech To Text Word Error Rate (WER) is affected by a number of factors. Background noise and overlapping speech, voice signals being captured and stored in mono rather than stereo, and compression of voice recordings all reduce accuracy and increase WER. Using the SOTERIA platform’s Automated Speech Recognition (ASR) engine, accuracy/confidence ratings of better than 75% (25% WER) can be achieved in noisy environments with compressed, mono feeds. For broadcast quality recordings (stereo, uncompressed feeds with little background noise), 95% confidence ratings or better (5% WER) can be achieved. As part of the new client on-boarding process, a manual voice modelling process is carried out, using samples of the client’s existing voice recordings, matching them to the corresponding transcriptions and correcting any transcription errors. This effectively teaches the platform to recognise and better understand the speech patterns and common terminology used in the client’s environment. Additional refinements can be carried out on an ongoing basis where necessary (e.g. when new terminology comes into play, such as “MiFID II”, “Brexit” etc.), further optimising transcription accuracy.