ASR Integrations
Learn more about automatic speech recognition (ASR) services that can be integrated with Jovo.
Introduction
Automatic speech recognition (in short, ASR) is the process of turning raw speech input (recorded audio) into transcribed text. It is part of the interpretation
step of the RIDR lifecycle.
Jovo offers integrations with a variety of ASR services. You can find all the current integrations here.
ASR integrations are helpful for platforms that deal with raw speech input. The integration then writes the results into an asr
object that is part of the $input
property:
// Before ASR step { type: 'SPEECH', audio: { /* ... */ }, // recorded speech } // After ASR step { type: 'SPEECH', audio: { /* ... */ }, asr: { text: 'the transcribed text', }, }
The text
then needs to be turned into structured meaning by using an NLU integration. Some services like Amazon Lex are also called spoken language understanding (SLU) services because they take care of both the ASR and NLU steps.
Learn more about Jovo ASR integrations in the following sections:
Integrations
Currently, the following integrations are available with Jovo v4
:
Configuration
An ASR integration needs to be added as a platform plugin in the app configuration. Here is an example how it could look like in the app.ts
file, using Core Platform with Lex SLU:
import { CorePlatform } from '@jovotech/platform-core'; import { LexSlu } from '@jovotech/slu-lex'; // ... const app = new App({ plugins: [ new CorePlatform({ plugins: [new LexSlu( // ASR Plugin Configuration )], }), // ... ], });
Along with integration specific options (which can be found in each integration's documentation), there are also features that are configured the same way across all ASR integrations.
The default configuration for each ASR integration is:
new LexSlu({ // ... input: { supportedTypes: ['SPEECH'], // Use ASR for 'SPEECH' input types } }),
The input
config property refers to the Jovo $input
property. The supportedTypes
array includes all input types for which the ASR integration should run, the default being ['SPEECH']
.