ASR Integrations

Learn more about automatic speech recognition (ASR) services that can be integrated with Jovo.

Introduction

Automatic speech recognition (in short, ASR) is the process of turning raw speech input (recorded audio) into transcribed text. It is part of the interpretation step of the RIDR lifecycle.

Jovo offers integrations with a variety of ASR services. You can find all the current integrations here.

ASR integrations are helpful for platforms that deal with raw speech input. The integration then writes the results into an asr object that is part of the $input property:

// Before ASR step
{
  type: 'SPEECH',
  audio: { /* ... */ }, // recorded speech
}

// After ASR step
{
  type: 'SPEECH',
  audio: { /* ... */ },
  asr: {
    text: 'the transcribed text',
  },
}

The text then needs to be turned into structured meaning by using an NLU integration. Some services like Amazon Lex are also called spoken language understanding (SLU) services because they take care of both the ASR and NLU steps.

Learn more about Jovo ASR integrations in the following sections:

Integrations

Currently, the following integrations are available with Jovo v4:

Configuration

An ASR integration needs to be added as a platform plugin in the app configuration. Here is an example how it could look like in the app.ts file, using Core Platform with Lex SLU:

import { CorePlatform } from '@jovotech/platform-core';
import { LexSlu } from '@jovotech/slu-lex';
// ...

const app = new App({
  plugins: [
    new CorePlatform({
      plugins: [new LexSlu(
        // ASR Plugin Configuration
      )],
    }),
    // ...
  ],
});

Along with integration specific options (which can be found in each integration's documentation), there are also features that are configured the same way across all ASR integrations.

The default configuration for each ASR integration is:

new LexSlu({
  // ...
  input: {
    supportedTypes: ['SPEECH'], // Use ASR for 'SPEECH' input types
  }
}),

The input config property refers to the Jovo $input property. The supportedTypes array includes all input types for which the ASR integration should run, the default being ['SPEECH'].