Output Templates

Learn more about the structure of Jovo output templates, which offer the ability to create cross-platform output for voice and chat experiences.

Introduction

The Jovo output template engine takes structured output and translates it into a native platform response. This makes it possible to have a common format for multimodal output that works across platforms and devices.

For example, here is a simple output that just returns a "Hello World!":

{
  message: 'Hello World!',
}

And here is an example that asks the user a question and offers guidance how to answer:

{
  message: 'Do you like pizza?',
  quickReplies: [ 'yes', 'no' ],
}

Jovo output offers both generic output as well as platform-specific output elements.

Generic Output Elements

Jovo output templates come with a selection of generic elements that are supported by most platforms, including:

Not all platforms support all of these elements. In such a case, the platform just ignores that element and still successfully builds the rest of the output template.

message

The message is usually either what you hear (speech output) or what see you see in the form of a chat bubble.

{
  message: 'Hello world!',
}

A message can either be a string or have the following properties:

{
  message: {
    speech: 'Hello listener!', // For voice platforms
    text: 'Hello reader!', // For chat platforms and voice platforms that support display text
  }
}

message also supports randomization. If you use an array, one of the elements will be picked randomly. This works for both strings as well as the object structure shown above.

{
  message: [
    'Hi!',
    'Hello!',
    { speech: 'Hello listener.', text: 'Hello reader.'
  ],
}

reprompt

The reprompt is typically only relevant for voice interfaces. It represents the output that is presented to the user if they haven't responded to a previous question.

{
  message: `Hello world! What's your name?`,
  reprompt: 'Could you tell me your name?',
}

A reprompt can have the same values (speech, text) as a message and also supports randomization if you use an array.

{
  reprompt: [
    'Could you tell me your name?',
    'I missed your name, could you please repeat it?',
  ],
}

card

Cards are often used to display content in a visual and structured way.

{
  card: {
    title: 'Hello world!',
    content: 'Welcome to this new app built with Jovo.',
  },
}

A card consists of the following properties:

  • title: A string that is usually displayed at the top of the card. Required.
  • subtitle: An optional string that is displayed below the title.
  • content: An optional string that contains the body text of the card.
  • imageUrl: An optional string with a URL to an image to be displayed as part of the card.
  • imageAlt: An optional string that contains an alt text for the image.
  • key: An optional string that is used by some platforms if the card is selected.

A carousel is a (usually horizontally scrollable) collection cards:

{
  carousel: {
    items: [
      {
        title: 'Element 1',
        content: 'To my right, you will see element 2.'
      },
      {
        title: 'Element 2',
        content: 'Hi there!'
      },
    ],
  },
}

A carousel consists of the following properties:

  • title: A string that is usually displayed at the top of the carousel.
  • items: An array of card items. The minimum number of items is 1, the maximum depends on the platform. Some platforms like Google Assistant and Google Business Messages require a minimum number of 2 items. In that case, single-item carousels are turned into a card.

quickReplies

Quick replies (sometimes called suggestion chips) are little buttons that provide suggestions for the user to tap instead of having to type out a response to a question.

{
  quickReplies: [
    'Berlin',
    'NYC',
  ],
}

Quick replies can also contain a text and a value:

{
  quickReplies: [
    {
      text: 'Berlin',
      value: 'berlin',
    },
    // ...
  ],
}

Usually, the value of the quick reply gets passed to a platform's natural language understanding service. For some platforms, you can also add intent (and entity) information so that the button click can be directly mapped to structured user input:

{
  quickReplies: [
    {
      text: 'Berlin',
      intent: 'CityIntent',
      entities: {
        city: {
          value: 'berlin',
        },
      },
    },
    // ...
  ],
}

listen

Especially for voice platforms, it is important to indicate whether you are expecting the user to answer (keep the microphone open) or want the session to close. The listen property is used for this. By default, it is set to true, even if you don't specify it in your output template.

If you want the session to close, you need to set it to false:

{
  message: `Goodbye!`,
  listen: false,
}

It's also possible to turn listen into an object to tell the platform to listen for specific user input:

{
  message: `Which city do you want to visit?`,
  listen: {
    intents: [ 'CityIntent' ],
    entities: { /* ... */ },
  },
}

Learn more in the sections below:

intents

Some NLU services offer intent scoping, which means the ability to tell the model to prioritize certain intents. Learn more in the NLU documentation.

{
  message: `Which city do you want to visit?`,
  listen: {
    intents: [ 'CityIntent' ],
  },
}

entities

By adding an entities object to the listen property, you can dynamically add values to the NLU model. This is also called "dynamic entities". Learn more in the entities documentation.

{
  // ...
  listen: {
    entities: {
      CityType: {
        values: [
          {
            value: 'berlin',
          },
          // ...
        ],
      },
    },
  },
}

In the case that your output is an array of objects with differing listen values, the one of the last array will be prioritized. The only exception is that a listen: true value does not override dynamic entities, because setting dynamic entities implicitly sets listen to true. A last item in an array with listen: false closes the session and removes previous dynamic entities.

Platform Specific Output Elements

Each output object can contain a platforms element for platform specific content:

{
  message: 'Hello world!',
  platforms: {
    // ...
  },
}

You can reference each platform by using their name, for example alexa or googleAssistant:

{
  message: 'Hello world!',
  platforms: {
    alexa: {
      // ...
    },
  },
}

There are two ways how this can be used:

  • Add content types that are only available on one platform (for example an account linking card on Alexa)
  • Override generic output elements for specific platforms

For example, the message can be overridden for Alexa users:

{
  message: 'Hello world!',
  platforms: {
    alexa: {
      message: 'Hello Alexa!',
    },
  },
}

A platform can also remove generic output by setting it to null:

{
  message: 'Hello world!',
  platforms: {
    alexa: {
      message: null,
    },
  },
}

Native Response

For each platform, you can add a nativeResponse object that is directly translated into the native platform JSON.

{
  message: 'Hello world!',
  platforms: {
    alexa: {
      nativeResponse: {
        // Add elements in the same way they show up in the response JSON
      },
    },
  },
}

If you want to explicitly remove a property, you can set it to undefined. In the following example, the shouldEndSession property is removed from the Alexa response:

{
  message: 'Hello world!',
  platforms: {
    alexa: {
      nativeResponse: {
        response: {
          shouldEndSession: undefined,
        },
      },
    },
  },
}

Array of Output Templates

You can also have an array of output objects:

[
  {
    message: 'Hello world!',
  },
  {
    message: 'This is a second chat bubble.',
  },
];

Platforms that support multiple responses will display the example above in 2 chat bubbles. Synchronous platforms like Alexa will concatenate the message to a single response:

{
  message: 'Hello world! This is a second chat bubble.',
}

If the message is an object for one of the output objects, the other message strings are treated as if the spoken speech and text are the same. Here is an example:

// Before merging
[
  {
    message: 'Hello world!',
  },
  {
    message: {
      speech: 'This is spoken text.',
      text: 'This is display text.',
    },
  },
]

// After merging
{
  message: {
    speech: 'Hello world! This is spoken text.',
    text: 'Hello world! This is display text.',
  },
}

For elements that are only allowed once per response (like card or carousel), the last object of the array will be prioritized, overriding previous elements.