Sunday, April 15, 2018

Voice Interactions on AWS Lex + Google Dialogflow


Summary

In this post I'll discuss the audio capabilities of the bot frameworks in AWS and Google.  They have different approaches currently, though I think that's changing.  AWS Lex is fully-capable processing voice/audio in a single API call.  Google Dialogflow has a separation of concerns currently.  It takes three API calls to process a voice input and provide a voice response.  Interestingly enough, execution time on both platforms is roughly the same.

Voice Interaction Flow - AWS Lex

Diagram below of what things look like on Lex to process a voice interaction.  It's really simple.  A single API call (PostContent) can take audio as input and provide an audio bot response.  Lex is burying the speech-to-text and text-to-speech details such that the developer doesn't have to deal with it.  It's nice.


Code Snippet - AWS Lex

Simple function for submitting audio in and receiving audio out below.  The PostContent API call can process text or audio.

 send(userId, request) {
  let params = {
          botAlias: '$LATEST',
    botName: BOT_NAME,
    userId: userId,
    inputStream: request
  };
  
  switch (typeof request) {
   case 'string':
    params.contentType = 'text/plain; charset=utf-8';
    params.accept = 'text/plain; charset=utf-8';
    break;   
   case 'object':
    params.contentType = 'audio/x-l16; sample-rate=16000';
    params.accept = 'audio/mpeg';
    break;
  }
  return new Promise((resolve, reject) => {
   this.runtime.postContent(params, (err, data) => {
    if (err) {
     reject(err);
    }
    else if (data) {
     let response = {'text' : data.message};
     switch (typeof request) {
      case 'string':
       response.audio = '';
       break;
      case 'object':
       response.audio = Buffer.from(data.audioStream).toString('base64');
       break;
     }
     resolve(response);
    }
   });
  });
 }

Voice Interaction Flow - Google Dialogflow

Diagram of what the current state of affairs look like with Dialogflow and voice processing.  Each function (speech-to-text, bot, text-to-speech) require separate API calls.  At least that's the way it is in the V1 Dialogflow API.  From what I can tell in V2 (beta), it will allow for audio inputs.


Code Snippet - Google Dialogflow

Coding this up is more complicated than Lex, but nothing cosmic.  I wrote some wrapper functions around Javascript Fetch commands and then cascaded them via Promises as you see below.
 send(request) {
  return new Promise((resolve, reject) => {
   switch (typeof request) {
    case 'string':
     this._sendText(request)
     .then(text => {
      let response = {};
      response.text = text;
      response.audio = '';
      resolve(response);
     })
     .catch(err => { 
      console.error(err.message);
      reject(err);
     });  
     break;
    case 'object':
     let response = {};
     this._stt(request)
     .then((text) => {
      return this._sendText(text);
     })
     .then((text) => {
      response.text = text;
      return this._tts(text);
     })
     .then((audio) => {
      response.audio = audio;
      resolve(response);
     })
     .catch(err => { 
      console.error(err.message);
      reject(err);
     });  
   }
  });
 }

Results

I didn't expect this, but both platforms performed fairly equally even though multiple calls are necessary on Dialogflow.  For my simple bot example, I saw ~ 2 second execution times for audio in/out from both Lex and Dialogflow.  

Copyright ©1993-2024 Joey E Whelan, All rights reserved.

Saturday, April 7, 2018

Dialogflow & InContact Chat Integration


Summary

In this post I'll discuss how to integrate a chat session that starts with a bot in Google Dialogflow.  The user isn't able to complete the transaction with the bot and then requests a human agent for assistance.  The application then connects the user with an agent on InContact's cloud platform.  The bot and web interfaces I built here are crude/non-production quality.  The emphasis here is on API usage and integration thereof.

This the third post of three discussing chat with InContact and Dialogflow.


Architecture

Below is a diagram the overall architecture for the scenario discussed above.


Application Architecture

The application layer is a simple HTML page with the interface driven by a single Javascript file - chat.js.  I built wrapper classes for the Dialogflow and InContact REST API's:  dflow.js and incontactchat.js respectively.  The chat.js code invokes API calls via those classes.





Application Flow

The diagram below depicts the steps in this example scenario.  




Steps 5, 6 Screen-shots



Copyright ©1993-2024 Joey E Whelan, All rights reserved.

Tuesday, April 3, 2018

Google Dialogflow - Input Validation


Summary

This post concerns the task of validating user-entered input for a Google Dialogflow-driven chat agent.  My particular scenario is a quite simple/crude transactional flow, but I found input validation (slots) to be particularly cumbersome in Dialogflow.  Based on what I've seen in various forums, I'm not alone in that opinion.  Below are my thoughts on one way to handle input validation in Dialogflow.


Architecture

Below is high-level depiction of the Dialogflow architecture I utilized for my simple agent.  This particular agent is a repeat of something I did with AWS Lex (explanation here).  It's a firewood ordering agent.  The bot prompts for various items (number of cords, delivery address, etc) necessary to fulfill an order for firewood.  Really simple.



Below is my interpretation of the agent bot model in Dialogflow.

Validation Steps

For this simple, transactional agent I had various input items (slots) that needed to be provided by the end-user.  To validate those slots, I used two intents per item.  One intent was the main one that gathers the user's input.  That intent uses an input context to restrict access per the transactional flow.  The input to the intent is then be sent to a Google Cloud Function (GCF) for validation.  If it's valid, then a prompt is sent back to user for the next input slot.  If it's invalid, the GCF function triggers a follow-up intent to requery for that particular input item.  The user is trapped in that loop until they provide valid input.

Below is a diagram of the overall validation flow.


Below are screenshots of the Intent and requery-Intent for the 'number of cords' input item.  That item must be an integer between 1 and 3 for this simple scenario.



Code

Below is a depiction of the overall app architecture I used here.  All of the input validation is happening in a node.js function on GCF.


Validation function (firewoodWebhook.js)

The meaty parts of that function below:
function validate(data) { 
 console.log('validate: data.intentName - ' + data.metadata.intentName);
 switch (data.metadata.intentName) {
  case '3.0_getNumberCords':
   const cords = data.parameters.numberCords;
   if (cords && cords > 0 && cords < 4) {
    return new Promise((resolve, reject) => {
     const msg = 'We deliver within the 80863 zip code.  What is your street address?';
     const output = JSON.stringify({"speech": msg, "displayText": msg});
     resolve(output);
    });
   }
   else {
    return new Promise((resolve, reject) => {
     const output = JSON.stringify ({"followupEvent" : {"name":"requerynumbercords", "data":{}}});
     resolve(output);
    });
   }
   break;
  case '4.0_getStreet':
   const street = data.parameters.deliveryStreet;
   if (street) {
    return callStreetApi(street);
   }
   else {
    return new Promise((resolve, reject) => {
     const output = JSON.stringify ({"followupEvent" : {"name":"requerystreet", "data":{}}});
     resolve(output);
    });
   }
   break;
  case '5.0_getDeliveryTime':
   const dt = new Date(Date.parse(data.parameters.deliveryTime));
   const now = new Date();
   const tomorrow = new Date(now.getFullYear(), now.getMonth(), now.getDate()+1);
   const monthFromNow = new Date(now.getFullYear(), now.getMonth()+1, now.getDate());
   if (dt && dt.getUTCHours() >= 9 && dt.getUTCHours() <= 17 && dt >= tomorrow && dt <= monthFromNow) {
    return new Promise((resolve, reject) => {
     const contexts = data.contexts;
     let context = {};
     for (let i=0; i < contexts.length; i++){
      if (contexts[i].name === 'ordercontext') {
       context = contexts[i];
       break;
      }
     }
     const price = '$' + PRICE_PER_CORD[context.parameters.firewoodType] * context.parameters.numberCords;
     const msg = 'Thanks, your order for ' + context.parameters.numberCords + ' cords of ' + context.parameters.firewoodType + ' firewood ' + 
        'has been placed and will be delivered to ' + context.parameters.deliveryStreet + ' at ' + context.parameters.deliveryTime + '.  ' + 
        'We will need to collect a payment of ' + price + ' upon arrival.';
     const output = JSON.stringify({"speech": msg, "displayText": msg});
     resolve(output);
    });
   }
   else {
    return new Promise((resolve, reject) => {
     const output = JSON.stringify ({"followupEvent" : {"name":"requerydeliverytime", "data":{}}});
     resolve(output);   
    });
   }
   break;
  default:  //should never get here
   return new Promise((resolve, reject) => {
    const output = JSON.stringify ({"followupEvent" : {"name":"requestagent", "data":{}}});
    resolve(output);  
   });
 }
}
Focusing only on the number of cords validation -
Lines 6-11:  Check if the user input is between 1 and 3 cords.  If so, return a Promise object with the next prompt for input.
Lines 13-17:  Input is invalid.  Return a Promise object with a followupEvent to trigger the requery intent for this input item.

Client-side.  Dialogflow wrapper (dflow.js)

Meaty section of that below.  This is the 'send' function that submits user-input to Dialogflow for analysis and response.
 send(text) {
  const body = {'contexts': this.contexts,
      'query': text,
      'lang': 'en',
      'sessionId': this.sessionId
  };
  
  return fetch(this.url, {
   method: 'POST',
   body: JSON.stringify(body),
   headers: {'Content-Type' : 'application/json','Authorization' : 'Bearer ' + this.token},
   cache: 'no-store',
   mode: 'cors'
  })
  .then(response => response.json())
  .then(data => {
   console.log(data);
   if (data.status && data.status.code == 200) {
    this.contexts = data.result.contexts;
    return data.result.fulfillment.speech;
   }
   else {
    throw data.status.errorDetails;
   }
  })
  .catch(err => { 
   console.error(err);
   return 'We are experiencing technical difficulties.  Please contact an agent.';
  }) 
 }

Lines 8-29:  Main code here consists of a REST API call to Dialogflow with the user input.  If it's valid, return a Promise object with the next prompt.  Otherwise, send back a Promise with the error message.

Client-side.  User interface.

    function Chat(mode) {
        var _mode = mode;
     var _self = this;
        var _firstName;
        var _lastName;
        var _dflow; 
   
        this.start = function(firstName, lastName) {
            _firstName = firstName;
            _lastName = lastName;
            if (!_firstName || !_lastName) {
                alert('Please enter a first and last name');
                return;
            }
            
            _dflow = new DFlow("yourid");
            hide(getId('start'));
            show(getId('started'));
            getId('sendButton').disabled = false;
            getId('phrase').focus();
        };

        this.leave = function() {
         switch (_mode) {
          case 'dflow':       
           break;
         }
         getId('chat').innerHTML = '';
         show(getId('start'));
            hide(getId('started'));
            getId('firstName').focus();
        };
                       
        this.send = function() {
            var phrase = getId('phrase');
            var text = phrase.value.trim();
            phrase.value = '';

            if (text && text.length > 0) {
             var fromUser = _firstName + _lastName + ':'; 
             displayText(fromUser, text);
            
             switch (_mode) {
              case 'dflow':
               _dflow.send(text).then(resp => displayText('Bot:', resp));
               break;
             }
            }
        };         
Line 16:  Instantiate the Dialogflow wrapper object with your API token.
Line 45:  Call the 'send' function of the wrapper object and then display the returned text of the Promise.

Source Code


Copyright ©1993-2024 Joey E Whelan, All rights reserved.