Building Authentication Around Alexa

October 11, 2016

Alexa, huh ?

picture alt Alexa is a smart voice assistant software and was initially shipped with its flagship device Echo. If you have heard about Siri and Google Now, Alexa can be considered as another addition to the digital voice assistants family. However, Alexa gives a much different experience than its competitors when it comes to transcribing speech to text and having natural conversations. With the help of Alexa Skills Kit and Alexa Voice Service, you can tinker around and develop apps controlled by voice. By January there were 1000+ skills on the Amazon Alexa platform and now that Amazon Echo is available in the UK and Germany, more skills are expected soon.

Banking Using Voice ?

picture alt One such skill which I would like to talk about here is the Capital One skill, it is a well-developed application capable of assisting you with minimal banking activities, like “What is my card balance” or “Make a minimum card payment”. Its has made an effort to make banking seamless and it is a great solution for consumers like my Mom, who struggles with UI’s and web portals. The Capital One skill allows you to check account balances, make payments by using voice commands like “Alexa ask Capital One When is my payment due”.

What about authentication ?

Unlike all the other skills available on Alexa, banking apps are more personal and mostly used by one person at a time. The entire process of authenticating a user before using a personal skill in Alexa is missing. As all the Echo devices come without any speaker recognition built in them, Alexa will process these voice commands not fretting about the person uttering those commands. So anyone can issue voice commands and Alexa will simply transcribe, find the matching intent and perform the related task. During my first Hackathon at BigRed // Hacks 2016, I tried to solve this very problem by adding a speaker recognition module for sample banking skill called as ‘reImagine’. This was my first experience with developing a voice related application and creating an Alexa skill. The below image shows the 1000ft view of the basic components used to design the reImagine application. picture alt I refined my implementation to shared a much better version of the app called ‘Ikran’ which has authentication built around a fully functional banking application.

Is Voice based authentication secure enough?

While presenting my idea at the hackathon, I was alluded (not so politely) by fellow hackers that using voice as a means of authentications is a lame idea, as anyone can impersonate anyone’s voice. And yes, impersonating a certain phrase and repeated words is possible, but we are talking about a ‘text-independent’ voice print, analyzed and stored for verification. Thus there is no such voice phrase which someone can sneakingly record(thanks to sci-fi Hollywood movies) and then repeat, to trigger authentication. So is there anyone who is using Voice as a Blueprint ? Yes. Barclays, one of most popular Investment Banks has actually implemented it. They simply use voice prints from the customer service calls to generate a unique print for every customer. The next call from the customer gets authenticated automatically, thus allowing you to finally get rid of PIN’s and passwords. Another organization PinDrop has patented a similar phoneprinting technology and is planning to bring secure voice authentication to Amazon Echo and Google Now devices.

Will Alexa eventually authenticate me?

I believe Alexa as a product, is still young and speaker recognition might be just ‘in the next sprint’ of their development cycle. We don’t know if Amazon is using our queries to generate a profile which is unique to us, and later roll out their very on voice-based authentication service. It will be interesting to see how well the speaker recognition for their services work, and can they provision it within minimum response time without adding any latency to the existing services

Talk is Cheap, show me the code.

Lets talk about the implementation. Ikran is a Banking Application that can help you with minimal banking needs just by using your voice commands. picture alt The below technologies were used to build the application.

I used Docker to create containers with the base images of Ubuntu OS.


The application contains many modules that needed to be separate to avoid complexity, I tried my best to push them to a cloud and make then talk to each other as microservices but the lag during the round trip was horrible so had to keep this local.

  1. Clone this project.
  2. Make sure you have bower, docker and pip installed globally
  3. On the command prompt run the following commands
cd project-directory
cd alexaweb

  • Follow the instructions in its README. This runs your Alexa Web app in the front end.
  • You will need an Alexa developer account to access Alexa features.
  • Pull down the image from docker using
docker pull rajeshetty/microsoftcognitiveservices
docker run -it --name <nameofcontainer> -p 8888:8888 -v /local/volume:/contianervolume rajeshetty/microsoftcognitiveservices
cd Cognitive-SpeakerRecognition-Python/server
node app.js

  • This helps running the local version of the API which performs the Speaker recognition for you.
  • All banking services are written in a node js app available in a docker container. Pull down the image from docker using
docker pull rajeshetty/microsoftcognitiveservices

Voice Interaction

Below are the Alexa commands that are available to interact with the user,

  • “Alexa start Eekran”
  • “Alexa ask Eekran What is my balance”
  • “Alexa ask Ekran Transfer thousand dollars to Ash”.

And a couple of other commands as per its instructions.

Talk Docker to me

I already did.


  • Thank you for Sam’s hack, made my life easier.
  • Web version of Microsoft Cognitive API’s was not happy with me, so had to use this tweak for hacking.
  • Cycle for being so easy to use.