1. Why?
Category: meta
Couple month ago I discovered a wonderful piece of software - QNAMaker.AI. It allows to build a chatbot based on “Questions and Answers” - a table that has 2 columns - one column contains question and second - corresponding answer. This table is often referenced as Knowledge Base (KB). Service provides a web-based editor for Knowledge Base, test page for playing with “QnA” chatbot produced from KB, import/export between Knowledge Base and Excel spreadsheet and ability to start engine with the customer knowledge base as a separate Azure web application. When such engine is started customer can use it in other applications calling it via REST API provided by the engine.
So, engine uses KB and provides an answer to users questions. Question can be asked using different words, different word order. Answer to some user questions can be spread between several answers in KB. In order to process such free-form questions engine is using several Natural Language Processing methods to understand which answers are close to user requests.
First - it is a nice service with great user interface, but second what I paid attention to was it’s cost for any real-time loads. In order to support a load of > 100 requests per second customer needs to deploy a App Service - Basic, Standard or above and Azure Coginitve Search cluster - Standard S1 or above. For production tier it is highly recommended to deploy Azure Cognitive search cluster in 2 availability zones and have a load balancer (another machine) in front of them. If customer deploys in one availability zone - than on some (unknown) conditions Azure Cognitive Search cluster have reply time more then 20 seconds. We witnessed these conditions twice during one unfortunate month.
So, before customer, attracted by ads to go “From data to bot in minutes” with “No code experience”, understands what’s going on he ends up with $15,000 / month just for Azure Cognitive Search (Standard S2 unit in one availability zone + standard S3 unit in another). This configuration was originally recommended by Microsoft consultant - as some sort of “go to” standard configuration. It is hard to realize by just looking to the QNAMaker price page, which mentions only $10 / month per the engine image. After internal arguments (“It cost what it cost, right?"), careful investigation of logs, multiple consultations with Microsoft personnel, and 2 month of $15,000 bills company, mentioned above, was able to decrease price significantly by switching to 2 Basic Azure Congnitive Search clusters in 2 availability zones ($145 / month) + Basic App service for load balancer ($55 / month) + Standard machine for qnamaker image ($72.5 / month). So, it is $282.5 ($10 for qnamaker image) for the performance slightly exceeding 100 requests per minute, with 3 requests/second peaks.
Results of experiments and measurements for this article
And here we were cautious to spend only necessary amount of money and we do not have load balancing and second cluster - what is highly recommended for production.
So, the question is - can we do better?