Bangladesh is a developing country with a lot of potentials. Even though Bangladesh has a lot of complications like overpopulation, the government is working hard to resolve these issues and continue the development process. By integrating technology into different sectors, government has gained an enormous success. To contribute in the digitizing movement of the Government of Bangladesh, an artificial intelligence based system is proposed. The overall objective of this system is to design and develop an Artificial Intelligence based system which can interact with the target audience in banks, universities and hospitals and answer their queries and questions where the main medium of communication is Bengali language. For effective human machine interaction, the system will be presented in a controlled environment with an interface that would ideally put the human subject at ease and facilitate better service.


The following modules are the deliverables of this project,

1. Face Detection and Recognition.

2. Speaker Recognition.

3. Speech Recognition.

4. Automatic Question Answering System.

5. Speech Synthesis.

6. User Kiosk – Enclosure with a central Processing system, multiple cameras, microphones, and speakers.

7. Integrated System with all Modules and Interfaces.

8. Data Set (Voice-to-Text, Question-Answer, Video).

A decent amount of effort exist in literature that has been given to digitize Bangladesh, over the past years. From electronic voting machines to machine readable passports. Also, mobile applications such as bKash, Pathao and Uber have played a huge role to provide services via digitization. It is evident that this digitization does not only span to government sectors. Private organizations such as garments factories have also adopted this digitization via their automation of different processes using complex machineries and entry logging of their workers using RFID/Fingerprint scanners. Since Bangladesh is among the developing regions in the world, there are still many sectors which need the attention of digitization. One such sector is the process of replacing a receptionist with an artificial robot. Here, the robot will perform the same role as the receptionist which is to provide valuable information to the target audience.


The proposed system spans to computer vision and natural language processing. This leads to three different fields namely: face detection and recognition,speech recognition and synthesis. The system will have the capability to not only detect and recognise distinct faces but also remember the past facial features of an individual for future references. Furthermore, a speech recognition system will be designed and implemented which can interpret the user’s commands or queries in Bengali language - much of which an actual receptionist would do. The interpretation of data is followed by a TTS synthesis system which outputs the response in the form of human voice generated by the machine which is also in Bengali language. In the language of Computer Science, implementation of this task is known as “Speech Synthesis”. Natural language is easily interpretable by humans, but very difficult for machines to interpret them as the data are in unstructured format. It can be clearly depicted that this is a non trivial task.


Furthermore, two crucial aspects will play a role in the overall functionality of the system. 1) A fully annotated face dataset of people in Bangladesh. ii) An annotated speech dataset around a human receptionist. This is discussed more briefly in a later section.


To the best of our knowledge, there exist no system which may deliver this service in the context of Bengali language. In our knowledge, the proposed system will be the very first artificially intelligent based system providing such assistance to its target audience(mainly schools, universities and banks) in Bangla.