My Research Activities

My research interest includes Machine Learning, Natural Language Processing, Bioinformatics, and Parallel and Distributed Computing. I use different learning techniques to solve various problems of NLP and Bioinformatics. To meet the computational need of drug molecular docking problem, I led a research team to create the cluster computer based docking procedure. So far I have published14 research articles in the international conference and journals.

Recently I am doing work on "Information Dynamics" which is a new term I have coined couple of months ago (I did not see the use of it anywhere, please correct me if I am wrong). The most important point is when a data is shared, it does not become information right away on its own. It depends on the importance of the content, accessibility, time and place of creation. So every data have at least these four dimension and based on the placement of the data on these multidimensional continuum we can analyze how and when a data becomes information. The creation of data from a source and its dissipation in the multidimensional continuum can be plotted and then analyzed using machine learning method. This would possibly give us a way to understand the dynamics with which information moves on and the path an information follow in a region. From this I would like to predict which information will dissipate how far tomorrow based on the data I have in my hand today. Hence the term, Information Dynamics comes into the play.

My best work is in the field of NLP and ML. I have developed the Bengali Sorting Algorithm with my thesis students and I hope that it would be accepted as the standard procedure in the NLP Conference to be held in SUST in March. I have also led research teams to develop keyword suggestions, title extraction and summerization procedure for Bengali documents. My research team also developed a stemming algorithm for stemming words from document using ML procedure. All of these works are being used in the Bengali Search Engine for which I am working now.

Among the deployed systems that use my research work, one of my very good works was mapping two pairs of unrelated databases of two different educational boards for SMS based secured registration system for university admission. These databases contained SSC and HSC (each with more than 4 million rows) data of students and they did not have any common primary key. I could not use straight string matching using the student name, father name and mother name because of the spelling error. Also the fuzzy string matching was not possible because of the time complexity. I came up with a procedure of O(nLg(n)) for mapping the erroneous name using Double Metaphone, Pattern Matching, Combinatorics and Hashing and could map 97% of students uniquely. For extreme level of errors of remaining 3% students, I developed a O(n^2) solution. For this procedure, SUST application procedure through SMS require only 3 key word of HSC results whereas all other university uses in a total 6 keywords from SSC and HSC results. This is why SUST sms registration receive nearly 97% correct HSC roll whereas others receive close to 15% wrong HSC/SSC digits. In a survey it was found that the data processing of SUST admission procedure is more than 30% efficient and more than 99% correct.