Accepted Papers

7^th International Conference on Natural Language Computing (NATL 2021)

November 27 ~ 28, 2021, London, United Kingdom

Accepted Papers

An Ocr-based Voice Assisted Reading System for Sinhala and Tamil Languages

Priyanthini Sivasubramaniyam, Himasha Wijesinghe, Dr. Randil Pushpananda and Dr. Ruvan Weerasinghe, University of Colombo School of Computing, University of Colombo, Sri Lanka

ABSTRACT

Knowledge is an important asset that one should possess to lead a successful life. Normally people gain knowledge through reading. But it is not an easy way for the visually impaired community. While we are living in a modern and technically advanced life, these visually impaired people are still struggling to lead even a simple life. Globally the number of visually impaired people is estimated to be 285 million according to the World Health Organization in 2010 [1]. This visual impairment still remains as a major health issue all over the world. Visually impaired are still far away from a huge amount of knowledge resources as most of these resources are in printed or handwritten formats. It’s merely important to make the knowledge resources including books, papers and any document available to access for visually impaired people in order to support them to improve their knowledge and to improve their living standards. For this purpose, we are proposing an OCR based voice-assisted reading system for both Sinhala and Tamil languages, the main languages used in Sri Lanka. Here we are using both Optical Character Recognition (OCR) and TextTo-Speech (TTS) technologies to help the visually impaired by providing the voice output to the printed documents. Tesseract [2] is the toolkit which is used for the development of OCR module and Festival [3] framework is used for Text-to-speech synthesis. A visually impaired person can use this application by capturing the image of a document that he/she wants to read, and then the person can hear the speech output via headphones or speakers. For now we are solely focusing on Sri Lankan context by developing this application for Sinhala and Tamil languages only. This system can be extended for other languages in future. Additionally a web application that can provide output voice to the given text image is also developed in this research.

KEYWORDS

Festival framework, Optical Character Recognition (OCR), Tesseract toolkit, Text-To-Speech (TTS), University of Colombo School of Computing (UCSC).

Multi-language Information Extraction with Text Pattern Recognition

Johannes Lindén, Tingting Zhang, Stefan Forsström and Patrik Österberg Departement of Information System and Technology, Mid. Sweden University, Sundsvall, Sweden

ABSTRACT

Information extraction is a task that can extract meta-data information from text. The research in this article proposes a new information extraction algorithm called GenerateIE. The proposed algorithm identifies pairs of entities and relations described in a piece of text. The extracted meta-data is useful in many areas, but within this research the focus is to use them in newsmedia contexts to provide the gist of the written articles for analytics and paraphrasing of news information. GenerateIE algorithm is compared with existing state of the art algorithms with two benefits. Firstly, the GenerateIE provides the co-referenced word as the entity instead of using he, she, it, etc. which is more beneficial for knowledge graphs. Secondly GenerateIE can be applied on multiple languages without changing the algorithm itself apart from the underlying natural language text-parsing. Furthermore, the performance of GenerateIE compared with state-of-theart algorithms is not significantly better, but it offers competitive results.

KEYWORDS

Information Extraction, IE, Information representation, Knowledge Graph, Natural Language Processing, NLP, Pattern Recognition, Entity Recognition.

Twitter Sentiment Analysis of Covid Vaccines

Wenbo Zhu and Tiechuan Hu, Courant Institute of Mathematical Sciences, New York University, New York, US

ABSTRACT

In this paper, we look at a database of tweets sorted by various keywords that could indicate the users’ sentiment towards covid vaccines. With social media becoming such a prevalent source of opinion, sorting and ranking tweets that hold important information such as opinions on covid vaccines is of utmost importance. Two different ranking scales were used, and ranking a tweet in this way could represent the difference between an opinion being lost and an opinion being featured on the site, which affects the decisions and behavior of people, and why researchers were interested in it. Using natural language processing techniques, our aim is to determine and categorize opinions about covid vaccines with the highest accuracy possible.

KEYWORDS

Sentiment Analysis, Ranking algorithm, Machine learning classification.

An Evaluation of State-of-the-Art Approaches to Relation Extraction for usage on Domain-Specific Corpora

Christoph Brandl¹, Jens Albrecht¹ and Renato Budinich², ¹Nuremberg Institute of Technology Georg Simon Ohm, Department of Computer Science, Keßlerplatz 12, 90489 Nuremberg, Germany, ²Fraunhofer Supply Chain Services, Research Group Future Engineering, Nordostpark 93, 90411 Nuremberg, Germany

ABSTRACT

The task of relation extraction aims at classifying the semantic relations between entities in a text. When coupled with named-entity recognition these can be used as the building blocks for an information extraction procedure that results in the construction of a Knowledge Graph. While many NLP libraries support named-entity recognition, there is no off-the-shelf solution for relation extraction. In this paper, we evaluate and compare several state-of-the-art approaches on a subset of the FewRel data set as well as a manually annotated corpus. The custom corpus contains six relations from the area of market research and is available for public use. Our approach provides guidance for the selection of models and training data for relation extraction in real-world projects.

KEYWORDS

Relation Extraction, Knowledge Graph, Market Research.

What Makes Us Curious? Analysis of a Corpus of Open-Domain Questions

Zhaozhen Xu¹, Amelia Howarth², Nicole Briggs² and Nello Cristianini¹, ¹Intelligent Systems Laboratory, University of Bristol, Bristol, UK, ²We the Curious, Bristol, UK

ABSTRACT

Every day people ask short questions through smart devices or online forums to seek answers to all kinds of queries. With the increasing number of questions collected it becomes difficult to provide answers to each of them, which is one of the reasons behind the growing interest in automated question answering. Some questions are similar to existing ones that have already been answered, while others could be answered by an external knowledge source such as Wikipedia. An important question is what can be revealed by analysing a large set of questions. In 2017, “We the Curious” science centre in Bristol started a project to capture the curiosity of Bristolians: the project collected more than 10,000 questions on various topics. As no rules were given during collection, the questions are truly open-domain, and ranged across a variety of topics. One important aim for the science centre was to understand what concerns its visitors had beyond science, particularly on societal and cultural issues. We addressed this question by developing an Artificial Intelligence tool, that can be used to perform various processing tasks: detection of equivalence between questions; detection of topic and type; and answering of the question. As we focused on the creation of a “generalist” tool, we trained it with labelled data from different datasets. We called the resulting model QBERT. This paper describes what information we extracted from the automated analysis of the WTC corpus of open-domain questions.

KEYWORDS

Deep Learning, Natural Language Processing, Question Answering, BERT.

W&G-ERNIE: A Concept for a Pre-Trained Automotive Warranty and Goodwill Language Representation Model for Warranty and Goodwill Text Mining

Lukas Jonathan Weber¹, Alice Kirchheim² and Axel Zimmermann³, ¹Department Management and Business Science, Aalen University, Aalen, Germany, ²Department Mechanical Engineering, Helmut-Schmidt-University, Hamburg, Germany, ³Department Management and Business Science, Aalen University, Aalen, Germany

ABSTRACT

The demand for accurate text mining tools to extract information of company based automotive warranty and goodwill (W&G) data is steadily increasing. The progress of the analytical competence of text mining methods for information extraction is among others based on the developments and insights of deep learning techniques applied in natural language processing (NLP). Directly applying NLP based architectures to automotive W&G text mining would wage to a significant performance loss due to different word distributions of general domain and W&G specific corpora. Therefore, labelled W&G training datasets are necessary to transform a general-domain language model in a specific-domain one to increase the performance in W&G text mining tasks. In this article, we describe a concept for adapting the generally pre-trained language model ERNIE 2.0 [1] with the popular two-stage language model training approach in the automotive W&G context. For performance evaluation, we plan to use the common metrics recall, precision and F1-score.

KEYWORDS

Natural language processing, Domain-specific language models, ERNIE, Labelled automotive warranty.

Warrant Generation through Deep Learning

Fatima T. Alkhawaldeh, Tommy Yuan, Dimitar Kazakov, Department of Computer Science,University of York, Deramore Lane, Heslington, York, YO10 5GH. UK

ABSTRACT

The warrant element of the Toulmin model is critical for fact checking and assessing the strength of an argument. As implicit information, justify the arguments and explain why the evidence supports the claim. Despite the critical role warrants play in facilitating argument comprehension, the fact that most works aim to select the best warrant from existing structured data and there is a scarcity of labelled data presents a fact checking challenge, particularly when the evidence is insufficient, or the conclusion is not inferred or generated well based on the evidence. Additionally, deep learning methods for false information detection face a significant bottleneck due to their training requirement of a large amount of labelled data. Manually annotating data, on the other hand, is a time-consuming and laborious process. Thus, we examine the extent to which warrants can be retrieved or reconfigured using unstructured data obtained from their premises.

KEYWORDS

Toulmin model, fact checking, and deep learning.

Advanced Skills Mapping and Career Development using AI

Yew Kee Wong, School of Information Engineering, HuangHuai University, Henan, China

ABSTRACT

Artificial intelligence has been an eye-popping word that is impacting every industry in the world. With the rise of such advanced technology, there will be always a question regarding its impact on our social life, environment and economy thus impacting all efforts exerted towards continuous development. From the definition, the welfare of human beings is the core of continuous development. Continuous development is useful only when ordinary people’s lives are improved whether in health, education, employment, environment, equality or justice. Securing decent jobs is a key enabler to promote the components of continuous development, economic growth, social welfare and environmental sustainability. The human resources are the precious resource for nations. The high unemployment and underemployment rates especially in youth is a great threat affecting the continuous economic development of many countries and is influenced by investment in education, and quality of living.

KEYWORDS

Artificial Intelligence, Human Resources, Conceptual Blueprint, Continuous Development, Learning and Employability Blueprint.

Encoder Decoder Approach To Automated Essay Scoring For Deeper Semantic Analysis

Priyatam Naravajhula, Sreedeep Rayavarapu and Srujana Inturi, Chaitanya Bharathi Institute of Technology, Gandipet Hyderabad 500075, India

ABSTRACT

Descriptive or essay type of answers have always played a major role in education. They clearly capture the student’s grasp on knowledge and presentation skills. Manual essay scoring can be a daunting process to human evaluators; assessing descriptive answers can present a huge overhead owing to limited numbers of evaluators and an out of proportional number of essays to be graded hence leading to an inefficient or an inaccurate score. There has been a major shift in paradigm from traditional classroom education to online education engendered by COVID-19 pandemic; it seems plausible to infer that future assessment of education shall be online, making the solution of automatic essay scorer not only relevant, but of paramount importance. We explore several neural architecture models for the task of automated essay scoring system. Results and Experimental analysis exhibit that our model based on recurrent encoder-decoder provides for a deeper semantic analysis hence, outperforming a strong baseline in terms of quadratic weighted kappa score.

KEYWORDS

Encoder-Decoder, Automated Essay Scoring, Semantic analysis, Natural Language Processing.

MCommon Ground, Frames and Slots: Understanding Doctors Interacting with a Virtual Patient

Philippe Blache, Matthis Houlès, LPL-CNRS, Aix-en-Provence, France

ABSTRACT

This paper presents a dialogue system for training doctors to break by news. The originality of this work lies in its knowledge representation. All information known before the dialogue (the universe of discourse, the context, the scenario of the dialogue) as well as the knowledge transferred from the doctor to the patient during the conversation is represented in a shared knowledge structure called common ground, that constitute the core of the system. The Natural Language Understanding and the Natural Language Generation modules of the system take advantage on this structure and we present in this paper different original techniques making it possible to implement them efficiently

Intelligent Question Answering Module for Product Manuals

Abinaya Govindan, Gyan Ranjan, and Amit Verma, Neuron7.ai, USA

ABSTRACT

Question Answering (QA) is a well researched problem in the field of NLP over the past few years. The need for users to be able to query through information content available in variety of formats - structured and unstructured has become a necessary task. This paper pro- poses to untangle factoid question answering, more specifically targeting the Hi-Tech domain. This task of document question answering aims to address the challenges of document parsing indexing and retrieval (identifying the relevant documents) along with those of the machine comprehension (extract spans of correct answers from the context). Our proposed approach introduces a compre- hensive pipeline which consists of document ingestion modules that take care of a wide variety of unstructured data that spans across different sections of the document - such as textual content, images and tabular content. Our experiments on several “real-world” and domain specific datasets indicate the insuﬀiciency of current fine tuned models and indicate how our proposed pipeline is an effective solution for this complex task.

KEYWORDS

machine comprehension, document parser, question answering, information retrieval.

Product Market Demand Analysis Using NLP in Banglish Text with Sentiment Analysis and Named Entity Recognition

Md Sabbir Hossain, Brac University, Bangladesh

ABSTRACT

Product market demand analysis plays a significant role for originating business strategies due to its noticeable impact on the competitive business field. Furthermore, there are roughly 228 million native Bengali speakers, the majority of whom use Banglish text to interact with one another on social media. Consumers are buying and evaluating items on social media with Banglish text as social media emerges as an online marketplace for entrepreneurs. People use social media to find preferred smartphone brands and models by sharing their positive and bad experiences with them. As a result, our goal is to gather Banglish text data and use sentiment analysis and named entity identification to assess Bangladeshi market demand for smartphones to determine the most popular smartphones by gender. We scraped product related data from social media with instant data scrapers and crawled data from Wikipedia and other sites for product information with python web scrapers. Using Pythons Pandas and Seaborn libraries, the raw data is filtered using NLP methods. To train our datasets for named entity recognition, we utilized Spaceys custom NER model, Amazon Comprehend Custom NER. A TensorFlow sequential model was deployed with parameter tweaking for sentiment analysis. Meanwhile, we used the Google Cloud Translation API to estimate the gender of the reviewers using the BanglaLinga library. In this article, we use natural language processing (NLP) approaches, and several machine learning models to identify the most in-demand items and services in the Bangladeshi market. Our model has an accuracy of 87.99 % in Spacy Custom Named Entity recognition, 95.51 % in Amazon Comprehend Custom NER, and 87.02 % in the Sequential model for demand analysis. After Spacys study, we were able to manage 80% of mistakes related to misspelled words using a mix of Levenshtein distance and ratio algorithms.

KEYWORDS

Market Demand Analysis, Sentiment Analysis, Natural Language Processing, Named Entity Recognition, Tensor-flow, Gender Prediction, Banglish Text.

Emotions Are Subtle: Learning Sentiment Based Text Representation using Contrastive Learning

Ipsita Mohanty¹, Ankit Goyal¹, Alex Dotterweich², ¹Carnegie Mellon University, Pittsburgh, USA, ²University of California, Berkeley, California, USA

ABSTRACT

Contrastive learning techniques have been widely used in the field of computer vision as a means of augmenting datasets. In this paper, we extend the use of these contrastive learning embeddings to sentiment analysis tasks and demonstrate that finetuning on these embeddings provides an improvement over fine-tuning on BERT-based embeddings to achieve higher benchmarks on the task of sentiment analysis when evaluated on the DynaSent dataset. Additionally, we also explore upsampling techniques to achieve a more balanced class distribution so that we can make further improvements on our benchmark tasks.

KEYWORDS

BERT, RoBERTa, Transformers, SIMCSE, Contextual Embeddings, Fine-Tuning, Sentiment Analysis, Focal loss, Contrastive Learning, Transfer Learning.

Annotated Lexicon for Sentiment Analysis in Bosnian Language

Sead Jahic¹ and Jernej Vicic², ¹Faculty of Mathematics, Natural Science and Information Technologies, University of Primorska, Koper, Slovenia, ²Faculty of Mathematics, Natural Science and Information Technologies, University of Primorska, Koper, Slovenia

ABSTRACT

The paper presents first sentiment annotated lexicon of Bosnian language. The language coverage of the lexicon was evaluated using two reference corpora. The usability of the lexicon was already proven on a Twitter based comparison. Two approaches were observed in this experiment, first method used a frequency list of all lemmas ex-tracted from two relevant Bosnian language corpora, second method used all lemmas occurrences without using frequency as main factor in counting The results of the study suggest usable language coverage. The computed coverage for the first corpus was 27.25%, while the second corpus yields 24.34%. The second method yields 1.899% coverage for the first corpus and 6.05% for the second corpus. Two methods were used to identify 500 the most relevant lemmas - Log-likehood and Sketch Engine’s term extraction function lemmas and whose list was covered by lexicon in 13.8% and 13.4% respectively.

KEYWORDS

Bosnian lexicon, corpus, sentiment analysis, AnAwords, stopwords.

Diaspora Database Analytics: Database Requirements for a Diaspora Analytics Application

Dr. Tamaro Green, D.S. and Dr.WoineshetMeaza D.S., II Data School, International CARDS, Texas, USA

ABSTRACT

Reviewing distributed database requirements of the Diaspora Analytics Application was an opportunity to explore the latest applications of big data analytics, database technologies, and data management systems. This paper explores a review of distributed database specifications to support a big data analytics application for the visualization of diaspora migration data. Meaza [1] provided a design science study receiving input from subject matter experts on the design. Dobre and Xhafa [2] described parallel programming and distributed frameworks for big data and how their efficiency improves scalability and performance. Dobre and Xhafa [2] also evaluated various frameworks including MapReduce, Hadoop, Hive, and Spark. Dobre and Xhafa [2] also suggest methods and techniques for sharing data and online processing. The Diaspora Analytics Application implemented parallel programming, distributed big data frameworks, and distributed databases.

KEYWORDS

Distributed Databases, Database Design, Big Data Analytics, Diaspora Analytics.

Data Ingestion from a Data Lake to a Nosql Data Warehouse: The Case of Relational Databases

Fatma Abdelhedi¹, Rym Jemmali¹ and Gilles Zurfluh², ¹CBI, Trimane, Paris, France, ²Toulouse Institute of Computer Science Research (IRIT), Toulouse Capitole University, Toulouse, France

ABSTRACT

Nowadays, the digital transformation of companies and the wider society has led to the development of databases towards big data. Our work is part of this background, focusing more specifically on the mechanisms of extracting datasets stored in a Data Lake and storing the data in a data warehouse. The latter will again allow decision analysis. In this article, we introduced an extraction mechanism limited to relational databases. To automate this process, we used the MDA architecture, which provides a formal environment for schema transformation. Based on the physical model describing the Data Lake, we propose a set of conversion rules that allow the creation of a Data Warehouse stored on document-oriented NoSQL systems. The transformation process has been experimented with a medical application.

KEYWORDS

Data Lake, Data Warehouse, NoSQL, Big data, Relational DB, MDA, QVT.

Classification of Water Supply and Sanitation Technology Options using Machine Learning Methods

Hala AlNuaimi, Dr. Ali Bouabid, and Dr. Maher Maalouf, Department of Industrial and System Engineering, Khalifa University, Abu Dhabi

ABSTRACT

The implemented water and sanitation (watsan) technologies need to be accessible by all people living in the area and more importantly, to be sustainable. Based on a study done by the Rural Water Supply Network, 15% to 30% of installed watsan technologies in developing countries are currently not operating due to the inappropriate implementation. The available decision support system (DSS) are mostly guidance documents or technical sheets which make it important to have a smart DSS. This paper presents a decision framework which is made up of three modules for the appropriate selection of watsan technologies. The paper focuses on the second module which concentrates on the classification of the watsan technologies using capacity requirement level. The classification methods proposed are based on Machine Learning algorithms. A large set of water supply technologies is used to illustrate the application of the proposed methods with three classification algorithms.

KEYWORDS

classification algorithms, decision support system, watsan, machine learning, sustainability.

Role of Data in War Against Covid-19: Data Mining Techniques Framework

Faluyi Samuel Gbenga, Balogun Temitayo Elijah, and Ogunsan Matthew Adebayo, Department of Computer Science, Ekiti State College of Agriculture and Technology, Isan Ekiti, Nigeria

ABSTRACT

Coronavirus disease, COVID-19, has brought about lots challenges to the entire world at various sectors. The ways things are handle have literarily change. The purpose of this study is to review some related literature in data mining techniques, to reveal how it can be incorporated to health and disease control sector. The study elicits some knowledge from recent researches in disease prediction model. Various techniques were identified and found reliable at making predictions, classification, clustering and regression. The study concluded that data mining techniques/machine learning algorithms are valuable in handling the threat posed by the coronavirus disease, COVID-19 pandemic as well as providing relevant insight.

KEYWORDS

COVID-19, Data mining techniques, prediction model, classification.

Fuzzy Control of an Servomechanism: an Pratical Approach using Mamdani and Takagi- Sugeno

Renato Aguiar and Izabella Sirqueira, Department of Electrical Engineering, Centro Universitario FEI, S.B.C, Brazil

ABSTRACT

The main objective of this work is to propose two fuzzy controllers: one based on the Mamdani inference method and another controller based on the Takagi- Sugeno inference method, both will be designed for application in a position control system of a servomechanism. Some comparations between the methods mentioned above will be made with regard to the performance of the system in order to identify the advantages of the Takagi- Sugeno method in relation to the Mamdani method in the presence of disturbances and nonlinearities of the system. Some results of simulation and practical application are presented.

KEYWORDS

Fuzzy Logic, Fuzzy Controller, Mamdani Inference Method, Takagi- Sugeno Inference Method.

Assessment of Water Availabilities in the Tancítaro Area Through the Fuzzy Willingness to Pay

José M. Brotons¹, Gerardo Ruiz Sevilla², Ruben Chavez³, ¹Department of Economic and Financial Studies, Miguel Hernández University, Elche, 03202 Alicante, Spain, ²Escuela Nacional de Estudios Superiores. Universidad Nacional Autónoma de México, Campus Morelia, México, ³Facultad de Químico Farmacobiología, Universidad Michoacana de San Nicolás de Hidalgo, Morelia, Michoacán. México

ABSTRACT

The Tancítaro peak is located in the State of Michoacán in Mexico. The current situation of unsustainable consumption of water resources can lead the region to a critical situation if adequate measures are not taken. An improvement in water management involving paying for the use of these resources could improve the situation. This work aims to propose a model allowing to obtain an equilibrium price of the use of water in the Tancítaro area. For this, experts will be consulted among the users of the water and experts among those who currently the right to use it, that is, inhabitant of the reserve area. The use of the Fuzzy logic will allow them to express their willingness to pay and collect data, not in a dichotomous way, but by grading their opinions. The use of Ordered Weighted Average (OWA) will allow the aggregation of these opinions bearing in mind different degrees of optimism or pessimism. The results obtained show an equilibrium price of $ 0.49 m-3 . It should be noted that these are preliminary results and the main objective of the work is the presentation of a methodological proposal.

KEYWORDS

OWA, Water demand function, Water supply function, Willingness to accept, Willingness to pay.

Human Resource, Retention, Motivation and Job Satisfaction in the Information Technology Industry: Basis for Employee Retention Program

Maureen Prongo-Eufan and Edgar B. Manubag, Business College, Notre Dame of Dadiangas University, General Santos City, Philippines & College of Engineering, Architecture, and Technology, Notre Dame of Dadiangas University, General Santos City, Philippines

ABSTRACT

Employee retention within information technology has remained challenging due to the high rates of employee turnover. Numerous studies have explored the retention and turnover phenomenon to understand each factor and determine an appropriate human resource management approach. This study examined employee retention in the IT industry in General Santos City in the Philippines to determine the relationship between employee retention and three identified factors: human resource practices, employee motivation, and job satisfaction. Thus, it provides a recommendation for developing an employee retention program. In this study, a quantitative research design was adopted using a survey questionnaire administered to a sample size of 384 IT employees from across different IT enterprises and other related business organizations. To substantiate the quantitative research design results, a qualitative research tool was also adopted using a focused group discussion by human resource practitioners and managers or owners of IT enterprises. Pooled results showed that human resource practices are the primary factor that influences the high retention rate of IT employees. In other words, there is a perfect positive correlation between HR practices and employee retention. Employee motivation and job satisfaction were secondary factors, thereby suggesting that these two are only consequences of HR practices. Employee retention programs aimed at increasing employee retention or reducing employee turnover rates should be built around HR practices fitted to the needs of IT professionals.

KEYWORDS

Information Technology Industry, IT Professionals, Human Resource Practices, Retention, Motivation, Job Satisfaction, Employee Retention Program, IT Industry in General Santos City, Information and Communication Technology, Human Resource Management.

Navigational Tools for Multimedia and Internet Resources

Sue Fenley, Emeritus University of Oxford, UK

ABSTRACT

This paper investigates observational research on users navigating through multimedia and internet resources. The audit trails of these users have been used to investigate a series of navigational tools and graphs of tool usage. Form this research the need for a set of navigational tools has been developed. Further study in this area has highlighted the need for more specific navigational tools to allow the user to situate themselves within the resource, to be able to backtrack and retrace previous searches and to properly orientate themselves within large resources using tools usually found in mapping and orienteering contexts. More detailed examples of these tools will be included in the final paper and presentation.

KEYWORDS

Multimedia, Internet resources, geographical tools, landmarks, trails, patterns and diagrammatic tools.

Detection of Ransomeware Attacks using Machine Learning

Anusha Vajha, Department of Information Technology, Institute of Aeronautical Engineering, Hyderabad, India

ABSTRACT

Ransomware is one of the most prevalent malicious software in 2021, it encodes the files in the victims device, then demands money, i.e., ransom, for decrypting the files. The financial losses and global damage cost of individuals and organizations due to ransomware is increasing year by year. As a result, combatting ransomware is a critical concern. In this paper, the proposed ransomware detection approach is based on machine learning. The primary goal was to examine the visible changes with a human eye. The encrypted text files are then compared to unencrypted text files with the use of customized features and ML training, the program can identify text files that had been attacked. Four different datasets are used which contain thousands of text files. Each text file has 50% unencrypted text and 50% text encrypted by 6 types of encryption techniques – Atbash, Autokey, Caesar, Gronsfeld, Playfair, and RSA. Two different machine learning classifiers are trained – support vector machine and near neighbor algorithm (KNN) – to train the model. From the experimental results, the classification accuracies of 91% were achieved with both classifiers.

KEYWORDS

Ransomware, Machine learning, Malware detection, Security.

Time Series Prediction of Temperature in Pune using Seasonal Arima Model

Aarati Gangshetty and Gurpreet Kaur, JRF, R&DE(E), Pune

ABSTRACT

In this paper, an attempt has been made to develop a Seasonal Autoregressive Integrated Moving Average (SARIMA) model to predict temperature using past data of Pune, Maharashtra. The dataset from 2009 to 2020 has been taken for analysis. When trend and seasonality is present in a time series, instead of decomposing it manually to fit an ARIMA model, another very popular method is to use the seasonal autoregressive integrated moving average (SARIMA) model which is a generalization of an ARIMA model. The goodness of fit of the model was tested against standardized residuals, the autocorrelation function, and the partial autocorrelation function. We discover that SARIMA (1,1,1)(1,1,1)12 can represent very well in the data behaviour. According to the model diagnostics, the model was reliable for predicting temperature.

KEYWORDS

SARIMA, Prediction, ARIMA, temperature.

Credit Scoring System using Machine Learning

Sowmya Meka, S. Ravi Kishan, JNTUK, India

ABSTRACT

An activity within the banking industry is to extend credit to customers, hence, credit risk analysis is critical for financial risk management. There are various new methods used to perform credit risk analysis. The development of the credit scoring model has been regarded as a critical topic. In this research paper we will analyse a detailed comparison between Random Forest and K Nearest Neighbours algorithm. In this report, we have explained the algorithms and mathematical framework that goes behind developing the machine learning models. We discuss the speed and accuracy of the two Machine Learning algorithms mentioned when we test them on the UCI Credit Card database. After comparison and finding the gender with maximum debt, both the methods are refined and tuned to obtain better precision. Basically, we can conclude with a discussion and comparison of summarizing the best approach to classify these datasets.

KEYWORDS

Machine Learning, Random Forest, KNN, Credit Card, Risk Management.

A Decision Support System for Recommending Movies in an E-Booking and Social-Distancing Environment using an Ontology-based Approach

Balogun Temitayo Elijah, Faluyi Samuel Gbenga and Oyesanmi Fiyinfoluwa Gboluwaga, Department of Computer Science, Ekiti State College of Agriculture and Technology, Isan Ekiti, Nigeria

ABSTRACT

One of the difficult places to observe social distancing in line with COVID-19 protocols might be in a cinema due to the queue for booking movies while the seating arrangements will also need to be booked to help observe social distancing. Most cinemas in Nigeria already offer booking services online. Most of them, however, don’t offer seat booking, give adequate recommendations on their website nor make their system customer-centric. This paper is aimed at developing a decision support system that can be used along with online movie and seat booking. The model employed was the ontology model which will be used to tailor the description of the user’s preferences to help the user in deciding what movie to watch. The system was developed with HTML, CSS, JavaScript as the front-end design while PHP was used for communicating with the backend which was built with the MySQL server.

KEYWORDS

Recommender Systems, Ontology, e-booking, seat booking, movie recommendation.

Application of Control System and Digital Techniques in Agricultural Operations: An Approach of Achieving Smart Agriculture

Alare Kehinde P¹, Alare Taiwo², and Beemnet Mengesha Kassahun^{3, 4}, ¹Department of Medicine, Ladoke Akintola University, Ogbomoso Nigeria, ²Department of Mechanical Engineering, Federal University of Technology, Akure Nigeria, ³Department of Horticulture, Kyungpook National University, Daegu 41566, Republic of Korea, ⁴Department of Horticulture, Ethiopian Institute of Agricultural Research Institute, Wondo Genet Agricultural Research Center, P.O.Box 198, Addis Ababa, Ethiopia

ABSTRACT

Achieving the food and environment security in the future climate changes is a great challenge for agriculture society. Food security is an essential precursor to environmental protection. Food production is likely to maintain priority over environmental protection.This article reviews the potential of applying control system and digital techniques in agricultural operations for food and environmental security. The likely impacts of control system and digital techniques for food and environmental security on the other important dimensions of food security are discussed qualitatively. Finally, the current assessment studies are discussed, suggesting improvements and proposing technique for new approaches. Therefore, in modern agriculture, the application of smart control system and digital techniques is very crucial for sustaining future food and environment security. The system enables to integrate and manage natural resources, human resources, pest, disease, climatic conditions, nutrients, and other resources efficiently and sustainably. Therefore, this publication provides plenty of information by analyzing a 50 years data from 108 countries, intends to summarize and discuss the past and current evidences, suggest improvements, and propose control systems and digital techniques for achieving smarter agriculture for environmental and food security.

KEYWORDS

Control System, Smart Agriculture; Closed Loop System, Feedback Signals, Mathematical Modelling; Efficiency Optimization; Predictive Analysis; Agricultural-Farming planning, Agricultural risk management.

Three Dimensional Denoising Filter for Effective Source Smartphone Video Identification and Verification

Ashref Lawgaly¹, Fouad Khelifi¹, Ahmed Bouridane¹, Somaya Al-Maaddeed² and Younes Akbari², ¹Department of Computer and Information Sciences, Northumbria University, Newcastle upon Tyne, UK, ²Department of Computer Science and Engineering, Qatar University, Doha, Qatar

ABSTRACT

The field of digital image and video forensics has recently seen significant advances and has attracted attention from a growing number of researchers given the availability of imaging functionalities in most current multimedia devices at no cost including smartphones and tablets. Photo response non-uniformity (PRNU) noise is a sensor pattern noise characterizing the imaging device. However, estimating the PRNU from smartphone videos can be a challenging process because of the lossy compression that digital videos normally undergo for various reasons in addition to other non-unique noise components that interfere with the video data. This paper presents a new filtering technique for PRNU estimation based on the three-dimensional discrete wavelet transform followed by a 3D wiener filter. The rationale is that the 3D filter can filter out the compression artifacts along the temporal dimension in a more effective way than simple averaging. Experimental results on a public video dataset captured by various smartphone devices have shown a significant gain obtained with the proposed approach over the well known two-dimensional wavelet-based Wiener approach.

KEYWORDS

Photo Response Non-Uniformity Noise, Source Smartphone Identification, Digital Image Forensics, 3D Denoising, 3D Wavelets & Wiener Filter.

Brain Tumor Detection and Segmentation Methods A Review

Gursangeet Kaur and Navdeep Kaur, Department of Computer Applications, Gulzar Group of Institutes, Khanna, Punjab, India

ABSTRACT

Brain tumor is an abnormal growth of cells inside the brain, can be benign or malignant and its detection has become a challenging task for radiologists. Magnetic Resonance Imaging(MRI) is convenient tool to detect tumor and segmentation is accomplished by carried out useful fragments from an image. Threshold method gives the proper detection of region of interest while k-means clustering and fuzzy c-means clustering shows the exact location of tumor. To detect tumor types, this survey paper is planned to review the current used segmentation methods for tumor detection and segmentation as well as classification of tumor like different wavelet transforms and support vector machines, finite difference method, integrating symmetric property etc using MR images. Results vary in the form of tumor type, area of region etc.

KEYWORDS

Brain tumor, MRI, K-means clustering, different wavelet transforms, segmentation.

Computational Predictions of the Two Dimensional Compressible Turbulent Flow through A Rocket Nozzle

Carlos Eduardo Américo¹, Carlos Henrique Marchi² and Guilherme Bertoldov3, ¹Laboratory of Numerical Experimentation (LENA), Department of Mechanical Engineering (DEMEC), Universidade Federal do Paraná (UFPR), Curitiba, Paraná, Zip Code 81531-980, Brazil, ²Federal University of Paraná (UFPR), Department of Mechanical Engineering (DEMEC), Universidade Federal do Paraná (UFPR), Curitiba, Paraná, Zip Code 81531-980, Brazil, ³Department of Physics, Statistics and Mathematics (DAFEM), Universidade Tecnológica Federal do Paraná(UTFPR), Campus Francisco Beltrão, Francisco Beltrão, Paraná, Zip Code 85601-970, Brazil

ABSTRACT

The discharge coefficient (𝐶𝑑), thrust coefficient at vacuum (𝐶𝑓), temperature and pressure at the wall (𝑇𝑤𝑎𝑙𝑙 and 𝑝𝑤𝑎𝑙𝑙) were obtained by computational simulations for two-dimensional rocket nozzle flows with two CFD codes. In general, applying Eulers model; explicit model (described in commercial codes as Laminar model); explicit model with Baldwin-Lomax, Spalart-Allmaras, and negative Spalart-Allmaras closure models. Mach2D code, discretized by the finite volume method, had their computational results compared with analytical predictions, experimental results, and CFD++ results. 𝐶𝑑 and 𝐶𝑓 had their computational errors evaluated. Results suggests that for 𝐶𝑑, simulations performed with Mach2D code were in better agreement with experimental and quasi-one-dimensional results than CFD++. For 𝐶𝑓, CFD++ results shown best fit with quasi-one-dimensional results. Pressure distribution at the wall, for both CFD codes, was considered sufficiently representative of experimental measurements. The same occurs for 𝑇𝑤𝑎𝑙𝑙 obtained by Euler’s models.

KEYWORDS

Baldwin-Lomax model, Spalart-Allmaras model, Rocket nozzle, Mach2D, CFD++.

Comparative Study of Justification Methods in Recommender Systems: Example of Information Access Assistance Service (IAAS)

Kyelem Yacouba¹, Kabore Kiswendsida Kisito¹ and Ouedraogo Tounwendyam Frédéric², ¹Department of Informatique, University of Joseph Ki-Zerbo, Ouagadougou, Burkina Faso, ²Department of Informatique, University of Nobert Zongo, Koudougou, Burkina Faso

ABSTRACT

We conducted a rationale study of the recommendation for IAAS. Our comparative study shows that IAAS, which currently does not offer the opportunity to justify recommendations, need to be improved. From the analysis of existing justification methods, it appears that none of these methods can be used effectively in IAAS. That’s why, we proposed a new IAAS architecture that deals separately with item classification and the extraction of the rationale has added the item during recommendation generation. The item selection method remains unchanged as we plan to implement a new strategy to filter user’s reviews should now be extended to four elements: the documentary unit, the group of users, the justification and the weight. Opinion A=(UD,G,J,a).

KEYWORDS

IAAS, Justification in Recommender Systems, users reviews, weight of reviews.

Thermal Entropy based Hesitant Fuzzy Linguistic Term Set Analysis in Energy Efficient Opportunistic Clustering

Junaid Anees and Hao-Chun Zhang, School of Energy Science & Engineering, Harbin Institute of Technology, Harbin, China

ABSTRACT

Limited energy resources and sensor nodes’ adaptability with the surrounding environment play a significant role in the sustainable Wireless Sensor Networks. This paper proposes a novel, dynamic, self-organizing opportunistic clustering using Hesitant Fuzzy Linguistic Term Analysis- based Multi-Criteria Decision Modeling methodology in order to overcome the CH decision making problems and network lifetime bottlenecks. The asynchronous sleep/awake cycle strategy could be exploited to make an opportunistic connection between sensor nodes using opportunistic connection random graph. Every node in the network observe the node gain degree, energy welfare, relative thermal entropy, link connectivity, expected optimal hop, link quality factor etc. to form the criteria for Hesitant Fuzzy Linguistic Term Set. It makes the node to evaluate its current state and make the decision about the required action (‘CH’, ‘CM’ or ‘relay’). Our proposed scheme leads to an improvement in network lifetime, packet delivery ratio and overall energy consumption against existing benchmarks.

KEYWORDS

Graph Theory, Wireless Sensor Networks, Hesitant Fuzzy Linguistic Term Set, Opportunistic Routing and RF Energy Transfer.

Software Defined Network (SDN): A Survey of Security Threats and their Mitigation Techniques

Zulkarnain Zainal¹ and Azizol Abdullah², ¹Faculty of Computer Science and Information Technology, University Putra Malaysia, Selangor, Malaysia, ²Faculty of Computer Science and Information Technology, University Putra Malaysia, Selangor, Malaysia

ABSTRACT

The emergence of Software-Defined Network (SDN) has raised the bar significantly in terms of network management complexity and programmability. However, their full potential is still being exploited. The increasing number of SDN and their various architectural components have raised concerns about their security. Security is not part of the initial design of a network. Therefore, it must be considered as part of the overall strategy. Through the northbound interface, users can easily deploy various applications to access the networks resources. However, exploitation of this interface can lead to the exploitation of the networks openness. This paper presents a broad overview of the various security solutions available in the market. This paper will discuss about various types of attacks and threats that can affect the operation of SDN. It also presents various security considerations that can be taken into account while implementing an SDN security strategy.

KEYWORDS

Application, Authentication and authorization, Network security, Software defined network.

Gesture based Notification System for Paralytics

Sean Ernest Clyde Nodado, Alaric Justin Gallego, Vince Marvin Pama and Aniel James Villamor, Department of Electronics Engineering, University of St. La Salle, Bacolod City, Philippines

ABSTRACT

Communication barrier between paralytic/disabled people and the people around them is a major concern in the society and has greatly affected the way they live and interact. In lieu of addressing this specific problem, a gesture based notification system was designed and developed to mainly assist the paralytic and/or the disabled people in the community, even when power interruptions occur. This low cost reliable project provides an effective and simple, yet important solution to various issues faced by caretakers in traditionally communicating with disabled/paralyzed patients.

KEYWORDS

Paralytic, Caretaker, WiFi, 3G, Mobile Application, Web-based & Feedback.

Automated Training Techniques and Electronics Sensorsrole in Cricket: A Review

Pravin Balbudhe¹, Dr. Brijesh Khandelwal², Dr. Sachin Solanki³, ¹Research Scholar, Department Of Computer Science &Engineering, Amity School Of Engineering and Technology, Amity University Raipur, Chhattisgarh, India, ²Associate Professor, Department Of Computer Science &Engineering, Amity School Of Engineering and Technology, Amity University Raipur, Chhattisgarh, India, ³Assistant Director(T), Directorate Of Technical Education , Government Polytechnic Campus, Sadar Nagpur, Maharashtra, India

ABSTRACT

This paper presents the study about technological involvement in game coaching. Attending multiple players with their performance and accuracy level checking is not feasible for coaches every time. Self-paced training sessions or self-learning methods are invented by different researchers&identify multiple games or the gaming apparatuses for different level automation.Methods used for analysis purpose& described the smart cricket ball & its circuit diagram.Tracking technology that areused in cricket, tennis, Gaelic football, badminton, hurling, rugby union, association football & volleyball, to visually track the trajectory of the ball, Centre of Percussion(COP) in cricket, Accelerometer &Swing angle model.Provides a systematic literature review of smart sport & various methods i.e, SVM, CART, ML,AI,CNN, SVM, ORB, SIFT & SURF. Lastly, future directions of research are proposed in the emerging ﬁeld of SST.

KEYWORDS

BI, NVR, COP, ML, CNN, IoT.

WPA2 based Wireless Enterprise Configuration

Dr. Seema B. Hegde, Aditya Ranjan, Aman Raj, Krishanu Paul and Smritimay Santra, Electronics and Communication Siddaganga Institute Of Technology Tumkur, India

ABSTRACT

In this Pandemic era the wireless enterprise configuration has gained more significance as globe is moving towards the work from culture and robust, reliable secured wireless communication is the need of the day. This work discusses Wi-Fi Protected Access-2 Enterprise which is a fundamental technology for secure communication in enterprise wireless networks has been discussed. A key requirement of Wi-Fi Protected Access-2 is that Wi-Fi enabled devices or supplicants should be correctly configured before connecting to the enterprise wireless network. Hackers may attack incorrectly configured supplicants and may steal the network credentials very easily. As wireless technology is developing in rapid pace, the feature of remote access in wireless network is also expected to develop with more safe and secure connectivity. With more people accessing enterprise network using remote access feature, it becomes even more essential to secure the network in order to prevent the attacks which are aimed at stealing the network credentials. The network credentials have been an enormous value because they usually unlock access to all enterprise services. Hence, there is a need for secure connection of only authorized devices and users. This discussed work discovers and mitigates the widespread risks in wireless enterprise networks with proper enterprise network design and implementing the standard Wi-Fi Protected Access-2 802.1x authentication protocol over the network. Remote user access for work from home scenarios is also implemented using IPSec for high security over the tunnel.

KEYWORDS

WPA2, WLAN, IPsec, Network Architecture, Network Security, Enterprise Network, WAN.

Measurement of Public Opinion based on Social Media Big Data (Comparative Analysis of Public Opinion Lockdown Policy in Indonesia and Malaysia)

Catur Suratnoaji, Nurhadi, Irwan Dwi Arianto, Communication Departmen, 2Business Administration Departmen, Social and Politic Sciences Faculty, Universitas Pembangunan Nasional “Veteran” Jawa Timur, Indonesia

ABSTRACT

The focus of this study is to compare public opinion about the lockdown policy based on the data stored on Twitter social media. The measurement of public opinion on policies tends to be done using a survey method (traditional), but in this research, it is done by measuring public opinion based on social media big data. Big data-based public opinion research methods are still relatively new and exploratory. The measurement of public opinion is not only counting the number of Twitter users, top tweets, top influencers, but also the communication network between Twitter users in discussing the lockdown policy. Based on the results of the study, it shows that the Lockdown policy in Malaysia and Indonesia raises various kinds of public opinion. Sentiment analysis shows that most tweets in Indonesia and Malaysia fall into the positive category, but the negative opinion category is almost the same as the positive opinion. In an uncertain situation during the COVID-19 pandemic, Indonesians are more confident in personal sources of information, not from the mass media or government departments. Personal sources of information are more dominant in disseminating information on COVID-19 to the wider community. The big data-based public opinion research method has several opportunities and challenges. 1) Public opinion research provides new research opportunities in the fields of politics, communication, public opinion. 2) The challenge in this research is the selection of keywords in downloading the data. 3) Determining the sample is also a challenge because the amount of data in the Twitter media is very large.

KEYWORDS

public opinion, lockdown, social media, PSBB, big data, Covid-19, PPKM, PKP.

Low-Complexity Receiver for Massive MIMO-GFDM Communications

Feng-Cheng Tsai, Fang-Biau Ueng and Ding-Ching Lin, Department of Electrical Engineering, National Chung Hsing University, Taiwan

ABSTRACT

OFDM has two disadvantages. The first is high peak-to-average power ratio (PAPR), and the second is high out-of-band (OOB) radiated power. In the future communication applications, the diversified scenarios such as Internet of Things, inter-machine communication and telemedicine make the fourth-generation mobile communication no longer applicable. The generalized frequency division multiplexing (GFDM) has a pulse-shaping filter, which has less out-of-band radiated power and peak-to-average power ratio and fewer cyclic prefixes (CP) than OFDM. In order to meet high- data-transmission rate, it is an inevitable trend to install massive multi-input multi-output (massive MIMO) antennas. As the number of antennas increases, so does its complexity. This paper employs time reversal (TR) technology to reduce the computational complexity. Although the number of base station (BS) antennas has increased to eliminate interference, there is still residual interference. In order to eliminate the interference one step further, we deploy a zero forcing equalization (ZF equalization) after the time reversal combination.

KEYWORDS

5G, GFDM, MIMO.

Enhanced Compact Hairpin Bandpass Filter with Cross Shaped and Spiral Shaped DGS using High Dielectric Constant Laminate Substrate and Superstrate for 6G Communication

Anoop Kumar Bundela¹ and Uma Shankar Kurmi², ¹Govt. Women’s polytechnic College Bhopal, Madhya Pradesh, 462016, ²LNCT University Bhopal, Madhya Pradesh

ABSTRACT

Hairpin bandpass filters are one of the most demanding types of bandpass filters used in RF/microwave applications for regulating frequency responses.In this research to remove the unwanted harmonics that disturb the filter operation and to increase the performance of the filter, a novel Hairpin Bandpass Filter with cross shaped and spiral shaped DGS based Optimization has been proposed in which cross shaped and spiral shaped DGS structure has been placed in the input and output feedline of hairpin filter to provide smooth passband characteristics and wide stopband characteristics and the placement of Double Square Split Ring Resonator in both the input and output feedline of the hairpin filterincreases the quality factor. Inorder to increase the performance of the hairpin filter while maintaining the compact size, a high dielectric constant laminate substrate and superstrate has been used which reduces the losses in humid environment. Arlon AD1000 material has been used in this design. The simulation in this paper is carried out using the High-Frequency Structure Simulator (HFSS) software. The proposed filter design improves the performance in a compact size with low insertion lossand provide excellent spurious passband suppression.

KEYWORDS

Hairpinbandpass filter, DGS based Optimization, Cross shaped DGS, Spiral shaped DGS, Double Square Split Ring Resonator, High dielectric constant laminate substrate and superstrate.

Convolutional Neural Network for Offline Signature Verification via Multiple Classifiers

Fadi Mohammad Alsuhimat and Fatma Susilawati Mohamad, Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin, Terengganu, Malaysia

ABSTRACT

The signing process is one of the most important processes used by organizations to ensure the confidentiality of information and to protect it against any unauthorized penetration or access to such information. As organizations and individuals enter the digital world, there is an urgent need for a digital system capable of distinguishing between the original and fraud signature, in order to ensure individuals authorization and determine the powers allowed to them. In this paper, we used Pre-Trained CNN for extracts features from genuine and forged signatures, and three widely used classification algorithms, SVM (Support Vector Machine), NB (Naive Bayes) and KNN (k-nearest neighbors), these algorithms are compared to calculate the run time, classification error, classification loss and accuracy for test-set consist of signature images (genuine and forgery). Three classifiers have been applied using (UTSig) dataset; where run time, classification error, classification loss and accuracy were calculated for each classifier in the verification phase, the results showed that the SVM and KNN got the best accuracy (76.21), while the SVM got the best run time (0.13) result among other classifiers, therefore the SVM classifier got the best result among the other classifiers in terms of our measures.

KEYWORDS

CNN, Signature verification, SVM, KNN, NB.

Contact Us

natlconfer@yahoo.com

7th International Conference on Natural Language Computing (NATL 2021)

November 27 ~ 28, 2021, London, United Kingdom

7^th International Conference on Natural Language Computing (NATL 2021)