Accepted Papers

Welcome to NLPCL 2021

2^nd International Conference on Natural Language Processing and Computational Linguistics (NLPCL 2021)

May 29~30, 2021, Vancouver, Canada

Accepted Papers

Faceatlasar: Atlas of Facial Acupuncture Points in Augmented Reality

Menghe Zhang¹, Jürgen P. Schulze¹, and Dong Zhang², ¹Department of Computer Science, University of California, San Diego, USA, ²Qilu University of Technology, Shandong, China

ABSTRACT

Acupuncture is a technique in which practitioners stimulate specific points on the body. Those points, called acupuncture points(or acupoints), anatomically define areas on the skin relative to specific landmarks on the body. However, mapping the acupoints to individuals could be challenging for inexperienced acupuncturists. In this project, we proposed a system to localize and visualize facial acupoints for individuals in an augmented reality(AR) context. This system combines a face alignment model and a hair segmentation model to provide dense reference points for acupoints localization in real-time(60FPS). The localization process takes the proportional bone(B-cun or skeletal) measurement method, which is commonly operated by specialists; however, in the real practice, operators sometimes find it inaccurate due to the skill-related error. With this system, users, even without any skills, can locate the facial acupoints as a part of the self-training or self-treatment process.

KEYWORDS

Augmented reality, Acupuncture point, Face alignment, Hair segmentation.

Automatic Extraction of Threat Actions using Word Vector and Information Retrieval

Chia-Mei Chen¹, Jing-Yun Kan¹, Ya-Hui Ou² and Zheng-Xun Cai¹, ¹Department of Information Management, National Sun Yat-sen University, Taiwan, ²National Penghu University of Science and Technology, Taiwan

ABSTRACT

To adapt to the rapidly evolving cyberattacks, awareness of adversary’s threat actions is essential for organizations to gain visibility into the fast-evolving threat landscape and to timely identify early signs of an attack and the adversary’s strategies, tactics, and techniques. To gain insight into potential cyber threats, this research proposes a novel automatic threat action retrieval system called “TAminer” (Threat Action Miner), which collects and analyzes various data sources including security news, incident analysis reports, and darknet hacker forums and develops an improved data preprocessing method to reduce feature dimension and a novel query match algorithm to capture effective threat actions automatically without manually predefined ontology applied by the past research. The experimental results illustrate that TAminer achieves an accuracy of 94.7% and a recall rate of 95.8% and outperforms the previous research. The proposed solution can extract effective threat actions automatically and efficiently.

KEYWORDS

Cyber Threat Intelligence, Threat Action, Natural Language Processing.

Research on Deep Level Kernel Hook Mining Algorithm and its Application In Software Security

Wenjian Yu and Yongbin Yu, School of Information and Software Engineering, University of Electronic Science and Technology, Chengdu, China

ABSTRACT

This paper studies the protection principle of kernel hooks in the Windows operating system and proposes a deep level kernel hook mining algorithm to solve the shortcomings of the IDA(Interactive Disassembler Professional) cross-reference function. This algorithm can dig out the internal calls of the specified kernel function and all the called positions of the kernel function containing hooks. This article uses Python to write mining algorithms based on the principle of function calls. Use C++ to write the driver program for the passing-protection experiment. The research results show that the passing-protection experiment is successful, proving the effectiveness of the mining algorithm and the comprehensiveness of the mining results. This paper provides a practical method for the study of software security and has practical significance.

KEYWORDS

Kernel Hook, Mining Algorithm, Kernel Security, Software Security.

Risk Analysis of Setting up a Restaurant at NYC

Santoshi Laxmi Reddy Ellanki¹ and John Jenq², ¹Medical College of Wisconsin, Houston, Texas, USA, ²Department of Computer Science, Montclair State University, NJ USA

ABSTRACT

In this report, a system was developed that can predict the outcome of opening a restaurant in NYC based on various NYC open data sets, such as 311 calls, New York Police crime records and restaurant rating data. The data sets were preprocessed and cleaned before analysis to improve the quality of our results.

KEYWORDS

Big Data, Risk Analysis, PySpark, Decision Tree.

Information Technology Governance of Japanese Companies; an Empirical Study

Michiko Miyamoto, Department of Management Science and Engineering, Akita Prefectural University Yurihonjo City Akita, Japan

ABSTRACT

IT has become an essential part of the organization. IT governance specifies the decision rights and accountability framework to encourage desirable behavior in using IT. Concepts of IT governance has expanded to improve IT-business alignment under today’s business environment and prospects. This paper contributes to empirically knowledge of IT governance practices in Japanese organizations based on survey data gathered from 101 corporations, including large, medium, and small companies. The findings of the ordinal regression analyses in this study indicate that IT governance is associated with Strategic Alignment, Performance Measurement and Value Delivery, while Risk Management and Resource Management have positive but no significance association with IT governance.

KEYWORDS

IT Governance, IT-business alignment, Strategic alignment maturity, Regression Analysis.

Low Cost Autonomous UAV Swarm Application in Wildfire Surveillance and Suppression

Xiaoyu Mo², Doney Peters¹ and Chengwei Lei¹, ¹Department of Computer and Electrical Engineering and Computer Science, California State University, Bakersfield, USA, ²San Domenico School, San Anselmo, United States

ABSTRACT

With the developing severity of wildfire issue and the fast-adapting drone technology, there is a possibility of utilization of drones in detecting fire centers in these wildfires for firefighting and evaluating the potential factors which are accountable for the starting fire. The purpose of this research is to develop a more efficient, portable, and disposable system for gathering precise temperature and possible other readings in certain regions of a wildfire. The system will be comprised of a swarm of low-cost self-guided UAVs which are formed by consumer grade quadcopters with a single board computer and related sensors. The entire system is managed by Particle Swarm Optimization algorithm, and the vision for this strategy would be for the group of quadcopters to gather the complete readings of a land region autonomously, which will significantly reduce the cost of the rescue and the risk of the firefighters.

KEYWORDS

Particle Swarm Optimization, UAV, wildfire.

Cross Languages One-Versus-All Speech Emotion Classifier

Xiangrui Liu¹, Junchi Bin² and Huakang Li^3,4, ¹Suzhou Privacy Technology Co. Ltd, ²School of Engineering, University of British Columbia, Okanagan Campus, ³School of Artificial Intelligence and Advanced Computing, Xi’an Jiaotong-Liverpool University, ⁴Key Laboratory of Urban Land Resources Monitoring and Simulation

ABSTRACT

Speech emotion recognition (SER) is a task that cannot be accomplished solely depending on linguistic models due to the presence of figures of speech. For a more accurate prediction of emotions, researchers adopted acoustic modelling. The complexity of SER can be attributed to a variety of acoustic features, the similarities among certain emotions, etc. In this paper, we proposed a framework named Cross Languages One-Versus-All Speech Emotion Classifier (CLOVASEC) that identifies speeches’ emotions for both Chinese and English. Acoustic features were pre processed by Synthetic Minority Over sampling Technique(SMOTE) to diminish the impact of an imbalanced dataset then by Principal component analysis (PCA) to reduce the dimension. The features were fed into a classifier that was made up of eight sub-classifiers and each sub-classifier was tasked to differentiate one class from the other seven classes. The frame workout performed regular classifiers significantly on The Chinese Natural Audio-VisualEmotion Database(CHEAVD) and an English dataset from Deng.

KEYWORDS

Speech Emotion Recognition, Multi-languages, Acoustic Modelling, Deep Learning & Multiplicative Attention.

An Energy-Efficient Bioinspired Scheduling Resource Model for Emergency Scenario

Janine Kniess and Marcelo Petri and Rafael Stubs Parpinelli, Santa Catarina State University, Graduate Program in Applied Computing, Brazil

ABSTRACT

The efficient management of resources after a disaster, must take place within a short time and efficiently. Therefore, resource scheduling protocol which shares resources among the victims, respecting time constraints, is decisive. In disaster scenarios, communication infrastructure is usually damaged and a commonly used solution for ensuring connectivity between victims and providers is ad hoc networks (MANET), composed by battery-operated mobile nodes. Fire brigade and ambulances might be insufficient if the number of victims is high. Hence, in order to provide an efficient resource scheduling approach that cares about the energy consumption, the number of requesters attended and the processing time, we present the Scheduling Resource protocol ΔAGschedule. Thus, to reduce the provider dislocation time and increase the number of victims attended the protocol was modelled based on Genetic Algorithms. Results show that this approach maintains the trade-off between the number of victims attended, therefore, minimizing the energy consumption.

KEYWORDS

Scheduling Resources, Genetic Algorithm, Energy-Efficient, Mobile Networks.

A New Hashing based Nearest Neighbors Selection Technique for Big Datasets

Jude Tchaye-Kondi, Yanlong Zhai and Liehuang Zhu, School of Computer Science, Beijing Institute of Technology, Beijing, China

ABSTRACT

KNN has the reputation of being a simple and powerful supervised learning algorithm used for either classification or regression. Although KNN prediction performance highly depends on the size of the training dataset, when this one is large, KNN suffers from slow decision making. This is because each decision-making process requires the KNN algorithm to look for nearest neighbors within the entire dataset. To overcome this slowness problem, we propose a new technique that enables the selection of nearest neighbors directly in the neighborhood of a given data point. The proposed approach consists of dividing the data space into sub-cells of a virtual grid built on top of the dataset. The mapping between data points and sub-cells is achieved using hashing. When it comes to selecting the nearest neighbors of a new observation, we first identify the central cell where the observation is contained. Once that central cell is known, we then start looking for the nearest neighbors from it and the cells around. From our experimental performance analysis of publicly available datasets, our algorithm outperforms the original KNN with a predictive quality as good and offers competitive performance with solutions such as KDtree.

KEYWORDS

Machine learning, Nearest neighbors, Hashing, Big data.

Hierarchical Virtual Bitmaps for Spread Estimation in Traffic Measurement

Olufemi Odegbile, Chaoyi Ma, Shigang Chen, Dimitrios Melissourgos and Haibo Wang, Department of Computer and Information Science and Engineering University of Florida, Gainesville, Florida, USA

ABSTRACT

This paper introduces a hierarchical traffic model for spread measurement of network traffic flows. The hierarchical model, which aggregates lower level flows into higher-level flows in a hierarchical structure, will allow us to measure network traffic at different granularities at once to support diverse traffic analysis from a grand view to fine-grained details. The spread of a flow is the number of distinct elements (under measurement) in the flow, where the flow label (that identifies packets belonging to the flow) and the elements (which are defined based on application need) can be found in packet headers or payload. Traditional flow spread estimators are designed without hierarchical traffic modeling in mind, and incur high overhead when they are applied to each level of the traffic hierarchy. In this paper, we propose a new Hierarchical Virtual bitmap Estimator (HVE) that performs simultaneous multi-level traffic measurement, at the same cost of a traditional estimator, without degrading measurement accuracy. We implement the proposed solution and perform experiments based on real traffic traces. The experimental results demonstrate that HVE improves measurement throughput by 43% to 155%, thanks to the reduction of perpacket processing overhead. For small to medium flows, its measurement accuracy is largely similar to traditional estimators that work at one level at a time. For large aggregate and base flows, its accuracy is better, with up to 97% smaller error in our experiments.

Comparative Analysis of Quality of Service Scheduling Classes in Mobile Ad-Hoc Networks

Thulani Phakathi, Bukohwo Michael Esiefarienrhe and Francis Lugayizi, Department of Computer Science, North-West University, Mafikeng, South Africa

ABSTRACT

Quality of Service (QoS) is now regarded as a requirement for all networks in managing resources like bandwidth and avoidance of network impairments like packet loss, jitter, and delay. Media transfer or streaming would be virtually impossible if QoS parameters were not used even if the streaming protocols were perfectly designed. QoS Scheduling classes help in network traffic optimization and the priority management of packets. This paper presents an analysis of QoS scheduling classes using video traffic in a MANET. The main objective was to identify a scheduling class that provides better QoS for video streaming. A simulation was conducted using NetSim and results were analyzed according to throughput, jitter, and delay. The overall results showed that extended real-time Polling Service (ertPS) outperformed the other classes. ertPS has hybrid features of both real-time Polling Service (rtPS) and Unsolicited Grant Service(UGS) hence the enhanced performance. It is recommended that ertPS scheduling class should be used in MANET where QoS consideration is utmost particularly in multimedia streaming applications.

KEYWORDS

Routing protocols, MANETs, Scheduling, QoS in MANET, rtPS protocol.

Malicious Node Detection in Smart Grid Networks

Faisal Y Al Yahmadi and Muhammad R Ahmed, Marine Engineering Department, Military Technological College, Muscat, Sultanate of Oman

ABSTRACT

Many countries around the world are implementing smart grids and smart meters. Malicious users that have moderate level of computer knowledge can manipulate smart meters and launch cyber-attacks. This poses cyber threats to network operators and government security. In order to reduce the number of electricity theft cases, companies need to develop preventive and protective methods to minimize the losses from this issue. In this paper, we propose an algorithm that detects malicious nodes in a smart grid network. The algorithm collects data (electricity consumption/electric bill) from the nodes and compares it with previously obtained data. Support Vector Machine (SVM) model is implemented to classify nodes into good or malicious nodes by (high dimensional) giving the statues of 1 for good nodes and status of -1 for malicious (abnormal) nodes. The algorithm also displays the network graphically as well as the data table. The algorithm also displays the detection error in each cycle. This algorithm has a very low false alarm rate (2%) and a high detection rate as high as (98%). Future developments can trace the attack origin to eliminate or block the attack source minimizing losses before human control arrives.

KEYWORDS

Smart Grid Networks, Security, Malicious, Attacks, Support Vector Machine.

Behavior-Based Multi-labeling of Malware Samples

Pedro García-Teodoro and José Antonio Gómez-Hernández, School of Computer Science and Telecommunication Engineering University of Granada, Spain

ABSTRACT

The use of malware datasets is usually required to test cyber security solutions. For that, the correct labeling of the samples is generally of interest to properly estimate the exhibited performance of the solutions under study. Based on the varied classification generally provided by automatic detection engines, we introduce here a two-step multi-labeling procedure to automatically tag the usual complex, multiple behavior of each of the samples that compose a given malware dataset. Due to the current relevance of mobile environments, the automatic multi-labeling approach is executed here over four well known Android malware datasets. The results obtained are dissected to show the real composition of the datasets studied, which evidences the usefulness of our labeling approach with assessment purposes.

KEYWORDS

Android security, Malware, Dataset, Labeling.

Improving Immigrant Integration through the Design of a Digital Learning Game

Heidi Katz¹, Emmanuel Acquah¹, Anette Bengs² and Fredrik Sten², ¹Faculty of Education and Welfare Studies, Department of Education, Åbo Akademi University, Vaasa, Finland, ²Faculty of Education and Welfare Studies, Experience Lab, Åbo Akademi University, Vaasa, Finland

ABSTRACT

In order for immigrants to achieve academically and successfully integrate into society, they must receive adequate second language education. Currently, immigrants tend to perform worse on international tests compared to their native peers due a range of reasons including, challenges acquiring the host country language, lack of teacher training, separation of language and content learning, low parental involvement, and more. Technology has garnered attention as a successful tool for language learning, which could help improve immigrant student outcomes and integration. More specifically, digital learning games have been used to enhance a variety of outcomes including, language acquisition, motivation, and student confidence. Digital learning games differentiate instruction, provide a safe environment for students to practice the target language, and give students immediate feedback. However, it is important that digital learning games are designed with the end-users in mind. For that reason, we outline how researchers and game developers can utilize user-centered design to develop a game that is context-specific. As an example, we present the four-step process of an on-going game design project in Finland, including general findings from interviews with teachers.

KEYWORDS

Second language learning, Immigrant education, Immigrant integration, Digital learning game, Usercentered design.

Electromechanical Platform with Removable Overlay for Exploring, Tuning and Evaluating Search, Machine Learning and Feedback Control Algorithms

Kelvin Tan Thye Lye, Singapore

ABSTRACT

Disclosed is a system of motorized movable stage capable of detecting the presence of the objects on the stage. Removable overlay labyrinths, obstacle courses and containers can be added on the stage. The system accepts virtually unlimited permutations of overlays. The system is designed for exploring, learning, tuning, and evaluating machine learning, feedback control methodologies and algorithms. Other use of the system includes teaching and entertainment.

KEYWORDS

Reinforcement Learning, Electromechanical Maze Platform, Physical Environment for Reinforcement Learning.

The Case for Error-bounded Lossy Floating-point Data Compression on Interconnection Networks

Yao Hu and Michihiro Koibuchi, National Institute of Informatics, 2-1-2, Hitotsubashi, Chiyoda-ku, Tokyo, Japan

ABSTRACT

Data compression virtually increases the effective network bandwidth on an interconnection network of parallel computers. Although a floating-point dataset is frequently exchanged between compute nodes in parallel applications, its compression ratio often becomes low when using simple lossless compression algorithms. In this study, we aggressively introduce a lossy compression algorithm for floating-point values on interconnection networks. We take an application-level compression for providing high portability: a source process compresses communication datasets at an MPI parallel program, and a destination process decompresses them. Since recent interconnection networks are latency-sensitive, sophisticated lossy compression techniques that introduce large compression overhead are not suitable for compressing communication data. In this context, we apply a linear predictor with the user-defined error bound to the compression of communication datasets. We design, implement, and evaluate the compression technique for the floating-point communication datasets generated in MPI parallel programs, i.e., Ping Pong, Himeno, K-means Clustering, and Fast Fourier Transform (FFT). The proposed compression technique achieves 2.4x, 6.6x, 4.3x and 2.7x compression ratio for Ping Pong, Himeno, K-means and FFT at the cost of the moderate decrease of quality of results (error bound is 10-4), thus achieving 2.1x, 1.7x, 2.0x and 2.4x speedup of the execution time, respectively. More generally, our cycle-accurate network simulation shows that a high compression ratio provides comparably low communication latency, and significantly improves effective network throughput on typical synthetic traffic patterns when compared to no data compression on a conventional interconnection network.

KEYWORDS

Interconnection Network, Lossy Compression, Floating-point Number, Linear Predictor, High-performance Computing (HPC).

WLNI-LPA: detecting overlapping communities in attributed networks based on label propagation process

Imen Ben El Kouni^1,2, Wafa Karoui^1,3 and Lot Ben Romdhane^1,2, ¹Universite de Sousse, Laboratoire MARS LR17ES05, ISITCom, 4011, Sousse, Tunisie, ²Universite de Sousse, ISITCom, 4011, Sousse, Tunisie, ³Universite de Tunis El Manar, Institut Superieur d'Informatique, 2080, Tunis, Tunisie

ABSTRACT

Several networks are enriched by two types of information: the network topology and the attributes information about each node. Such graphs are typically called attributed networks, where the attributes are always as important as the topological structure. In these attributed networks, community detection is a critical task that aims to discover groups of similar users. However, the majority of the existing community detection methods in attributed networks were created to identify separated groups in attributed networks. Therefore, detecting overlapping communities using a combination of node attributes and topological structures is challenging. In this paper, we propose an accurate algorithm called WLNI-LPA based on label propagation to discover overlapping communities in the attributed network. WLNI-LPA is an extension of NI-LPA [1] and combines node importance attributes information and topology structures to improve the quality of graph partition. In the experiments, the performance of our method is validated on synthetic weighted networks. Also, our experiment done on the recommender system demonstrate that our method can effectively and overlapping communities that improve the quality of recommendation.

KEYWORDS

Attributed networks, overlapping community detection, node similarity, weighted graph.

Fusion of Medical Images based on Salient Features Extraction by Fuzzy Logic and NSML in NSST Domain

Yuan Gao, Shiwei Ma, School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, China

ABSTRACT

Medical image fusion can be divided into single-mode fusion and multimodal fusion. Their common purpose is to extract information from different images (taken by the same or different sensors) into a single one and obtain the tissues characteristics of source images. Different image often brings different visual morphology, however, the salient features of tissues are basically the same from the perspective of human eyes. According to this characteristic, an improved image fusion algorithm based on visual salience detection is proposed in this paper. First, the GBVS algorithm was introduced to calculate visual salience of registered source images, and then decompose the source images in NSST domain to obtain their low-frequency and high-frequency sub-bands. For the low-frequency sub-bands, local energy and GBVS graph are input into fuzzy logic system to obtain the respective weights for the fused low-frequency sub-band. For the high-frequency sub-bands, the NSML values of each sub-band were calculated and compared to obtain the fused high-frequency sub-band. The final fused image was obtained by using the inverse NSST transformation. Applying this method to single-mode and multimodal medical image fusion, the visual quality of the image can be enhanced effectively and the salient features of tissues can be preserved well. Experiments on single-mode ultrasound and multimodal gray-scale medical images show that the proposed method has advantages in retention of image salient features and the overall image contrast, and has better objective index than the comparison models.

KEYWORDS

Medical Image Fusion, GBVS, Visual Salience, Fuzzy Logic, corner-NSML, NSST.

Intelligent Computational Model for the Classification of Covid-19 with Chest Radiography Compared to other Respiratory Diseases

Paula Santos, Department of Head and Neck Surgery, Ophthalmology and Otorhinolaryngology, University of São Paulo, Ribeirão Preto, Brazil & Department of Psychology, University of São Paulo, Brazil

ABSTRACT

Chest radiography images, if processed using statistical and computational methods, can distinguish pneumonia from Covid-19 diseases.The present work shows that it is possible to extract characteristics from chest X-ray images to improve the methods of screening and diagnosing patients with suspected contraction of COVID-19 in relation to pneumonia, in malaria, dengue, H1N1, tuberculosis and Streptococcus Pneumoniae. More precisely, an intelligent computer model was developed to process chest X-ray images and classify whether an image is of a patient with COVID-19. The images were processed and their features were extracted. These features were the input for non-supervisioned statistical learning techniques, PCA and clustering, which identified specific characteristics of Covid-19 X-ray images. The introduction of statistical models allowed a fast algorithm, which used the X-means clustering technique associated with the Bayesian Information Criterion (BIC). The developed algorithm efficiently distinguished each lung disease from the chest X-ray images. Screening by chest X-ray images showed excellent sensitivity, specificity, positive predictive value, negative predictive value and accuracy in the diagnosis of COVID-19 0.93 ± 0.051.

KEYWORDS

Probabilistic Models, Machine Learning and Computer Vision.

Automatic Detection and Extraction of Lungs Cancer Nodules Using Connected Component Labeling and Distance measure based Classification

Mamdouh Monif¹, Kinan Mansour², Waad Ammar², and Maan Ammar¹, ¹AL Andalus University for Medical Sciences, Faculty of Biomed. Eng., Al Qudmos, Syria, ²Al Andalus University Hospital, Al Qudmos, Syria

ABSTRACT

We introduce in this paper a method for reliable automatic extraction of lung area from CT chest images with a wide variety of lungs image shapes by using Connected Components Labeling (CCL) technique with some morphological operations. The paper introduces also a method using the CCL technique with distance measure based classification for the efficient detection of lungs nodules from extracted lung area. We further tested our complete detection and extraction approach using a performance consistency check by applying it to lungs CT images of healthy persons (contain no nodules). The experimental results have shown that the performance of the method in all stages is high.

KEYWORDS

lungs cancer, lungs area extraction, nodules detection, distance measure, performance consistency check.

Fuzzy Rate Control Algorithm for Real-Time Applications of the Multiview Extension of H.264/AVC Video Coding Standard

Hanieh Hosseini and Mehdi Rezaei, University of Sistan and Baluchestan, Zahedan, Iran

ABSTRACT

In this work, we propose a rate control algorithm that regards the characteristics of multiview video coding. The proposed algorithm is designed for real-time multiview video coding applications and optimized to provide high-quality compressed video bit streams with optimal utilization of channel bandwidth and buffering delay. This algorithm uses a fuzzy rate controller and a deterministic quality controller to define a quantization parameter for a Group of Pictures based on given target rate, buffer, and quality constraints. The key point is to provide a variable bit rate multiview video bit stream with minimum fluctuations in quantization parameter and thereafter in quality while the buffer constraints are satisfied. The experimental results show that it can control the bitrate of all views according to the specified target bit rates for each view while the buffering constraints are completely obeyed, and it provides compressed video bit streams with high visual quality.

KEYWORDS

Fuzzy algorithm, multiview video coding, rate controller.

Developing Multi-Dimensional Machine Learning Models: Predicting College Basketball Player Success in the NBA

Andrei J. Rosu, Department of Systems Engineering, United States Military Academy, West Point, New York, USA

ABSTRACT

Each year all 30 National Basketball Association (NBA) teams attempt to select the best available players in a two round linear draft. Various selection methodologies, performance metrics, and personnel factors are considered by team executives when creating a draft board, or selection order of merit. Yet a historical review of NBA drafts reveals many early round selections that failed to live up to expectations and late round sleepers who dominated the league. The purpose of this research effort was to develop a multi-dimensional machine learning model to predict college basketball player success in the NBA. By using various college statistics and previous NBA draft results, a predictive model was developed to improve the draft selection process. The multi-dimensional machine learning models drastically improve accuracy in selecting college players who successfully performed in the NBA in comparison to observed results. Overall, the model increased draft accuracy by an average of 104% from the 2003 NBA Draft until the 2018 NBA Draft.

KEYWORDS

Machine learning, deep learning, predictive modeling, value-based decision making.

Basketball-51: A Video Dataset for Activity Recognition in the Basketball Game

Sarbagya Ratna Shakya, Chaoyang Zhang and Zhaoxian Zhou, School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, USA

ABSTRACT

In recent years, there has been an increase in the association of technology in sports and live sports broadcasting networks. From score updates, broadcasting commercials, assisting referees for decision making, and minimizing errors, the adoption of technology has been used for fair play and improve results. This has been possible with the advancement in video analysis, classification techniques, and the availability of resources. This paper introduces a new labelled video dataset collected from a live basketball game broadcasted on live TV to determine the type of basket scored in the basketball game. Among different shots, the points the player can score are basically of three types: 3 points, 2 points, which depends on the range of shots taken and 1 point which is the free shots taken after a foul. This dataset consists of labelled video clips collected from the live broadcast of the game from the broadcasting medium to classify different scoring activities. This paper also gives the preliminary analysis of the dataset for different class labels using 3D convNet and two-stream 3D convNet methods to show the complexity of the dataset.

KEYWORDS

Basketball dataset, 3D convnet, two-stream 3D convnet.

Recognition and Classification of Hand Gestures using Hybrid Optimized Feature Selection

Manisha R. Kowdiki, Arti Khaparde and Vaidehi Deshmukh, School of Electronics & Communication, MIT World Peace University, Pune, India

ABSTRACT

Gesture recognition is one of the most common techniques used to interact with computers. This paper implements an efficient hand gesture recognition model considering Indian Sign Languages (ISL) datasets. The proposed recognition model uses artificial neural network which recognizes the gestures with the help of selected features which are extracted from datasets using active contour and edge detection. A new hybrid algorithm Deer Hunting-based Grey Wolf Optimization (DH-GWO) is used for selecting the features which ultimately train the neural network. The model proves to be highly efficient in recognizing the characters in images of static dataset with high recognition accuracy.

KEYWORDS

Hand Gesture Recognition, Optimal Feature Selection, Deer Hunting-based Grey Wolf Optimization.

Effects of Nonlinear Functions on Knowledge Graph Convolutional Networks For Recommender Systems with Yelp Knowledge Graph

Xing Wei and Jiangjiang Liu, Department of Computer Science, Lamar University, Beaumont, USA

ABSTRACT

Knowledge Graph (KG) related recommendation method is advanced in dealing with cold start problems and sparse data. Knowledge Graph Convolutional Network (KGCN) as an end-to-end framework that has been proved the ability to captures latent item-entity features by mining their associated attributes on the KG. In KGCN, aggregator plays a key role for extracting information from the high-order structure. In this work, we investigate the impacts of several nonlinear function of aggregators on KGCN. Meanwhile, to utilize the date pre-processing for KG related experiments we propose a tool Knowledge Graph Processor (KGP) and build a knowledge graph for Yelp Open dataset with the KGP.

KEYWORDS

Recommender systems, Knowledge Graph, Activation Function.

Novel Machine Learning Algorithm for Prevalent Gene Biomarkers for Effective Cancer Treatment by Detecting its PH

Sahil Sudhakar Patil¹, Darshit Shetty², Vaibhav S. Pawar^3,4, ¹Masters Student at Hof University of Applied Science, ²MBA Student from Mumbai University, JIBMS, ³Associate Professor, Mechanical Engineering, Annasaheb Dange College of Engineering & Technology (ADCET), Ashta, Sangli, Maharashtra, India, ^*4PhD (Structures, IIT Bombay) (2013-2019), Graduated in August 2019

ABSTRACT

Personalized treatments for cancer patients with similar molecular subtypes can be achieved by Patterns discovered from systematically collected molecular profiles of patient tumour samples, as well as clinical metadata. There is an unmet need for computational algorithms for cancer diagnosis, prognosis, and therapeutics that can recognize complex patterns and aid in classifications based on a plethora of publicly available cancer research outcomes. According to a recent literature review, machine learning, a branch of artificial intelligence, has a lot of potential for pattern recognition in cryptic cancer datasets. We focus on the current state of machine learning applications in cancer research in this review, highlighting trends and analysing major achievements, roadblocks, and challenges on the way to clinic implementation. We propose novel Machine learning algorithm in the context of non-invasive cancer treatment using diet based biomarkers.

KEYWORDS

Biomarkers, Machine learning, Statistical Models, sequencing, pH sensing.

Deriving Autism Spectrum Disorder Functional Networks from Rs-FMRI Data using Group ICA and Dictionary Learning

Xin Yang¹, Ning Zhang² and Donglin Wang³, ¹Department of Computer Science, Middle Tennessee State University, Murfreesboro, TN, USA, ²Department of Computer Information Sciences, St. Ambrose University, Davenport, USA, ³Department of Mathematical Sciences, Middle Tennessee State University, Murfreesboro, TN, USA

ABSTRACT

The objective of this study is to derive functional networks for the autism spectrum disorder (ASD) population using the group ICA and dictionary learning model together and to classify ASD and typically developing (TD) participants using the functional connectivity calculated from the derived functional networks. In our experiments, the ASD functional networks were derived from resting-state functional magnetic resonance imaging (rs-fMRI) data. We downloaded a total of 120 training samples, including 58 ASD and 62 TD participants, which were obtained from the public repository: Autism Brain Imaging Data Exchange I (ABIDE I). Our methodology and results have five main parts. First, we utilize a group ICA model to extract functional networks from the ASD group and rank the top 20 regions of interest (ROIs). Second, we utilize a dictionary learning model to extract functional networks from the ASD group and rank the top 20 ROIs. Third, we merged the 40 selected ROIs from the two models together as the ASD functional networks. Fourth, we generate three corresponding masks based on the 20 selected ROIs from group ICA, the 20 ROIs selected from dictionary learning, and the 40 combined ROIs selected from both. Finally, we extract ROIs for all training samples using the above three masks, and the calculated functional connectivity was used as features for ASD and TD classification. The classification results showed that the functional networks derived from ICA and dictionary learning together outperform those derived from a single ICA model or a single dictionary learning model.

KEYWORDS

Functional connectivity, rs-fMRI, autism spectrum disorder (ASD), group ICA, Dictionary Learning.

Automatic Offensive or Cyberbullying Language Detection

ROYA SALEK SHAHREZAIE, University of Nevada, Reno, MOHSEN AHMADI, Arizona State University, PARISA HAJIBABAEE, University of Massachusetts Lowell, SAEEDEH SHEKARPOUR, University of Dayton

ABSTRACT

There is a concerning rise of offensive language on the content generated by the crowd over various social platforms. Such language might bully or hurt the feelings of an individual or a community. Recently, the research community has investigated and developed different semi- or supervised approaches and training datasets to detect or prevent offensive monologues or dialogues automatically. However, these implementations are ad hoc and limited to a specific context, platform, or domain. Nevertheless, there is no exhaustive benchmarking study to provide insight into the pitfalls and strengths of state-of-the-art solutions. Thus in this paper, we introduce a pipeline that can integrate various datasets and easily benchmark it over different approaches. We present the result of our primary benchmarking for automatic detection of the offensive language. We introduce an experiment consists of four datasets, modular cleaning phase and tokenizer, four different embedding methods, and eight classifiers. These experiments showed a promising result for detection of harassing language. Considering hyper-parameter optimization, two methods of SVM and MLP had highest average of F-score on popular embedding methods of TF-IDF and FastText. Results and project’s ongoing development is an open-source platform available on Github.

KEYWORDS

Cyberbullying, Machine Learning.

A Natural Logic for Artificial Intelligence, And its Risks and Benefits

Gyula Klima, Department of Philosophy, Fordham University, New York, USA

ABSTRACT

This paper is a multidisciplinary project proposal, submitted in the hopes that it may garner enough interest to launch it with members of the AI research community along with linguists and philosophers of mind and language interested in constructing a semantics for a natural logic for AI.

KEYWORDS

Natural logic, natural vs. artificial intelligence,semantics-driven language processing.

Experiments on Nl2Sql using Sqlova, Tabert and Lookahead Optimizer

Shubham V Chaudhari and Kameshwar Rao JV, HCL Technologies LTD, India

ABSTRACT

With the advancement of deep learning in NLP there has been keen interest to convert natural language to SQL across academics and industry. Various models have been developed to address this problem which employ techniques like reinforcement learning, seq-to-seq, seq-to-set etc. We present an approach, where TaBERT and SQLOVA [2] are combined. TaBERT[1] trained on structured text improves over traditional BERT[3] thus better enhancing the features of the input query and headers. NL2SQL layer of the SQLOVA connected at the top of TaBERT which further encodes the query and headers further enhancing the features. The choice of optimizer plays a key role in improving the model’s results. This proposed architecture with lookahead optimizer[4] surpasses the accuracy of where-num, where-col and where-cond by 0.2%,0.5%,0.4% respectively.

KEYWORDS

nl2sql, deep neural networks, NLP.

Double Multi-head Attention-based Capsule Network for Relation Classification

Hongjun Heng¹ and Renjie Li², ¹Department of Computer Science and Technology, Civil Aviation University of China, Tianjin, China, ²Sino-European Institute of Aviation Engineering, Civil Aviation University of China, Tianjin, China

ABSTRACT

Semantic relation classification is an important task in the field of nature language processing. The existing neural network relation classification models introduce attention mechanism to increase the importance of significant features, but part of these attention models only have one head which is not enough to capture more distinctive fine-grained features; Models based on RNN Recurrent Neural Network) usually use single-layer structure and have limited feature extraction capability; Current RNN based capsule networks have problem of improper handling of noise which increase complexity of network. Therefore, we propose a capsule network relation classification model based on double multi head attention. In this model, we introduce an auxiliary BiGRU (Bidirectional Gated Recurrent Unit) to make up for the lack of feature extraction performance of single BiGRU, improve the bilinear attention through double multi-head mechanism to enable the model to obtain more information of sentence from different representation subspace and instantiate capsules with sentence-level features to alleviate noise impact. Experiments on the SemEval-2010 Task 8 benchmark dataset show that our model outperforms most of previous state-of-the-art neural network models and achieves the comparable performance in capsule network.

KEYWORDS

Relation Classification, Double Multi-head Attention, Auxiliary BiGRU, Capsule Network.

Fenix: A Semantic Search Engine based on an Ontology and a Model Trained with Machine Learning to Support Research

Felipe Cujar Rosero, David Santiago Pinchao Ortiz, Silvio Ricardo Timaran Pereira and Jimmy Mateo Guerrero Restrepo, Systems Department, University of Nariño, Pasto, Colombia

ABSTRACT

This paper presents the final results of the research project that aimed to build a Semantic Search Engine that uses an Ontology and a model trained with Machine Learning to support the semantic search of research projects of the System of Research from the University of Nariño. For the construction of FENIX, as this Engine is called, it was used a methodology that includes the stages: appropriation of knowledge, installation and configuration of tools, libraries and technologies, collection, extraction and preparation of research projects, design and development of the Semantic Search Engine. The main results of the work were three: a) the complete construction of the Ontology with classes, object properties (predicates), data properties (attributes) and individuals (instances) in Protegé, SPARQL queries with Apache Jena Fuseki and the respective coding with Owlready2 using Jupyter Notebook with Python within the virtual environment of anaconda; b) the successful training of the model for which Machine Learning algorithms and specifically Natural Language Processing algorithms were used such as: SpaCy, NLTK, Word2vec and Doc2vec, this was also done in Jupyter Notebook with Python within the virtual environment of anaconda and with Elasticsearch; and c) the creation of FENIX managing and unifying the queries for the Ontology and for the Machine Learning model. The tests showed that FENIX was successful in all the searches that were carried out because its results were satisfactory.

KEYWORDS

Search Engine, Semantic Web, Ontology, Machine Learning, Research Projects.

Interconnection of the Sapir-whorf Theory and the Sociolinguistic Relevance of African-american Vernacular English

Angelina Barbasheva, The English Language Department, St. Petersburg University of the Humanities and Social Sciences, Russia, Saint-Petersburg

ABSTRACT

The history of the emergence, development and formation of the African American language and its role in contemporary American society is a huge topic in which linguistics, cultural history, politics, sociology, psychology, educational philosophy and other disciplines intersect. In this regard, people must try to move away from the structuralist perception of language as a given, fixed and self-sufficient system and take an interdisciplinary approach to its study. The analysis of the lexical composition of any language (or language variant) allows us not only to study the peculiarities of language units functioning, but also to describe the facts of their social and ethnic specificity taking into account the system-forming cultural and national relations between language units and culturally significant concepts, which is the goal of the present study.

KEYWORDS

Sapir-Whorf theory, AAVE (African-AmericanVernacularEnglish), LinguisticRelativity, ethnolinguistics, sociolinguistics, culturalmodel, discourse, linguisticstatus, dialect,code-switching.

Efficient GAN-based Method for Extractive Summarization

Seyed Vahid Moravvej¹, Mohammad Javad Maleki Kahaki², Moein Salimi Sartakhti³, ¹Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran, ²Department of Electrical and Computer Engineering, University of Kashan, Kashan, Iran, ³Department of Electrical and Computer Engineering, Amirkabir University of Technology, Tehran, Iran

ABSTRACT

Text summarization plays an essential role in reducing time and cost in many domains such as medicine, engineering, etc. Summarization techniques introduced in recent years are generally greedy in choosing sentences. The present paper presents a method based on a generative adversarial network and attention mechanism called AM-GAN for extractive summarization. By identifying important sentences, the generator network produces summaries that are related to the main subject. Unlike recent works, in our method the generator selects sentences non-greedily. Another advantage of the proposed model is introducing a loss function for the discriminator that improves its performance. We use hand engineering and embedding features for summarization. In addition, due to the production of multiple summaries for each document, we employ the voting system to generate a single summary. Summaries produced by the generator show that our model performs better than other methods compared based on the ROUGE metric.

KEYWORDS

Text summarization, generative adversarial network, non-greedily, attention mechanism, extractive summarization.

Using Transformers for Political Text Mining

Tu My Doan, Nils Barlaug, Jon Atle Gulla, Department of Computer Science, Norwegian University of Science and Technology (NTNU), Trondheim, Norway

ABSTRACT

Political text mining is a new discipline that combines ideology classification and model explanation, and has not gotten much attention yet. In this paper we propose a framework for this novel task using a state-of-the-art natural language processing model and recent explainability techniques. We apply our framework to the ParlSpeech dataset, and show that popular explainability techniques tend to agree on which part of the text is the most influential. Furthermore, through experimentation, we highlight the advantages of generating sentence-level explanations instead of token-level for political text mining. We conclude that, even though our framework has potential, current natural language processing techniques can only solve part of the problem.

KEYWORDS

Political text, data mining, ParlSpeech, UK politics, transformers, LIME, SHAP, Integrated Gradient.

Mining evolving spatial co-location patterns from spatio-temporal databases

Yunqiang Ma¹, Junli Lu^2*1, Dazhi Yang², ¹Department of Biodiversity Conservation, Southwest Forestry University, Kunming, Yunnan, China, ²Department of Mathematics and Computer Science, Yunnan Minzu University, Kunming, Yunnan, China

ABSTRACT

Spatial co-location patterns (SCPs) represent the subsets of spatial features which are frequently located together in a geographic space. SCP mining has been a research hot in recent years. However, many application domains include location-based services, rare animals and plants protection, transportation and environmental monitoring which collecting their data periodically or continuously. SCPs will evolve with the change of databases. Therefore, evolving spatial co-location patterns (ESCs) widely exist in spatio-temporal databases and discovering the ESCs are very actionable, which can help us better understand the relationships between SCPs over time, find the variation trends of SCPs and dataset, and track the diversity of SCPs. This paper defines ESCs and proposes a two-step framework to discover ESCs. Extend-and-evaluate scheme is proposed to form ESCs by selecting appropriate evolvers from top-k spatial prevalent co-location patterns at each time slot. A top-k spatial prevalent co-location pattern at a time slot maybe gets longer, keep itself, get shorter, be split or get extinct at next time slot. When it is split, a lot of sub-colocations need to be searched for generating its pattern partitions. The operation is computational expensive. Two kinds of pattern division and storage are proposed to speed up the searching course. Furthermore, Thread-like speeder is used to improve the mining process. The experiments evaluate the effectiveness and efficiency of the proposed algorithms with “real + synthetic” datasets. Our important findings include identification of all the ESCs of mixed forests in Shilin nature preservation zone of Yunnan Province over ten years. For example, The ESC *A, B,D, E, G+ → {*A, B,D, E, G,H+} → {*B, E+,*A,H+,*D, G+} (A: Pinus Yunnanenisis, B: Eucalyptus, D: Budding Oak, E: Mount Hua Pine, G: Miscellaneous irrigation, H: Cypress Wood) results in identification of variation trends of SCPs and dataset (e.g., The prevalent SCP {A, B, D, E, G} gets longer and then is split), and tracking the diversity of SCPs on some features(The diversity enhances and then gets weak on these features).

KEYWORDS

Evolving spatial co-location patterns, Spatio-temporal databases, Top-k spatial prevalent co-location patterns, Extend-and-evaluate scheme.

Deep Learning for Medical Diagnosis of Lung Ct Images Caused by Sars-cov2 Diseases

Mehmet Akif Cifci¹, ¹Istanbul Aydin University, Computer Engineering, Florya, 34295, Istanbul, Turkey

ABSTRACT

SARS-COV-2, a severe acute respiratory syndrome, has caused around 4.5 million people worldwide to become infected. In this study, a dataset of CT chest images with common bacterial pneumonia, confirmed to have SARS-Cov2 disease, was used for the automatic detection of Coronavirus disease. The aim of this study is to evaluate the performance of state-of-the-art convolution neural network architectures that have benefited from the medical image classification. In particular, the transfer learning process, adopted for its unprecedented achievement in the detection of various abnormalities in small medical image datasets. For the testing probability of the disease, 5800 CT chest images were taken from various reliable sources. First, the GitHub Repository was analyzed for the related dataset. Secondly, the Kaggle website was scanned. From these sources, 5800 images with confirmed cases of Coronavirus were selected. The collected data includes 3654 images of confirmed Coronavirus cases and 2146 images of healthy conditions. 70% of CT chest images were used as the training step, while the remaining were benefitted for the testing step. The results suggest that Deep Learning (DL) with CT chest imaging may extract significant biomarkers related to the Coronavirus disease while obtaining the best accuracy, sensitivity, and specificity of 98.41%, 98.1%, and 98.02%, respectively. These results show the high value of using Deep learning for early diagnosis of Coronavirus. Deep learning was used as a beneficial tool for fast screening Coronavirus. Furthermore, it was benefitted to find potential high-risk patients.

KEYWORDS

Coronavirus, deep learning, machine learning, CT chest Images, transfer learning

Data Warehousing of Solid Minerals in Nigeria: A Study of South South Region, Nigeria

Usanga Udeme J¹, Essien Nse U², Umanah Ifiok S³, Akomaye Azauka G², ¹Entrepreneurship Development and Vocational Studies, ²Geology Department, University of Calabar, Calabar, Cross River State, Nigeria, ³Computer Science Technology Department, NRCRI- Federal College of Agriculture, Ishiagu, Ebonyi State, Nigeria

ABSTRACT

Data Warehouse have become an integral part of organization's decision making strategy due to competition and globalization. For any organization to gain competitive advantage and make better decisions, Data warehousing and Data mining are now playing significant roles nationally and globally. It helps in better decisions, streamline work-flows, and provide better customer services. This paper gives the report on developing data warehouse for solid minerals in South-South, using a model of management system to capture solid minerals in the South-South Region of Nigeria. It describes the process of data warehouse design and development using Microsoft SQL Server Analysis Services, outlines the development of a data cube and the application of Online Analytical Processing (OLAP) and Data mining tools. It was concluded that the effective use of Data-Warehousing and Data mining in solid minerals resource management will promote the rapid growth of the Nigerian economy and offer information on past, present and future prospects of mineral deposits identification, exploitation, exploration and utilization for economic growth.

KEYWORDS

Data warehousing, Solid minerals, South South Nigeria.

nlpclconference@yahoo.com

2nd International Conference on Natural Language Processing and Computational Linguistics (NLPCL 2021)

2^nd International Conference on Natural Language Processing and Computational Linguistics (NLPCL 2021)