MLTEC - Accepted Papers

Welcome to MLTEC 2020

International Conference on Machine Learning Techniques (MLTEC 2020)

November 21~22, 2020, Zurich, Switzerland

Network Defense in an End-to-end Paradigm

William R Simpson and Kevin E. Foltz, The Institute for Defense Analyses (IDA), Alexandria, Virginia, USA

ABSTRACT

Network Defense is built upon a clear concept of the fortress approach. Many of the requirements are based on inspection and reporting prior to delivery. These inspections require decryption of packets when they are encrypted. In an end-to-end paradigm, confidentiality is maintained through unbroken end-to-end encryption. How does one take advantage of the functionality of end-to-end security while maintaining network defense inspection and reporting requirements? The approach presented is based on a distributed computation of inspection and reporting. This paper examines a formulation that is pertinent to the Enterprise Level Security (ELS) model.

KEYWORDS

Appliance, end-to-end security model, ELS, network defenses, web server handlers

A Systematic Review on Performance and Usability Trends of Content Management Systems – Wordpress

Suzan Ejura Ojonuba, Department of Software Engineering, Atilim University, Incek, Ankara Turkey

ABSTRACT

Content management system (CMS) has indeed made a drastic change in the internet’s evolution today and with a significantly increased amount of developers, businesses, and individuals who now host their content using WordPress CMS. This paper intends to raise awareness for more researches on WordPress content management system’s performance and usability by investigating and analyzing research trends. To do this it addresses three research questions (RQs): (1)What are the most investigated researches on WordPress? (2) How many of this researches focused on performance and usability of WordPress? (3) What are the contribution of the papers on WordPress content management systems? In answering these RQs, a systematic literature review method was used, and a digital database search from three popular resource site between 2017 and 2019 was implemented and to keep efforts manageable from the pool of data, selection of representative literature related to WordPress was made, and 50 papers selected based on SLR analyses to identify the trend in WordPress research. To classify the research area, I separated the selected papers into four categories based on the research topics. The results showed that more numbers of researches were done in other areas compared to performance and usability.

KEYWORDS

WordPress, WordPress performance, WordPress usability, research trends, content management system

A systematic Literature review study on Understandability and Quality of UML Models

Sina Alizadeh Tabrizi^1,*, Nergiz Ercil Cagiltay² and Damla Topalli³, ¹MSc Student, Department of Software Engineering, Faculty of Engineering, Atilim University, 06830 Ankara, Turkey, ²Department of Software Engineering, Atilim University, 06830 Ankara, Turkey, ³Department of Information Systems Engineering, Atilim University, 06830 Ankara, Turkey

ABSTRACT

Nowadays, there is a great attempt in various industries to enhance the quality aspect of objects around us rather than quantity, and this is also the case for the final products yielded by Model-Driven Development approach. In the present study, a set of inter-related questions regarding the quality and understandability aspects of the UML representations was addressed by conducting a systematic literature review on journal papers published during 2000-2020. From two indexing data bases, namely, Science Direct and Google Scholar 38 and 26 journal and conference papers (totally 64 papers) were retrieved, respectively.The review results demonstrated that a distinction should be considered between UML diagrams quality and understandability, implying that numerous factors, including layouting, aesthetics, application domain, modelers’ background, developers’ experience level, adopted modelling guidelines and CASE tools are effective on the quality of UML representations.

KEYWORDS

UML Diagrams, Understandability, Quality, Modelling Notations, Systematic Literature Review

Unique Software Engineering Techniques: Panacea for Threat Complexities in Secure Multiparty Computation (MPC) with Big Data

Uchechukwu Emejeamara¹, Udochukwu Nwoduh² and Andrew Madu², ¹IEEE Computer Society, Connecticut Section, USA, ²Department of Computer Science, Federal Polytechnic Nekede, Nigeria

ABSTRACT

Most large corporations with big data have adopted more privacy measures in handling their sensitive/private data and as a result, employing the use of analytic tools to run across multiple sources has become ineffective. Joint computation across multiple parties is allowed through the use of secure multi-party computations (MPC). The practicality of MPC is impaired when dealing with large datasets as more of its algorithms are poorly scaled with data sizes. Despite its limitations, MPC continues to attract increasing attention from industry players who have viewed it as a better approach to exploiting big data. Secure MPC is however, faced with complexities that most times overwhelm its handlers, so the need for special software engineering techniques for resolving these threat complexities. This research presents cryptographic data security measures, garbed circuits protocol, optimizing circuits, and protocol execution techniques as some of the special techniques for resolving threat complexities associated with MPC’s. Honest majority, asymmetric trust, covert security, and trading off leakage are some of the experimental outcomes of implementing these special techniques. This paper also reveals that an essential approach in developing suitable mitigation strategies is having knowledge of the adversary type.

KEYWORDS

Cryptographic Data Security, Garbed Circuits, Optimizing Circuits, Protocol Execution, Honest Majority, Asymmetric Trust, Covert Security, Trading Off Leakage

Moderation Effect of Software Engineers’ Emotional Intelligence (EQ) Between their Work Ethics and their Work Performance

Shafia Khatun and Norsaremah Salleh, Department of Computer Science, Kulliyah of Information and Communication Technology (KICT), International Islamic University Malaysia (IIUM), Kuala Lumpur, Malaysia

ABSTRACT

In today’s world, software is being used in every sector, be it education, healthcare, security, transportation, finance and so on. As software engineers are affecting society greatly, if they do not behave ethically, it could cause widespread damage, such as the Facebook-Cambridge Analytica scandal in 2018. So, investigating the ethics of software engineers and the relationships it has with other variables is important for understanding what could be done to improve the situation. Software engineers work in rapidly-changing business environments which lead to a lot of stress. Their emotions are important for dealing with this, and can impact their ethical decision-making. In this quantitative study, the researcher aims to investigate whether Emotional Intelligence (EQ) moderates the relationship between work ethics of software engineers and their work performance using hierarchical multiple regression analysis in SPSS. The findings are expected to give valuable information for improving the ethical behaviour of software engineers.

KEYWORDS

Software Engineers, Emotional Intelligence, Work Ethics, Work Performance, Quantitative Study

Optical Character Recognition for Hindi using A Convolutional Approach

Sonal Sannigrahi, École Polytechnique, Palaiseau, France

ABSTRACT

Being one of the most widely spoken languages, an accurate system for text recognition and image translation for Hindi is needed. Due to difficult characteristic features of the script, OCR systems typically very accurate for Roman scripts fails to provide the same accuracy here. For any general OCR system, the major steps include preprocessing, character segmentation, feature extraction, and lastly, classification and recognition. In this paper, the approaches taken in the preprocessing stage include conversion of grey scaled images to binary images by normalisation and rounding, image rectification, and segmentation of the document itself into lines, words, and characters (basic symbols). To recognise these characters, a Neural Network based classifier is used which serves to be the main contribution. Lastly, this paper uses convolutional layers for feature extraction with three other feature extraction methods considered: histogram of projection based on mean distance and on pixel value, and vertical zero crossing.

KEYWORDS

Pattern Recognition, Language Modelling, Optical Character Recognition, Convolutional Neural Network, Segmentation

Adversarial Training for Few-shot Event Detection

Xiaoxiang Zhu, Mengshu Hou and Xiaoyang Zeng, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China

ABSTRACT

Most supervised systems of event detection (ED) task reply heavily on manual annotations and suffer from high-cost human effort when applied to new event types. To tackle this general problem, we turn our attention to few-shot learning (FSL). As a typical solution to FSL, cross-modal feature generation based frameworks achieve promising performance on images classification, which inspires us to advance this approach to ED task. In this work, we propose a model which extracts latent semantic features from event mentions, type structures and type names, then these three modalities are mapped into a shared low- dimension latent space by modality-specific aligned variational autoencoder enhanced by adversarial training. We evaluate the quality of our latent representations by training a CNN classifier to perform ED task. Experiments conducted on ACE2005 dataset demonstrate the effectiveness of our method.

KEYWORDS

Event Detection, Few-Shot Learning, Cross-modal generation, Variational autoencoder, GAN

Arabiclocation name annotationsand applications

Omar ASBAYOU, Department of LEA, Lumière University, CRTT, Lyon 2, France

ABSTRACT

LNE (Location named entities) extraction and annotation makes part of NER systems which aim at managing the great amont of data by classifying information for research porposes and text mining. This task is limited in arabic comparing it with many language like english and french. In this paper we try to explain our linguistic approach of LNE recognition and classification based on syntactico-semantic patterns. To reach good results we have taken into account, in our syntactico-semantic rule construction, morpho-syntactic information, syntactico-semantic classification of TW (trigger words) and extensions. Formally, different TW sens implies different syntactic structures (argument/attribute), different syntactic structures. We also show the semantic data that our system LNER can provide by storing LNE relation with different NE classes in an IE (information extraction) system, and the role LNE play in IR.

KEYWORDS

Location name annotations, Location named enities, Information retreival, Information extraction

Injecting Event Knowledge into Pre- Trained Language Models for Event Extraction

Zining Yang, Siyu Zhan, Xiaoyang Zeng and Mengshu Hou, School ofComputer Science and Engineering, University of Electronic Science & Technology of China, Chengdu, China

ABSTRACT

The recent Pre-trained language model has made great success in many NLP tasks. In this paper, we propose an event extraction system based on the novel pre-trained language model BERT to extract both event trigger and argument. As a deep-learning-based method, the size of the training dataset has a crucial impact on performance. To address the lacking training data problemfor event extraction, we further train the pre-trained language model with a carefully constructed in-domain corpus to inject event knowledge to our event extraction system with minimal efforts. Empirical evaluation on the ACE2005 dataset [1] shows that injecting event knowledge can significantly improve the performance of event extraction.

KEYWORDS

Natural Language Processing, Event Extraction, BERT, Lacking Training Data Problem.

Semantic Management of Enterprise Information Systems through Ontologies

Valentina Casola and Rosario Catelli, Department of Electrical Engineering and Information Technologies (DIETI), University of Naples Federico II, Naples, Italy

ABSTRACT

This article introduces a model for a cloud-aware enterprise governance with a focus on its semantic aspects. It considers the need for Business-IT/OT and Governance-Security alignments. The proposed model proposes the use of ontologies as specific tools to address the governance of each IT/OT environment in a holistic manner. The semantic support within the model suggests further possible applications in different company departments, with the aim of evaluating and managing projects in an optimal way, integrating several different but important point of views from the stakeholders.

KEYWORDS

Cloud, Enterprise, Governance, Information management, Ontology, Semantic systems.

Joint Entity and Relation Extraction for Information Redundancy Elimination

Yuanhao Shen and Jungang Han, School of computer science and Technology, Xi`an University of Posts and Telecommunications, Xi`an, China

ABSTRACT

To solve the problem of redundant information and overlapping relations in the entity and relation extraction model, we propose a joint extraction model. This model can directly extract multiple pairs of related entities without generating unrelated redundant information. We also propose a recurrent neural network named Encoder-LSTM that enhances the ability of recurrent units to model sentences. Specifically, the joint model includes three sub-modules: the Named Entity Recognition sub-module consisted of a pretrained language model and the LSTM decoder layer, the Entity Pair Extraction sub-module which uses Encoder-LSTM network to model the order relationship between related entity pairs, and the Relation Classification sub-module including Attention mechanism. We conducted experiments on the public datasets ADE and CoNLL04 to evaluate the effectiveness of our model. The results show that the proposed model achieves good performance in the task of entity and relation extraction and can greatly reduce the amount of redundant information.

KEYWORDS

Joint Model, Entity Pair Extraction, Named Entity Recognition, Relation Classification, Information Redundancy Elimination.

Paraphrase Quality Assessment: Intersecting Topic Similarity

Viacheslav Shalamov, Valeria Efimova, Kirill Vakhrushev and Andrey Filchenkov, Department of Information Technologies and Programming, ITMO University, St. Petersburg, Russia

ABSTRACT

Paraphrase generation is becoming increasingly popular for many languages. It is useful for creating various content for websites and social media. More importantly, it is also frequently used to improve the performance of machine learning models via data augmentation and enrichment. The existing metrics for natural language processing (NLP) problems are not entirely suitable for quality assessment of paraphrases generated for long texts. In this work, we propose a new metric for quality assessment of such paraphrases based on topic modelling, which we call Intersecting Topic Similarity (ITS). We collected the dataset of strong paraphrases, which is first of its kind for the Russian language, the dataset is openly available1. We conducted a comparison of the new metric with existing metric families BLEU, ROUGE, and METEOR metric on the collected dataset. The ITS metric showed better ability to distinguish paraphrases and is much more reliable in assessing paraphrase quality.

KEYWORDS

Natural Language Processing, Paraphrasing, Russian language, Text corpus, Quality Assessment.

Multi-layer Attention Approach for Aspect based Sentiment Analysis

XinzhiAi¹, Xiaoge Li^1*, Jiangyan Sun², ShutingZhi¹, Dayi Lin³, ¹School of Computing, Xi’an University of Posts and Telecommunications, Xi’an, China, 710121, ²Xi`an International University, Xi’an, China, 710077, ³School of Computing, Queen's UniversityKingston, ON K7L 3N6, Canada

ABSTRACT

Based on the aspect-level sentiment analysis is typical of fine-grained emotional classification that assigns sentiment polarity for each of the aspectsin a review. For better handle the emotion classification task,This paper put forward a newmodel which apply Long Short-Term Memory network combine multiple attention withaspectcontext.Where multipleattention mechanism (i.e., location attention, content attention and class attention) refers to takes the factors of context location, content semantics and class balancing into consideration.Therefore, the proposed model can adaptively integrate location and semantic information between the aspect targets and their contexts into sentimental features, and overcome the model data variance introduced by the imbalanced training dataset. In addition, the aspect context is encoded on both sides of the aspect target, so as to enhance the ability of the model to capture semantic information. The Multi-Attention mechanism (MATT) and Aspect Context (AC) allow our model to perform betterwhen facingreviewswith more complicated structures.The result of this experiment indicate that the accuracy of the new model is up to 80.6% and 75.1% for two datasets in SemEval-2014 Task4 respectively, While the accuracy of the data set on twitter 71.1%, and 81.6% for the Chinese automotive-domain dataset. Compared with some previous models for sentiment analysis, our model shows a higher accuracy.

KEYWORDS

Aspect-level sentiment analysis, Multiple attention mechanism, LSTM neural network.

Domain-transferable Method for Named Entity Recognition Task

Vladislav Mikhailov^1,2 and Tatiana Shavrina^1,2, ¹Sberbank, Moscow, Russia, ²Higher School of Economics, Moscow, Russia

ABSTRACT

Named Entity Recognition (NER) is a fundamental task in the fields of natural language processing and information extraction. NER has been widely used as a standalone tool or an essential component in a variety of applications such as question answering, dialogue assistants and knowledge graphs development. However, training reliable NER models requires a large amount of labelled data which is expensive to obtain, particularly in specialized domains. This paper describes a method to learn a domain-specific NER model for an arbitrary set of named entities when domain-specific supervision is not available. We assume that the supervision can be obtained with no human effort, and neural models can learn from each other. The code, data and models are publicly available.

KEYWORDS

Named Entity Recognition, BERT-based Models, Russian Language.

News Article Text Classification and Summary for Authors and Topics

Aviel J. Stein¹, Janith Weerasinghe², Spiros Mancoridis¹, Rachel Greenstadt², ¹College of Computing and Informatics, Drexel University, Philadelphia, Pennsylvania, USA, ²Tandon School of Engineering, New York University, New York, USA

ABSTRACT

News articles are important for providing timely, historic information. However, the Internet is replete with text that may contain irrelevant or unhelpful information, therefore means of processing it and distilling content is important and useful to human readers as well as information extracting tools. In this work we compare machine learning models for evaluating two common NLP tasks, topic and authorship attribution, on the 2017 Vox Media dataset.Additionally, we use the models to classify on extractivesummaries which are more apt for the task than the provided blurbs.Because of the large number of topics, take into account topic overlap and address it via top-n accuracy and hierarchical groupings of topics. We also consider edge cases in authorship by classifying on inter-topic and intra-topic author distributions. Our results show that both topics and authors readily identifiable and consistently perform best when using neural networks rather than support vector, random forests, or naive Bayes classifiers, although the latter methods perform acceptably.

KEYWORDS

Natural Language Processing, Topic Classification, Author Attribution, Summarization, Machine Learning.

Chinese Medical Question Answer Matching Based on Interactive Sentence Representation Learning

Xiongtao Cui and Jungang Han, College of Computer and Engineering, Xi’an University of Posts and Telecommunications, Xi’an, China

ABSTRACT

Chinese medical question-answer matching is more challenging than the open-domain question-answer matching in English. Even though the deep learning method has performed well in improving the performance of question-answer matching, these methods only focus on the semantic information inside sentences, while ignoring the semantic association between questions and answers, thus resulting in performance deficits. In this paper, we design a series of interactive sentence representation learning models to tackle this problem. To better adapt to Chinese medical question-answer matching and take the advantages of different neural network structures, we propose the Crossed BERT network to extract the deep semantic information inside the sentence and the semantic association between question and answer, and then combine with the multi-scale CNNs network or BiGRU network to take the advantage of different structure of neural networks to learn more semantic features into the sentence representation. The experiments on the cMedQA V2.0 and cMedQA V1.0 dataset show that our model significantly outperforms all the existing state-of-the-art models of Chinese medical question answer matching.

KEYWORDS

Question answer matching, Chinese medical field, interactive sentence representation, deep learning.

A Pattern-mining Driven Study on Differences of Newspapers in expressing Temporal Information

Yingxue Fu^{1, 2} and Elaine Uí Dhonnchadha², ¹School of Computer Science, University of St Andrews, Scotland, UK, ²Center for Language and Communication Sciences, Trinity College Dublin, Dublin 2, Ireland

ABSTRACT

This paper studies the dif erences between dif erent types of newspapers in expressing temporal information, which is a topic that has not received much attention. Techniques from the fields of temporal processing and pattern mining are employed to investigate this topic. First, a corpus annotated with temporal information is created by the author. Then, sequences of temporal information tags mixed with part-of-speech tags are extracted from the corpus. The TKS algorithm is used to mine skip-gram patterns from the sequences. With these patterns, the signatures of the four newspapers are obtained. The parameter setting of the pattern mining algorithm and the steps of obtaining the initial signatures and the revised signatures of the four newspapers are implemented based on previous findings[1]. It is shown that newspapers dif er in ways of expressing temporal information.

KEYWORDS

Pattern Mining, TKS algorithm, Temporal Annotation, Tabloids and Broad sheets.

Arabic speech classification for contextual voice pathology correction

Naim Terbeh, Mohsen Maraoui, Mounir Zrigui, Faculty of Sciences-University of Monastir-Tunisia

ABSTRACT

In this paper, we present a novel method based on the speech recognition and on the Support Vector Machine (SVM) to classify Arabic speech. The enchainment of this classification consists to recognize, in the first step, the produced speech by using a speech recognition system. The second step, is the adoption of the SVM technique to classify the recognized speech. For each recoded speech, we associate a topic. The last task consists to compare between the classification obtained by the proposed system and the etiquette associate to speech in input. The proposed method presents an efficient classifier of Arabic speech; indeed, we have obtained a classification rate of 89.27% that allow to extended system like voice pathological speech correction to benefit from our system to correct the pathological speech considering the context of conversation.

KEYWORDS

Speech classification, Arabic speech, SVM, speech recognition

A Topological Method for Comparing Document Semantics

Yuqi Kong¹, Fanchao Meng¹ and Ben Carterette², ¹Department of Computer & Information Sciences, University of Delaware, Newark, USA, ²Spotify, Greenwich Street, New York, USA

ABSTRACT

Comparing document semantics is one ofthe toughest tasks in bothNatural Language ProcessingandInformation Retrieval. To date, on one hand, the tools forthis task are still rare (N.B. topic extractionand sentiment analysis ones should not becounted); on the other hand, most relevantmethods are devised from the statistic orthevector space modelperspectives butnearly none from a topological perspective. In this paper, we hope to make a different sound. A novel algorithm basedontopological persistencefor comparingsemantics similarity between two documents is proposed. Our experiments areconducted on a document dataset with human judges’ results. A collection of state-of-the-art methods are selected for comparison. The experimental results show thatour algorithm can produce highly human-consistent results, and also beats moststate-of-the-art methods though ties with NLTK.

KEYWORDS

Topological Graph, Document Semantics Comparison, Natural Language Processing, Information Retrieval, Topological Persistence

Hybridization of Genetic Algorithms and Neural Networks by Data Mining

Djamila Benhaddouche, 1505 Oran El M'naouer, Algérie University of Science and Technology of Oran “Mohammed Boudiaf” USTO,PB

ABSTRACT

We propose, in our study, an approach of hybridization (genetic Algorithms/Réseaux of neurons) in order to apply it to solve our problems, namely “optimization of the budget estimates of the Aval activity of Sonatrach”. Indeed, within the downstream activity, the budget estimates of the expenditure are done every 5 years. Those must approach reality as much as possible in theory; therefore we took as bases training the table which contains the real expenditure (entries) 3 years previous. While studying the genetic algorithms and the networks of neurons separately, the idea came to us to use the genetic algorithms for the adjustment of the synaptic weights at the time of the phase of training of our network of neurons, and to use the capacity of prediction of the network of neurons to predict our budgetary expenditure. One starts from a population of more than 100 individuals (chromosomes), each one of them represents a configuration of weight, for the network of neurons, drawn by chance between [- 1,1]. For each individual, one will calculate the average error between the calculated exits and the exits wished for all the examples of our base of training. The function fitness is anything else only this average error. One selects then the individuals ready to reproduce who are those which have the weakest error. One uses the operators of crossing and change to generate a new generation. This process will be repeated to L `obtaining of an individual whose error will be minimal. The configuration of weight represented by this individual will be to use for our prediction.

KEYWORDS

Dated genetic mining, Algorithms, neural networks, Learning, optimization, prediction.

Database Management Systems- An efficient, effective and augmented approach for organizations

Anushka Sharma, Aman Karamchandani, Devam Dave, Arush Patel, Nishant Doshi, Computer Engineering, School of Technology, Pandit Deen Dayal Petroleum University, Gandhinagar, India

ABSTRACT

Big and small firms, companies, hospitals, schools, and other commercial offices are generating moderate or huge data on a regular basis and need to regularly update and manage this data. This data is not only used at that instance but generally, the post-analysis of data helps tremendously to improve the business strategies and the marketing trends. With time, this data may grow and become unmanageable if handled in a conventional manner, like the file-based system. This introduces the term database and database management system. Here, four types of DBMS approaches - Hierarchical, Network, Relational & Object-Oriented are discussed. A highlight of the new generation database approach called NoSQL is also included in this paper along with an insight to the Augmented Data Management. An example is based on the database design for the Study in India Program which is an integral part of Pandit Deendayal Petroleum University (PDPU). A Graphical User Interface for the same has been developed using Java Swing which makes the access to the database easier. A list of the present-day applications is also included.

KEYWORDS

Database Management System, Relational DBMS, Hierarchical DBMS, Network DBMS, Object Oriented DBMS, ER diagram, NoSQL, Graphical User Interface, Applications, Augmented Data Management, Database Software.

Inverse Space Filling Curve Partitioning Applied to Wide Area Graphs

Cyprien Gottstein¹, Philippe Raipin Parvedy¹, Michel Hurfin², Thomas Hassan¹ and Thierry Coupaye¹, ¹TGI-OLS-DIESE-LCP-DDSD, Orange Labs, Cesson-Sevigné, France, ²Univ Rennes, INRIA, CNRS, IRISA, 35000 RENNES, France

ABSTRACT

The most recent developments in graph partitioning research often consider scale-free graphs. Instead we focus on partitioning geometric graphs using a less usual strategy: Inverse Space-filling Partitioning (ISP). ISP relies on a space filling curve to partition a graph and was previously applied to graphs essentially generated from Meshes. We extend ISP to apply it to a new context where the targets are now Wide Area Graphs. We provide an extended comparison with two state-of-the-art graph partitioning streaming strategies, namely LDG and FENNEL. We also propose customized metrics to better understand and identify which use cases the ISP partitioning solution is best for. Experimentations show that in favourable contexts, edge-cuts can be drastically reduced, going from more 34% using FENNEL to less than 1% using ISP.

KEYWORDS

Graph, Partitioning, Graph Partitioning, Geometric partitioning, Spatial, Geography, Geometric, Space Filling Curve, SFC, ISP.

Multidimensional Data Structure for Bigdata

Aridj Mohamed¹ and Zegour Djamel Eddine², ¹Department of Computer Chlef University, ALGERIA New City, Cyprus, ²Ecole superieure en informatique Oued smar Algeria

ABSTRACT

This Multidimensional trie hashing (MTH) access method is an extension of the trie hashing for dynamic multi-key files (or databases). Its formulation consists in maintaining in main memory (d) separate tries, every one indexes an attribute. The data file represents an array of dimension (d), in an orderly, linear way on the disk. The correspondence between the physical addresses and indexes resulting of the application of the tries is achieved through the mapping function. In average, a record may be found in one disk access, which places the method among the most efficient known. Yet MTH has the double disadvantage of a low occupancy of file buckets (40-50%) and a greater memory space in relation to the file size (tries in memory). We propose a refinement of MTH on two levels. First, by using the compact representations of tries suggested in [13], then by applying the phenomenon of delayed splitting (partial expansion) as introduced in the first methods of dynamic hashing and as used in [15]. The analysis of performances of this new scheme, mainly by simulation, shows on the one hand a high load factor (70-80%) with an access time practically equal to one disk access and on the other hand an increase in the file size with a factor of two with the same space used by MTH.

KEYWORDS

Data structure, BigData, hashing, Multidimensional data, data storage.

An Automated Data-driven Prediction of Product Pricing based on Covid-19 Case Number using Data Mining and Machine Learning

¹Zhuoyang Han, ²Ang Li, ³Yu Sun, ¹University of California, Irvine, California, USA, ²California State University, Long Beach, USA, ³California State Polytechnic University, Pomona, USA

ABSTRACT

In early 2020, a global outbreak of Corona Disease Virus 2019 (Covid-19) emerged as an acute respiratory infectious Disease with high infectivity and incidence. China imposed a blockade on the worst affected city of Wuhan at the end of January 2020, and over time, covid19 spread rapidly around the world and was designated pandemic by the World Health Organization on March 11. As the epidemic spread, the number of confirmed cases and the number of deaths in countries around the world are changing day by day. Correspondingly, the price of face masks, as important epidemic prevention materials, is also changing with each passing day in international trade. In this project, we used machine learning to solve this problem. The project used python to find algorithms to fit daily confirmed cases in China, daily deaths, daily confirmed cases in the world, and daily deaths in the world, the recorded mask price was used to predict the effect of the number of cases on the mask price. Under such circumstances, the demand for face masks in the international trade market is enormous, and because the epidemic changes from day to day, the prices of face masks fluctuate from day to day and are very unstable. We would like to provide guidance to traders and the general public on the purchase of face masks by forecasting face mask prices.

KEYWORDS

Corona Virus, Machine Learning, Price Prediction, Linear Regression, Poly Regression, Data Cleaning.

Frustration Intensity Prediction in Customer Support Dialog Texts

Janis Zuters and Viktorija Leonova, Department of Computer Science, University of Latvia, Riga, Latvia

ABSTRACT

This paper examines the evolution of emotion intensity in dialogs occurring on Twitter between customer support representatives and clients (“users”). We focus on a single emotion type—frustration, modelling the user's level of frustration (on scale of 0 to 4) for each dialog turn and attempting to predict change of intensity from turn to turn, based on the text of turns from both dialog participants. As the modelling data, we used a subset of the Kaggle Customer Support on Twitter dataset annotated with per-turn frustration intensity ratings. For the modelling, we used a machine learning classifier for which dialog turns were represented by specifically selected bags of words. Since in our experimental setup the prediction classes (i.e., ratings) are not independent, to assess the classification quality, we examined different levels of accuracy imprecision tolerance. We showed that for frustration intensity prediction of actual dialog turns we can achieve a level of accuracy significantly higher than a statistical baseline. However we found that, as the intensity of user’s frustration tends to be stable across turns of the dialog, customer support turns have only a very limited immediate effect on the customer's level of frustration, so using the additional information from customer support turns doesn't help to predict future frustration level.

KEYWORDS

Neural Networks, Emotion Annotation, Emotion Recognition, Emotion Intensity, Frustration.

A More Abstractive Text Summarization Model

Sayak Chakraborty¹, Xinya Li² and Satyaki Chakraborty², ¹Calcutta Institute of Engineering & Management, Department of Computer Science & Engineering, Kolkata, WB, India, ²Carnegie Mellon University, School of Computer Science, Pittsburgh, PA, USA

ABSTRACT

Pointer-generator network is an extremely popular method of text summarization. Recent contributions in this domain still build on top of the baseline pointer generator by augmenting a content selection phase, or by decomposing the decoder into a contextual network and a language model. However, all such models that are based on the pointer-generator base architecture cannot generate novel words in the summary and mostly copy words from the source text. In our work, we first thoroughly investigate why the pointer-generator network is unable to generate novel words, and then address that by adding an Out-of-vocabulary (OOV) penalty loss function. This enables us to improve the amount of novelty/abstraction significantly. We use normalized n-gram novelty scores as a metric for determining the level of abstraction. Moreover, we also report rouge scores of our model since most existing summarization models are evaluated with R-1, R-2, R-L scores.

KEYWORDS

Natural Language Processing, Text Summarization, Pointer Generator Networks, Auxiliary Losses.

Resolving Code Smells in Software Product Line using Refactoring and Reverse Engineering

Sami Ouali, College of Applied Sciences Ibri, Oman

ABSTRACT

Software Product Lines (SPL) are recognized as a successful approach to reuse in software development. Its purpose is to reduce production costs. This approach allows products to be different with respect of particular characteristics and constraints in order to cover different markets. Software Product Line engineering is the production process in product lines. It exploits the commonalities between software products, but also to preserve the ability to vary the functionality between these products. Sometimes, an inappropriate implementation of SPL during this process can conduct to code smells or code anomalies. Code smells are considered as problems in source code which can have an impact on the quality of the derived products of an SPL. The same problem can be present in many derived products from an SPL due to reuse. A possible solution to this problem can be the refactoring which can improve the internal structure of source code without altering external behavior. This paper proposes an approach for building SPL from source code. Its purpose is to reduce code smells in the obtained SPL using refactoring source code. Another part of the approach consists on obtained SPL’s design based on reverse engineering.

KEYWORDS

Software Product Line, Code smells, Refactoring, Reverse Engineering.

Indian commodity market price comparative study of forecasting methods - A case study on onion, potato and tomato

Suresha HP¹, Mutturaj IU² and Krishna Kumar Tiwari³, ¹REVA Academy for Corporate Excellence, ²REVA Academy for Corporate Excellence, ³MLAI Community, India

ABSTRACT

Forecasting agricultural commodity futures prices is a crucial subject in agriculturaldomain, not only in providing price information of agricultural commodities in advance which decision makers believe, but also reducing the uncertainty and risks of agricultural markets. (Madaan et al., 2019) Fluctuations in commodity prices for onion, tomato and potato can cause distress among both consumers and producers, and are often exacerbated by trading networks especially in developing economies where marketplaces might not be operating under conditions of perfect competition for various contextual reasons. (Subhasree et al., 2016) India is mainly an agricultural country. The farmer is an important part of agriculture. Agriculture mainly depends on him. Even then the farmers cannot predict prices for their commodities because prediction of prices plays a major challenge. Several characteristics are taken into account so that the crop price forecast is accurate. Forecasting price of agriculture commodities based onVolume , diesel price helps the agriculturist and also the agriculture mandi’s in India. (Varun et al., 2019) We look at onion, tomato and potato trading in India and present the evaluation of a price forecasting model, and an anomaly detection and compared different Supervised, Unsupervised and Forecasting prediction models. Our dataset consists of time series of wholesale prices, retail prices, arrival volumes and Diesel prices of the agricultural commodities at several mandi’s in India. (Shome et al., 2018) We also provide an in-depth forecasting analysis of the effect on these retail prices. Our results are encouraging and point towards the likelihood of building pricing models for agricultural commodities and to detect anomalies. These data can then be stored and analysed. the power to use historical data regarding agricultural commodities like onion, tomato and potato for the forecasting of retail prices are demonstrated. (Onion, 2016) We propose a comparative study of various forecasting strategies which will be wont to this aim. The empirical comparison of the chosen methods on the various data showed that some methods are more suitable than others for this type of problem. especially, we show that strategies supported in Machine Learning approaches seem to be more suitable for this task. (Balaji Prabhu et al., 2018) We did a comparative study of Auto ARIMA (Autoregressive Integrated Moving Average), RNN (Recurrent Neural Network), LSTM, VAR (vector autoregressive model), Random forest Regression,XGBoost. (Madaan et al., 2019)

KEYWORDS

Anomaly, commodity, Forecasting, Machine learning, Timeseries, Auto ARIMA, RNN, LSTM, VAR, Random forest Regression and XGBoost Regression.

Multi-format Document Verification System

Madura Rajapakshe, Muammar Arif, Dasith Gunaratne, Ashen Shaluka, Kavinga Yapa Abeywardana, Sri Lanka Institute of Information Technology, Malabe, Sri Lanka

ABSTRACT

The spread of fake documents claiming to be from official sources on social media has led to increase levels of skepticism and uncertainty in modern society. Currently, there is no easy access method of verification for documents that can be adopted by the public. This paper proposes a method of a multi-format document verification scheme using digital signatures and blockchain. We employ digital signature algorithms to sign document contents extracted using Optical Character Recognition (OCR) methods and attach this signature to the document by converting it into a 2D barcode format. On a shared document, this code can then be used to retrieve the document’s digital signature and OCR can be used to verify the signature. In addition to this, we also provide an alternative method of verification in the form of forgery detection techniques. These signed documents can be stored in a decentralized storage solution backed by blockchain technology increasing the overall reliability and security of the solution.

KEYWORDS

Digital Signatures, Content Extraction, Image Processing, Blockchain, Decentralized Storage, Forgery detection, 2d barcodes.

Arcsecure: Centralized Hub for Securing a Network of IoT Devices

Kavinga Yapa Abeywardena, A.M Isuru Srimal Abeykoon, A.M.S.P.B Atapattu, H.N Jayawardhane, C.N Samarasekara, Information System Engineering, Sri Lanka Institute of Information Technology Malabe, Sri Lanka

ABSTRACT

With respect to current trends in information technology, Internet of Things (IoT)is playing a prominent role in technological advancements which has happened in the last few years. In the current context, the major issue that users face is the threat to their information stored in these devices. Modern day attackers are aware of vulnerabilities in the current IoT environment. Therefore, securing information from being gone into the hands of unauthorized parties is of highest priority for IoT users. With the need of securing the information came the need of protecting the devices which the data is being stored. Small Office/Home Office (SOHO) environments working with IoT devices are particularly in need of such mechanisms to protect the data and information that they hold in order to sustain their operations. Hence, to come up with a well-rounded security mechanism from every possible aspect, this research proposes a plug and play device called “ARCSECURE”.

KEYWORDS

Internet of Things, Information Security, Machine Learning, DoS, DDoS, Botnet, Authentication, Authorization, Detection, Mitigation, Malware.

Energy Aware Routing with Computational Offloading for Wireless Sensor Networks

Adam Barker and Martin Swany, Department of Intelligent Systems Engineering, Indiana University, Bloomington, Indiana, USA

ABSTRACT

Wireless sensor networks (WSN) are characterized by a network of small, battery powered devices, operating remotely with no pre-existing infrastructure. The unique structure of WSN allow for novel approaches to data reduction and energy preservation. This paper presents a concept to assist in management of these factors by re-examining existing routing concepts and improving upon them for the unique use case of peer-to-peer mesh WSN. This paper presents a modification to an existing routing protocol which provides an alternate action of performing sensor data reduction in place. When network congestion increases, data reduction becomes more favorable thus reducing bandwidth used while still optimizing the throughput time. The algorithm is further modified to include an energy factor which increases the cost of forwarding as energy reserves deplete. Experimental results show that this approach can, in periods of high network traffic, both reduce bandwidth used and maintain low data transition times.

KEYWORDS

Q-routing, Wireless Sensor Network, Computational Offloading, Energy Aware.

AI-based Prediction for Early Detection of Tuberculosis in India Based on Environmental Factors

Dr. Mrs. Nupur Giri, Mr. Richard Joseph, Ms. Sanika Chavan, Mr. Raghav Heda, Ms. Reema Israni, Ms. Ritika Sethiya, Dept. Of Computer Technology, V.E.S.I.T, Mumbai, India

ABSTRACT

Machine Learning and Deep Learning can play an essential role in determining the spread of diseases. The proposed system aims at predicting the spread of Tuberculosis by understanding the impact of various climatic and pollution parameters on the disease. The proposed solution takes into consideration the information related to Tuberculosis in different districts of India; and the climatic and pollution parameters for those regions. This information is then used to understand the sustainability conditions of Tuberculosis and correlation of different environmental factors with a number of cases of Tuberculosis. This can then help in the prediction of the spread of disease. The system will also provide visualizations depicting the spread pattern of Tuberculosis, of the different regions affected in the past and the regions which may get affected in the near future.

KEYWORDS

Air quality, Climate change, Geospatial Visualisations, Supervised models, TB incidence.

Deep Learning Roles Based Approach to Link Prediction in Networks

Aman Gupta and Yadul Raghav, Department of Computer Engineering and Engineering, Indian Institute of Technology (BHU), Varanasi, India

ABSTRACT

The problem of predicting links has gained a lot of attention in recent year due to its vast application in various domain such as sociology, network analysis, information science, etc. Many methods have been proposed for link prediction such as RA, AA, CCLP, and so on. All these methods required hand-crafted structural features to calculate the similarity scores between a pair of nodes in a network. Some methods use local structural information while others use global information of a graph. These methods don’t tell which properties are better than others. With deep analysis of these methods, we understand that one way to overcome this problem is to consider both network structure and node attribute information to capture the discriminative features for link prediction task. We proposed a deep learning architecture (autoencoder) for the latent representation of a graph, unified with non-negative matrix factorization to automatically determines the underlying roles in a network, then assigning a mixed-membership of these roles to each node in the network. The idea is to transfer these roles as a feature vector for the link prediction task in the network. Further, cosine similarity is applied after getting the required features to compute pairwise similarity score between the nodes. We present the performance of the algorithm on the real-world datasets, where it gives the competitive result compared to other algorithms.

KEYWORDS

Link Prediction, Deep Learning, Autoencoder, Latent Representation, Non-Negative Matrix Factorization.

Time Series Classification with Meta Learning

Aman Gupta and Yadul Raghav, Department of Computer Engineering and Engineering, Indian Institute of Technology (BHU), Varanasi, India 221-005

ABSTRACT

Meta-Learning, the ability of learning to learn, helps to train a model to learn very quickly on a variety of learning tasks, adapting to any new environment with minimal number of examples, allow us to speed up the performance and training of the model. It solves the problem of the traditional machine learning paradigm, where it needed a vast dataset to learn any task to train the model from scratch. Much work has already been done on meta-learning in a variety of learning environments, including reinforcement learning, regression task, classiﬁcation task with image and other datasets, but it is yet to be explored with time-series domain. In this work, we aimed to understand the effectiveness of meta-learning algorithms in time series classification task with multivariate time-series datasets. We present the performance of the algorithm on the UCR time series archive, where the result show that using meta-learning algorithms leads to faster convergence with fewer iteration over non meta-learning equivalent.

KEYWORDS

Time Series, Classification, Meta Learning, Few Shot Learning, Convolutional Neural Network.

Data Driven Soft Sensor for Condition Monitoring of Sample Handling System (SHS)

Abhilash Pani, Jinendra Gugaliya and Mekapati Srinivas, Industrial Automation Technology Centre, ABB, Bangalore, India

ABSTRACT

Gas sample is conditioned using sample handling system (SHS) to remove particulate matter and moisture content before sending it through Continuous Emission Monitoring (CEM) devices. The performance of SHS plays a crucial role in reliable operation of CEMs and therefore, sensor-based condition monitoring systems (CMSs) have been developed for SHSs. As sensor failures impact performance of CMSs, a data driven soft-sensor approach is proposed to improve robustness of CMSs in presence of single sensor failure. The proposed approach uses data of available sensors to estimate true value of a faulty sensor which can be further utilized by CMSs. The proposed approach compares multiple methods and uses support vector regression for development of soft sensors. The paper also considers practical challenges in building those models. Further, the proposed approach is tested on industrial data and the results show that the soft sensor values are in close match with the actual ones.

KEYWORDS

Sample Handling System, Soft-Sensor, Variance Inflation Factor (VIF), Local Outlier Factor (LOF), Support Vector Regression.

Multi-hybrid method for music genre classification via textural, acoustic and topological measures of complex networks

Andrés Eduardo Coca Salazar, Federal Technological University of Paraná (UTFPR), Brazil

ABSTRACT

One of the main collective characteristics that identify a musical work is the genre, so this attribute is the most used to organize musical databases. The musical genre is determined by the author at the time of composition; however, this label is not always available, in addition, its identification is not a simple and direct task. In this way, different researches have approached this problem from different perspectives. In this paper we propose a method for musical genre classification by using a multi-hybrid feature strategy, GLCM networks and two levels of hierarchical mining (macro and micro-mining). The multi-hybrid features are formed by acoustic characteristics (MFCC, NPCC and log energy) calculated from the melspectrogram, texture features of its visual representation (mel-spectrogram) and topological measurements of complex networks. The spectrogram was segmented in superpixels and the image texture was represented with the descriptor GLCM. With the two previous representations three types of complex networks were generated: 1) GLCM network Gg, 2) Superpixels network Gs, and 3) GLCM network of each node of Gs (denoted as G si g network). The measures of the GLCM descriptor were adapted to be calculated directly from networks Gg and G si g , in addition, for these networks conventional topological measures were also calculated. In the classification step, the Bagging ensemble approach using the algorithm Random Forests was used. Several experiments were performed using combination of features and macromining (global features of Gg and Gs) and micro-mining (local features of G si g ). The accuracy obtained using macro and micro-mining together was > 90%, which reveals an excellent performance. The results obtained in all experiments indicate that the proposed method has a satisfactory performance.

KEYWORDS

Pattern Recognition, Complex Network Applications, Image and Signal Processing.

Face Recognition using PCA Integrated with Delaunay Triangulation

Kavan Adeshara and Vinayak Elangovan, Division of Science and Engineering, Penn State Abington, PA, USA

ABSTRACT

Face Recognition is most used for biometric user authentication that identifies a user based on his or her facial features. The system is in high demand, as it is used by many businesses and employed in many devices such as smartphones and surveillance cameras. However, one frequent problem that is still observed in this user-verification method is its accuracy rate. Numerous approaches and algorithms have been experimented to improve the stated flaw of the system. This research develops one such algorithm that utilizes a combination of two different approaches. Using the concepts from Linear Algebra and computational geometry, the research examines the integration of Principal Component Analysis with Delaunay Triangulation; the method triangulates a set of face landmark points and obtains eigenfaces of the provided images. It compares the algorithm with traditional PCA and discusses the inclusion of different face landmark points to deliver an effective recognition rate.

KEYWORDS

Delaunay Triangulation, PCA, Face Recognition.

Obstacle Avoidance and Path Finding for Mobile Robot Navigation

Poojith Kotikalapudi and Vinayak Elangovan, Division of Science and Engineering, Penn State Abington, PA, USA

ABSTRACT

This paper investigates different methods to detect obstacles ahead of a robot using a camera in the robot,an aerial camera, and an ultrasound sensor. We also explored various efficient path finding methods for the robot to navigate to the target source. Single and multi-iteration angle-based navigation algorithms were developed. The theta-based path finding algorithms were compared with the Dijkstra’s Algorithm and their performance were analyzed.

KEYWORDS

Image Processing, Path Finding, Obstacle Avoidance, Machine Learning, Robot Navigation.

Generating Adversarial Example based on Subarea Noise Texture for Efficient Black-box Attacks

Zhijian Chen and Jing Liu, National Pilot School of Software, Yunnan University, Yunnan, China

ABSTRACT

Nowadays, machine learning algorithms play a vital role in the field of artificial intelligence. However, it has been proved that deep convolutional networks (DCNs) are vulnerable to interference from adversarial examples. In this paper, we used continuous noise to simulate natural texture for generating adversarial examples, which are used to carry out different forms of attacks against mainstream target detection tasks. Experimental results show that this method can achieve a high rate of up to 90% of deception on the current prevalent target detection task (Yolov3/Inceptionv3). It is confirmed that the DCNs trained on the ImageNet dataset rely too much on the feature aggregation of low-level regions with low robustness in the classification task. It is enlightened that we need to consider not only the pursuit of accuracy but also the nature of model feature learning while using DCNs.

KEYWORDS

Adversarial examples, Computer Vision noise, black-box attacks, deep neural networks, Subarea noise texture.

Phone Clustering Methods for Multilingual Language Identification

Ronny Mabokela, Technopreneurship Centre, School of Consumer Intelligence and Information Systems, Department of Applied Information Systems, University of Johannesburg, Johannesburg, South Africa

ABSTRACT

This paper proposes phoneme clustering methods for multilingual language identification on the mixed-language corpus. A one-pass multilingual ASR converts the spoken utterances into occurrences of phone sequences. We employ hidden Markov models to train multilingual acoustic models that handle multiple languages within an utterance. We explore two phoneme clustering methods to derive the most appropriate phoneme similarities among the target languages. We ultimately employ a supervised machine learning technique to learn the language transition of the phonotactic information and engage the support vector machines (SVM) models to classify phoneme occurrences. The system performance was evaluated on mixed-language speech corpus for two South African languages (i.e. Sepedi and English). We evaluated the system performance using the phone error rate (PER) and LID classification accuracy separately. We also show that multilingual ASR which fed directly to LID system has a direct impact to the LID accuracy. Our proposed system has achieved an acceptable phone recognition and classification accuracy on the mixed-language speech and monolingual speech (i.e. either Sepedi or English). Lastly, data-driven and knowledge-driven phoneme clustering methods improve ASR and LID for code-switched speech.

KEYWORDS

Code-switching, Phone Clustering, Multilingual Speech Recognition, Mixed-Language, Language Identification.

Multi-Core Aware Virtual Machine Placement for Cloud Data Cenetrs with Constraint Programming

Nagadevi and Kasmir Raja, SRM Institute of Science and Technology, Chennai, TamilNadu, INDIA

ABSTRACT

Creation of a Virtual Machine (VM) in a suitable Physical Machine (PM) is the critical requirement of a Cloud Service Provider in cloud data center. Mapping the VM to an appropriate PM is called VM Placement . The VMP decision is required at different stages of a cloud data center. Finding the best PM to place/create a VM is popularly known as Virtual Machine Placement Problem (VMPP) [5]. Efficient VMP techniques improve the resource utilization, power consumption and number of VM migrations [6] in a cloud data center. Due to the significant and intrinsic difficulty, many models and algorithms have been proposed for VMPP. VMP algorithm is executed by considering the resources like CPU, RAM, Disk etc. of VMs and PMs. Many researchers have designed VMP algorithms [8-11] by considering the CPU capacity of a PM without considering the number of physical cores (pCPU) available in a PM and physical core capacity of a PM. i.e. VMs are mapped onto PMs, if and only if the sum of their CPU capacity does not exceed the CPU capacity of a PM. Such allocation results in pCPU overload which leads to performance degradation and violation of Service Level Agreement. However, in the real scenario, PM and VM consist of multiple physical cores. So, to place VM on PM, the CPU capacity of VM must be mapped to pCPUcapacity of a PM. i.e. the virtual core (vCPU) of a VM should be mapped to a pCPU of a PM. So, VM to PM mapping must be done based on core to core mapping and not on machine to machine mapping. Also, VMP using the total computational capacity of the PM leads to increase the number of core overloads and thus the host overload. More number of host overload leads to more number of VM migration. So, we have designed a VMP algorithm which will eradicate all the above mentioned problems of core overload, less resource utilization, more number of host overloads and more number of VM migrations. In our proposed work, we have considered a multi-core aware VMP by considering the core capacity of a VM and PM.In multi-core aware VMP, the CPU capacity of VM is checked against the pCPU capacity of PM, so that, the pCPU of a PM not get overloaded. So, that the overall performance is improved when compared to non-core aware virtual machine placement algorithms.

KEYWORDS

Multi-core, Virtual Machine Placement, Data Center, Constraint Programming.

Welcome to MLTEC 2020

International Conference on Machine Learning Techniques (MLTEC 2020)

Contact Us

mltecconf@yahoo.com

Copyright © MLTEC 2020