Accepted Papers

6^th International Conference on Data Mining and Applications (DMA 2020)

January 25 ~ 26, 2020, Zurich, Switzerland

Accepted Papers

Buliding The First Arabic Dataset for Sentiment Analysis in Syrian Dialect Out of Facebook platform

¹Nasser Nasser and ²Ali Arous, ¹Department of Software Engineering, Tishreen University, Latakia, Syria ²Department of Software Engineering, Tishreen University, Latakia, Syria

ABSTRACT

Despite not being as competitive as its English counterpart, Sentiment Analysis in Arabic has witnessed a surge of progress in the past few years. However, most of the resources in this area are still either limited in size, domain specific or not publicly available. In this paper, we address the sparsity problem of the available resources for different dialects of Arabic by generating a multi-domain dataset for Sentiment Analysis dedicated for the Syrian Levantine Dialect. The dataset was gathered from Facebook public content, and consists of 10,000 annotated comments collected at posts of different domains, including Education, Sport, Services, Technology and Culture. We have carried out a set of experiments to validate the usefulness of our dataset, in addition to doing feature engineering for the top classifiers. From the experimental results, we highlight useful insights addressing the best performing classifiers and most viable features.

KEYWORDS

Arabic Text Mining, Sentiment Analysis, Opinion Mining, Modern Standard Arabic, Dialectical Arabic

Accounting narrative obfuscation in financial statements

Jörg Hering, Jens Hölscher and Phyllis Alexander, Department of Accounting, Finance and Economics, Bournemouth University, 89 Holdenhurst Road, Bournemouth BH8 8EB, United Kingdom

ABSTRACT

The study examines the presence and success of accounting narrative obfuscation in financial statements filed with the United States Securities and Exchange Commission (SEC). Based on more than 50,000 "Footnotes" sections in annual reports on Form 10-K submitted between 1993 and 2016, the study finds that company officials are not able "bury" negative corporate information in financial statements. Using textual sentiment analysis, the study provides evi- dence that capital market participants are well aware of the information content disclosed in the "Footnotes" sections of annual reports. Measuring "Key Word Density" (disclosure tone) in the notes to the financial statements ("Item 8"), the study reveals that investors react to changes in textual characteristics and adjust their market expectations accordingly. In addition, it is shown that investors react to changes in this subsection of the annual report much stronger and in a timelier fashion than to changes in the entire Form 10-K filing. Furthermore, the results indicate that company officials report truthful information in the "Footnotes" sections of an- nual reports representing accurate corporate disclosures.

Fase-Al — Adaptation of Fast Adaptive Stacking of Ensembles for Supporting Active Learning

Agustín Alejandro Ortiz-Díaz¹, Fabiano Baldo¹, Laura María Palomino Mariño² and Alberto Verdecia Cabrera³, ¹Santa Catarina State University, Joinville, Santa Catarina, Brazil, ²Pernambuco Federal University, Recife, Pernambuco, Brazil and ³Granma University, Manzanillo, Granma, Cuba

ABSTRACT

Classification algorithms to mine data stream have been extensively studied in recent years. However, a lot of these algorithms are designed for supervised learning which requires labeled instances. Nevertheless, the labeling of the data is costly and time-consuming. Because of this, alternative learning paradigms have been proposed to reduce the cost of the labeling process without significant loss of model performance. Active learning is one of these paradigms, whose main objective is to build classification models that request the lowest possible number of labeled examples achieving adequate levels of accuracy. Therefore, this work presents the FASE-AL algorithm which induces classification models with non-labeled instances using Active Learning. FASE-AL is based on the algorithm Fast Adaptive Stacking of Ensembles (FASE). FASE is an ensemble algorithm that detects and adapts the model when the input data stream has concept drift. FASE-AL was compared with four different strategies of active learning found in the literature. Real and synthetic databases were used in the experiments. The algorithm achieves promising results in terms of the percentage of correctly classified instances.

KEYWORDS

Ensemble, active learning, data stream and concept drift

Sentiment Classification for Under-Resourced Language Using Word2Vec Neural Network: Amharic Language Social Media Text

¹Zewdie Mossie and ²Jenq-Haur Wang, ¹Department of Computer Science and Information Engineering, National Taipei University of Technology, Taiwan and ²Department of International Graduate Program in Electrical Engineering and Computer Science, National Taipei University of Technology, Taipei, Taiwan

ABSTRACT

Sentiment classification becomes popular task in social network texts which express opinions on different issue to analyze and produce useful knowledge. However, many linguistic computational resources are available only for English language. In the recent years, due to the emergence of social media platforms, opinion-rich resources are booming abundant for under-resourced languages with the need to perform Sentiment Analysis. On the other hand,most of the existing researches focus on how to extract the effective features, such as lexical and syntactic features,while limited work has been done on semantic features, which can make more contributions to both under-resourced and resourceful languages. In this paper, we proposed sentiment classification based on Word2Vec for Amharic Language text on political domain. The Word2Vec establishes the neural network models to learn the vector representations of words to extract the deep semantic relationships. Firstly, we cluster the similar features together and apply language modeling Ngram to check sentiment-bearing Co-occurring Terms (COT). Word2Vec and TF-IDF were used to learn the word representations as a candidate feature vector. Secondly, The Gradient-Boosting Tree(GBT) and Random forest machine learning classifiers were used to train and test in the Apache Spark platform. In our experiments, we use the Amharic language in Ethiopia and adopt a standard natural language pre-processing techniques on the crawled Facebook datasets to categorize into positive and negative opinions. Experimental results of feature extraction using Word2Vec technique performs better in the GBT classifier achieving an average accuracy of 82.29%. Therefore, our proposed approach can successfully discriminate among posts and comments expressing positive and negative opinions.

KEYWORDS

Amharic text Sentiment, Word2Vec Semantic, Social Media, Under-resourced Language

Ontological Approach for Knowledge Extraction from Clinical Documents

Raxit Goswami and Vatsal Shah Research Department, ezDI Inc, Kentucky, USA

ABSTRACT

In clinical NLP(Natural Language Processing), Knowledge extraction is a very important task to develop a highly accurate information retrieval system. The various approaches used to develop such systems include rule-based approach, statistical approach, shortest path algorithm or hybrid of these approaches. Accuracy and coverage are the most important parameters while comparing different approaches. Some methodologies have good accuracy but low coverage and vice-versa. In this paper, our focus is to extract domain relationships, for example to extract the relationship between ‘Disease’ and ‘Procedure’ or ‘Symptom’ and ‘Disease’ etc. from the clinical documents using three different approaches. These three approaches are i) Statistical ii) Shortest Path iii) Shortest Path Using Body System. All three approaches use our existing NLP system to extract entities from the unstructured documents. The Statistical approach applies probabilistic algorithm on clinical documents whereas the Shortest Path algorithm uses the Ontological knowledge base for the hierarchical relationship between entities. This Ontological knowledge base is built upon the curated Unified Medical Language System (UMLS). For the Shortest Path Using Body System approach, we have used the domain relationship as well as hierarchical relationship. The output of these approaches is further validated by a domain expert and this validated relationship is used to enrich our ontological knowledge base. We have presented the details of these approaches one-by-one along with the comparative results of these approaches. We finally go through the analysis of the result and conclude on further work.

KEYWORDS

Knowledge Extraction ,Clinical information retrieval,Relationship Extraction, Clinical Document, Medical knowledge base,Ontology ,Clinical NLP (Natural Language Processing)

Transfer Learning for Recognition of Surgical Workflow

Baolian Qi^1,3, Kunhua Zhong^1,2,3 and Yuwen Chen^1,2,3, ¹Chengdu Computing Institute of the Chinese Academy of Sciences, Chengdu, China, ²Chongqing Institute of Green and Intelligent, Chongqing, China and ³University of Chinese Academy of Sciences, Beijing, China

ABSTRACT

Computer-assisted surgery has occupied an important position in modern surgery, further stimulating the progress of methodology and technology. In recent years, a large number of computer vision-based methods have been widely used in surgical workflow recognition tasks. For training this method, a lot of annotated data are necessary. However, the annotation of surgical data requires expert knowledge and thus becomes difficult and time-consuming. In this paper, we focus on the problem of data deficiency and propose a knowledge transfer learning method to compensate a small amount of labeled training data. To solve this problem, we propose an unsupervised method for pre-training a Convolutional De-Convolutional (CDC) network for sequencing surgical workflow frames, which performs convolution in space (for semantic abstraction) and de-convolution in time (for frame level resolution) simultaneously. Specifically, through transfer learning, we only fine-tuned the Convolutional De-Convolutional network to classify the surgical phase. We performed some experiments for validating the model, and it showed that the proposed model can effectively extract the surgical feature and determine the surgical phase. The accuracy, recall, precision of our model can reach 91.4%,78.9%,82.5% separately.

KEYWORDS

Convolutional De-Convolutional(CDC), transfer learning, surgical phase

Deep Image Compositing

Shivangi Aneja and Soham Mazumder, Technical University of Munich, Germany

ABSTRACT

In image editing, the most common task is pasting object from one image to the other and then eventually adjusting the manifestation of the foreground object with the background object. This task is called image compositing. But image compositing is a challenging problem which requires professional editing skills and considerable amount of time. Not only these professionals are expensive to hire, but the tools used for doing such tasks(like Adobe Photoshop [1]) are also expensive to purchase making the overall task of image compositing difficult for people without this skillset. In this work we aim to cater to this problem by making composite images look realistic. To achieve this, we are using GANS [3]. By training the network with diverse range of filters applied to the images and special loss functions, the model is able to decode the color histogram of foreground and background part of the image and also learns to blend the foreground object with the background. The hue and saturation values of the image plays an important role as discussed in this paper. To the best of our knowledge, this is the first work that uses GANs for that task of image compositing. Currently, there is no benchmark dataset available for image compositing. So we created the dataset and will also make the dataset publicly available for benchmarking. Experimental results on this dataset show that our method outperforms all current state-of-the-art methods.

A Comprehensive Survey on TCP Congestion and INCAST Solutions in Data Center Networks

Houda Amari¹, Lyes Khoukhi² and Lamia Hadrich Belguith³, ¹Department of Computing Science, University of Sfax, Sfax, Tunisia, ²ERA Lab, University of Technology of Troyes, Paris, France and ³MIRACL Lab, University of Sfax, Sfax, Tunisia

ABSTRACT

In recent years, Data Centers are considered as the backbone of the Cloud Computing systems and are meant to deliver large-scale cloud-based services. The transmission Control Protocol (TCP) is the most used one in DCNs wherein congestion occurs when multiple senders send data to one receiver at the same time, this phenomenon is the TCP incast. In this paper, we survey the most recent solutions of TCP incast problems in DCNs, we provided a comprehensive comparison between them according to a set of criteria to discuss their strengths and weakness. Finally, we outline some challenges in mitigating TCP incast in DCNs.

KEYWORDS

Data Center Networks (DCNs), TCP Congestion, TCP incast in DCNs

Lekana - Blockchain Based Archive Storage

Eranga Bandara¹, Wee Keong Ng², Nalin Ranasinghe³, Kasun De Zoysa³, Bard Langoy¹ and David Larsson¹, ¹Pagero AB, Gothenburg Sweden, ²School of Computer Science and Engineering Nanyang Technological University, Singapore and ³University of Colombo School of Computing, Sri Lanka

ABSTRACT

Blockchain is a form of distributed storage system that stores chronological sequence of transactions in a tamper-evident manner. Due to the decentralized trust ecosystem in blockchain, various industries adopted blockchain to build their applications. This paper presents a novel approach to build blockchain based document archive storage platform, “Lekana”. The Lekana platform built for “Pageroonline” which is cloud based e-invoicing provider in Europe. Pageroonline’s archive document information, archive document payloads and their hash chain information stored in blockchain based Lekana platform. The Lekana platform built on top of Mystiko which is highly scalable blockchain storage targeted for big data. Mystiko comes with Scala functional programming and Akka actor based concurrent smart contract platform, Aplos. All document archiving business logic implemented with Aplos smart contracts on Mystiko platform. By integrating Lekana platform with blockchain we have addressed the major issues in cloud based centralized storage platforms(ex centralized control, lack of immutability, lack of traceability, lack of data provenance). Since Mystiko blockchain targeted for high transaction throughput and big data environment, we were able to align Lekana platform with the high transaction load in Pageroonline. Mystiko blockchain comes with Apache Spark based Mystiko-Ml machine learning service. We have integrated real-time data analytics and machine learning into Lekana platform by using Mystiko-Ml.

Innocrowd, A Contribution to an IoT Based Engineering Product Development

Camille Salinesi¹, Clotilde Rohleder², Asmaa Achtaich^1,3, Indra Kusumah^1,2, ¹CRI -Paris 1 Sorbonne University, Paris, France, ²University of Applied Science HTWG Constanz, Germany and ³Siweb – Université Mohammed 5, Rabat, Maroc

ABSTRACT

System engineering is an approach that focuses on the realization of complex systems, from design all the way to management. Meanwhile, in the era of Industry 4.0 and Internet of Thing, the system is getting more and more complex. This complexity is among others related to the usage of smart sub systems (e.g. smart objects, new communication protocols, etc.) and new engineering product development processes (e.g. through Open Innovation). These two aspects namely the IoT related sub system and product development process are our main discussion topics in our research work. The creation of smart objects such as innovative fleets of connected devices is a compelling case. Fleets of devices in smart building, smart car or smart consumer product (e.g. cameras, sensors, etc.) are confronted with complex, dynamic, rapidly changing and resource constrained environments. In order to align with these context fluctuations we develop a framework representing the dimensions for building Self-adaptive fleets for IoT applications. The emerging product development process Open innovation is proven to be three time faster and ten time cheaper than the conventional one. However, it is relatively new to the industry, and therefore, many aspects are not clearly known, starting from the specific product requirements definition, design and engineering process (task assignment), until quality assurance, time and cost. Thereupon, the acceptance of this new approach in the industry is still limited. Research activities are mainly dealing with high and qualitative levels. Whereas methods that supply more transparent numbers remain unlikely. The project related risks are therefore unclear, consequently, the Go / noGo decisions become inconclusive. The paper contributes with ideas to handle issues mentioned above by proposing a new integrated method, we call it InnoCrowd. This approach, presented in this paper, from the perspective of IoT, can be used as a base for the establishment of a related decision support system.

KEYWORDS

Industry 4.0, Internet of Thing, Crowdsourcing, Neural Network, Decision Support System

Internet Research Agency’s Campaign to Influence the U.S. 2016 Elections: Assessing Linguistic Profiles Via Statistical Analysis

YuLin Bingle, William Burke, Micheline Al Harrack, Larry Blankenship, Khoanam Nguyen, Christopher Sokol, Sara Sadat Tabatabaei, Department of Cybersecurity, Marymount University, Arlington, Virginia, USA

ABSTRACT

We document the linguistic structure of Russia’s Internet Research Agency’s social media and disinformation campaign to influence the 2016 U.S. Elections. Using the Discover Linguistic Inquiry and Word Count 2015 computerized text analysis tool, we researched the linguistic profiles of the Clemson University-collected Internet Research Agency tweets and retweets on a word-by-word basis. In our research, we selected a 95% confidence level model and a word count ratio analysis of a 99% confidence level. Our analysis indicate the Internet Research Agency executed a persistent and synchronized strategy during the periods of pre-election, year of election, and post-election. We offer that our study will show: a) policy leaders an example of a sophisticated adversary’s social media disinformation campaign, b) cybersecurity planners and defenders the strategy and tactics used in this campaign and thereby develop mitigation plans and actions; and c) researchers the future opportunities for study and analysis.

KEYWORDS

Election Security, linguistic profiling, data analysis

Revisit Dialogflow in an English Teaching Virtual Assistant use Case

M.S. Tran¹, T.H. Tran², Q.D. Tran³, ¹AI Lab, Topica Holding, Hanoi, Vietnam, ²University of Technology and Education, Hochiminh City, Vietnam and ³Hanoi University of Science and Technology, Hanoi, Vietnam

ABSTRACT

We deployed a conversation chatbot as a virtual assistant teaching English via Internet. The system was developed on the base of Moodle as Learning Management System and DialogFlow as Dialogue Management System. It is interesting that the crucial problem we had to face here is the lack of an efficient authoring tool in order to generate in mass the dialog scenarios fed into Moodle and DialogFlow. In our concrete case - teaching English for beginners - the Dialogflow platform seems to be a cumbersome tool. Especially with bad internet connection, sending messages back and forth to Dialogflow may degrade smooth conversation experience. We therefore built an authoring tool to fasten up the conversation rules generation. We also replace Dialogflow with a local browser-based dialogue management engine. The lessons taught with our systems – our English teaching virtual assistant – seem interesting to students and receive encouraging feedbacks.

KEYWORDS

Chatbot, Dialogflow. Artificial Intelligence, Natural Language Processing

Designing A Knowledge-Based Smart Assistant To Detect Crohn’s Disease By Processing Colonoscopy Images Of The Internal Gastrointestinal Wall

Hamidreza Rokhsati¹ and Zohre Fasihfar², ¹Khaje Nasir Toosi University of Technology, Tehran, Iran and ²Hakim Sabzevari University, Sabzevar, Iran

ABSTRACT

Crohn’s disease is the inflammation and laceration of the deep layers on the right side of the gastrointestinal wall and the colon. The most prevalent affected areas includes the lower parts of the small intestine and the first sections of the intestine. This disease can involve every part of the upper digestive system, from mouth to stomach and intestine. This paper presents an expert knowledge-based system in order to detect atrophy in the gastrointestinal layers based on image processing of colonoscopy & sigmoidoscopy, clinical examination results, and external symptoms. The inference engine of the system is rule-based and is designed to be expandable without having problems in its search engine. This system uses deductive & inductive reasoning. Image processing in the suggested system includes pre-processing images of the small intestine wall, filtering the image, morphologic operations and indicating the level of the atrophic villus.

KEYWORDS

Crohn, Colonoscopy, Inflammatory bowel, Morphology structures, Knowledge-based expert system,Intelligent assistant

Tensorflow 2.0 And Kubeflow for Scalable and Reproducable Enterprise AI

Romeo Kienzler^1,2, Holger Kyas^2,3, ¹IBM Center for Open Source Data and AI Technologies,San Francisco, CA, USA , ²Berne University of Applied Sciences, Berne, Switzerland and³Helvetia Insurance Switzerland, Basel, Switzerland

ABSTRACT

Towards the End of 2015 Google released TensorFlow 1.0, which started out as just another numerical library, but has grown to become a de-facto standard in AI technologies. TensorFlow received a lot of hype as part of its initial release, in no small part because it was released by Google. Despite the hype, there have been complaints on usability as well. Especially, for example, the fact that debugging was only possible after construction of a static execution graph. In addition to that, neural networks needed to be expressed as a set of linear algebra operations which was considered as too low level by many practitioners. PyTorch and Keras addressed many of the flaws in TensorFlow and gained a lot of ground. TensorFlow 2.0 successfully addresses these complaints and promises to become the go-to framework for many AI problems. This paper introduces the most prominent changes in TensorFlow 2.0 targeted towards ease of use followed by introducing TensorFlow Extended Pipelines and KubeFlow in order to illustra e the latest TensorFlow and Kubernetes ecosystem movements towards simplification for large scale Enterprise AI adoption.

KEYWORDS

Artificial Intelligence, TensorFlow, Keras, Kubernetes, KubeFlow, TFX, TFX Pipelines

CBS: A Crypto-bio System for Information Security

THOMPSON A.F, OWOLAFE O., Department of Computer Science, Federal University of Technology, Akure, Nigeria

ABSTRACT

Crypto-biometric systems are recently emerging as an effective process of key management to address the security weakness of conventional key release systems using passcodes, tokens or pattern recognition based biometrics. This project presents a lattice mapping based fuzzy commitment method for cryptographic key generation from biometric data (fingerprint) using DES (Data Encryption Standard) algorithm. The process is composed of three modules namely, fingerprint enrollment, feature extraction, and cryptographic key generation. Minutiae points and texture properties are extracted from the fingerprint images respectively. Since the codes obtained from the minutiae points are not exactly the same it is difficult to bind a key with biometrics due to the exactitude requirement of cryptographic key. To bind a cryptographic key with biometrics, error tolerance technique has to be applied to process the biometric information. The proposed method not only outputs high entropy keys, but also conceals the original biometric data such that it is impossible to recover the biometric data even when the stored information in the system is open to an attacker. The simulated results show that the proposed method maintains good template discriminability, resulting in good recognition performance, authentication accuracy and security.

KEYWORDS

Crypto-biometrics, Lattice mapping, Information Security, DES

Experiments of A neuro symbolic deep learning system with incomplete data

Jihane Boulahia^1,2, ¹Computer science college –Umm Al-Qura UniversityMakkah Almoukarramah KSA and ²Laboratory Innov’Com @ Higher School of Telecommunication of Tunis.

ABSTRACT

In this paper, we discuss the properties of a deep learning hybrid learning system called DHLS (Deep Hybrid Learning System) proposed by J. - Boulahia Smirani. The DHLS system has modules capable of performing a bidirectional transfer of information between a symbolic module and a deep learning module. We present various experiments that demonstrate various strengths of the DHLS system are : the ability to integrate theoretical knowledge ( rules) and empirical knowledge (examples) , the ability to make an initial knowledge base ( rules) of the converted into a connectionist network , using empirical knowledge by learning help revise the knowledge , acquire new knowledge and explain these new knowledge and finally the ability to improve the performance of systems or simple symbolic connectionist Keywords Deep learning hybrid system, neural networks, integration of symbolic rules, extraction of symbolic rules.

Analysis of Echo Characteristics for Time – Varying Scatterers

Junjie Wang^1, Weidong Hu² and Dejun Feng³, ¹State Key Laboratory of Complex Electromagnetic Environmental Effects on Electronics and Information System, National University of Defense Technology, Changsha, China, ²College of Electrical Science, National University of Defense Technology,Changsha, China

ABSTRACT

Phase modulation technique is that the phase information of signal varies proportionally with a modulated signal, which is commonly applied in the field of communications. The current processing method mainly uses the active devices to intercept, modulate and repeat, but the devices are complicated and require a certain processing time. In this paper, Phase modulation method based on phase-switched screen (PSS) is studied and the echo characteristics are analyzed. Meanwhile, the realization of PSS time-varying modulation is discussed. Simulation results are utilized to demonstrate the effectiveness of the proposed method.

KEYWORDS

Linear frequency modulation (LFM), frequency spectrum shifting, phase-switched screen (PSS)

Detecting and Mapping Ancient Buildings by using GPR

Muhammet Cihat MUMCU, Faculty of Engineering, Marmara University, Istanbul, Turkey and Department of Electrical & Electronics Engineering, Maltepe University, Istanbul, Turkey

ABSTRACT

Ground penetrating radar (GPR) is an ultra- wideband electromagnetic sensor used not only for subsurface sensing but also for detection of objects which may be hidden behind a wall or inserted within the wall. Such applications of the GPR technology are used in military and civilian operations such as mine detection, rescue missions after earthquakes and investigation of archeological sites. Search for the presence of designated targets hidden between the walls, such as air pockets is help to archeologists. A two-dimensional (2-D) time-domain numerical scheme for simulation of ground penetrating radar (GPR) on dispersive and homogeneous soil is described. The finite- difference time-domain (FDTD) method is used to discretize the partial differential equations for time stepping of the electromagnetic fields. The soil dispersion is modeled by Lorentz model. The dispersive soil parameters are obtained by fitting the model to reported experimental data. The perfectly matched layer (PML) is extended to match dispersive media and used as an absorbing boundary condition to simulate an open space.

KEYWORDS

Ground penetrating radar ( GPR ), finite difference time-domain (FDTD), perfectly matched layer (PML), buried objects.

Event-Based Real-Time Hand Gesture Recognitionusing Spiking Neural Network

Van Khoa LE and Sylvain Bougnoux, IMRA Europe S.A.S., 220 Rue Albert Caquot, 06904 Sophia-Antipolis

ABSTRACT

Deep learning represents the state of the art in many machine learning and computer vision problem. The core of this technology is the analog neural network (ANN) composed of multiple convolution and pooling layers. Unfortunately, such system demands massive computational power thus consuming a lot of energy and therefore causing negative effect to the environment. On one hand, human brain is known to be much more energy efficient, so the spiking neural network (SNN) was created to replicate the brain activity in order to improve the energy efficiency of current deep learning model. On the other hand, the event-based domain based on neuromorphic sensor like event camera made huge progress since last few years and become more and more popular. The data signal flow in spiking neural network is a perfect fit for the output of event camera. Therefore, in this article we built a system based on the combination of event camera and SNN for the real-time hand gesture recognition. We also give an analysis to prove the energy efficiency of this technology compared to the ANN counterpart.

KEYWORDS

Event camera, Spiking Neural Network, Neuromorphic engineering

RGBA Based Generative Adversarial Network for 3D Semantic Scene Completion

Jiahao Wang, Ling Pei*, Danping Zou, Yifan Zhu, Tao Li and Ruochen Wang, Shanghai Key Laboratory of Location-based Navigation and Services, SJTU-ParisTech Elite Institute of Technology Shanghai Jiao Tong University, Shanghai, China

ABSTRACT

3D scene understanding is of importance since it is a reflection about the real-world scenario. The goal of our work is to complete the 3d semantic scene from an RGB-D image. The state-of-the-art methods have poor accuracy in the face of complex scenes. In addition, other existing 3D reconstruction methods use depth as the sole input, which causes performance bottlenecks. We introduce a two-stream approach that uses RGB and depth as input channels to a novel GAN architecture to solve this problem. Our method demonstrates excellent performance on both synthetic SUNCG and real NYU dataset. Compared with the latest method SSCNet, we achieve 4.3% gains in Scene Completion (SC) and 2.5% gains in Semantic Scene Completion (SSC) on NYU dataset.

KEYWORDS

Scene Completion, Semantic Segmentation, Generation Adversarial Network, RGB-D

Improvement of Intrusion Detection System’s Accuracy Using Gradient Boosting Trees on Kyoto 2016 Dataset

Ryosuke Terado¹ and Morihiro Hayashida², ¹Erectrical Group - WORKS Co., Ltd, Masuda, Shimane, Japan ²National Institute of Technology, Matsue College, Matsue, Shimane, Japan

ABSTRACT

As computers become more widespread, they are exposed to threats such as cyber-attacks. In recent years, attacks have gradually changed, and security softwares must be frequently updated. Network-based intrusion detection systems (NIDSs) have been developed for detecting such attacks. It, however, is difficult to detect unknown attacks by the signature-based NIDS that decides whether or not an access is abnormal based on known attacks. Hence, Kyoto 2016 dataset was constructed for the evaluation, and machine learning methods including support vector machines and random forests were applied to the dataset. In this paper, we examine a deep neural network and gradient boosting tree methods additionally, and perform computational experiments on Kyoto 2016 dataset. The results suggest that gradient boosting tree method XGBoost outperforms other machine learning classifiers, and the elapsed time for the classification is significantly shorter.

KEYWORDS

Network-based Intrusion Detection System, Gradient Boosting Tree, Neural Network

Privacy Preserving Data Aggregation and Dynamic Billing System in Smart Grid Using Permissioned Blockchain

Ozgur Oksuz, Faculty of Engineering, Adiyaman University, Adiyaman, Turkey

ABSTRACT

We propose an efficient data aggregation and dynamic billing system that it consists of a permissioned blockchain. This blockchain uses a ledger that provides users anonimity and keeps users’ electricity consumption for predefined time ranges. Using consumption data of users, a billing mechanism is able to bill the users accordingly. In our construction, since all the parties in the system have the ledger, every party has the aggregated usage of the electricity without using very heavy cryptographic operations. Thanks to the ledgers in our model, the aggregation of the users’ electricity consumption can be computed by anyone in the system. Moreover, users are able to verify their bills. This results that integrity of all data is going to be preserved. In our solution we mainly use hash functions to provide the same functionality with preserving data privacy of the users.

Design of Blockchain Ledger Compression Algorithm

Zhijun Wu, Yiming Yang and Xin Lu,Civil Aviation University of China, Tianjin, China

ABSTRACT

Blockchain has received widespread attention due to its decentralization, openness, autonomy, information tamper proof, and anonymity, and is currently on the rise. Some fatal weaknesses have been widely criticized in the process of evolution and development of blockchain technology. For example, the blockchain network has a large amount of data, large communication overhead, large storage overhead, and poor timeliness. The causes of these defects are complex, and the increase for data is the most important of many. In order to solve above problems, we research on the self-designed alliance chain that supports deletable ledger function, which can implement the function of searching and deleting ledger transactions. After the deletion request and execution are verified and agreed by the consensus nodes, the blockchain ledger is compressible or deletable. At the same time, the integrity of the forward and backward verification of the corresponding high-level blocks can be guaranteed without affecting the storage and usage of other blocks.

KEYWORDS

Blockchain, Deletable Ledger, Simulation Alliance Chain, Forward and Backward Verification

Introducing A Deep Learning for Anomaly Detection in Urban Road Traffic Networks

Jamal Raiyn Computer Science Department, Al Qasemi Academic College, Baqa Al Garbiah, Israel

ABSTRACT

This paper offers an overview of essential concepts in deep learning related cyber security in autonomous vehicle network, one of the state of the art approaches in machine learning, in terms of its history and current applications as a brief introduction to the subject. Deep learning has shown great successes in many domains such as image recognition, object detection, etc. Various forecasting schemes have been proposed to manage urban road traffic data, which is collected by different sources such as, videos cameras, sensors, Lidar, GNSS and mobile phone services. However, these are not sufficient for the purpose because of their limited coverage and high costs of installation and maintenance, which makes it difficult to detect the anomalies in collected big data. The deep learning mutiple layer leads to extract the features of the input data. The feature of input data includes normal and abnormal data. Abonormal data is cause by threats and attacks such as denial of dervice, Malicious and wrong setup which can cause an anomaly and system failure. An anomalies in automous vehicle networks causes traffic congestion and accident that have impact on the economic, environment, human and time costly. The introduction of the deep learing aims to detect the anomaly in autonomous vehicle network.

KEYWORDS

deep learning, cyber- attack, autonomous vehicle, anomalies detection

Evaluating Verbal Production Levels

Fabio Fassetti¹ and Ilaria Fassetti², ¹DIMES Dept., University of Calabria, Italy and ²Therapeia, Rehabilitation Center, Italy

ABSTRACT

The paper presents a framework to evaluate the adequateness of a written text with respect to age or in presence of pathologies like deafness. This work aims at providing insights about verbal production level of an individual in order for a therapist to evaluate the adequateness of such level. The verbal production is analyzed by several points of view, categorized in six families: orthography, syntax, lexicon, lemmata, morphology, discourse. The proposed approach extract several features belonging to these categories through ad-hoc algorithms and exploits such features to train a learner able to classify verbal production in levels. This study is conducted in conjunction with a speech rehabilitation center. The technique is precisely designed for Italian language, however the methodology is more widely applicable. The proposed technique has a twofold aim. Other than the main goal of providing the therapist with an evaluation of the provided essay, the framework could spread lights on relationship between capabilities and ages. To the best of our knowledge, this is the first attempt to perform these evaluations through an automatic system.

KEYWORDS

Verbal production, Feature Extraction, Deep Learning.

A Computational Phonetic Comparison Algorithms Approach for the Low-Resources Languages (Extinct Meroitic)

Ali Raham

ABSTRACT

The object of this paper is to investigate a correlation between Meroitic and local languages in Sudan by implementing Natural Language Processing using phonetic comparison algorithms. Our model aims to produce clearly defined relation between Meroitic scripts letters and its related graphemes and Phonemes in Local Sudan Languages. Building this foundation is necessary to successfully read Meroitic and step forward for a better decipherment chance for this extinct language. Researching this matter using Natural Language processing tools could lead to enhancement for Phonetic Comparison Algorithms in “low-resource” or extinct languages area which currently lack researchers’ attentions.

KEYWORDS

Natural Language Processing (NLP), Phonetics, Meroitic, Phonemes, Phonetic, Soundex, Sudanic Language, Low Resources Languages.

Comparison of Turkish Word Representations Trained on Different Morphological Forms

Gökhan Güler and A. Cüneyd Tantug, Department of Computer Engineering, Istanbul Technical University, Istanbul, TURKEY

ABSTRACT

Increased popularity of different text representations has also brought many improvements in Natural Language Processing (NLP) tasks. Without need of supervised data, embeddings trained on large corpora provide us meaningful relations to be used on different NLP tasks. Even though training these vectors is relatively easy with recent methods, information gained from the data heavily depends on the structure of the corpus language. Since the popularly researched languages have a similar morphological structure, problems occurring for morphologically rich languages are mainly disregarded in studies. For morphologically rich languages, context-free word vectors ignore morphological structure of languages. In this study, we prepared texts in morphologically different forms in a morphologically rich language, Turkish, and compared the results on different intrinsic and extrinsic tasks. To see the effect of morphological structure, we trained word2vec model on texts which lemma and suffixes are treated differently. We also trained subword model fastText and compared the embeddings on word analogy, text classification, sentimental analysis, and language model tasks.

KEYWORDS

embedding, vector, morphology, Turkish, word2vec, fastText

A Modern Psycho-virus War of AI for the Global Mind

Alexander G. Yushchenko, Department of Information Systems, National Technical University Kharkiv Polytechnic Institute, Kharkiv, Ukraine

ABSTRACT

From a sociocybernetics point of view, we analyse threats to security resulting from globalization of international information space and information and communication aggression of Russia. As the analysis shows, the harmful influence in the processes of mass communication can be carried out not only at the technical level of data transmission and processing, but also through the hostile content of the information that affects the brain of an information recipient. The manipulative effect on the mass consciousness is multiplied by application of artificial intelligence methods. We reveal an emergence of a threat for democratic states resulting from the destabilizing impact of a target state's mass media and social networks being exploited by Russian secret services under freedom-of-speech disguise. Among the socially dangerous infocommunication technologies that we have noted in the media, we should point out “information shock” and “Pavlovian conditioning” as well as the initialization of destructive radical movements through social networks. A new body intended for global geopolitical monitoring and management of synergetic defence of the Russian Federation is described and analysed, which bases its work on an expert system installed on a powerful supercomputer. The importance of the information component of the modern synergetic war is underlined and the technologies for Data Mining modelling the media influence on a society are briefly described. Thus, modern civilization has faced new threats of psycho-viral contamination of public consciousness, which in its disastrous social and political consequences can be considered as a weapon of mass destruction, requiring the development of protection methods based on the ethical compromise between the democratic values and the needs of mental Défense. In essence, this is a war of psycho-viruses constructed with help of AI for mastering the virtual reality of the global network mind.

KEYWORDS

new weapon of mass destruction, data mining algorithms, supercomputer, expert system, security system, communication security, synergetic war, Kremlin propaganda, psycho-viral contamination, infocommunication, mental Défense, NATO

Filter-based Active Suspension System with Adapted Reference Input

Adel Djellal¹ and Rabah Lakel², ¹Department of Second Cycle, Higher School for Industrial Technologies, Annaba, Algeria and ²Department of Electronics, Badji Mokhtar Univeristy, Annaba, Algeria

ABSTRACT

In this paper, active suspension system is controlled using PID controller with adapted reference point. After the derivation of quarter car suspension model. Three approaches were applied: passive suspension system, active suspension system with constant reference and with adapted reference. The proposed approach was focusing on system life span; how to reduce brutal controller actions, that can cause car body damage, and assure a certain ride comfort? Simulation of three approaches has been done using quarter car system and Matlab simulation model to implement the proposed technique and compare performance variation in different cases: road bump and other road disturbances.

KEYWORDS

Active Suspension system, PID controller, Quarter car model, Passive Suspension system

Separateness Algorithm of Relevant and Irrelevant Documents Based on Lagrange Multipliers

Rabeb Mbarek and Hawete Hattab

ABSTRACT

Relevant and irrelevant documents share some terms (at least the terms of the query which selected these documents). The majority of relevance feedback methods try to optimally separate relevant and irrelevant documents; indeed, these methods build a new set of indexing terms (vector space basis) which separate these documents. But there is no satisfactory answer to this problem. In this paper, we propose to separate relevant and irrelevant documents using Lagrange multipliers. This new approach is evaluated experimentally on two TREC collections (TREC-7 ad hoc and TREC-8 ad hoc). The experiments show that this method improves previous works.

KEYWORDS

Relevance feedback, vector space basis change, Lagrange multipliers, TREC.

Contact Us

dma@cosit2020.org

Sponsers

TOP 10 Cited Journal of Database Management Systems Research Articles

Page updated

Google Sites

Report abuse

6th International Conference on Data Mining and Applications (DMA 2020)

January 25 ~ 26, 2020, Zurich, Switzerland

Accepted Papers

6^th International Conference on Data Mining and Applications (DMA 2020)