7th International Conference on Data Mining & Knowledge Management Process (DKMP 2019)

July 13~14, 2019, Toronto, Canada

Accepted Papers

Attribute Reduction And Decision Tree Pruning To Simplify Liver Fibrosis Prediction Algorithms
Mahasen Mabrouk1, Abubakr Awad2, Hend Shousha1, Wafaa Alakel1,3, Ahmed Salama1, Tahany Awad1
1
Cairo University,Egypt, 2University of Aberdeen, Aberdeen,UK, 3National Hepatology and Tropical Medicine Research Institute, Egypt.
ABSTRACT
Assessment of liver fibrosis is a vital need for enabling therapeutic decisions and prognostic evaluations of chronic hepatitis. Liver biopsy is considered the gold standard for assessing the fibrosis stage but with several limitations, also FIB-4 and APRI have a limited accuracy. The “Egyptian National Committee for Control of Viral Hepatitis” has provided a rich pool of electronic data that data mining can explore to discover hidden patterns, trends and enables the development of predictive algorithms.
KEYWORDS
Liver Fibrosis, Data Mining, Weka, Decision Tree, Attribute Reduction, Tree Pruning.

Context-Aware Trust-Based Access Control For Ubiquitous Systems
Malika Yaici, Faiza Ainennas and Nassima Zidi, Computer Department, University of Bejaia, Bejaia, Algeria
ABSTRACT
The ubiquitous computing and context-aware applications experience at the present time a very important development. This has led organizations to open more of their information systems, making them available anywhere, at any time and integrating the dimension of mobile users. This cannot be done without taking into account thoughtfully the access security: a pervasive information system must henceforth be able to take into account the contextual features to ensure a robust access control. In this paper, access control and a few existing mechanisms have been exposed. It is intended to show the importance of taking into account context during a request for access. In this regard, our proposal incorporates the concept of trust to establish a trust relationship according to three contextual constraints (location, social situation and time) in order to decide to grant or deny the access request of a user to a service
KEYWORDS
Pervasive systems, Access Control, RBAC, Context-awareness, Trust management

An Ontology Based Approach to Improve Process Mining Result In Univer sity Information system
Maryem Dellai and Yemna Sayeb, Research Laboratory, ISAMM, Manouba, Tunisia
ABSTRACT
Process mining algorithms use event logs to extract process-related information, to discover, analyze conformance, or to enhance processes. Event logs can be used to analyze and visualize the processes with better insight and improved formal access to the data. Most process mining (PM) applications are based on event logs with keyword-based activity and resource descriptions. In recent years, lots of efforts are dedicated to explore logic-based ontology formalisms.In this research work, we use ontologies that are intended to define the semantics of recorded events. The highest quality of event logs requires the existence of ontologies to which events and attributes point. Many human-designed processes are based on explicit workflow or lifecycle models which can be described using taxonomies or more complicated ontologies. Ontologies have been successfully applied to represent the knowledge in many domains. In this paper, we introduce an approach for enriching event logs using Process mining with associated ontology structures. Our proposal is to provide features that help integrating event logs from event sources in order to extract data and put it into a suitable format semantically enriched so that the data can be exploited with process mining tools (ProM).
KEYWORDS
process mining, event logs, ontologies, process model, Petri net.

Construction Of an Oral Cancer Auto-Classify system Based On Machine- Learning for Artificial Intelligence
Meng-Jia Lian1, Chih-Ling Huang2, Tzer-Min Lee1,3
1
School of Dentistry, Kaohsiung Medical University, Kaohsiung, Taiwan
2 Center for Fundamental Science, Kaohsiung Medical University, Kaohsiung, Taiwan
3Institute of Oral Medicine, National Cheng Kung University Medical College, Tainan

ABSTRACT
Oral cancer is one of the most prevalent tumors of the head and neck region. An earlier diagnosis can help dentist getting a better therapy plan, giving patients a better treatment and the reliable techniques for detecting oral cancer cells are urgently required. This study proposes an optic and automation method using reflection images obtained with scanned laser pico-projection system, and Gray-Level Cooccurrence Matrix for sampling. Moreover, the artificial intelligence technology, Support Vector Machine, was used to classify samples. Normal Oral Keratinocyte and dysplastic oral keratinocyte were simulating the evolvement of cancer to be classified. The accuracy in distinguishing two cells has reached 85.22%. Compared to existing diagnosis methods, the proposed method possesses many advantages, including a lower cost, a larger sample size, an instant, a non-invasive, and a more reliable diagnostic performance. As a result, it provides a highly promising solution for the early diagnosis of oral squamous carcinoma.
KEYWORDS
Oral Cancer Cell, Normal Oral Keratinocyte (NOK), Dysplastic oral keratinocyte (DOK), Gray-Level Cooccurrence Matrix (GLCM), Scanned Laser Pico-Projection (SLPP), Support Vector Machine (SVM), Machine-Learning

Automatic Extraction of Feature Lines on 3D Surface
Zhihong Mao,Division of Intelligent Manufacturing, Wuyi University, Jiangmen, China
ABSTRACT
Many applications in mesh processing require the detection of feature lines. Feature lines convey the inherent features of the shape. Existing techniques to find feature lines in discrete surfaces are relied on user-specified thresholds and are inaccurate and time-consuming. We use an automatic approximation technique to estimate the optimal threshold for detecting feature lines. Some examples are presented to show our method is effective, which leads to improve the feature lines visualization.
KEYWORDS
Feature Lines; Extraction; Meshes .

HMM-Based Dari Named Entity Recognition for Information Extraction
Ghezal Ahmad Jan, Zia
Department of Models and Theory of Distributed Systems, TU Berlin Straße des 17. Juni 135, 10623 Berlin, Germany
ABSTRACT
Named Entity Recognition (NER) is the fundamental subtask of information extraction systems that labels elements into categories such as persons, organizations or locations. The task of NER is to detect and classify words that are parts of sentences. This paper describes a statistical approach to modeling NER on the Dari language.Dari and Pashto are low resources languages, spoken as official languages in Afghanistan. Unlike other languages, named entity detection approaches differ in Dari. Since in Dari language there is no capitalization for identifying named entities. We seek to bridge the gap between Dari linguistic structure and supervised learning model that predict the sequences of words paired with a sequence of tags as outputs. Dari corpus was developed from the collection of news, reports and articles based on the original orthographic structure of the Dari language. The experimental result presents the named entity recognition performance 95% accuracy.
KEYWORDS
Natural Language Processing (NLP), Hidden Markov Model (HMM), Named Entity Recognition (NER), Part-of-Speech (POS) Tagging

Designing Dynamic Protocol for Real-Time IIoT-based Applications by Efficient Management of System Resources
Farzad Kiani, Sajjad Nematzadehmiandoab, Amir Seyyedabbasi
Computer Engineering Dept., Engineering and Natural Sciences Faculty at Istanbul Sabahattin Zaim University, Kucukcekmece, 34303, Istanbul, Turkey
ABSTRACT
Due to increased applicability, wireless sensor networks have captured the attention of researchers from various fields. These networks still suffer from various challenges and limitations regardless. These problems are even much more pronounced in some areas of the field such as real time IoT based applications. Here, a dynamic protocol that efficiently utilizes the available resources is proposed. The protocol employs five developed algorithms that aid the data transmission, neighbor and optimal path finding processes. The protocol can be utilized in, but not limited to, real-time large data streaming applications.. In this paper is defined a structure that enables the sensor devices to communicate with each other over their local network or internet as required in order to preserve the available resources. Both theoretical and experimental result analysis of the entire protocol in general and individual algorithms is also performed.
KEYWORDS
Big data wireless sensor networks, real-time systems, energy efficiency, routing protocol, IoT.

Interactive Mesh Cutout Using Graph Cuts
Zhihong Mao,Division of Intelligent Manufacturing, Wuyi University, Jiangmen529020, China
ABSTRACT
Mesh segmentation is a foundational operation for many computer graphics applications. Although various automatic segmentation schemes have been proposed, to precisely obtain the meaningful part of a mesh is a challenging issue. In this paper, we introduce an Interactive system to efficiently extract meaningful objects from a triangular mesh. The algorithm proposed in this paper extends min-cut based on 2D-image segmentation techniques to the domain of 3D mesh. We also provide a screen-space user interface that allows the user to indicate the meaningful object easily. In our system, quadric-based surface simplification is adopted for a large mesh, we use min-cut in the simplified mesh, then graph cuts are used to refine the previous cuts in the original mesh. The results show that our proposed method is relatively simple and effective as a powerful tool for mesh cutout.
KEYWORDS
Mesh Segmentation, Mesh Cutout, Graph cuts.

Tough Random Symmetric 3-SAT Generator
Robert Amador1, Chang-Yu Hsieh2 , Chen-Fu Chiang3
1,3
Department of Computer Science, State University of New York Polytechnic Institute, Utica, NY 13502, USA,2Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
ABSTRACT
We designed and implemented an efficient tough random symmetric 3-SAT generator. We quantify the hardness in terms of CPU time, numbers of restarts, decisions, propagations, conflicts and conflicted literals that occur when a solver tries to solve 3-SAT instances. In this experiment, the clause variable ratio was chosen to be around the conventional critical phase transition number 4.24. The experiment shows that instances generated by our generator are significantly harder than instances generated by the Tough K-SAT generator. The difference in hardness between two SAT instance generators exponentiates as the number of boolean variables used increases.


Data Analysis of Wireless Networks Using Classification Techniques
Daniel Rosa Canêdo1,2 and Alexandre Ricardo Soares Romariz1
1
Department of Electrical Engineering, University of Brasília, Brasília, Brazil ,2Federal Institute of Goiás, Luziânia, Brazil
ABSTRACT
In the last decade, there has been a great technological advance in the infrastructure of mobile technologies. The increase in the use of wireless local area networks and the use of satellite services are also noticed. The high utilization rate of mobile devices for various purposes makes clear the need to monitor wireless networks to ensure the integrity and confidentiality of the information transmitted. Therefore, it is necessary to quickly and efficiently identify the normal and abnormal traffic of such networks, so that administrators can take action. This work aims to analyze classification techniques in relation to data from Wireless Networks, using some classes of anomalies pre-established according to some defined criteria of the MAC layer. For data analysis, WEKA Data Mining software (Waikato Environment for Knowledge Analysis) is used. The classification algorithms present a success rate in the classification of viable data, being indicated in the use of intrusion detection systems for wireless networks.
KEYWORDS
Wireless Networks, Classification Thecniques, Weka.

A Survey Of State-Of-The-Art GAN-Based Approaches To Image Synthesis
Shirin Nasr Esfahani and Shahram Latifi,University of Nevada, Las Vegas,USA.
ABSTRACT
In the past few years, Generative Adversarial Networks (GANs) have received immense attention by researchers in a variety of application domains. This new field of deep learning has been growing rapidly and has provided a way to learn deep representations without extensive use of annotated training data. Their achievements may be used in a variety of applications, including speech synthesis, image and video generation, semantic image editing, and style transfer. Image synthesis is an important component of expert systems and it attracted much attention since the introduction of GANs. However, GANs are known to be difficult to train especially when they try to generate high resolution images. This paper gives a through overview of the state-of-the-art GANs-based approaches in four applicable areas of image generation including Text-to-Image-Synthesis, Image- to- Image-Translation, Face Aging, and 3D Image Synthesis. Experimental results show state-of-the-art performance using GANs compared to traditional approaches in the fields of image processing and machine vision.
KEYWORDS
Conditional generative adversarial networks (cGANs), image synthesis, image-to-image translation, text-to-image synthesis, 3D GANs.

A Call Graph Reduction based Novel Storage Allocation for Smart City Applications
Prabhdeep Singh, Rajvir Kaur, Diljot Singh, Vivek Gupta,Punjabi University, India.
ABSTRACT
Today's world is going to be smart even smarter day by day. Smart cities play an important role to make the world smart. Thousands of smart city applications are developing in every day. Every second very huge amount of data is generated. The data need to be managed and stored properly so that information can be extracted using various emerging technologies. The main aim of this paper is to propose a storage scheme for data generated by smart city applications. A matrix is used which store the information of each adjacency node of each level as well as the weight and frequency of call graph. It has been experimentally depicted that the applied algorithm reduces the size of the call graph without changing the basic structure without any loss of information. Once the graph is generated from the source code, it is stored in the matrix and reduced appropriately using the proposed algorithm. The proposed algorithm is also compared to another call graph reduction techniques and it has been experimentally evaluated that the proposed algorithm significantly reduces the graph and store the smart city application data efficiently


Comparing String Similarity Measures In The Task Of Name Matching
Aleksandra Zaba,University of Utah, USA.
ABSTRACT
This pilot study reports recall, precision, and f-measures for three groups of string similarity algorithms contained in the ‘stringdist’ package of R, the edit-based Levenshtein, full Levenshtein-Damerau, Hamming, and longest common substring, the q-gram based Jaccard, q-gram, and cosine measures, and the heuristic Jaro and Jaro-Winkler. The algorithms are to specify values for the similarity between a base word, a female first name, and three of its variants, that same name, and two of the following: Its foreign version (categorized by us as ‘same’), its male version (‘different’), and a different, also female, version of the base name in American English (‘different’). We report f-measures, and these are interpreted in the context of the given algorithm. For our data so far, a relatively low threshold (from ‘match’ to ‘not match’; assigned by us to an algorithm’s value for a given similarity) provides the highest weighted average of recall and precision.
KEYWORDS
Artificial Intelligence, Natural Language Processing, String Similarity Algorithms, R, F-Measure.

Performance Comparison Of Web-Based Book Recommender Systems
Swathi S Bhat, Pranav P, Shashank K V and Arpitha Raghunandan, National Institute of Technology, India.
ABSTRACT
Recommendation systems are being widely used for personalization on the web today. E-Commerce giants rely highly on their recommendation systems to improve their business. As a result, the quality of recommendations can have a significant impact on their sales. Hence, proper evaluation of such recommender systems is important. Traditional evaluation metrics are limited to error based and accuracy metrics and do not take into consideration factors like diversity, novelty, informedness, markedness etc. We aim to perform a comprehensive performance comparison of two web-based book recommendation systems using lesser known but equally important metrics like diversity, informedness and markedness.
KEYWORDS
Recommendation systems, diversity, metrics, informedness, markedness, precision, recall, ROC, performance testing

Obstacle Avoidance Robot
Faisal Imran and Dr.Yin Yunfei, Chongqing University, China.
ABSTRACT
Obstacle avoidance is one of the most important aspects of mobile robots. Without it, the movement of the robot will be very strict and fragile. A robot is a machine that can perform tasks automatically or perform tasks under the direction of a robot. Robotics is a combination of computational intelligence and physical machines (motors). Computational intelligence means programmatic instructions. The project proposes a robotic vehicle with an integrated intelligent device to guide you as it enters your path. The robotic vehicle is manufactured using the AT8 mega-8 series microcontrollers. Ultrasonic sensors are used to detect any obstacle in front of them and send commands to the microcontroller. The ultrasonic sensors are used to detect any obstacle in front of them and send commands to the microcontroller. Depending on the input signal received, when the connected motor is driven by the motor controller, the microcontroller redirects the robot to move in alternate directions. The evaluation of the performance of the system shows the probability of failure with an accuracy of 85% and 0.15, respectively. We made a robotic vehicle that moves in different directions forward, backward, to the left and to the right when the entrance is given. The objective of our project is to create an automated robot that intelligently detects obstacles in its path and navigates according to the actions we configure.
KEYWORDS
obstacle avoidance, ultrasonic sensor, arduino microcontroller, autonomous robot, arduino software.

A Performanceof Peano Koch Hybrid Fractal Antenna For 2.4 and 5.5GHZ Applications
Er. Inkwinder Singh Bangi and Dr. Jagtar Singh Sivia, Punjabi University, India.
ABSTRACT
In this modernization, demand of wireless devices tremendously increased. The antenna is part and parcel component of every wireless electronic gadget. Thanks to hybrid fractal technology, single antenna used for various applications. In this article, hybrid fractal antenna is designed using Peano and Koch antenna. The performance of hybrid fractal antenna is scrutinized and anticipate various antenna parameters to judge antenna behavior. The hybrid fractal antenna dimensions are 34x42 mm2 and proposed antenna is operated on GHz frequency. The small size antenna has less than 2 value of VSWR at every resonant frequencies. Proposed antenna is light in weight because it designed on FR4 epoxy material and cheaper in price. The current distribution and radiation pattern also demonstrate the omni directional radiation of electromagnetic waves. Max gain is 18dB at 2.43GHz at unlicensed band for Bluetooth application. Proposed antenna is also operated at 5.5GHz for Wi-Fi and WLAN applications and 3G cellular communication (1.90-1.98GHz).
KEYWORDS
Peano, Koch, GHz, VSWR, FR4.

Cloud Computing: Issues And Risks Of Embracing The Cloud In A Business Environment
Shafat Khan, Himalayan University, India.
ABSTRACT
Cloud computing is a swiftly advancing paradigm that is drastically changing the way people utilize their PCs. Over the latest couple of years, cloud computing has created from being a promising business thought to one of the rapidly creating portions of the IT business. Despite the boom of cloud and the numerous favorable circumstances, for example, financial advantage, a rapid elastic resource pool, and on-demand benefit, endeavor clients are yet hesitant to send their business in the cloud and the paradigm likewise makes difficulties for the two clients and suppliers. There are issues, for example, unapproved get to, loss of protection, information replication, and administrative infringement that require enough consideration. An absence of fitting answers for such difficulties may cause dangers, which may exceed the normal advantages of utilizing the paradigm. To address the difficulties and related dangers, an orderly hazard the board practice is vital that guides clients dissect the two advantages and dangers identified with cloud-based frameworks. The point of this paper is to provide better comprehension to configuration difficulties of cloud computing and distinguish essential research heading in such manner as this is an expanding area.
KEYWORDS
Cloud computing; Data center; Risks; Challenges; Security; Business.

Prediction Model of SCR Outlet NOx Based on LSTM Algorithm
JiyuChen, Feng Hong, MingmingGao, TaihuaChang, LiyingXu, North China Electric Power University, China.
ABSTRACT
Pollutants emissions is strictly controlled in modern power plants, and Nitrogen Oxides (NOx), which is the main contaminants is the exhaust gas. The Selective Catalytic Reduction process (SCR) is commonly used for denitration. For achieving an effective the SCR outlet NOx concentration control, an accurate outlet NOx concentration model is necessary. A model using historical data is proposed, and long-short term memory(LSTM) algorithm is applied, which could describe relevance in time series. The accuracy performances for proposed data-driven model are verified, and root mean square error ( RMSE ) and mean absolute error (MAPE) for training set are, 0.706 mg/m3 and 1.99%, respectively, which for test set are 1.44 mg/m3 and 2.90%, respectively, The verification reveals that the accuracy for data-driven model is acceptable for control system design.
KEYWORDS
LSTM, SCR, desulfurization and denitration, NOx content at outlet

IoT -Based Approach To Monitor Parking Space In Cities
Fatin Farhan Haque1, Weijia Zhou1, Jun-Shuo Ng2, Frank Walsh2, Kumar Yelamarthi1, Ahmed Abdelgawad1, 1Central Michigan University, USA, 2Waterford Institute of Technology Waterford, Ireland.
ABSTRACT
Internet of Things is the next big thing, as almost everything developed now has an extensive use of data which is then used to get the daily statistics and usage of every individual. The work mainly consists of constructing a screen where the parking space will be shown, and a camera module will be set up, and PIR (Passive Infrared Sensor) will be at the entrance to detect the entrance of a car or any vehicle eligible to park at the lot. The vehicle will be scanned for its registration number in to provide a check whether the vehicle is registered to park or not. This also acts as the security of the parking lot. Moreover, a viable sensor will be placed at each parking slot through which the vacancy of each parking slot will be shown to determine the exact spot available to the user. In order to surpass the project completion, we will be using Raspberry Pi 3 with camera module mounted on it and by using Tensorflow, Node-Red we would be able to identify the car and the license number and also infrared sensor to detect the parking availability which would be displayed on the screen.
KEYWORDS
IoT, Node-Red, Tensor Flow, smart, parking.