Accepted Papers

10^th International Conference on Data Mining & Knowledge Management Process (DKMP 2022)

March 26~27, 2022, Sydney, Australia

Accepted Papers

Measurement of Software Development Effort Estimation Bias: Avoiding Biased Measures of Estimation Bias

Magne Jørgensen, Simula Metropolitan Center for Digital Engineering, Oslo, Norway

ABSTRACT

In this paper, we propose improvements in how estimation bias, e.g., the tendency towards under-estimating the effort, is measured. The proposed approach emphasizes the need to know what the estimates are meant to represent, i.e., the type of estimate we evaluate and the need for a match between the type of estimate given and the bias measure used. We show that even perfect estimates of the mean effort will not lead to an expectation of zero estimation bias when applying the frequently used bias measure: (actual effort – estimated effort)/actual effort. This measure will instead reward under-estimates of the mean effort. We also provide examples of bias measures that match estimates of the mean and the median effort, and argue that there are, in general, no practical bias measures for estimates of the most likely effort. The paper concludes with implications for the evaluation of bias of software development effort estimates.

KEYWORDS

Effort estimates, measurement of estimation overrun, proper measurement.

Towards Maintainable Platform Software - Delivery Cost Control in Continuous Software Development

Ning Luo and Yue Xiong, Visual Computing Group, Intel Asia-Pacific Research & Development Ltd, Shanghai, China

ABSTRACT

Modern platform software delivery cost increases rapidly as it usually needs to align with many hardware and silicon’s TTMs, feature evolvement and involves hundreds of engineers. In this paper, citing one ultra-large-scale software - Intel Media Driver as an example, we analyze the hotspots leading to delivery cost increase in continuous software development, the challenges on our software design and our experiences on software delivery cost shrink against the targeted design enhancements. We expect the identified hotspots can help more researchers to form the corresponding research agendas and the experiences shared can help following practitioners to apply similar enhancements.

KEYWORDS

Software Delivery Cost Control, Predictable Software Evolvement, Streamlined Parallel Development, Continuous Integration.

Enhanced Grey Box Fuzzing for Intel Media Driver

Linlin Zhang and Ning Luo, Visual Computing Group, Intel Asia-Pacific Research & Development Ltd, Shanghai, China

ABSTRACT

Grey box fuzzing is one of the most successful methods for automatic vulnerability detection. However, conventional Grey box Fuzzers like AFL can open perform fuzzing against the whole input and tends to spend more time on smaller seeds with lower execution time, which greatly impact fuzzing efficiency for complicated input types. In this work, we introduce one intelligent grey box fuzzing for Intel Media driver, MediaFuzzer, which can perform effective fuzzing based on selective fields of complicated input. Also, with one novel calling depth-based power schedule biased toward seed corpus which can lead to deeper calling chain, it dramatically improves the vulnerability exposures (~6.6 times more issues exposed) and fuzzing efficiency (~2.7 times more efficient) against the baseline AFL for Intel media driver with almost negligible overhead.

KEYWORDS

vulnerability detection, automated testing, fuzzing, Grey box fuzzer.

Build Automation Tools for Software Development

Mridula Prakash, Department of Chief Technology Officer, Mysore, India

ABSTRACT

As the pace of design and development of new software picks up, automated processes are playing an increasingly vital role in ensuring a seamless and continuous integration. With the importance of software build automation tools taking centerstage, the present paper undertakes a comparative analysis of three available solutions - Maven, Gradle and Bazel, evaluating their efficiency and performance in terms of software build automation and deployment. The aim of this study is also to provide the reader with a complete overview of the selected build automation tools and, the relevant features and capabilities of interest. In addition, the paper leads to a broader view on the future of the build automation tools ecosystem.

KEYWORDS

Automated process, Build automation tools, Maven, Gradle, Bazel.

Know How of AI, Machine Learning, Deep Learning and Big Data Analysis

Yew Kee Wong, School of Information Engineering, HuangHuai University, Henan, China

ABSTRACT

In the information era, enormous amounts of data have become available on hand to decision makers. Big data refers to datasets that are not only big, but also high in variety and velocity, which makes them difficult to handle using traditional tools and techniques. Due to the rapid growth of such data, solutions need to be studied and provided in order to handle and extract value and knowledge from these datasets. Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention. Such minimal human intervention can be provided using machine learning, which is the application of advanced deep learning techniques on big data. This paper aims to analyse some of the different machine learning and deep learning algorithms and methods, as well as the opportunities provided by the AI applications in various decision making domains.

KEYWORDS

Artificial Intelligence, Machine Learning, Deep Learning, Big Data.

An IR-based QA System for Impact of Social Determinants of Health on Covid-19

Priyanka Addagudi and Wendy MacCaull, Department of Computer Science, St. Francis Xavier University, Canada

ABSTRACT

Question Answering (QA), a branch of Natural Language Processing (NLP), automates information retrieval of answers to natural language questions from databases or documents without human intervention. Motivated by the COVID-19 pandemic and the increasing awareness of Social Determinants of Health (SDoH), we built a prototype QA system that combines NLP, semantics, and IR systems with the focus on SDoH and COVID-19. Our goal was to demonstrate how such technologies could be leveraged to allow decision-makers to retrieve answers to queries from very large databases of documents. We used documents from CORD-19 and PubMed datasets, merged the COVID-19 (CODO) ontology with published ontologies for homelessness and gender, and used the mean average precision metric to evaluate the system. Given the interdisciplinary nature of this research, we provide details of the methodologies used. We anticipate that QA systems can play a significant role in providing information leading to improved health outcomes.

KEYWORDS

Question Answering, Ontology, Information Retrieval, Social Determinants of Health, COVID-19.

An Evaluation Dataset for Legal Word Embedding: A Case Study on Chinese Codex

Chun-Hsien Lin and Pu-Jen Cheng, Department of Computer Science & Information Engineering, National Taiwan University, Taipei, Taiwan

ABSTRACT

Currently, the most common approach to unsupervised word encoding in the form of vectors is the word embedding model. Converting the vocabulary in a legal document into a word embedding model facilitates subjecting legal documents to machine learning, deep learning, and other algorithms and subsequently performing the downstream tasks of natural language processing vis-à-vis, for instance, document classification, contract review, and machine translation. The most common and practical approach of accuracy evaluation with the word embedding model uses a benchmark set with linguistic rules or the relationship between words to perform analogy reasoning via algebraic calculation. This paper proposes establishing an 1,134 Legal Analogical Reasoning Questions Set (LARQS) from the 2,388 Chinese Codex corpus using five kinds of legal relations, which are then used to evaluate the accuracy of the Chinese word embedding model. Moreover, we discovered that legal relations might be ubiquitous in the word embedding model.

KEYWORDS

Legal Word Embedding, Chinese Word Embedding, Word Embedding Benchmark, Legal Term Categories.

Esgbert: Language Model to Help With Classification Tasks Related to Companies’ Environmental, Social, And Governance Practices

Srishti Mehra*, Robert Louka*, Yixun Zhang, University of California, Berkeley School of Information

ABSTRACT

Environmental, Social, and Governance (ESG) are non-financial factors that are garnering attention from investors as they increasingly look to apply these as part of their analysis to identify material risks and growth opportunities. Some of this attention is also driven by clients who, now more aware than ever, are demanding for their money to be managed and invested responsibly. As the interest in ESG grows, so does the need for investors to have access to consumable ESG information. Since most of it is in text form in reports, disclosures, press releases, and 10-Q filings, we see a need for sophisticated NLP techniques for classification tasks for ESG text. We hypothesize that an ESG domain-specific pre-trained model will help with such and study building of the same in this paper. We explored doing this by fine-tuning BERT’s pre-trained weights using ESG specific text and then further fine-tuning the model for a classification task. We were able to achieve accuracy better than the original BERT and baseline models in environment-specific classification tasks.

KEYWORDS

ESG, NELP, BERT, Universal Sentence Encoder, Deep Averaging Network.

Language and its Role in Bridging International Communication

Georgiana Silvia Leotescu, Faculty of Letters, University of Craiova, Craiova, Romania

ABSTRACT

The English language is in a continuous process of advancement as society evolves and people make distance irrelevant by communicating with each other in various ways. It is unquestionable that the development of technology has substantially improved the English language. Not only was it enriched, but it also became the official donner for many languages around the world. Words changed both their meaning and their grammatical category. A simple noun as text became the verb to text with the development of mobile phones. The word “google” started from the name of the search engine and developed into a transitive verb. The name of a company also converted into a verb and even more a concept uberization. Moreover, the association of two basic words such as face and book created a word that began to be used worldwide.

KEYWORDS

Language, Technology, Advancement, Influence, Semantic expansion.

SSL/TLS Encrypted Traffic Application Layer Protocol and Service Classification

Kunhao Li, Bo Lang, Hongyu Liu and Shaojie Chen, State Key Laboratory of Software Development Environment, Beijing, China

ABSTRACT

Network traffic protocols and service classification are the foundations of network quality of service (QoS) and security technologies, which have attracted increasing attention in recent years. At present, encryption technologies, such as SSL/TLS, are widely used in network transmission, so traditional traffic classification technologies cannot analyze encrypted packet payload. This paper first proposes a two-level application layer protocol classification model that combines packets and sessions information to address this problem. The first level extracts packet features, such as entropy and randomness of ciphertext, and then classifies the protocol. The second level regards the session as a unit and determines the final classification results by voting on the results of the first level. Many application layer protocols only correspond to one specific service, but HTTPS is used for many services. For the HTTPS service classification problem, we combine session features and packet features and establish a service identification model based on CNN-LSTM. We construct a dataset in a laboratory environment. The experimental results show that the proposed method achieves 99.679% and 96.27% accuracy in SSL/TLS application layer protocol classification and HTTPS service classification, respectively. Thus, the service classification model performs better than other existing methods.

KEYWORDS

SSL/TLS, HTTPS, Protocol Classification, Service Classification.

Virtualised Ecosystem to Envisage Numerous Applications on an Automotive Microcontroller

Meghashyam Ashwathnarayan, Vaishnavi J, Ananth Kamath and Jayakrishna Guddeti, Infineon Technologies India Pvt Ltd, 11MG Road, Bengaluru, Karnataka

ABSTRACT

In automotive electronics, new technologies are getting integrated into basic framework creating ways for new software-defined architectures. Virtualization is one of most discussed technologies which will offer a functionality growth into architecture of automobile. This paper introduces concept of validating testcases from multiple IPs on virtualised framework and also investigate the feasibility of implementing protection mechanism on memory segment dedicated for a virtual machine (VM). We describe a proof-of-concept which can be used to promote the use of virtualisation to extend the coverage of post silicon validation. Experimental results are presented as a quantitative evaluation of using virtualization for different testcase scenarios.

KEYWORDS

Virtualisation, Automotive, Multi-Core Systems, Hypervisor, Post Silicon Validation.

Deep Learning Frameworks Evaluation for Image Classification on Resource Constrained Device

Mathieu Febvay and Ahmed Bounekkar, Université de Lyon, Lyon 2, ERIC UR 3083, F69676 Bron Cedex, France

ABSTRACT

Each new generation of smartphone gains capabilities that increase performance and power efficiency allowing us to use them for increasingly complex calculations such as Deep Learning. In this paper, four Android deep learning inference frameworks (TFLite, MNN, NCNN and PyTorch) were implemented to evaluate the most recent generation of System On a Chip (SoC) Samsung Exynos 2100, Qualcomm Snapdragon 865+ and 865. Our work focused on image classification task using five state-of-the-art models. The 50 000 images of the ImageNet 2012 validation subset were inferred. Latency and accuracy with various scenarios like CPU, OpenCL, Vulkan with and without multi-threading were measured. Power efficiency and real-world use-case were evaluated from these results as we run the same experiment on the devices camera stream until they consumed 3% of their battery. Our results show that low-level software optimizations, image pre-processing algorithms, conversion process and cooling design have an impact on latency, accuracy and energy efficiency.

KEYWORDS

Deep Learning, On-device inference, Image classification, Mobile, Quantized Models.

Mutual Inlining: An Inlining Algorithm to Reduce the executable Size

Yosi Ben-Asher, Western Digital Tefen and The University of Haifa CS, Nidal Faour, Western Digital Tefen, Ofer Shinaar, Western Digital Tefen

ABSTRACT

We consider the problem of selecting an optimized subset of inlinings (replacing a call to a function by its body) that minimize the resulting code size. Frequently, in embedded systems, the program’s executable file size must fit into a small size memory. In such cases, the compiler should generate as small as possible executables. In particular, we seek to improve the code size obtained by the LLVM inliner executed with the -Oz option. One important aspect is whether or not this problem requires a global solution that considers the full span of the call graph or a local solution (as is the case with the LLVM inliner) that decides whether to apply inlining to each call separately based on the expected code-size improvement. We have implemented a global type of inlining algorithm called Mutual Inlining that selects the next call-site (f()callsg() to be inline based on its global properties, namely:

Leveraging Open Amp in Embedded Mixed Critical Systems

Mridula Prakash, Department of Chief Technology Officer, Mysore, India

ABSTRACT

This aim of this paper is to provide details on the Open Asymmetric Multi-Processing(OpenAMP) framework in mixed critical systems. OpenAMP is an open-source software framework that provides software components for working with Asymmetric multiprocessing (AMP) systems. The paper will provide in depth details on how to use OpenAMP in multicore systems while designing safety critical projects.

KEYWORDS

OpenAMP, Multicore, Mixed Critical, & Embedded Systems.

Key Learnings from Pre-Silicon Safety Compliant Bootrom Firmware Development

Chidambaram Baskaran, Pawan Nayak, R.Manoj, Sampath Shantanu and Karuppiah Aravindhan, Texas Instruments India Ltd, Bangalore, India

ABSTRACT

Safety needs of real-time embedded devices are becoming a must in automotive and industrial markets. The BootROM firmware being part of the device drives the need of the firmware adhering to required safety standards for these end markers. Most of the software practices for safety compliance assumes that the development of software is carried out once the devices are available. The BootROM firmware development discussed in this paper involves meeting safety compliance need while device on which it is to be executed is being designed concurrently. In this case, the firmware development is done primarily on pre-silicon development environments which are slow and developers have limited access. These aspects present a unique challenge to developing safety compliant BootROM firmware. Hence, it is important to understand the challenges and identify the right methodology for ensuring that the firmware meets the safety compliance with right level of efficiency. The authors in this paper share their learnings from three safety compliant BootROM firmware development and propose an iterative development flow including safety artifacts generation iteratively. Concurrent firmware development along with device design may sound risky for iterative development and one may wonder it may lead to more effort but the learnings suggests that iterative development is ideal. All the three BootROM firmware development has so far not resulted in any critical bugs that needed another update of the firmware and refabrication of the device.

KEYWORDS

Concurrent development, Firmware development, Safety compliance, Pre-silicon software development.

Investigating Cargo Loss in Logistics Systems using Low-Cost Impact Sensors

Prasang Gupta, Antoinette Youngand Anand Rao, AI and Emerging Technologies, PwC

ABSTRACT

Cargo loss/damage is a very common problem faced by almost any business with a supply chain arm, leading to major problems like revenue loss and reputation tarnishing. This problem can be solved by employing an asset and impact tracking solution. This would be more practical and effective for high-cost cargo in comparison to low-cost cargo due to the high costs associated with the sensors and overall solution. In this study, we propose a low-cost solution architecture that is scalable, user-friendly, easy to adopt and is viable for a large range of cargo and logistics systems. Taking inspiration from a real-life use case we solved for a client, we also provide insights into the architecture as well as the design decisions that make this a reality.

KEYWORDS

Asset tracking, Logistics, Cargo loss, Cargo damage, Impact sensor, Accelerometer sensor, Low-cost solution, No code AEP (Application Enablement Platform).

DECENTRALIZED TRANSACTION MECHANISM BASED ON SMART CONTRACTS

Rishabh Garg, Department of Electrical & Electronics Engineering, Birla Institute of Technology & Science, K.K. Birla Goa Campus, India

ABSTRACT

Blockchain comes with a possibility to oust the outdated identity system and eliminate the intermediaries. The identity management, through blockchain, can allow individuals to have ownership of their identity by creating a global ID to serve multiple purposes. For user security and ledger consistency, asymmetric cryptography and distributed consensus algorithms can be employed. The blockchain technology, by virtue of its key features like decentralization, persistency, anonymity and auditability, would save the cost and increase the efficiency. Further, the digital identity platform would benefit citizens by allowing them to save time when accessing or providing their personal data and records. Instead of being required to show up to services in-person to produce a physical form of ID, users could be provided with a digital ID through a personal device, like smartphone, that can be shared with services conveniently and securely through a DLT.

KEYWORDS

Blockchain, Decentralized Apps, Data Portability, Decentralized Public Key Infrastructure (DPKI), DID, Ethereum, Hash, IAM framework, Identity Management System (IMS), IPFS, Private Key, Public Key, Revocation, SSI, Storage Variables, Validation, Zero Knowledge Proof.

Original Appropriation in the Bitcoin Mining Process

João Victor Barcellos Machado Correia, Law Department, Vale do Cricaré University Center (UVC), São Mateus, Brazil

ABSTRACT

When we analyze the bitcoin mining process, it is first necessary to understand if the bitcoin protocol truly guarantees against everyone the ownership of the cryptocurrency. In this sense, starting from Kantian theory, it is clear that the phatic arrangement of the technology does not guarantee ownership, which can only be guaranteed through the state. Moreover, there are practical cases of protocol failure. Thus, in the face of the immutability of the blockchain protocol, it remains to legally solve the problem of original appropriation in the bitcoin mining process, which is done through the theory of cryptographic fruits. Broadly speaking, the mined bitcoin should be viewed as a new kind of civil fruit of the mining device. As a result, by owning the mining machine, one ends up owning what it mines. For all that, the theoretical structure of cryptographic fruits justifies the legal flaws of the protocol bitcoin.

KEYWORDS

Original Appropriation, Bitcoin Mining Process, Cryptocurrency, Blockchain, Cryptographic Fruits.

Understanding the Effect of IoT Adoption on the Behavior of Firms: An Agent-Based Model

Riccardo Occa1 and Francesco Bertolotti, LIUC – Università Catteneo, Corso G. Matteotti 22, Castellanza (VA), Italy

ABSTRACT

In the context of the increasing diffusion of technologies related to the world of Industry 4.0 and the Internet of Things in particular, we have developed an agent-based model to simulate the effect of IoT diffusion in companies and verify potential benefits and risks. The model shows how IoT diffusion has the potential to influence the market by supporting both quality and cost improvements. The results of the model also confirm the potential for significant benefits for businesses, suggesting the opportunity to support the introduction and application of IoT, and clearly show how the use of IoT can be a key strategic choice in competitive market contexts focused on cost strategies to increase business performance and prospects.

KEYWORDS

IoT, agent-based modelling, simulation, adoption, risk, blockchain.

Blockchain Enabled Diabetic Patients’ Data Sharing and Real Time Monitoring

Dodo Khan, Low Tan Jung, Manzoor Ahmed Hashmani and Moke KwaiCheong, partment of Computer and Information Science, Universiti Teknologi PETRONAS (UTP), Seri Iskandar 32610, Malaysia

ABSTRACT

According to the World Health Organization worldwide diabetes report, the number of diabetic patients has surged from 108 million in the 1980s to 422 million in 2014. According to researchers, the numbers will continue to climb in the next decades. Diabetes is a sickness that requires long-term self-care and close monitoring to be appropriately put under control. The absence or lack of continuous monitoring of regular blood glucose level test and check at appropriate frequency by medical personnel is highly to jeopardize the patients daily life. A well-managed continuous monitoring of patient’s blood sugar level has the advantage to save lives. This paper proposes a Blockchain-based platform that connects the patients, healthcare practitioners (HP), and caregivers for a continuous monitoring and care of diabetic patients. It lets the patients to securely connected to HP for the purpose of remote patient monitoring (telemedicine), whilst preserving patient data privacy using the blockchain technology. IoT sensors are used to read sugar levels and store these data in a tamper-proof immutable ledger (Hyperledger). This platform provides an End-to-End movement of the patients data. That is, from the point where it is formed (sensors) to the point it ends up in the HP side. It gives patient a control-and-track function to maintain/track data movement. It provides a unique feature in allowing the patient to keep track of the private data and to pick who they want to share the data with and for how long (and for what reason). The platform is developed in two stages. Initially, the concept is implemented using the Hyperledger Fabric. Then, a Blockchain based on a novel Proof-of-Review (PoR) consensus model is included on to provide efficient performance and scalability in the Hyperledger fabric. Essentially, this proposed platform is to alleviate the pain points in traditional healthcare systems in the scopes of information exchange, data security, and privacy maintenance for realtime diabetic patient monitoring.

KEYWORDS

Blockchain, Consensus Protocols, Secure Data Movement, Real-time Monitoring, Hyperledger.

Improved Lossless Image Compression and Processing

Manga,I.,¹ Garba,E.J² and Ahmadu, A.S, ¹Department of Computer, Adamawa State University, Mubi, Nigeria, ²Department of Computer Science, Modibbo Adama University, Yola, Nigeria

ABSTRACT

The growth and development of modern information and communication technologies, has led the demand for data compression to increase rapidly. Recent development in the field of Computer Science and information has led to the generation of large amount of data always. Data compression is an important aspect of information processing. Data that can be compressed include image data, video data, textual data or even audio data. Image compression refers to the process of representing image using fewer number of bits. Basically, two types of data compression exist. The major aim of lossless image compression is to reduce the redundancy and irreverence of image data for better storage and transmission of data in the better form. The lossy compression scheme leads to high compression ratio while the image experiences lost in quality. However, there are many cases where the loss of image quality or information due to compression needs to be avoided, such as medical, artistic and scientific images. Efficient lossless compression become paramount, although the lossy compressed images are usually satisfactory in divers’ cases. The objectives of the research include to explore existing lossless image compression algorithm, to design efficient and effective lossless image compression technique based on LZW- BCH lossless image compression to reduce redundancies in the image, to demonstrate image enhancement using Gaussian filter algorithm, Secondary method of data collection was used to collect the data. Standard research images were used to validate the new scheme. To achieve these objectives, Java programming language was used to develop the compression scheme using JDK 8.0 and MATLAB was used to conduct the analysis to analyze the space and time complexity of the existing compression scheme against the enhanced scheme. From the findings, it was revealed that the average compression ratio of the enhanced lossless image compression scheme was 1.6489 and the average bit per pixel was 5.416667.

KEYWORDS

Lossless, Image, Compression, Processing.

An Object-Driven Collision Detection with 2D Cameras using Artificial Intelligence and Computer Vision

Yang Liu¹, Evan Gunnell², Yu Sun², Hao Zheng³, ¹Department of Mechanical and Aerospace Engineering, George Washington University, Washington, DC 20052, ²California State Polytechnic University, Pomona, CA, 91768, ³ASML, Wilton, CT 06897

ABSTRACT

Autonomous driving is one of the most popular technologies in artificial intelligence. Collision detection is an important issue in automatic driving, which is related to the safety of automatic driving. Many collision detection methods have been proposed, but they all have certain limitations and cannot meet the requirements for automatic driving. Camera is one of the most popular methods to detect objects. The obstacle detection of the current camera is mostly completed by two or more cameras (binocular technology) or used in conjunction with other sensors (such as a depth camera) to achieve the purpose of distance detection. In this paper, we propose an algorithm to detect obstacle distances from photos or videos of a single camera.

KEYWORDS

Autonomous driving, computer vision, machine learning, artificial intelligence, distance detection, collision detection.

A Novel Mask R-CNNModel to Segment Heterogeneous Brain Tumors through Image Subtraction

Sanskriti Singh, BASIS Independent Silicon Valley, San Jose, USA

ABSTRACT

The segmentation of diseases is a popular topic explored by researchers in the field of machine learning. Brain tumors are extremely dangerous and require the utmost precision to segment for a successful surgery. Patients with tumors usually take 4 MRI scans, T1, T1gd, T2, and FLAIR, which are then sent to radiologists to segment and analyze for possible future surgery. To create a second segmentation, it would be beneficial to both radiologists and patients in being more confident in their conclusions. We propose using a method performed by radiologists called image segmentation and applying it to machine learning models to prove a better segmentation. Using Mask R-CNN, its ResNet backbone being pre-trained on the RSNA pneumonia detection challenge dataset, we can train a model on the Brats2020 Brain Tumor dataset. Center for Biomedical Image Computing & Analytics provides MRI data on patients with and without brain tumors and the corresponding segmentations. We can see how well the method of image subtraction works by comparing it to models without image subtraction through DICE coefficient (F1 score), recall, and precision on the untouched test set. Our model performed with a DICE coefficient of 0.75 in comparison to 0.69 without image subtraction. To further emphasize the usefulness of image subtraction, we compare our final model to current state-of-the-art models to segment tumors from MRI scans.

KEYWORDS

computer vision, healthcare, MRCNN.

Brain Tumor Grading Prediction and Brain Health Management based on Multi-Physiological Data Fusion

Xiangrong Han¹, Binhui Wang¹, Guoliang Jiao¹, Liang Wen^{2, 3} and Zhenni Li^1*, ¹College of Information Science and Engineering, Northeastern University, Shenyang 110819, China, ²Neuosurgery department, General hospital of Northern Theater Command, ³General hospital of Northern Theater Command, China Medical School

ABSTRACT

Prediction of brain tumor grade is the only way to obtain brain tumor grade without physiological biopsy. It is a reliable way to prolong the life of patients. The author proposes an effective PTGP model, which can predict and grade the brain tumor grade of the segmented MRI image. It is another research path of medical image recognition. Through the segmented RGB color image and integrating the patients physiological data (age), the classification method is used to divide it into high-grade and low-grade gliomas, and the prediction success rate is 98%, Finally, the least square method is used to predict the survival cycle of patients after operation, and a reasonable brain health management method is proposed.

KEYWORDS

Prediction of brain tumor grade, PTGP model, medical image, physiological data, brain health management.

Unsupervised Blind Image Quality Assessment based on Multi-Feature Fusion

Qinglin He, Chao Yang and Ping An, School of Communication and Information Engineering, Shanghai University, Shanghai, China

ABSTRACT

Image quality affects the visual experience of observers. How to accurately evaluate image quality has been widely studied by researchers. Unsupervised blind image quality assessment (BIQA) requires less prior knowledge, than supervised ones. Besides, there is a trade-off between accuracy and complexity in most existing BIQA methods. In this paper, we propose an unsupervised BIQA framework that aims for both high accuracy and low complexity. In order to represent the image structure information, we employ Phase Congruency (PC) and gradient. After that, we calculate the mean subtracted and contrast normalized (MSCN) coefficient and the Karhunen-Loéve transform (KLT) coefficient to represent the naturalness of the images. Finally, features extracted from both the pristine and the distorted images are adopted to calculate the image quality with Multivariate Gaussian (MVG) model. Experiments conducted on six IQA databases demonstrate that the proposed method achieves better performance than the stateof-the-art BIQA methods.

KEYWORDS

Blind Image Quality Assessment (BIQA), Unsupervised Method, Natural Scene Statistics (NSS), Karhunen-Loéve Transform (KLT).

An Efficient Image Compression & Decompression Algorithm based on DWT

Nandeesha R¹ and Dr. Somashekar K², ¹Research Scholar, Dept of ECE Research Centre, SJB Institute of Technology, Bengaluru-560060, Karnataka, India, ²Professor, Dept of ECE, SJB Institute of Technology, Bengaluru-560060, Karnataka, India

ABSTRACT

This study presents an efficient image compression and decompression algorithm that considers only the approximation coefficients of the Three Level Decomposition of Discrete Wavelet Transformation (3D-DWT). Image compression methods are critical for storing and transferring digital data. Without sacrificing picture quality, compression minimizes the number of bits required to describe an image.In this procedure, a scalar quantized third level approximation sub-band co-efficient is used as the basis for the encoding technique that maximizes the compression ratio and image reconstruction is accomplished using inverse processes with improved the quality of the image. This method is tested on a variety of standard datasets, resulting in good PSNR, CR, and SSIM values, demonstrating the benefits of this approach.

KEYWORDS

Image compression, DWT, CR, SSIM, PSNR.

Gradient Different Weight Algorithm for Spect Imaging Volume Estimation

Mohd Akmal Masud¹, Mohd Zamani Ngali² and Dr. Siti Amira Othman¹, ¹Faculty of Science and Technology, University Tun Hussein Onn Malaysia, Pagoh, Johor, Malaysia, ²Faculty of Mechanical and Manufacturing Engineering, University Tun Hussein Onn Malaysia, Batu Pahat, Johor, Malaysia

ABSTRACT

Single photon emission computed tomography (SPECT) is one of the modalities to detect a cancer lesion and tumour in nuclear medicine. The low spatial resolution capabilities to form image data of SPECT is quite challenging for the determination required dose for delineation lesion. Robust techniques for automatic or semi-automatic segmentation of objects in sing le photon emission computed tomography (SPECT) are still the subject of development. This paper describes a threshold based which uses the gradient different weight method using the Matlab 2021b algorithm. The method computes the pixel weight for each pixel in the grayscale image operated. Evaluation is performed based on the NEMA phantom to determine volume, activity concentration, and dice similarity coefficient. In addition, the Monte-Carlo simulated patient data are used to investigate the delineated tumour volumes using the fourmethod algorithm.

KEYWORDS

SPECT, Gradient Different Weight, GrayDifferenrCutoff.

Adaptive Forgetting, Drafting and Comprehensive Guiding: Text-to-image Synthesis with Hierarchical Generative Adversarial Networks

Yuting Xue, Heng Zhou, Yuxuan Ding, Xiao Shan, School of Electronic Engineering, Xidian University, Xi’an, China

ABSTRACT

In this paper, we propose to boost the text-to-image synthesis through an Adaptive Learning and Generating Generative Adversarial Networks (ALG-GANs). First, we propose an adaptive forgetting mechanism in the generator to reduce the error accumulation and learn knowledge flexibly in the cascade structure. Besides, to evade the mode collapse caused by a strong biased surveillance, we propose a multi-task discriminator using weak-supervision information to guide the generator more comprehensively and maintain the semantic consistency in the cascade generation process. To avoid the refine dif iculty aroused by the bad initialization, we judge the quality of initialization before further processing. The generator will re-sample the noise and re-initialize the bad initializations to obtain good ones. All the above contributions have been integrated in a unified framework, which is an adaptive forgetting, drafting and comprehensive guiding based text-to-image synthesis method with hierarchical generative adversarial networks. The model is evaluated on the Caltech-UCSD Birds 200 (CUB) dataset and the Oxford 102 Category Flowers (Oxford) dataset with standard metrics. The results on Inception Score (IS) and Fréchet Inception Distance (FID) show that our model outperforms the previous methods.

KEYWORDS

Text-to-Image Synthesis, Generative Adversarial Network, Forgetting Mechanism, Semantic Consistency.

Trust for Big Data usage in Cloud

Hazirah Bee Yusof Ali and Lili Marziana Abdullah, Kulliyyah of Information and Communication Technology (KICT), International Islamic University Malaysia, Kuala Lumpur, Malaysia

ABSTRACT

The emergence of big data and the cloud has changed the way companies conduct their businesses. Businesses and leisure activities are performed in the cloud extensively by the day. The Internet connection is too appealing for companies not to use it. Likewise, many companies emerged to give solutions for analysing massive datasets, thus help to deliver meaningful information to the public. Powerful data analysis tools coupled with the big storage of the cloud enable companies to understand their businesss data and consequently allow them to proceed with proper solutions. Regrettably, big data and cloud usage benefits come hand in hand with the abundance of security vulnerabilities. Companies are being hacked, and data are stolen. So, the terrifying thought of being hacked and vandalized makes companies extra careful before they can start storing or trusting their data in the cloud. Thus, this research is needed whereby the researcher validated the factors of trust gained from Qualitative Data Analysis with Quantitative Data Analysis. Due to those facts, this research is required. The study focuses on the trust of big data and the cloud for companies in Malaysia. Companies must trust the cloud to put their data in the hands of the cloud provider.

KEYWORDS

Trust, Big Data, Cloud.

Detecting Parkinson Disease based on Voice Attributes with

Lucas Salvador Bernardo and Robertas Damaševičius, Department of Software Engineering, Kaunas University of Technology, Kaunas, Lithuania

ABSTRACT

Parkinson disease is the second most world widespread neural impairment. It affects approximately 2 to 3% of world ‘s population with age over 65 years. Part of Parkinson ‘s disease progress happens due the loss of cells in a brain region called Substantia Nigra (SN). Nerve cells in this region are responsible for improving the control of movements and coordination. The loss of such cells results in the emerge of the motor symptoms characteristic of the disease. However, motor symptoms appear when brain cells are already damaged, while oppositely voice impairments appear before the brain cells are being affected. This study aims to recognize Parkinson disease using 22 attributes, extracted from 195 voice records, being 14 from Parkinson disease patients and 48 from healthy individuals. The data is passed through a series of pre-processing steps, being them: balancing, where we applied a Synthetic Minority Oversampling Technique (SMOTE) to make the number of data per class equal, training and test segmentation, where the data was divided in 30% for testing and 70% for training, scaling, the data into intervals of 0 to 1, amplification, step where the values are converted into intervals from 0 to 100 and image generation, con-verting the numerical dataset into an image dataset. Later, the resulted image dataset was used to train a Visual Geometry Group 11 (VGG11) Convolutional Neural Network (CNN). The proposed solution achieved 93.1% accuracy, 92.31% f1-score, 96% recall, 88.89% precision on testing dataset. This metrics were later compared to other Convolutional Neural Network solutions such as ResNet34 and MobileNet, which allowed to see its improvement of performance when compared with those solutions.

KEYWORDS

Parkinson, VGG11, SMOTE

Using Domain Knowledge for Low Resource Named Entity Recognition

Yuan Shi and Heyan Huang, hool of Computer Science & Technology, Beijing Institute of Technology, eijing, China

ABSTRACT

In recent years, named entity recognition has always been a popular research in the field of natural language processing, traditional deep learning methods require a large amount of labeled data for model training, so they are not suitable for areas where labeling resources are scarce. In addition, the existing cross-domain knowledge transfer methods need to adjust the entity labels for different fields, so as to increase the training cost. To solve these problems, enlightened by a processing method of Chinese named entity recognition, we propose to use domain knowledge to improve the performance of named entity recognition in areas with low resources. The domain knowledge mainly applied by us is domain dictionary and domain labeled data. We use dictionary information for each word to strengthen its word embedding and domain labeled data to reinforce the recognition effect. The proposed model avoids large-scale data adjustments in different domains while handling named entities recognition with low resources. Experiments demonstrate the effectiveness of our method, which has achieved impressive results on the data set in the field of scientific and technological equipment, and the F1 score has been significantly improved compared with many other baseline methods.

KEYWORDS

Named Entity Recognition, Domain Knowledge, Low Resource, Domain Dictionary.

Application of Cross-Wavelet and Singular Value Decomposition on Covid-19 and Bio-Physical Data

Iftikhar U. Sikder,¹ and James J. Ribero², ¹Department of Information Systems, Cleveland State University, USA, ²IBA, University of Dhaka, Bangladesh

ABSTRACT

The paper examines the bivariate relationship between COVID-19 and temperature time series values using Singular Value Decomposition (SVD) and cross-wavelet analysis. The COVID-19 incidence data and the temperature data of the corresponding period were transformed using SVD into significant eigen states. Wavelet transformation was performed to analyze and compare the frequency structure of the single and the bivariate time series. The result provides synchronicity and coherence measures in the ranges time period. Additionally, wavelet power spectrum and paired wavelet coherence statistics and phase difference were computed. The result suggests statistically significant coherence at various frequencies. It also indicates complex conjugate dynamic relationships in terms phases and phase differences.

KEYWORDS

COVID-19, SVD, Wavelet analysis, Cross-wavelet power, Wavelet coherence.

Runway Extraction and Improved Mapping from Space Imagery

David A. Noever, PeopleTec, Inc., Huntsville, AL, USA

ABSTRACT

Change detection methods applied to monitoring key infrastructure like airport runways represent an important capability for disaster relief and urban planning. The present work identifies two generative adversarial networks (GAN) architectures that translate reversibly between plausible runway maps and satellite imagery. We illustrate the training capability using paired images (satellite-map) from the same point of view and using the Pix2Pix architecture or conditional GANs. In the absence of available pairs, we likewise show that CycleGAN architectures with four network heads (discriminator-generator pairs) can also provide effective style transfer from raw image pixels to outline or feature maps. To emphasize the runway and tarmac boundaries, we experimentally show that the traditional grey-tan map palette is not a required training input but can be augmented by higher contrast mapping palettes (red-black) for sharper runway boundaries. We preview a potentially novel use case (called “sketch2satellite”) where a human roughly draws the current runway boundaries and automates the machine output of plausible satellite images. Finally, we identify examples of faulty runway maps where the published satellite and mapped runways disagree, but an automated update renders the correct map using GANs.

KEYWORDS

Generative Adversarial Networks, Satellite-to-Map, Pix2Pix, CycleGAN Architecture.

Dag-WGAN: Causal Structure Learning with Wasserstein Generative Adversarial Networks

Hristo Petkov¹, Colin Hanley² and Feng Dong¹, ¹Department of Computer and Information Sciences, University of Strathclyde, Glasgow, United Kingdom, ²Department of Management Science, University of Strathclyde, Glasgow, United Kingdom

ABSTRACT

The combinatorial search space presents a significant challenge to learning causality from data. Recently, the problem has been formulated into a continuous optimization framework with an acyclicity constraint, allowing for the exploration of deep generative models to better capture data sample distributions and support the discovery of Directed Acyclic Graphs (DAGs) that faithfully represent the underlying data distribution. However, so far no study has investigated the use of Wasserstein distance for causal structure learning via generative models. This paper proposes a new model named DAG-WGAN, which combines the Wasserstein-based adversarial loss, an auto-encoder architecture together with an acyclicity constraint. DAG-WGAN simultaneously learns causal structures and improves its data generation capability by leveraging the strength from the Wasserstein distance metric. Compared with other models, it scales well and handles both continuous and discrete data. Our experiments have evaluated DAG-WGAN against the state-of-the-art and demonstrated its good performance.

KEYWORDS

Generative Adversarial Networks, Wasserstein Distance, Bayesian Networks, Causal Structure Learning, Directed Acyclic Graphs.

Artificial Intelligence: Framework of Driving Triggers to Past, Present and Future Applications and Influencers of Industry Sector Adoption

Richard Fulton¹, Diane Fulton² and Susan Kaplan³, ¹Department of Computer Science, Troy University, Troy, Alabama, USA, ²Department of Management, Clayton State University, Morrow, Georgia, USA, ³Modal Technology, Minneapolis, Minnesota, USA

ABSTRACT

To gain a sense of the development of Artificial Intelligence (AI), this research analyzes what has been done in the past, presently in the last decade and what is predicted for the next several decades. The paper will highlight the biggest changes in AI and give examples of how these technologies are applied in several key industry sectors along with influencers that can affect adoption speed. Lastly, the research examines the driving triggers such as cost, speed, accuracy, diversity/inclusion and interdisciplinary research/collaboration that propel AI into an essential transformative technology.

KEYWORDS

Artificial Intelligence, Key Industrial Sectors, Adoption of Technology, Driving Triggers, Technology Trends.

Towards Efficient Integration of UDF in Databases

Mohamed Khalefa, SUNY College at Old Westbury Old Westbury, NY, USA

ABSTRACT

The proposed framework generates efficient C code for the SQL function by optimizing the memory layout, utilize compiler optimization and use function properties. These function properties may be manually supplied or extracted automatically. Our experiments show that our approach gives efficient algorithms that are faster than their counterpart from the state-of-the-art algorithms. The ultimate goal of our work is to smoothly integrate imperative and declarative code. and to generate efficient code based on function properties and data distributions.

KEYWORDS

Database, UDF, SQL.

Deep Learning Framework Mindspore and Pytorch Comparison

Xiangyu XIA and Shaoxiang ZHOU, Department of Information, Beijing City University, Beijing 100191, China

ABSTRACT

Deep learning has been well used in many fields. However, there is a large amount of data when training neural networks, which makes many deep learning frameworks appear to serve deep learning practitioners, providing services that are more convenient to use and perform better. MindSpore and PyTorch are both deep learning frameworks. MindSpore is owned by HUAWEI, while PyTorch is owned by Facebook. Some people think that HUAWEIs MindSpore has better performance than FacoBooks PyTorch, which makes deep learning practitioners confused about the choice between the two. In this paper, we perform analytical and experimental analysis to reveal the comparison of training speed of MIndSpore and PyTorch on a single GPU. To ensure that our survey is as comprehensive as possible, we carefully selected neural networks in 2 main domains, which cover computer vision and natural language processing (NLP). The contribution of this work is twofold. First, we conduct detailed benchmarking experiments on MindSpore and PyTorch to analyze the reasons for their performance differences. This work provides guidance for end users to choose between these two frameworks.

KEYWORDS

Deep Learning, Performance, Mindspore, Pytorch, Comparison.

Distributed streaming graph partitioning approach with buffer support for big graph

Kekun HU, Gang Dong, Yaqian Zhao, Rengang Li, Jian Zhao, Qichun Cao, Hongbin Yang and Hongzhi Shi, Inspur Electronic Information Industry Co., Ltd., Jinan 250014, China & State Key Laboratory of High-end Server & Storage Technology, Inspur Group Co., Ltd., Jinan 250014, China

ABSTRACT

Graph partitioning is one of the key technologies for the parallel processing of big graphs. Existing offline partitioning algorithms are too time-consuming to partition big graphs while online ones cannot offer high-quality partitions. To this end, we propose a distributed streaming partitioning method with buffer support. It adopts a multi-loader-multi-partitioner architecture, where multiple loaders read graph data in parallel to accelerate data loading. Each partitioner first buffers and sorts the vertex stream read by the corresponding loader and then assigns vertices in the buffer by using one of our proposed four streaming heuristics with different goals. To further improve the quality of graph partitions, we design a restreaming mechanism. Experimental results on real and synthetic big graphs show that the proposed distributed streaming partitioning algorithm outperforms the state-of-the-art online ones in terms of partition quality and scalability.

KEYWORDS

Big graph, streaming partitioning, distributed, buffering, restreaming.

From Monolith to Microservices: Software Architecture for Autonomous UAV Infrastructure Inspection

Lea Matlekovic and Peter Schneider-Kamp, Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark

ABSTRACT

Linear-infrastructure Mission Control (LiMiC) is an application for autonomous Unmanned Aerial Vehicle (UAV) infrastructure inspection mission planning developed in monolithic software architecture. The application calculates routes along the infrastructure based on the users’ inputs, the number of UAVs participating in the mission, and UAVs’ locations. User selects inspection targets on a 2D map, and the application calculates inspection routes as a set of waypoints for each UAV. LiMiC1.0 is the latest application version migrated from monolith to microservices, continuously integrated and deployed using DevOps tools like GitLab, Docker, and Kubernetes in order to facilitate future features development, enable better traffic management, and improve the route calculation processing time. In this paper, we discuss the differences between the monolith and microservice architecture to justify our decision for migration. We describe the methodology for the application’s migration and implementation processes, technologies we use for continuous integration and deployment, and we present microservices improved performance results compared with the monolithic application.

KEYWORDS

autonomous UAV, mission planning, microservices, Docker, Kubernetes, CI/CD.

Data Mining and Knowledge Management Process by Bibliometric/Scientometric Study on Solid Waste Management in the Covid-19 Pandemic

Thamirys Suelle da Silva¹, Gabriel Fernandes Ângelo¹, Thaísia Venância Barbosa da Silva¹, Soraya Giovanetti El-Deir¹ and Jorge Alfredo Cerqueira Streit², ¹Environmental Management Research Group in Pernambuco, Brazil, ²University of Brasilia, Brazil

ABSTRACT

In early 2020, the world population was surprised by the pandemic caused by the coronavirus (SARS-CoV-2) which derives Covid-19, generating a major concern in society due to the rapidity of spread in various regions of the world, presenting different socio-environmental impacts. In view of the Covid-19 pandemic, solid waste management can lead to contamination, thus increasing the number of victims of SARSCoV-2 in all countries, which represents a risk of transmission. The objective of the research was to evaluate the progress of scientific research focused on the thematic axis, Covid-19, solid waste, and the environment, emphasizing the relevance of the articles through the state of the art. To this end, a review of articles on the ScienceDirect platform was carried out through the systematization of steps, seeking to identify the solid waste management procedures adopted by some countries during the confrontation of the Covid-19 pandemic. The increase in the production of articles is noticeable, especially from 2021, evidencing the thematic axis of the research study area that is on the rise. With this, it was found that Covid-19 and Solid waste has received little attention in the academic area and application in solid waste management in the pandemic.

KEYWORDS

Dissemination, Environment, SARS-CoV-2.

Contact

dkmpconf@yahoo.com

10th International Conference on Data Mining & Knowledge Management Process (DKMP 2022)

10^th International Conference on Data Mining & Knowledge Management Process (DKMP 2022)