4th International Conference on Natural Language Processing and Computational Linguistics (NLPCL 2023)

July 22 ~ 23, 2023, Toronto, Canada




Accepted Papers


Challenges Performing IoT Forensic Investigations and Frameworks Addressing These Challenges: a Systematic Literature Review


Boitumelo Nkwe and Michael Kyobe, Department of Information Systems, University of Cape Town, Cape Town, South Africa

ABSTRACT

The increasing adoption of the Internet of Things (IoT) has introduced unique challenges to both users and the cybersecurity domain. As IoT evolves, cybersecurity threats and vulnerabilities meted against IoT devices also increased. IoT devices are susceptible to breaches, therefore forensic investigations focusing on IoT technologies need to be improved. This study aims to provide an understanding of the challenges in IoT forensics investigation since 2017. Furthermore, the article looks at different solutions in the form of frameworks and methodologies that have been developed to address these challenges and the gaps in the existing literature. The researchers adopted a systematic review methodology to guide the synthesis of the literature. The key issues highlighted in this study include the heterogeneous nature of IoT, the lack of proper investigative tools and frameworks that encompass all levels of IoT forensics, the lack of privacy, and the lack of standardization in the investigation process.

KEYWORDS

Internet of Things (IoT), IoT forensics, Cybersecurity, Challenges


Testing Goal-model Based Quality Attributes in Software Product Lines


Ibtesam Gwasem, Weichang Du, Andrew McAllister, Department of Computer Science, University of New Brunswick, Fredericton, Canada

ABSTRACT

Testing in software product lines is crucial for delivering high-quality software products that meet user needs. While research on software product line testing has primarily focused on functional attributes, verifying quality requirements has been overlooked. Quality attributes (i.e., security) are essential for a satisfactory user experience. Goal models have proven useful for capturing both functional and non-functional (quality) requirements in early system development stages. Researchers propose using goal models as a foundation for creating test cases to validate software systems. This paper introduces a methodology for verifying quality requirements in software product lines using goal models. The focus is on testing quality attributes of final software applications in product lines developed based on the feature and goal model approach. The methodology facilitates defining testable quality requirements, effective testing scope design, concrete test case creation, and efficient test case reuse. A prototype testing system was developed to support the methodology.

KEYWORDS

Software product lines, testing, non-functional requirements, test case reuse, reuse test case design


An Overview of Patient Monitoring Systems Based on Machine Learning in the Internet of Things


Saeid Javid, Department of Computer Engineering, University ofSalamanca, Salamanca, Spain

ABSTRACT

The Internet of Things (IoT) is widely used in many applications including patient monitoring systems. The purpose of healthcare systems is to monitor the patient in order to prevent risks, deal with critical cases quickly, and to establish long-distance communication for remote treatments. The Internet of Things has a long-term impact on patient monitoring, patient management, patient’s physiological information, and critical care. The sensors are connected to the patient to collect the data which are first sent to system controls and then autonomously to healthcare providers. There are a variety of biosensors that send the medical information to mobile applications or websites via wireless network. Healthcare providers are thus enabled to monitor the patient and control the treatment outside of hospital walls, therefore; the IoT medical devices require accurate methods of patient monitoring in order to predict the patient’s condition more precisely, and to increase the efficiency of the network. An overview of patient monitoring systems based on machine learning in the Internet of Things is provided in the following article.

KEYWORDS

IOT, Machine Learning, HealthCare, WBAN


Timor-leste’s Mikrolet Vehicle Multi-class Classification Performance Evaluation Using Deep Learning Models


Abreu Andre Boavida1, Shan Lu2, 1Faculty of Engineering, Universidade Nacional Timor-Lorosae, Hera, East Timor, 2Department of Intelligent Science and Engineering, Gifu University, Japan

ABSTRACT

This research uses a multiclass deep learning model to identify images containing different mikrolet vehicles. In addition, we have studied multiclass classification research by assessing the performance of mikrolet pictures with the same color and comparing the classification performance of several proposed deep learning models. The objectives of this study are two-fold: first, to identify images containing different mikrolet vehicle lanes, and second, compare four deep learning models for evaluation of the accuracy of performance precision, recall, f1-measure, micro and macro average ROC curves areas. We found that our network was able to classify the subjects significantly.

KEYWORDS

Deep Learning Models, Classification, Mikrolet, Multiclass, Vehicles.


Automatic Discovery of Multiword Nouns Based on Syntactic-semantic Representations

Xiaoqin HU, Beijing Language and Culture University, China

ABSTRACT

This research aims to explore a deeper representation of the internal structure and semantic relationship of multiword nouns (MWNs) for improving MWN discovery. This representation focuses on MWN formations, which follow a series of categorical and semantic constraints. The internal semantic relations of MWNs are represented by semantic class combinations of constituents, and the internal structures are represented by a set of categorical combinations in a hierarchy. These linguistically motivated semantic features are combined with statistically motivated semantic features, and the results present an improvement for MWN discovery.

KEYWORDS

Multiword nouns, automatic discovery, internal structure, internal semantic relation, semantic class combination, linguistic knowledge


A Deep Learning System for Domain-specific Speech Recognition

Yanan Jia, Businessolver, USA

ABSTRACT

As human-machine voice interfaces provide easy access to increasingly intelligent machines, many state-of-the art automatic speech recognition (ASR) systems are proposed. However, commercial ASR systems usually have poor performance on domain-specific speech especially under low-resource settings. The author works with pre-trained DeepSpeech2 and Wav2Vec2 acoustic models to develop benefit-specific ASR systems. The domain specific data are collected using proposed semi-supervised learning annotation with little human intervention. The best performance comes from a fine-tuned Wav2Vec2-Large-LV60 acoustic model with an external KenLM, which surpasses the Google and AWS ASR systems on benefit-specific speech. The viability of using error prone ASR transcriptions as part of spoken language understanding (SLU) is also investigated. Results of a benefit-specific natural language understanding (NLU) task show that the domain-specific fine-tuned ASR system can outperform the commercial ASR systems even when its transcriptions have higher word error rate (WER), and the results between fine-tuned ASR and human transcriptions are similar.

KEYWORDS

Automatic Speech Recognition, DeepSpeech2, Wav2Vec2, Semi-supervised learning annotation, Spoken language understanding


Acoustic Characteristics and Related Influencing Factors on the Acquisition of Retroflex Vowels in Putonghua by Learners of Different Native Language

HongLi Deng1 XinZhong Liu2 XianMing Bei3 1School of Liberal Arts, Jinan University, Guangzhou, Guangdong, China College of Culture and Communication, Guangxi Science and Technology NormalUniversity Laibin, Guangxi, China 2School of Liberal Arts, Jinan University, Guangzhou, Guangdong, China 3School of Chinese Language and Culture, Guangdong University of Foreign Studies, Guangzhou, Guangdong, China

ABSTRACT

Based on the theory of second language acquisition, The article analyzes the pronunciation of the “er” which means “two” in Chinese by learners from different native language backgroud, and explores the key acoustic characteristics and related influencing factors on the acquisition of retroflex vowels in S&P. The study has 4 findings: (1) F2 of retroflex vowels rises in S&P , and F3 falls, and the difference between F3 endpoint and F2 endpoint is small, which means that F3 and F2 are closer to each other. The key characteristics to the learner s “two” pronunciation is the slope of F3 and the value of F3, the greater the slope of F3 falls, the smaller the value, and the closer the learner’s “two” pronunciation is to the S&P. (2) Retroflex vowels are highly marked phonemes, which makes it difficult to acquire them. (3) Factors such as the acquisition environment, the length of second language acquisition time,influence the acquisition of retroflex vowel. (4)The early learning environment promotes the acquisition of retroflex vowel in Putonghua.

KEYWORDS

retroflex vowels; slope of F3; acoustic characteristics; influencing factors ;acquisition theory


Gpt-3 Models Are Few-shot Financial Reasoners

Raul Salles de Padua Imran Qureshi and Mustafa U. Karakaplan, Stanford University, University of Texas at Austin, University of South Carolina

ABSTRACT

Financial analysis is an important tool for evaluating company performance. Practitioners work to answer financial questions to make profitable investment decisions, and use advanced quantitative analyses to do so. As a result, Financial Question Answering (QA) is a question answering task that requires deep reasoning about numbers. Furthermore, it is unknown how well pre-trained language models can reason in the financial domain. The current state-of-the-art requires a retriever to collect relevant facts about the financial question from the text and a generator to produce a valid financial program and a final answer. However, recently large language models like GPT-3 [3] have achieved state-of-the-art performance on wide variety of tasks with just a few shot examples. We run several experiments with GPT-3 and find that a separate retrieval model and logic engine continue to be essential components to achieving SOTA performance in this task, particularly due to the precise nature of financial questions and the complex information stored in financial documents. With this understanding, our refined prompt engineering approach on GPT-3 achieves near SOTA accuracy without any fine-tuning.

KEYWORDS

Question Answering, GPT-3, Financial Question Answering, Large Language Models, Information Retrieval, BERT, RoBERTa, F


Deduplicating Highly Similar News in Large News Corpora

Wu Zhang Miotech, 69 Jervois St, Sheung Wan, Hong Kong

ABSTRACT

Duplicated training data usually downgrades machine learning models’ performance. This paper presents a practical algorithm for efficiently deduplicating highly similar news articles in large datasets. Our algorithm comprises three components - document embedding, similarity computation, and clustering- each utilizing specific algorithms and tools to optimize both speed and performance. We demonstrate the efficacy of our approach by accurately deduplicating over 7 million news articles in less than 4 hours.

KEYWORDS

News deduplication, natural language processing


Synthetic Source Low-resource Indonesian Augmentation for Colloquial Neural Machine Translation

Asrul Sani Ariesandy1, Mukhlis Amien2, Alham Fikri Aji3, Radityo Eko Prasojo4,1Sekolah Tinggi Informatika & Komputer Indonesia (STIKI), Malang, Indonesia, 2Kata.ai Research Team, Jakarta, Indonesia, 3Beijing Institute of Technology, China, 4Faculty of Computer Science, Universitas Indonesia

ABSTRACT

Neural Machine Translation (NMT) works better in Indonesian when it takes into account local dialects, geographical context, and regional culture (colloquialism). NMT is typically domaindependent and style-dependent, and it requires lots of training data. State-of-the-art NMT models often fall short in handling colloquial variations of its source language and the lack of parallel data in this regard is a challenging hurdle in systematically improving the existing models, despite the fact that Indonesians frequently employ colloquial language. In this work, we develop a colloquial Indonesian-English test-set collected from YouTube transcript and Twitter. We perform synthetic style augmentation to the source formal Indonesian language and show that it improves the baseline Id-En models (in BLEU) over the new test data.

KEYWORDS

Neural Machine Translation, NMT, Natural Language Processing, NLP, Low-Resource Language, Indonesian, Artificial Intelligence


What Makes a Good Dataset for Symbol Description Reading?

Karol Lynch1, Joern Ploennigs1,2 and Bradley Eck1, 1IBM Research Europe, Dublin, Ireland, 2University of Rostock, Rostock, Germany

ABSTRACT

The usage of mathematical formulas as concise representations of a document’s key ideas is common practice. Correctly interpreting these formulas, by identifying mathematical symbols and extracting their descriptions, is an important task in document understanding. This paper makes the following contributions to the mathematical identifier description reading (MIDR) task: (i) introduces the Math Formula Question Answering Dataset (MFQuAD) with 7508 annotated identifier occurrences; (ii) describes novel variations of the noun phrase ranking approach for the MIDR task; (iii) reports experimental results for the SOTA noun phrase ranking approach and our novel variations of the approach, providing problem insights and a performance baseline; (iv) provides a position on the features that make an effective dataset for the MIDR task.

KEYWORDS

Information Extraction, Reading Comprehension, Large Language Models


Robust Hadith Ir Using Knowledge-graphs and Semantic-similarity Classification

Omar Shafie, Kareem Darwish, and Bernard J. Jansen, Hamad Bin Khalifa University

ABSTRACT

Hadith is the term used to describe the narration of the sayings and actions of Prophet Mohammad (p.b.u.h.). The study of Hadith can be modeled into a pipeline of tasks performed on a collection of textual data. Although many attempts have been made for developing Hadith search engines, existing solutions are repetitive, text-based, and manually annotated. This research documents 6 Hadith Retrieval methods, discusses their limitations, and introduces 2 methods for robust narrative retrieval. Namely, we address the challenge of user needs by reformulating the problem in a two-fold solution: declarative knowledge-graph querying; and semantic-similarity classification for Takhreej groups retrieving. The classifier was built by fine-tuning an AraBERT transformer model on a 200k pairs sample and scored 90% recall and precision. This work demonstrated how the Hadith Retrieval could be more ef icient and insightful with auser-centered methodology, which is an under-explored area with high potential.

KEYWORDS

Hadith, Knowledge-graphs, Arabic, Semantic Similarity


A Customised Speech Recognition System for the Indian Telesales Industry: Addressing Indic Languages and Accents

Ved Vasu Sharma and Anit Bhandari, SquadStack Inc. New Delhi, India

ABSTRACT

The Telesales market is worth about US$ 27 Bn in 2022 and is expected to grow to US$ 55 Bn by 2029 across the globe. India is not only a huge consumer market but also a large rewarding market for telesales experts with low operational costs. Even when a large number of call recordings are generated on a daily basis, telesales is one of the most untouched markets when it comes to Engineering Innovations and AI applications involving Linguistics, NLP, Audio processing, etc. Speech Recognition is generally a pre-requisite for most of these applications. Hence, we have proposed a solution for Speech Recognition for the telesales industry in a huge market like India, operating primarily in Indic languages and accents for which no general-purpose ASR has acceptable performance. Our model achieves a competitive WER of 19.42% on the telesales dataset in the Indian context.

KEYWORDS

Speech Recognition, Telesales, Indic Languages, Hinglish, Audio Processing.


Analyzing Demand and Supply of Jobs to Enhance Smart Employability in Oman and Uae

Mohamed Abdul Karim Sadiq1, Thirumurugan Shanmugam2 and Nasser AlFannah3, 1, 2Department of Information Technology, College of Computing and Information Science, University of Technology and Applied Sciences, Suhar campus, Oman, 3Deputation, Ministry of Transport, Communications and Information Technology, Sultanate of Oman

ABSTRACT

Despite the existence of many job portals, both employers and candidates face difficulties in the search process. Primarily, the problem arises due to the mismatch in expressing the requirements with the contents of profiles. Though certain automated systems exist to support the recruitment procedure, application of Natural Language Processing (NLP) could enhance the extraction of useful information and rank the resume documents of candidates. The challenge is to transform unstructured textual data to structured reusable information. This issue is more evident in the case of young job seekers with minimal or no previous work experience. Suitable techniques in NLP are explored along with relevant data sets to enhance the employment process in a smart manner.

KEYWORDS

Natural Language Processing, Human Resources Management, Information Extraction, Resume Matching.


Women’s Language or Powerless Language

Sherif Alalfy, Cairo university, Egypt

ABSTRACT

Scientists have begun to record their observations about the difference between the language of women and the language of men since the middle of the seventeenth century AD, when studies appeared showing the linguistic differences between the sexes in the societies of the Amazon and the Caribbean. However, interest in the subject increased with the beginning of the twentieth century at the hands of anthropologists, and then the interest grew when the efforts of anthropologists mixed with those of sociologists. At that time, the conviction increased that gender, like other social structures, such as class, geographical area, and age, are all factors affecting speech. However, are these features that scientists have observed, are they achieved in all societies? Is it caused by the woman’s gender or her weakness? If it is the first, then the phenomenon must be expelled, and if it is the second, then the man can also be characterized by it.

KEYWORDS

Women’s language - man’s language - powerless language.


Bank Personnel Fraud Detection


Ekrem Duman, Department of Industrial Engineering, Ozyegin University, Istanbul, Turkey

ABSTRACT

For building successful predictive models one should have enough number of examples for the class to be predicted (the positive class). When the number of examples of the positive class is very small, building strong predictive models becomes a very challenging task. In this study we pick up one such problem: predicting the bank personnel which might commit fraud (stealing money from customer accounts). For this problem, in order to have a strong enough predictive model, we decided to combine the powers of descriptive and predictive modeling techniques where we developed several descriptive models and used them as an input of a predictive model at the last stage. The results show that our solution approach perform quite well.

KEYWORDS

Personnel fraud, predictive modeling, banking.


Exploring the Relationship Between Knowledge Management, Information Technology Investment, and Economic Prosperity in Middle Eastern Countries: a Study of the United Arab Emirates and Saudi Arabia


Amer Abuhantash, Department of Business Administration , University of the People, United States

ABSTRACT

This research study aims to investigate the relationship between knowledge management, information technology (IT) investment, and economic prosperity in Middle Eastern countries, with a specific focus on the United Arab Emirates (UAE) and Saudi Arabia. The Middle East region has witnessed significant economic growth and transformation in recent decades, and understanding the factors that contribute to economic prosperity is crucial for sustainable development. Knowledge management and IT investment have emerged as important drivers of economic growth in various contexts. This research will examine how knowledge management practices and IT investments influence economic prosperity in the UAE and Saudi Arabia, exploring similarities and differences between the two countries. The findings of this research will contribute to the existing literature on knowledge management, IT investment, and economic prosperity, while providing valuable insights for policymakers and business leaders in the Middle East.

KEYWORDS

knowledge management, information technology investment, economic prosperity, Middle Eastern countries, United Arab Emirates, Saudi Arabia.


Knowledge Management Effect on E-learning in Virtual Universities


Changiz Valmohammadi1 and Farkhondeh Mortaz Hejri2 1Department of Industrial Management, South Tehran Branch, Islamic Azad University, Tehran, Iran, 2Department of IT Management, South Tehran Branch, Islamic Azad University, Tehran, Iran

ABSTRACT

Knowledge management (KM) is an integrated approach and is used to apply knowledge at different levels of assets to enhance better organizational performance. While knowledge management is accepted in many sectors and organizations, higher education does not yet take full benefit of the provided opportunities by knowledge management. Also, while past research has attempted to highlight the importance of implementation, knowledge management lacks a single, clear pattern in higher education. After the Covid 19 pandemic and the virtualization of university education, the need to pay more attention to this issue became more and more important. This article aims to discuss how to use knowledge management to rise e-learning effectiveness in virtual universities. This study is developmental and applied in terms of purpose. To implement the quantitative method, we designed an online questionnaire; after measuring its validity and reliability index, it was provided to the statistical community including academic and industry experts. One hundred ninety-eight complete questionnaires were received. Structural equation technique and Smart PLS software were employed to analyse data. The results indicate that knowledge management positively affects e-learning quality and effectiveness.

KEYWORDS

Knowledge Management (KM), e-learning, Virtual university, IRAN.



A Data Management Stage Model (Dmsm) for Assessing the Maturity of Digital Transformation


Shuo Yan1 and Dr Jeff Jones2, 1Warwick Manufacturing Group, University of Warwick, IMC Building, WMG, The University of Warwick, Coventry, CV4 7AL, UK, 2Warwick Manufacturing Group, University of Warwick, Director of Academic Quality, WMG, The University of Warwick, Coventry, CV4 7AL, UK

ABSTRACT

In this era of digital transformation, organizations and individuals have the ability to generate, collect, process, and analyse a vast volume of data to gain benefits and valuable insights. However, despite recognising the competitive advantage of digital transformation, they often face challenges in navigating the transformation process and aligning their existing practices with on-premise technologies. The main objective of this research is threefold:1) To establish a Capability Maturity Model (CMM) for data management, known as the Data Management Stage Model (DMSM). The DMSM provides a standardized, structured, and objective roadmap for organizations to determine their current data management maturity stage.2) To guide organizations in progressing to the next maturity stage by using the DMSM as a framework. 3) To describe the process of creating an up-to-date and comprehensive DMSM. This research adopts a qualitative approach, utilising a Systematic Literature Review (SLR) conducted over the past ten years, applying Grounded Theory. A total of 14 well-defined CMMs of data management were reviewed, summarized, and consolidated. Compared to previous models, the DMSM includes more specific dimensions, such as Data Management in General, Data Acquisition and Quality, Data Visualization, Data Sharing, Data Preservation, Data Analysis, Budget, Infrastructure, and Data Governance. Within each dimension, this research consistently defines four sub-dimensions to describe the stage development processes throughout the DMSM. These sub-dimensions are Commitment to Perform, Ability to Perform, Activities Performed, and Process Assessment. The DMSM comprises six stages: Stages 0, 1, 2, 3, and 4, with Stage 5 representing the most developed and mature stage. Due to the complexity of the establishment processes of the DMSM, this paper selects Stage 0 as an example to demonstrate the detailed model consolidation process in the research methodology section. Following that, a comprehensive description of the entire DMSM will be provided.

KEYWORDS

Data management, Capability Maturity Model, CMMs, Stage Model, Digital Transformation, Data Governance.


From Research to Practice: Does AI Promote or Prevent the Use of an MBSE Tool?

Asma Charfi, Takwa Kochbati, ChokriMraidha and Université Paris-Saclay, CEA, List, F-91120, Palaiseau, France


ABSTRACT

In this paper, we will investigate the role that can play the AI in adopting a Model Based SystemEngineering (MBSE) tool. The MBSE approach is widely adopted in the development of complex systems (real time systems, cyber physical systems, system of systems, etc.) however, in the practice, the tools implementing this approach are facing several problems and are far from being adopted by system provider. We argue that the AI can be useful and beneficial if integrated in the right MBSE step and that the need of using AI techniques (either Machine learnings, NLP …) in MBSE tools should be more investigated to fit the stakeholders’ need.

KEYWORDS

MBSE, AI, ALM, NLP, MBSE Tool


Authentication Technique Based on Image Recognition: Example of Quantitative Evaluation by Probabilistic Model Checker

Bojan Nokovic, McMaster University, Computing and Software Department, 1280 Main Street West, Hamilton, Ontario, Canada


ABSTRACT

Over probabilistic models, we analyze an innovative online authentication process based on image recognition. For true positive identification, the user needs to recognize the relationship between identified objects on distinct images which we call an outer relation, and the relation between objects in the same image which we call an inner relation. We use probabilistic computational tree logic formulas (PCTL) to quantify false-negative detection and analyze the proposed authentication process. That helps to tune up the process and make it more convenient for the user while maintaining the integrity of the authentication process.

KEYWORDS

Hierarchical State Machines; Probabilistic Model Checker; Costs/rewards; Verification


Framework for Development of Platforms for Knowledge Management and Transformative Innovation –F2DKTIN-

Nieto Bernal Wilson1 and Vega Jurado Jaider2, 1Department of Systems Engineering, Norte Universidad, Barranquilla-Colombia, 2Departament of Entrepreneurship and Management, Norte Universidad, Barranquilla-Colombia


ABSTRACT

Disruptive technologies today have become the catalyst to promote the development of individual and organizational capacities. The work focuses on integrating the use of emerging technologies such as cloud computing, big data, the internet of things, blockchain, datasets, data warehouses, lake data, machine learning, data analytics, simulation, hyper-automation, and social networks, among others, to respond to the organizations requirements associated with the fulfillment of objectives, goals, kpis, regulation, trends, the creation of goods and services, loyalty, and integrated management. The response to this type of requirement translates operationally into new project processes and programs that are structured in a corporate portfolio and that are addressed with intensive agile methodologies in collaboration, communication, self-management, and virtual environments, enabling the development of these new organizational capabilities and especially the innovation that organizations require to interact within an ecosystem that today is highly digital (customers, suppliers, employees, regulators, investors, states, and competitors in general). this work presents a framework for the comprehensive development of platforms for knowledge management and transformative innovation, based on emerging information architecture (IA, DW, ML) for its implementation, so it is convenient to develop a current profile of the organization that allows identifying which are those innovation capabilities that you want to develop (organization, processes, products, services, technologies, knowledge, r&d, among others) so that from there an objective profile is established which must establish the desired capabilities, finally, the difference between the current capacity and the objective capacity gives rise to a gap, which is addressed through an implementation plan that allows achieving the desired innovation capacities, this process develops on a timeline and is projected recursively, giving rise to a process of continuous improvement in terms of development of innovation capabilities within a digital ecosystem.

KEYWORDS

Framework for developed, Disruptive Digital Platform, DevOps as Service, MOPD.


Requirements Engineering Framework Using Contextualization, Profiling, and Modelling

Arunkumar Khannur1, Manjunatha Hiremath2, 1Computer Science Department, Christ University,Hosur Rd, Bhavani Nagar, S.G. Palya, Bengaluru, Karnataka 560029. India, 2Hosur Rd, Bhavani Nagar, S.G. Palya, Bengaluru, Karnataka 560029. India.


ABSTRACT

Quality software evolves out of Quality requirements. In this regard, tremendous progress has taken place over the years in requirements engineering (RE) practices to collect, corroborate, represent, and specify requirements to arrive at the formal configuration item referred to as Requirements Specification (RS) document. These RE practice predominantly use either process-centric approach based on frameworks like CMM process maturity model or people-centric agile approach to engage stakeholders. Both approaches generally use analytical problem solving by using principles of functional decomposition and stepwise refinement. These approaches though have advanced RE are parochial making each phase in software development life cycle (SDLC) to bog down with higher defect containment in phase-end artefacts and final software, and also, are not aligned to the present-day digital world with pervasive computing needs that are characterised by disruptive technologies, user centricity, automation, and continuous deployment. As a result, there are issues like missing alignment to pervasive computing, user reluctance to accepting the software, unrealised requirements, unstable software with increase in maintenance costs, and dismayed customers. Surveys and findings indicated that major root causes for all these limitations and issues are mainly weak requirements engineering and management practices, poor understanding of requirements environment, lacking understanding of user requirements, changing requirements, and weak traceability. We are proposing a novel RE approach to address these perennial software requirements related problems and issues to make RE practices relevant to address reasons and causes of weak requirements and to control defects in order to developing quality software. The proposed approach uses concepts of profiling, contextualization, and quality engineering principles so to perform requirements modelling. Requirements Profiling is a combination different types of profiling that includes User, Business Value, Feature, and Risk. Contextualization is carried out to perform Defining Context, System Visualization, and Identifying Interactions in the System. Finally, Requirements Modelling is done by using Clustering, Prioritization, Representation, and Specification to produce effective abstraction by making use of visual representation by using notations. Quality Engineering principles are used across all the activities so as to find and fix defects at the very point of their creation in order to reduce software developmental and maintenance costs and effort and increase the possibility of on-time delivery, improved traceability, increase in user acceptance, and stable product that is easily maintainable and continuously deployable.

KEYWORDS

Modelling, Quality Profiling, Risk Profiling, Quality Engineering, Requirements Traceability, Software Stability, User Acceptance.


Propagation of Software Requirements to Executable Tests

Nader Kesserwan1, Jameela Al-Jaroodi1, Nader Mohamed2 and Imad Jawhar3, 1Department of Engineering, Robert Morris University, Pittsburgh, USA, 2Department of Computing and Engineering Technology, Pennsylvania Western University, California, Pennsylvania, USA, 3Faculy of Engineering, AlMaaref University, Beirut, Lebanon


ABSTRACT

Executable test cases start at the beginning of testing as abstract requirements representing the system behavior. The manual development of these test cases is labor-intensive, error-prone, and costly. Describing the system requirements into behavioral models and transforming them into a scripting language has the potential to automate their propagation to executable tests. Ideally, an efficient testing process should begin as early as possible, refine the use cases with sufficient details and facilitate test case creation. We propose an approach that enables automation in propagating functional requirements to executable test cases through model transformation. The proposed testing process begins with capturing system behavior as visual use cases, adopting a domain-specific language, defining transformation rules, and finally transforming the use cases into executable tests.

KEYWORDS

Model-Driven, Model Transformation, Requirement Propagation, & Test Cases.



Optimizing Estimation Accuracy: Leveraging the Winner-takes-all Approach


FuChe Wu1 and Andrew Dellinger2, 1Providence University, Taiwan, 2Elon University, USA

ABSTRACT

This paper proposes an algorithm for improving estimation accuracy in industrial applications. Traditionally, a weighted sum method is used to map two views, but this approach often leads to blurry results. Instead, the winner-take-all approach is suggested as a means of achieving better accuracy. Three criteria are introduced for evaluating image quality, including sharpness (which measures the effect of motion artifacts from the RGB camera), flatness (which estimates the number of parts in the depth image that belong to plane parts), and fitness (which checks the match between the current view and existing map). While depth images provide 3D structure of the environment, they typically lack sufficient resolution to deliver accurate results. However, by estimating a plane, accuracy can be improved and a more precise boundary can be obtained from the higher resolution of the RGB image.

KEYWORDS

Accuracy, winner-take-all, depth image.


Based on I/q Signal and Ai Algorithms to Identify the Unique Fingerprint of Each Mobile Phone Hardware Device


Kaiqing Fan1, Siwen Yang2, Zongbao Dai3 , 1United Automotive Electronic Systems Co., Ltd, China, AI Lab, 2Beijing Brainpower Pharma Consulting Co. Ltd, China, Data Team, 3United Automotive Electronic Systems Co., Ltd, China, Big Data Team

ABSTRACT

Traditionally, to identify the unique fingerprint of each mobile phone has its limitations. These limitations cause lots of the Internet black industry chain, the internet black and gray industry caused a loss of trillions of dollars due to fraud all over the world, of which smart phones accounted for a large part in 2020. Based on the I/Q signal and AI algorithm, we can correctly identify the unique fingerprint of each mobile phone hardware device. It is because the I/Q signals from mobile phone hardware device cannot be changed by hackers; But under the traditional method, lots of parameters which are made of fingerprint of each mobile phone can be changed by hackers. Through the combination of I/Q signals and AI algorithm, we can correctly identify 95% and higher of the unique fingerprint of mobile phones. Furthermore, because we can track and identify each mobile phone hardware device, the cost of the Internet black industry chain will be much higher than before, since each hardware device costs a lot. This way may be a nice method to distinguish the Internet black industry chain.

KEYWORDS

I/Q Signal, AI Algorithm, Identification, Unique Fingerprint, Mobile Phone Hardware Device .


Deepfake Audio and Video Detection System


Shrijak Dahal and Aayushma Pant, Institute of Engineering, Tribhuvan University, Nepal

ABSTRACT

Manipulation of facial appearances, voices and photos using deep generative approaches also called Deepfake has favoured a wide range of benign and malicious applications. The evil use of this technology has created frauds, false allegations and hoaxes that undermine and destabilize organizations. Even though many algorithms have been seen to produce realistic faces and voices, there are some artifacts that are hidden from the naked eyes and need to be alleviated. In this research work, we focus on identifying and detecting Deepfake videos and audios. DCT (Discrete Cosine Transform) is implemented for extracting image features which is passed on multi layered CNN (Convolutional Neural Network) architecture network along with original video images. Likewise, filter banks and MFCC (Mel-Frequency Cepstral Coefficient) are used for audio processing followed by CNN architecture to detect real and fake audios.

KEYWORDS

Audio Forensics, CNN, DCT, Deepfake, FFT, MFCC, Video Forensics .


India Machinery and Transport Equipment Exports Forecasting With External Factors Using Chain of Hybrid Sarimax-garch Model


RS Chadha, Jugesh, Embedded System, CDAC Noida

ABSTRACT

In order to choose the best forecasting model, it is essential to comprehend time series data since external influences like social, economic, and political events may affect the way the data behave. We take into account outside variables that could have an impact on our target variable to improve predictions. India Machinery and Transport Equipment Dataset is gathered from various sources, cleaned, preprocessed, missing value removed, data types converted, and dependent variables identified before being used. By incorporating the SARIMAX model with the GARCH model and experimenting with various parameters and conditions, the current study seeks to enhance it. The SARIMAX-GARCH Model is a time series forecasting method used to predict market swings and export values. A helper model is developed to forecast the exogenous value in order to forecast the export value, which is then used as input for the final model. We performed hyperparameter tuning to find the ideal settings in order to enhance the hybrid models performance. The results of this study provide estimates for future export values and contribute to a better understanding of Indias Machinery and Transport Equipment export market. This research focuses on export value forecasting with the use of future exogenous variables. Exogenous factors are essential for predicting market changes and, as a result, support the forecasting of precise export values.

KEYWORDS

Time Series Forecasting, SARIMAX (for Seasonal Autoregressive Integrated Moving Average with eXogenous), Data Analysis, Forecasting Models, External Factors, Export Prediction, GARCH, ARCH, pdmarima, Hybrid Forecasting model, Exogenous variable, Export Forecasting, India Machinery and Transport export.


Mitigating Bias and Enhancing Fairness in Recommender Systems


Rodrigo Ferrari de Souza and Marcelo Garcia Manzato, Mathematics and Computer Science Institute, University of São Paulo, Av. Trab. São Carlense 400, São Carlos-SP, Brazil

ABSTRACT

Popularity bias and unfairness are problems caused by the lack of calibration in recommender systems. Despite their relation to each other, works that intend to reduce the effect of popularity bias do not consider the variation of accuracy among different groups of users, which is unfair to some of them. Other studies aim to calibrate the system to generate fair recommendations, but usually are still biased towards popularity. We propose a system calibration approach based on users’ preferences for different levels of popularity of items and their genres. The proposed approach works in the post-processing stage and can be combined with different recommendation models. We evaluated the system with offline experiments using two state-of-art datasets, three recommender algorithms, six baselines, and different metrics for popularity, fairness, and accuracy. The results indicate reduced popularity bias and improved fairness and diversity.

KEYWORDS

Recommender System, Popularity Bias, Fairness, Calibration.


Invertible Neural Network for Time Series Anomaly Detection


Malgorzata Schwab and Ashis Biswas, Department of Computer Science and Engineering, University of Colorado at Denver

ABSTRACT

In this paper we explore the applicability of Invertible Neural Network architecture for anomaly detection techniques on time series data and hypothesize that a reversible network designed with embedded convolutional transformations is an excellent fit for that task. We leverage previous findings on autoencoders as well as deep generative maximum-likelihood training focused primarily on processing images and apply them in the innovative way to the time-series data exemplified by electrocardiograms or industrial sensor data. We recognize a challenge of common denominator patterns that occur across the entire sample domain, which might dominate the likelihoods and introduce intrinsic bias. We then mitigate it by applying wavelet transforms to decompose a time series into a set of subcomponents to eliminate low-level similarities between the healthy and abnormal samples. We conclude that the Invertible Neural Network designed to solve in-verse problems learns data reconstructions extremely well, and thus provides a remarkable solution for anomaly detection that is applicable to medical diagnostics, as well as other use cases in the similar problem space, such as predictive maintenance or detecting out-of-distribution inputs to protect integrity of systems relying on machine learning components.

KEYWORDS

Invertible, Autoencoder, Anomaly.


Examining Accuracy Heterogeneities in Classification of Multilingual Ai-generated Text


Raghav Subramaniam, Independent Researcher

ABSTRACT

Accurate multilingual differentiation between AI-generated text and human-generated text is crucial on a global scale in the fields of schooling, academia, and more as plagiarism and cheating become even more easily facilitated by generative AI tools. Current tools for detecting AI-generated text such as OpenAI’s “AI Text Classifier” are already fairly easy to discredit, as misclassifications have shown to be fairly common, but such vulnerabilities often persist in slightly different ways when non-English languages are observed as well: classification of human-written text as AI-generated, misclassification of AI-generated text as human-written, and other such perplexing scenarios could be more likely to occur in specific language environments than others. In this research, “AI Text Classifier” will be tested on a set of AI and human generated texts in English, Swahili, German, Arabic, Chinese, and Hindi (all with randomly-selected topics and writing styles) to observe the nature of possible accuracy differences.

KEYWORDS

Artificial Intelligence, Generative AI, AI Detection, Natural Language Processing, GPT.


From Conventional to Creative: Writing Pedagogy in the Age of Artificial Intelligence


Sofiya Shahiwala, Department of Humanities and Social Sciences, Indian Institute of Technology (IIT), Dhanbad, India

ABSTRACT

As the world faces a revolutionary intervention by Artificial Intelligence (AI), the educational framework nears a pivoting point. Although AI is believed to be as good as the humans producing it, the usage of its potential as a catalyst to further teaching and learning objectives can be of immense aid to educationists. The tensions in conventional teaching pedagogy gave way to digital trends barging into classroom teaching. Nevertheless, the teacher remains a dooming presence at the centre of a four-walled room called ‘class’. The fourth revolution in education, or Education 4.0, attempts to move farther away from teacher-centred classes to a more holistic learning experience. The present research paper acts as a descriptive study of the involvement of Artificial Intelligence in teaching and learning modules to inculcate better outcomes in a creative writing class. At its centre, the study keeps the concerns of teaching creative writing while advocating the promise of AI to education. It explores the drawbacks of conventional teaching methods and furthers the discourse on learner-centric education by introducing AI-assisted sessions.

KEYWORDS

Artificial Intelligence; Creative Writing; Education 4.0; Future of Education; Classroom Learning; Teaching Pedagogy.


Improved Student Learning Experience in Large Programming Classes Using Pseudo-flipped Method


Ritu Chaturvedi, University of Guelph, Ontario, Canada

ABSTRACT

In an effort to improve student engagement in large programming classes, this study proposes a pseudo-flipped (PF) method of teaching that combines the core principles of two popular teaching methods, traditional and flipped (or inverted), thereby mitigating the drawbacks of these methods. In traditional teaching, class time is mostly used by instructors to teach a class using pre-prepared lecture slides and smartboards or similar alternatives, whereas students, mostly passively, listen to the lecture and take notes. In a purely flipped class, all resources traditionally taught in classroom are moved outside the classroom, either as text, video, audio, students are expected to read or view lectures before class, and the instructor uses class time in solving problems. In the proposed PF method, students are taught in a traditional way for half the allocated time. For the other half, students solve problems in class with the instructor’s assistance. Similar to the flipped method, in PF, students learn concepts on their own outside the classroom using an interactive textbook. To fill gaps in their knowledge, instructors spend time teaching those core concepts in class by solving problems. PF promotes active learning by engaging students towards solving problems on learnt concepts. A survey is done in a programming class to find student opinion on how useful this pseudo-flipped method is on student engagement as opposed to traditional teaching. Both quantitative and qualitative analysis of the survey responses strongly favour the proposed method, with more than 70% of students in favour of it.


Design Implications for Next Generation Chatbots With Education 5.0


Gayane Sedrakyan1,2, Simone Borsci2, Stéphanie M. van den Berg2, Jos van Hillegersberg1, Bernard P. Veldkamp2, 1Department of Industrial Engineering and Business Information Systems, University of Twente, The Netherlands, 2Department of Cognition, Data, and Education, University of Twente, The Netherlands

ABSTRACT

Prior research reports that the use of chatbots in education has the potential to significantly improve learning performance and satisfaction and provide learners with engaged experiences. Chatbots are used in multiple ways in education to deliver course content, improve student interaction, encourage collaborative learning, practice questioning and answering, etc. Furthermore, the use of chatbots during teaching enables teachers to analyse and assess students learning abilities and level of understanding. However, most research on educational instruments, including educational chatbots, lacks both theoretical support from recent advancements in the learning sciences and an evidence-informed foundation for choosing the data and information models. As a result chatbot instruments used in education can rather deliver more harm instead of the intended benefits. In this research, we attempt to ground the educational chatbot design onto learning sciences. We posit that information communicated through educational chatbots needs to be formulated as a feedback dialogue to be effectively understood by learners. Additionally, we link the design of educational chatbots to learner-centric and mindful technology concepts following Industry 5.0 digitization strategy.

KEYWORDS

Education Digitalization, Chatbots, Digital Feedback, Education 5.0


A Crowdsourcing-based Analytical Enginefor Virusand Malware Detection Using Artificial Intelligenceand Machine Learning


Zonglin Zhang1, Marisabel Chang2, 1Portola High School, 1001 Cadence, Irvine, CA 92618, 2Computer Science Department, California State Polytechnic University, Pomona, CA91768

ABSTRACT

In recent years, cybersecurity has grown increasingly salient in people’s lives [8]. With the spread of various newmalware, the security risks of executable network installation packages are dramatically increasing, so problemspersist, rising with the growth of web users. This research work, aimed at a Crowdsourcing-based Analytical Engine for Virus and Malware Detection, prevents malware by examining MS Windows Portable Executable (PE) headers. YARA, a database from Kaggle, and data extracted from actual malware files were combined to createafinal dataset [9]. Comparing each section of the PE header to improve the detection accuracy, the final absoluteaccuracy is between 98% and 99%, and the front end displays the final prediction results through PythonGUI.

KEYWORDS

AI, Machine learning, Cybersecurity.