Accepted Papers

2^nd International Conference on Big Data & IOT (BDIoT 2021)

September 18 ~ 19, 2021, Copenhagen, Denmark

Accepted Papers

Question Answering Systems and Inclusion: Pros and Cons

Victoria Firsanova, Department of Mathematical Linguistics, Saint Petersburg State University, Saint Petersburg, Russia

ABSTRACT

In the inclusion, automated QA might become an effective tool allowing, for example, to ask questions about the interaction between neurotypical and atypical people anonymously and get reliable information immediately. However, the controllability of such systems is challenging. Before the integration of QA in the inclusion, a research is required to prevent the generation of misleading and false answers, and verify that a system is safe and does not misrepresent or alter the information. Although the problem of data misrepresentation is not new, the approach presented in the paper is novel, because it highlights a particular NLP application in the field of social policy and healthcare. The study focuses on extractive and generative QA models based on BERT and GPT-2 pre-trained Transformers, fine-tuned on a Russian dataset for the inclusion of people with autism spectrum disorder. The source code is available to GitHub: https://github.com/vifirsanova/ASD-QA.

KEYWORDS

Natural Language Processing, Question Answering, Information Extraction, BERT, GPT-2.

Low-Resource Named Entity Recognition without Human Annotation

Zhenshan Bao, Yuezhang Wang and Wenbo Zhang, College of Computer Science, Beijing University of Technology, Beijing, China

ABSTRACT

Named entity recognition (NER) as one of the most fundamental tasks in natural language processing (NLP) has received extensive attention. Most existing approaches to NER rely on a large amount of high-quality annotations or a more complete specific entity lists. However, in practice, it is very expensive to obtain manually annotated data, and the list of entities that can be used is often not comprehensive. Using the entity list to automatically annotate data is a common annotation method, but the automatically annotated data is usually not perfect under low-resource conditions, including incomplete annotation data or non-annotated data. In this paper, we propose a NER system for complex data processing, which could use an entity list containing only a few entities to obtain incomplete annotation data, and train the NER model without human annotation. Our system extracts semantic features from a small number of samples by introducing a pretrained language model. Based on the incomplete annotations model, we relabel the data using a cross-iteration approach. We use the data filtering method to filter the training data used in the iteration process, and re-annotate the incomplete data through multiple iterations to obtain high-quality data. Each iteration will do corresponding grouping and processing according to different types of annotations, which can improve the model performance faster and reduce the number of iterations. The experimental results demonstrate that our proposed system can effectively perform low-resource NER tasks without human annotation.

KEYWORDS

Named entity recognition, Low resource natural language processing, Complex annotated data, Cross-iteration.

Online Assessment of English for Specific Purposes

Renáta Nagy, Doctoral School of Health Sciences, Department of Languages for Biomedical Purposes and Communication Medical School, University of Pécs, Hungary

ABSTRACT

The presentation is about the online assessment of English for Specific Purposes. The focus is on online as a possible form of language testing. The topic is up-to-date and its main target is to uncover the intriguing question of validity of online testing. A positive outcome of the study would indicate an optimistic and dazzling future in a number of aspects for not only language assessors but for future candidates as well. Namely, a base online setup which could be used worldwide for online tests. In order to achieve this, the research involves not only the theoretical but also the real, first-hand empirical side of testing from the point of view of examiners and examinees as well. Material and methods include surveys, needs analysis and trial versions of online tests. In this context, the presentation focuses on the possible questions, techniques and approaches of the issue of online assessment which can be used in language lessons as a type of classroom technique, too.

KEYWORDS

Assessment, Online, ESP, Online assessment, validity, testing.

End-to-End Chinese Dialect Discrimination with Self-Attention

Yangjie Dan, Fan Xu*, Mingwen Wang, School of Computer Information Engineering, Jiangxi Normal University, Nanchang 330022, China

ABSTRACT

Dialect discrimination has an important practical significance for protecting inheritance of dialects. The traditional dialect discrimination methods pay much attention to the underlying acoustic features, and ignore the meaning of the pronunciation itself, resulting in low performance. This paper systematically explores the validity of the pronunciation features of dialect speech composed of phoneme sequence information for dialect discrimination, and designs an end-to-end dialect discrimination model based on the multi-head self-attention mechanism. Specifically, we first adopt the residual convolution neural network and the multi-head self-attention mechanism to effectively extract the phoneme sequence features unique to different dialects to compose the novel phonetic features. Then, we perform dialect discrimination based on the extracted phonetic features using the self-attention mechanism and bidirectional long short-term memory networks. The experimental results on the large-scale benchmark 10- way Chinese dialect corpus released by iFLYTEK show that our model outperforms the state-of-the-art alternatives by large margin.

KEYWORDS

Dialect discrimination, Multi-head attention mechanism, Phonetic sequence, Connectionist temporal classification.

Malicious URL Detection using Machine Learning

Siddhant Hosalikar¹, Saikumar Iyer¹, Ankit Limbasiya¹ and Prof. Suvarna Chaure², ¹SIES Graduate School of Technology, Mumbai University, India, ²Department of Computer Engineering, Mumbai University, India

ABSTRACT

Phishing is a type of fraud, in which two actors, attacker and victim take part. The role of attacker is to create a phishing webpage by mimicking as an authorized one and embed the website in an URL or any other media. Detecting malicious URLs (Uniform Resource Locators) is difficult, yet interesting topic because attackers mostly generate the URLs randomly and researchers have to detect them while considering the behaviours behind the generated Malicious URLs. There are various detection schemes exist in anti-phishing area, URL-based scheme is safer and more realistic because of most important perspective: it does not require access to malicious webpage. In this paper, our aim is to provide a comprehensive investigation on detection of Malicious URLs by using Machine Learning algorithms. So, our proposed detection system consists of feature extraction of URLs, algorithms and bigdata technology.

KEYWORDS

URL, Malicious URL detection, Feature extraction, Machine learning.

Industrial big data analytics and cyber-physical systems for future maintenance & service innovation

TEMITOPE O AWODIJI, Computer Information Science Personnel, California Miramar University, California, USA

ABSTRACT

Based on Information and Communication Technologies (ICT) fast advancement and the integration of advanced analytics into manufacturing, products, and services, several industries face new opportunities and at the identical time challenges of maintaining their ability and market desires. Such integration, that is termed Cyber-physical Systems (CPS), is remodeling the industry into a future level. CPS facilitates the systematic transformation of large data into information, that makes the invisible patterns of degradations and inefficiencies visible and yields to better decision-making. This project focuses on existing trends within the development of industrial huge information analytics and cps. Then it, in brief, discusses a system architecture for applying cps in manufacturing referred to as 5C. The 5C architecture, comprises necessary steps to totally integrate cyber-physical systems within the manufacturing industry.

KEYWORDS

Information and Communication Technologies (ICT), Big Data, Analytic, Data, Data Architecture.

Natural Language Generation using Link Grammar for General Conversational Intelligence

Vignav Ramesh^1,2 and Anton Kolonin^2,3,4, ¹Saratoga High School, Saratoga, California, USA, ²Singularity NET Foundation, Amsterdam, Netherlands, ³Aigents, Novosibirsk, Russian Federation, ⁴Novosibirsk State University, Russian Federation

ABSTRACT

Many current artificial general intelligence (AGI) and natural language processing (NLP) architectures do not possess general conversational intelligence—that is, they either do not deal with language or are unable to convey knowledge in a form similar to the human language without manual, labor-intensive methods such as template-based customization. In this paper, we propose a new technique to automatically generate grammatically valid sentences using the Link Grammar database. This natural language generation method far outperforms current state-of-the-art baselines and may serve as the final component in a proto-AGI question answering pipeline that understandably handles natural language material.

KEYWORDS

Interpretable Artificial Intelligence, Formal Grammar, Natural Language Generation, Natural Language Processin.

Machine Translation Literacy: A Solution Against Algorithm Bias and Linguistic Impoverishment

Olivia-Jade Tribert, Concordia University, Canada

ABSTRACT

Literature on artificial intelligence, algorithms, how they operate and their epistemic effects on humans have increased significantly in the last decades. If these topics were solely discussed amongst searchers, AI specialists or scholars before, they are slowly making their way into popular discourse. Traditional media outlets, social media threads and filmmakers are now taking part in the discussions: generating their own opinions and public debates. Although the epistemic effects of AI and algorithms on humans are still widely debated, the conclusion remains the same everywhere: digital literacy is indispensable in the 21st century. In a world where everything is run by algorithms, it is crucial for individuals to understand how they work and learn how to think critically about them, as information found online is not always correct and always need to be verified. Machine translation systems (MT) are not an exception to this rule and yet, very little is said about them. In fact, aside from language professionals and a small group of persons, very few people know how to use MT systems efficiently. If the translated sentence reads well, it is simply copied and pasted elsewhere, without a second thought or verification. This paper will touch on two consequences of poor machine translation literacy. Namely, the perpetuation of gender bias and artificial language impoverishment. I will begin by defining what is digital and machine translation literacy. Then, I will show two recent studies that demonstrate gender bias in MT systems such as Google Translate, and the sociolinguistic effects of gender bias in their algorithm. Finally, I will offer some solutions to raise awareness to machine translation literacy that include both language professionals and MT systems engineers.

Grammatical Variation between China English and American English: The Case of Difficulties/Difficulty (In) Doing

Lixin Xia¹ and Hui Hu², ¹Laboratory of Language Engineering and Computing of Guangdong University of Foreign Studies, China, ²Center for Lexicographical Studies of Guangdong University of Foreign Studies, China

ABSTRACT

This study aims to investigate the factors that determine the presence or absence of the preposition in in the construction difficulties/difficulty (in) doing in China English and American English respectively, and to explore the difference between the two varieties on the prepositional use. The search strings difficulty (in) *ing and [have] difficulties (in) *ing were retrieved in the Corpus of Contemporary American English (COCA) and then in the Corpus of China English (CCE). By analyzing the statistics of the singular form from two corpora, a conclusion can be made that the construction difficulty in *ing is preferred in formal registers in both English varieties as opposed to informal ones. And the difference on the prepositional use between the two English varieties lies in the process of prepositional gerund being replaced by a directly linked gerund. The results of the plural form indicate that the complexity principle is operative in determining the prepositional use in the construction for the two English varieties. Importantly, the findings of this study have great implications for the assessment of the previous claims about the construction. And the study offers a new insight into the analyses of the construction.

KEYWORDS

China English, American English, Grammatical Construction, Corpus Linguistics, Natural Language Processing.

Prediction of Vaccination Side-Effects using Deep Learning

Farhan Uz Zaman, Tanvinur Rahman Siam and Zulker Nayen, Department of Computer Science and Engineering, BRAC University, Dhaka, Bangladesh

ABSTRACT

Deep learning has been very successful in the field of research which includes predictions. In this paper, one such prediction is discussed which can help to implement safe vaccination. Vaccination is very important in order to fight viral diseases such as covid-19. However, people at times have to go through unwanted side effects of the vaccinations which might often cause serious illness. Therefore, modern techniques are to be utilised for safe implementations of vaccines. In this research, Gated Recurrent Unit, GRU, which is a form of Recurrent Neural Network is used to predict whether a particular vaccine will have any side effect on a particular patient. The extracted predictions might be used before deciding whether a vaccine should be injected to a particular person or not.

KEYWORDS

Deep Learning, Gated Recurrent Unit, Recurrent Neural Network.

Multi Modal Space of Users’ Interests and Preferences in Social Networks

Evgeniia Shchepina, Evgeniia Egorova, Pavel Fedotov and Anatoliy Surikov, ITMO University, St. Petersburg, Russia

ABSTRACT

This paper aims to build a model of users’ interests in the multimodal space and obtain a comprehensive conclusion about users interests. To do this, we build the graphs based on data of separate modalities, find communities in these graphs, consider given communities in a single space further highlighting communities of users’ interests. The constructed model showed better results for analysis of user similarity in comparison with the baseline model. Scientific novelty of our approach is in the proposed method of multimodal clustering heterogeneous data on the interests and preferences of social network users. The distinctive feature of this method from traditional approaches, such as biclustering, is the possibility of flexible scaling the number of initial modalities. As a result, the average performance of the model increased by 12% in accuracy and by 11% in F1-score compared to the baseline model.

KEYWORDS

Community Detection, Multimodal Space of Interests, Social Network Analysis.

A Self-Supervised Tibetan-Chinese Vocabulary Alignment Method based on Adversarial Learning

Enshuai Hou, Jie Zhu, Tibet University Tibet Lhasa, China

ABSTRACT

Tibetan is a low-resource language. In order to alleviate the shortage of parallel corpus between Tibetan and Chinese, this paper uses two monolingual corpora and a small number of seed dictionaries to learn the semi-supervised method with seed dictionaries and self-supervised adversarial training method through the similarity calculation of word clusters in different embedded spaces and puts forward an improved self supervised adversarial learning method of Tibetan and Chinese monolingual data alignment only. The experimental results are as follows. First, the experimental results of Tibetan syllabics Chinese characters are not good, which reflects the weak semantic correlation between Tibetan syllabics and Chinese characters; second, the seed dictionary of semi-supervised method made before 10 predicted word accuracy of 66.5 (Tibetan - Chinese) and 74.8 (Chinese - Tibetan) results, to improve the self-supervision methods in both language directions have reached 53.5 accuracy.

KEYWORDS

Tibetan, Word alignment, Without supervision, adversarial training.

Sign Language Translation System - A Deep Learning Approach

Dr. A.K. Hota¹, A.K. Somasekhar² and Shom C. Abraham³, ¹National Informatics Centre, Mantralaya, Nava Raipur - 492002, Chhattisgarh, INDIA, ²National Informatics Centre, Mantralaya, Nava Raipur - 492002, Chhattisgarh, INDIA, ³National Informatics Centre, Collectorate Campus, Dantewada – 494449, Chhattisgarh, INDIA

ABSTRACT

People with voice/hearing impairments communicate by performing sign gestures. It is difficult for common people to understand them without a human interpreter. However, with the latest advances in Image Processing and Machine Learning we can get away with human interpreters by creating an automated system capable of recognising sign gestures. Also, the same system could be trained for a different sign language, which differs by region and native language. Though a humongous amount of work has been done in the context of American Sign Language (ASL), extending the same to Indian Sign Languages poses several challenges due to the involvement of motion and body parts other than hand. As a first attempt we develop a static sign language recognition system, trained on ASL dataset. The focus was on improving the accuracy of existing techniques using transfer learning and hyper parameter tuning.

KEYWORDS

Sign Language Translation, Transfer learning, CNN.

Subtractive Mountain Clustering Algorithm Applied to a Chatbot to Assist Elderly People in Medication Intake

Neuza Claro, Paulo A. Salgado, and T-P Azevedo Perdico´ulis, Escola de Ciˆencias e Tecnologia, Universidade de Tr´as-os-Montes e Alto Douro, Vila Real 5000–811, Portugal

ABSTRACT

Errors in medication intake among elderly people are very common. One of the main causes for this is their loss of the ability to retain information. The high amount of medicine intake required by an advanced age is another limiting factor. Thence, the design of an interactive aid-system, preferably using natural language, to help the older population with medication is in demand. A chatbot based on a subtractive cluster algorithm, included in unsupervised learned models, is the chosen solution, since the processing of natural languages is a necessary step in view to construct a chatbot able to answer questions that older people may pose upon themselves concerning a particular drug. The subtractive mountain clustering algorithm has been adapted to the problem of natural languages processing. This algorithm version allows for the association of a set of words into clusters. After finding the centre of every cluster — the most relevant word, all the others are aggregated according to a defined metric adapted to the language processing realm. All the relevant stored information is processed, as well as the questions, by the algorithm. The correct processing of the text enables the chatbot to produce answers that relate to the posed queries. To validate the method, we use the package insert of a drug as the available information and formulate associated questions.

KEYWORDS

chatbot, medicine intake aid-system, natural language processing, subtractive mountain clustering.

Tipping the Scales: A Corpus-Based Reconstruction of Adjective Scales in the Mcgill Pain Questionnaire

Miriam Stern, Program in Linguistics, Princeton University, Princeton, New Jersey, USA

ABSTRACT

The pain assessment is a critical clinical tool for eliciting information and diagnostic clues. The McGill Pain Questionnaire (MPQ) is one such metric, relying on 78 pain descriptors to assist in patient-physician communication. This study consists of a corpus-based reconstruction of the questionnaire’s adjective intensity rankings. Text from internet forums was collected and analyzed, and specific sentence constructions using adjectives were identified. Adjective intensity scales were then assembled from the collected adjective relationship, and compared to those in the questionnaire. Of 17 adjective relationships predicted by this research, 10 showed agreement with the MPQ. The results suggest at least a minimal level of predictable patterns of adjective use by people experiencing pain. However, while there was some agreement with the MPQ’s adjective orderings, the results of this study call into question the MPQ’s categories for adjective groupings. Suggestions for further research and clinical implications of this study are discussed.

KEYWORDS

Adjective Scales, Pain Assessment, McGill Pain Questionnaire, Corpus.

A Multi-Input Multi-Output Transformer based Hybrid Neural Network for Multi Class Privacy Disclosure Detection

A K M Nuhil Mehdy and Hoda Mehrpouyan, Department of Computer Science, Boise State University, Idaho, USA

ABSTRACT

The concern regarding users’ data privacy has risen to its highest level due to the massive increase in communication platforms, social networking sites, and greater users’ participation in online public discourse. An increasing number of people exchange private information via emails, text messages, and social media without being aware of the risks and implications. Since a significant amount of data is shared in textual form, researchers from the area of Natural Language Processing (NLP) have focused on developing tools and techniques to detect, classify, and sanitize private information in text data. However, most of the detection methods solely rely on the existence of pre-identified keywords in the text and disregard the inference of underlying meaning of the utterance in a specific context. Hence, in some situations these tools and algorithms fail to detect disclosure or the produced results are miss classified. In this paper, we propose a multi-input, multi-output hybrid neural network which utilizes transfer learning, linguistics, and metadata to learn the hidden patterns. Our goal is to better classify disclosure/non-disclosure content in terms of the context of situation. We trained and evaluated our model on a human-annotated ground truth data-set, containing a total of 5,400 tweets. The results show that the proposed model was able to identify privacy disclosure through tweets with an accuracy of 77.4% while classifying the information type of those tweets with an impressive accuracy of 99%, by jointly learning for two separate tasks.

KEYWORDS

Feature Engineering, neural networks, Natural Language Processing, Privacy.

A Visual Exploratory Data Analysis of COVID-19 Pandemic in India

Sonam Mittal¹ and Gaurav Sahu², ¹Associate Professor, Dept. of Information Technology, B K Birla Institute of Engineering & Technology, Pilani – 333031, Rajasthan, India, ²Assistant Professor, Dept. of Electronics & Communication Engineering, B K Birla Institute of Engineering & Technology, Pilani – 333031, Rajasthan, India

ABSTRACT

The number of cases of novel COVID 19 or SARS-CoV-2 has been increasing day by day in India and across the globe. This is caused by a syndrome in the respiratory tract which may lead to threat of life worldwide. It became very important to study and analyze the pattern of pandemic which is spread worldwide so that certain strategies can be set to compete with this life threatening problem. This paper is composed of the visual exploratory data analysis of India based on number of testing done, confirmed, recovered and death cases along with the comparative analysis of the mortality and recovery rate state wise across India. This paper uses Exploratory Data Analysis (EDA) technique to analyze the impact of COVID 19 on daily and weekly manner and arrangement of vaccines including India’s Healthcare to handle such pandemic.

KEYWORDS

COVID 19, Exploratory Data Analysis, Machine Learning, Daily and Weekly Analysis, State wise Vaccine.

Applying AI and Big Data for Sensitive Operations and Disaster Management

Yew Kee Wong, School of Information Engineering, HuangHuai University, Henan, China

ABSTRACT

Artificial intelligence has been a buzz word that is impacting every industry in the world. With the rise of such advanced technology, there will be always a question regarding its impact on our social life, environment and economy thus impacting all efforts exerted towards sustainable development. In the information era, enormous amounts of data have become available on hand to decision makers. Big data refers to datasets that are not only big, but also high in variety and velocity, which makes them difficult to handle using traditional tools and techniques. Due to the rapid growth of such data, solutions need to be studied and provided in order to handle and extract value and knowledge from these datasets for different industries and business operations. Numerous use cases have shown that AI can ensure an effective supply of information to citizens, users and customers in times of crisis. This paper aims to analyse some of the different methods and scenario which can be applied to AI and big data, as well as the opportunities provided by the application in various sensitive operations and disaster management.

KEYWORDS

Artificial Intelligence, Big Data, Sensitive Operations, Disaster Management.

Machine Learning Algorithms using Big Data Analysis

Yew Kee Wong, School of Information Engineering, HuangHuai University, Henan, China

ABSTRACT

In the information era, enormous amounts of data have become available on hand to decision makers. Big data refers to datasets that are not only big, but also high in variety and velocity, which makes them difficult to handle using traditional tools and techniques. Due to the rapid growth of such data, solutions need to be studied and provided in order to handle and extract value and knowledge from these datasets. Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention. Such minimal human intervention can be provided using big data analytics, which is the application of advanced analytics techniques on big data. This paper aims to analyse some of the different machine learning algorithms and methods which can be applied to big data analysis, as well as the opportunities provided by the application of big data analytics in various decision making domains.

KEYWORDS

Artificial Intelligence, Machine Learning, Big Data Analysis.

FCM – Computerized Calculations vs the Role of Experts

Arthur Yosef¹, Eli Shnaider² and Moti Schneider², ¹Tel Aviv-Yaffo Academic College, Israel, ²Netanya Academic College, Israel

ABSTRACT

This study presents a method to assign relative weights when constructing Fuzzy Cognitive Maps (FCMs). We introduce a method of computing relative weights of directed edges based on actual past behavior (historical data) of the relevant concepts. There is also a discussion addressing the role of experts in the process of constructing FCMs. The method presented here is intuitive, and does not require any restrictive assumptions. The weights are estimated during the design stage of FCM and before the recursive simulations are performed.

KEYWORDS

FCM, relative importance (weight), Fuzzy Logic, Soft Computing, Neural Networks.

Solving Fully Fuzzy Linear System of Equations with Multidimensional Pentagonal Fuzzy Number Matrices

Abdul Musavvir Parappathiyil, Department of Mathematics, Pondicherry University, India

ABSTRACT

In this article, a linear pentagonal fuzzy number (PFN) is defined. The symmetrical and non-symmetrical PFN pertaining to linear PFN are also defined here. Some basic arithmetic operations such as addition and multiplication of linear PFNs are mentioned here. Moreover, the concept of classical two-dimensional (2-D) pentagonal fuzzy number matrices (PFMs) are also mentioned. In addition, the notion of multidimensional of pentagonal fuzzy number matrices (MDPFMs) is also discussed along with some of its rules and operations like multiplication. Finally, in the light of all rules relating to both 2-D and MDPFMs, we take use of the concept of MDPFMs to solve the fully fuzzy linear system equation (FFLSE) with pentagonal fuzzy numbers as inputs. Two of the methods like singular value decomposition (SVD) method and row reduced echelon (RRE) method are also discussed to solve FFLSE with a numerical example.

KEYWORDS

MDPFMs, FFLSE for MDPFMs with RRE method, FFLSE for MDPFMs with SVD method.

Ensemble Creation using Fuzzy Similarity Measures and Feature Subset Evaluators

Valerie Cross and Mike Zmuda, Computer Science and Software Engineering, Miami University, Oxford, OH USA

ABSTRACT

Current machine learning research is addressing the problem that occurs when the data set includes numerous features but the number of training data is small. Microarray data, for example, typically has a very large number of features, the genes, as compared to the number of training data examples, the patients. An important research problem is to develop techniques to effectively reduce the number of features by selecting the best set of features for use in a machine learning process, referred to as the feature selection problem. Another means of addressing high dimensional data is the use of an ensemble of base classifiers. Ensembles have been shown to improve on the predictive performance of a single model by training multiple models and combining their predictions. This paper examines combining an enhancement of the random subspace model of feature selection using fuzzy set similarity measures with different measures of evaluating feature subsets in the construction of an ensemble classifier. Experimental results show potentially useful combinations.

KEYWORDS

Feature selection, fuzzy set similarity measures, concordance correlation coefficient, feature subset evaluators, microarray data, ensemble learning.

High-Frequency Cryptocurrency Trading Strategy Using Tweet Sentiment Analysis

Zhijun Chen, Department of Financial Engineering, SUSTech University, Shen Zhen, China

ABSTRACT

Sentiments are extracted from tweets with the hashtag of cryptocurrencies to predict the price and sentiment prediction model generates the parameters for optimization procedure to make decision and re-allocate the portfolio in the further step. Moreover, after the process of prediction, the evaluation, which is conducted with RMSE, MAE and R2, select the KNN and CART model for the prediction of Bitcoin and Ethereum respectively. During the process of portfolio optimization, this project is trying to use predictive prescription to robust the uncertainty and meanwhile take full advantages of auxiliary data such as sentiments. For the outcome of optimization, the portfolio allocation and returns fluctuate acutely as the illustration of figure.

KEYWORDS

Cryptocurrency Trading Portfolio, Sentiment Analysis, Machine Learning, Predictive Prescription, Robust Optimization Portfolio.

Blockchain and Data Analytics in Healthcare Management

Lakmal Rupasinghe, Kanishka Yapa, Sanjeevan. S, Imaz. M.M.M, Swarnamyuran. T, Vijeethan. S, Department of Computer Systems Engineering, Sri Lanka Institute of Information Technology, Colombo, Sri Lanka

ABSTRACT

Inter-Hospital and Intra-Hospital patient detail and medical records of patients with long-term illnesses to be handled through the Blockchain technology, which is far more secure and invulnerable than a standard encrypted cloud storage or local database. This transfer is to maintain the continuity of medical care without the hassle of tedious paperwork, physical storage, and retrieval of data. Such important data as medical data of patients must be securely stored without an intruder view or modification. The data must only be available to be accessible to certain authorized personnel. The data analytics functions integrated to a Dapp named Medi-X to the EHR management by considering the data protection policies as GDPR and HIPPA, scalability and interoperability limitations. A comprehensive information will be processed in machine learning in-order to give a doctor or a medical professional better insight on forthcoming illness of patients.

KEYWORDS

Interoperability, Scalability, EHR, GDPR, HIPPA, MediX, Blockchain, Machine learning.

A Comparative Framework for Evaluating Consensus Algorithms for Blockchains

Dipti Mahamuni, Ira A. Fulton Schools of Engineering – School Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ, USA

ABSTRACT

The past five years have seen a significant increase in the popularity of Decentralized Ledgers, commonly referred to as Blockchains. Many new protocols have launched to cater to a variety of applications serving individual consumers as well as enterprises. While research is conducted on individual consensus mechanisms and comparison against popular protocols, decision making and selection between the protocols is still amorphous. This paper proposes a comprehensive comparative framework to evaluate various consensus algorithms. We hope that such a framework will help evaluate current as well as future consensus algorithms objectively for a given use case.

KEYWORDS

Consensus Algorithms, Blockchain, Comparative Framework, Decentralized Ledgers.

DMC: Decentralized Mixer with Channel for Transaction Privacy Protection on Ethereum

Su Liu and Jian Wang, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China

ABSTRACT

Ethereum is a public blockchain platform with smart contract. However, it has transaction privacy issues due to the openness of the underlying ledger. Decentralized mixing schemes are presented to hide transaction relationship and transferred amount, but suffer from high transaction cost and long transaction latency. To overcome the two challenges, we propose the idea of batch accounting, adopting batch processing at the time of accounting. For further realization, we introduce payment channel technology into decentralized mixer. Since intermediate transactions between two parities do not need network consensus, our scheme can reduce both transaction cost and transaction latency. Moreover, we provide informal definitions and proofs of our schemes security. Finally, our scheme is implemented based on zk-SNARKs and Ganache, and experimental results show that our method is practicable, and through theoretical and experimental analysis, we can get our scheme performs more well with the higher number of transactions in batch.

KEYWORDS

Ethereum, transaction privacy, decentralized coin mixer, payment channel, zero-knowledge proof.

A Peer-to-Peer Ownership-Preserving Data Marketplace

Nicolas Serrano and Fredy Cuenca, Yachay Tech University

ABSTRACT

A data marketplace enables trading among those who expect to monetize their data and those interested in gaining insights from it. Unfortunately, the paradigm that drives current marketplaces suffers from data leakage: one who buys data can, in principle, resell the acquired data as many times as he wants, even despising non-disclosure agreements. This work proposes a peer-to-peer ownership-preserving data marketplace, which allows to sell data that can be computed, though not unveiled. First, an owner provides encrypted data to a buyer, who can perform arbitrary operations on this encrypted data as if it were regular data. Thanks to homomorphic encryption, the encrypted results obtained in the buyer-side can then be decrypted in the seller-side, in a second and definitive data exchange.

KEYWORDS

Data Marketplace, Homomorphic Encryption, Blockchain.

Creating multi-scripts sentiment analysis lexicons for Algerian, Moroccan and Tunisian dialect

K. Abidi and K. Smaili, Loria University of Lorraine, France

ABSTRACT

In this article, we tackle the issue of sentiment analysis of three Maghrebi dialects used in social networks. More precisely, we are interested by analysing sentiments in Algerian, Moroccan and Tunisian corpora. To do this, we built automatically three lexicons of sentiments, one for each dialect. Each entry of these lexicons is composed by a word, written in Arabic script (Modern Standard Arabic or dialect) or Latin script (Arabizi, French or English) with its polarity. In these lexicons, the semantic orientation of a word represented by an embedding vector is determined automatically by calculating its distance with several embedding seed words. The embedding vectors are trained on three large corpora collected from YouTube. In the experimental session, the proposed approach is evaluated by using few existing annotated corpora for Tunisian and Moroccan dialects. For the Algerian dialect, in addition to a small corpus we found in the literature, we collected and annotated a corpus of 10k comments extracted from YouTube. This corpus represents a valuable resource which will be proposed for free to the community.

KEYWORDS

Maghrebi dialect, Word embedding, Orientation semantic.

Sentiment Analysis of Covid-19 Vaccine Responses in Mexico

Jessica Salinas, Carlos Flores, Hector Ceballos, and Francisco Cantu, School of Engineering and Sciences, Tecnologico de Monterrey, Monterrey, Mexico

ABSTRACT

The amount of information that social networks can shed on a certain topic is exponential compared to conventional methods. As new COVID-19 vaccines are approved by COFEPRIS in Mexico, society is acting differently by showing approval or rejection of some of these vaccines on social networks. Data analytics has opened the possibility to process, explore, and analyse a large amount of information that comes from social networks and evaluate peoples sentiments towards a specific topic. In this analysis, we present a Sentiment Analysis of tweets related to COVID-19 vaccines in Mexico. The study involves the exploration of Twitter data to evaluate if there are preferences between the different vaccines available in Mexico and what patterns and behaviours can be observed in the community based on their reactions and opinions. This research will help to provide a first understanding of peoples opinions about the available vaccines and how these opinions are built to identify and avoid possible misinformation sources.

KEYWORDS

Twitter, Data Mining, Sentiment Analysis, Machine Learning, COVID-19.

An Intelligent and Interactive Gaming System to Promote Environment Awareness using Context-Based Storying

YILIN LUO¹ AND YU SUN², ¹Santa Margarita Catholic High School, Rancho Santa Margarita, CA 92688, ²California State Polytechnic University, Pomona, CA, 91768

ABSTRACT

Since a child, I loved to play video games, especially platform games such as Metal Slug™, Mega Man™, etc.. Therefore, I was inspired to design my own platform game; with this paper, I have the opportunity to introduce our platform game, which is a “JFF Game” we developed using Unity and Visual Studio 2019.

KEYWORDS

Machine Learning, 3D design, Gaming System.

An Automated Analytics Engine for College Program Selection using Machine Learning and Big Data Analysis

Jinhui Yu, Xinyu Luan, Yu Sun, California State Polytechnic University, Pomona, CA, 91768

ABSTRACT

Because of the differences in the structure and content of each website, it is often difficult for international applicants to obtain the application information of each school in time. They need to spend a lot of time manually collecting and sorting information. Especially when the information of the school may be constantly updated, the information may become very inaccurate for international applicants. we designed a tool including three main steps to solve the problem: crawling links, processing web pages, and building my pages. In compiling languages, we mainly use Python and store the crawled data in JSON format [4]. In the process of crawling links, we mainly used beautiful soup to parse HTML and designed crawler. In this paper, we use Python language to design a system. First, we use the crawler method to fetch all the links related to the admission information on the schools official website. Then we traverse these links, and use the noise_remove [5] method to process their corresponding page contents, so as to further narrow the scope of effective information and save these processed contents in the JSON files. Finally, we use the Flask framework to integrate these contents into my front-end page conveniently and efficiently, so that it has the complete function of integrating and displaying information.

KEYWORDS

Data Crawler, Data Processing, Web framework.

An Anxiety and Stress Reducing Platform based on Minigames and Emotional Release using Machine Learning and Big Data Analysis

Selina Gong¹, John Morris² and Yu Sun², ¹University High School 4771 Campus Dr, Irvine, CA 92612, ²California State Polytechnic University, Pomona, CA, 91768

ABSTRACT

Today’s students are faced with stress and anxiety as a result of school or work life and have added pressure from social media and technology [7]. Stress is heavily related to many symptoms of depression such as irritability or difficulty with concentration as well as symptoms of anxiety like restlessness or feeling tired [8]. Some of these students are able to find a healthy outlet for stress, however other students may not be able to. We have created a program where students will be able to destress and explore their emotions with the help of suggestions from our system based on previously explored thoughts. Our program uses machine learning to help students get the most effective stress relief by suggesting different mental health exercises to try based on input given by the user and provides emotional comfort based on the user’s preferences [9].

KEYWORDS

relaxing, destress, game, journal.

A Deep Learning based Approach to Argument Recommendation

Guangjie Li, Yi Tang, Biyi Yi, Xiang Zhang and Yan He, National Innovation Institute of Defense Technology, Beijing, China

ABSTRACT

Code completion is one of the most useful features provided by advanced IDEs and is widely used by software developers. However, as a kind of code completion, recommending arguments for method calls is less used. Most of existing argument recommendation approaches provide a long list of syntactically correct candidate arguments, which is difficult for software engineers to select the correct arguments from the long list. To this end, we propose a deep learning based approach to recommending arguments instantly when programmers type in method names they intend to invoke. First, we extract context information from a large corpus of open-source applications. Second, we preprocess the extracted dataset, which involves natural language processing and data embedding. Third, we feed the preprocessed dataset to a specially designed convolutional neural network to rank and recommend actual arguments. With the resulting CNN model trained with sample applications, we can sort the candidate arguments in a reasonable order and recommend the first one as the correct argument. We evaluate the proposed approach on 100 open-source Java applications. Results suggest that the proposed approach outperforms the state-of-the-art approaches in recommending arguments.

KEYWORDS

Argument recommendation, Code Completion, CNN, Deep Learning.

Analysis of Dis Flooding Attack In RPL-based Internet of Things: A Case Study

Rajasekar V.R and Rajkumar. S, School of Computer Science and Engineering, VIT University, Vellore, India

ABSTRACT

In RPL-based Wireless Sensor Nodes or IoT networks, devices send DODAG Information Solicitation (DIS) messages to connect to the network. A malicious node can take advantage of this mechanism to send unauthorized DIS messages to neighboring nodes, thereby initiating a DIS flooding attack. The DIS flooding attack significantly increases the overhead associated with network control packets, degrades network performance, and consumes additional power at network nodes. This paper examines the impact of a DIS flooding attack on a 6LoWPAN network based on RPL using four distinct scenarios, each with two test cases. Our study shows that increasing the number of attacker nodes has a noticeable negative effect on PDR, E2ED delay, and power consumption. The experimental result indicates a significant decrease in packet delivery rate, an increase in packet end-to-end delay, and increased power consumption. Additionally, the overhearing mechanism is used to examine the number of packets sent and received via an overhearing mechanism.

KEYWORDS

Internet of Things, DIS flooding, RPL, 6LoWPAN, LLN.

Yolo v5 on Gaofen 3 Airplane Detection data

Fernando Lima, Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Brazil

ABSTRACT

Thanks to the increase in computational resources and data availability, deep learning-based object detection methods have achieved numerous successes in computer vision, and more recently in remote sens-ing. The \You Only Look Once" (or just YOLO) framework is a family of deep learning-based object detection methods that detect targets in a single stage. Single stage detectors provides faster but less accurate detections in comparison to multi-stage detectors. However, the newest versions of YOLO are reported to achieve very accurate results on a range of datasets. This paper details the processes of using the YOLO framework for detecting ships from Synthetic aperture radar (SAR) images using the recently introduced GAOFEN 3 Challenge Dataset. A comparison, between the performances and the improvements in the newest versions of YOLO, is done and the results shows Yolo v5 as a framework with potential to be a big leap towards improving SAR image ship detections performance under limited data.

KEYWORDS

yolo, object detection, sar, gaofen.

Introducing the Viewpoint in the Resource Description using Machine Learning

Ouahiba Djama, Lire Laboratory, University of Abdelhamid Mehri Constantine 2, Constantine, Algeria

ABSTRACT

Search engines allow providing the user with data and information according to their interests and specialty. Thus, it is necessary to exploit descriptions of the resources, which take into consideration viewpoints. Generally, the resource descriptions are available in RDF (e.g., DBPedia of Wikipedia content). However, these descriptions do not take into consideration viewpoints. In this paper, we propose a new approach, which allows converting a classic RDF resource description to a resource description that takes into consideration viewpoints. To detect viewpoints in the document, a machine learning technique will be exploited on an instanced ontology. This latter allows representing the viewpoint in a given domain.

KEYWORDS

Resource Description, RDF, Viewpoint, Ontology & Machine Learning.

An Intelligent Drone System to Automate the Avoidance of Collison using AI and Computer Vision Techniques

Steven Zhang¹ and Yu Sun², ¹Crean Lutheran High School, Irvine, CA 92618, ²California State Polytechnic University, Pomona, CA, 91768

ABSTRACT

People love to fly drones, but unfortunately many end up crashing or losing them. As the technology of flying drones improves, more people are getting involved. With the number of users increasing, people find that flying drones with sensors is safer because it can automatically avoid problems, but such drones are expensive. This paper describes an inexpensive UAV (unmanned aerial vehicle) system that eliminates the need for sensors and uses only the camera to avoid collisions. This program helps avoid drone crashes and losses. We used the Tello Education drone as our testing drone, which is only outfitted with a camera. Using the camera feed and transmitting that data to the program, the program will then give commands to the drone to avoid collisions.

KEYWORDS

Machine Learning, Electrical Engineering, Computer Vision, Drone.

An Ml-Based Memory Leak Detection Scheme for Network Devices

Minghui Wang, Jiangxuan Xie, Xinan Yang, Xiangqiao Ao, AI Research Institute, H3C Technology Co., Ltd

ABSTRACT

The network is very important to the normal operation of all aspects of society and economy, and the memory leak of network device is a software failure that seriously damages the stability of the system. Some common memory checking tools are not suitable for network devices that are running online, so the operation staff can only constantly monitor the memory usage and infer from experience, which has been proved to be inefficient and unreliable. This paper first obtains the memory utilization information which is not affected by the occupancy of the large-scale resource table entries. By analyzing its monotonicity and correlation with the memory leak sequence sets constructed by simulation, the memory leak fault can be found in time. Then, it predicts the time when the memory reaches the alarm threshold and uses the rule engine to obtain more detailed diagnostic information. The simulation experiments show that the scheme is computationally efficient and the precision rate is as high as 100%, which solves this problem well and is of great practical significance.

KEYWORDS

Memory leak, Resource table entry utilization, Correlation coefficient, Time Sequence monotonicity, Machine Learning.

Building a Smart Mirror for the Purposes of Increased Productivity and Better Mental Health, Complete with an App

Jonathan Liu¹ and Yu Sun², ¹Arcadia High School, Arcadia, CA, 91007, ²California State Polytechnic University, Pomona, CA, 91768

ABSTRACT

Oftentimes, people find themselves staring in the mirror mindlessly while brushing their teeth or putting on clothes. This time, which may seem unnoticeable at first, can accumulate to a significant amount of time wasted when looked at over the duration of a year, and can easily be repurposed to better suit one’s goals. In this paper, we describe the construction and implementation of the Smart Mirror, an intelligent mirror that boasts several features in order to improve an individual’s daily productivity. As the name suggests, it is a mirror, and so will not take anything away from the user when he or she is performing their daily teeth brushing. It also hosts facial recognition, and can recognize one’s emotions from one glimpse through the camera. The Mirror also comes with an app that is available on the Google Play Store, which helps input tasks and daily reminders that can be viewed on the Smart Mirror UI.

KEYWORDS

Thunkable, Google Firebase, Android, Raspberry Pi.

Baby Cry Classifications using Deep Learning

Shane Grayson¹ and Wilson Zhu², ¹Windward School, Los Angeles, USA, ²Diamond Bar High School, Los Angeles, USA

ABSTRACT

Oftentimes parents are awakened or interrupted by the never-ending cries of their newborn babies. Attempts to quell their children’s anguish sometimes results in increased louder cries. By first transforming these cries into waveforms and then into sound spectrograms we were then able to test the different efficiencies and accuracies of three different computer learning modules: a support vector machine, a 2-layer neural network, and a long short-term memory model. Finally, we were able to develop an automatic sorter that categorizes each cry by the meaning behind it using similarities and differences between them. Using this method, we hope to eliminate error and time wastage when trying to stop the cries of a baby. We trained, validated, and tested both modules on a series of audio files. After testing the programs, the results demonstrate an accuracy at determining the source of the cries to a degree well above the majority of the time. This program proves that it is more than sufficient at correctly finding the source of a babys cry, which will allow for less time wastage on the parents behalf.

KEYWORDS

Infant Cry, Deep Learning, Convolutional Neural Network, Audio Classification.

Reaching Fairness in Biometric Technology

Feiyang Tang, Norwegian Computing Center, Oslo, Norway

ABSTRACT

The accelerated development of biometric technology changes many aspects of our daily lives; however, much research recently points out many hidden severe biases in biometric recognition technology. For example, in the United States, recent large-scale protests for racial equality have drawn attention to the fact that security algorithms with biometric recognition technology unfairly target minorities. Over the years, activists have pointed out that search algorithms and biometric recognition technologies often associate negative images or words with disadvantaged communities. According to recent studies from Harvard University, African Americans are more likely to be arrested and incarcerated for minor crimes than white Americans by using facial recognition technology. As artificial intelligence technology is rapidly changing our society and economy, we should be alarmed that it may also undermine our societys equality and justice, emphasizing the importance of reaching fairness in biometric recognition technology. This paper provides a brief analysis of biometric technology with a focus on the underlying fairness and bias issue. We start from the unique properties of biometrics then followed by analysis of different stages of biometric processing: collection, analysis, and application. Finally, we provide a short discussion on possible fairness concerns among them and give our views and recommendations at the end. We hope this review could rise more awareness in relevant issues and motivates deeper reflections in biometric technology applications.

KEYWORDS

Biometric System, Fairness, Bias, Privacy.

Dealing Crisis Management using AI

Yew Kee Wong, School of Information Engineering, HuangHuai University, Henan, China

ABSTRACT

KEYWORDS

Artificial Intelligence, Big Data, Business Operations, Crisis Management.

Contact Us

bdiotconference@yahoo.com

Page updated

Google Sites

Report abuse

2nd International Conference on Big Data & IOT (BDIoT 2021)

September 18 ~ 19, 2021, Copenhagen, Denmark

Accepted Papers

2^nd International Conference on Big Data & IOT (BDIoT 2021)