8th International Conference on Computer Science and Information Technology (CoSIT 2021)


March 27 ~ 28, 2021, Sydney, Australia

Accepted Papers


A Deep Learning Approach to Nightfire Detection Based on Low-light Satellite

Yue Wang, Ye Ni, Xutao Li and Yunming Ye, Department of Computer Science, Harbin Institute of Technology, Shenzhen, China

ABSTRACT

Wildfires are a serious disaster, which often cause severe damages to forests and plants. Without an early detection and suitable control action, a small wildfire could grow into a big and serious one. The problem is especially fatal at night, as firefighters in general miss the chance to detect the wildfires in the very first few hours. Low-light satellites, which take pictures at night, offer an opportunity to detect night fire timely. However, previous studies identify night fires based on threshold methods or conventional machine learning approaches, which are not robust and accurate enough. In this paper, we develop a new deep learning approach, which determines night fire locations by a pixel-level classification on low-light remote sensing image. Experimental results on VIIRS data demonstrate the superiority and effectiveness of the proposed method, which outperforms conventional threshold and machine learning approaches.

KEYWORDS

night fire detection, pixel segmentation, low-light satellite image


Managing the complexity of climate change

Shann Turnbull, Princicpal: International Institute for self-governance, Sydney

ABSTRACT

This paper indicates how the knowledge of complex systems can be put into practice to counter climate change. A contribution of the paper is to show how individual behaviour, institutional analysis, political science and management can be grounded and integrated into the complexity of natural systems to introduce mutual sustainability. Bytes are used as the unit of analysis to explain how nature governs complexity on a more reliable and comprehensive basis than can be achieved by humans using markets and hierarchies. Tax incentives are described to increase revenues while encouraging organisations to adopt elements of ecological governance found in nature and in some social organisations identified by Ostrom and the author. Ecological corporations provide benefits for all stakeholders. This makes them a common good to promote global common goods like enriching democracy from the bottom up while countering: climate change, pollution, and inequalities in power, wealth and income.

KEYWORDS

Bytes, Climate change, Common good, Ecological governance, Tensegrity


Optimization of Random Forest Model for Assessing and Predicting Geological Hazards Susceptibility in Lingyun County

Chunfang Kong1,2,3,4, Kai Xu1,2,3,*, Junzuo Wang1, Yiping Tian1,3,4, Zhiting Zhang1,3,4, and Zhengping Weng1,3,4, 1School of Computer, China University of Geosciences, Wuhan, 430074, China, 2Hubei Key Laboratory of Intelligent Geo-Information Processing, Wuhan, 430074, China, 3Innovation Center of Mineral Resources Exploration Engineering Technology in Bedrock Area, Ministry of Natural Resources, Guiyang, 550081, China, 4National-Local Joint Engineering Laboratory on Digital Preservation and Innovative Technologies for the Culture of Traditional Villages and Towns, Hengyang, 421000, China

ABSTRACT

The random forest (RF) model is improved by the optimization of unbalanced geological hazards dataset, differentiation of continuous geological hazards evaluation factors, sample similarity calculation, and iterative method for finding optimal random characteristics by calculating out-of-bagger errors. The geological hazards susceptibility evaluation model based on optimized RF (OPRF) was established and used to assess the susceptibility for Lingyun County. Then, ROC curve and field investigation were performed to verify the efficiency for different geological hazards susceptibility assessment models. The AUC values for five models were estimated as 0.766, 0.814, 0.842, 0.846 and 0.934, respectively, which indicated that the prediction accuracy of the OPRF model can be as high as 93.4%. This result demonstrated that the geological hazards susceptibility assessment model based on OPRF has the highest prediction accuracy. Furthermore, the OPRF model could be extended to other regions with similar geological environment backgrounds for geological hazards susceptibility assessment and prediction.

KEYWORDS

Geological Hazards, Susceptibility Evaluation, Random Forest (RF), Optimized RF (OPRF), Geographical Information Systems (GIS).


Research on Edge Cloud Quality Model and Evaluation System

Liyun Yang, Hang Chen and Yangyang Zhang, Cloud computing Center, China Electronic Standardization Institute, Beijing, China

ABSTRACT

Aiming at the system and software quality of the edge cloud, this paper proposes an Edge Cloud Quality Model (ECQM) and evaluation system. The ECQM is composed of capability quality model and process quality model. The capability quality model includes technical capability, product capability, service capability, application capability, and security capability. The process quality modelincludes procurement, design, deployment, delivery, operation, and cloud service level agreement. Based on the ECQM, we construct an evaluation system and give the definition and description of edge cloud Quality Evaluation Level (QEL).

KEYWORDS

System and Software Quality, ECQM, Edge Cloud, Evaluation System.


Adgraph: Accurate, Large Mini-batch Training on Graphs

Zhang Lizhi, Lai Zhiquan, Liu Feng, Ran Zhejiang, Parallel and Distributed Key Laboratory of National Defence Technology, National University of Defence Technology, Changsha, China

ABSTRACT

In recent years, graph neural networks (GNNs) have been widely used in the fields of social networks, recommendation systems and knowledge graphs. In these domains, the scale of graph data is immense, so that distributed graph learning is required for efficient GNNs training. Graph partition-based methods are widely adopted to scale the graph training. However, most of the previous works focus on scalability other than the accuracy and are not thoroughly evaluated on large-scale graphs. In this paper, we introduce ADGraph, exploring how to improve accuracy while keeping large-scale graph training scalability. Firstly, to maintain complete neighbourhood information of the training nodes after graph partitioning, we assign l-hop neighbours of the training nodes to the same partition. We also analyse the accuracy and runtime performance of graph training, with different l-hop settings. Secondly, multi-layer neighbourhood sampling is performed on each partition, so that the mini-batch generated can accurately train target nodes. We find that partial neighbourhood sampling can achieve better performance than full neighbourhood sampling. Thirdly, to further overcome the generalization error caused by large-batch training, we choose appropriate batchsize after graph partition and apply the linear scaling rule in distributed optimization. We evaluate ADGraph using GraphSage and GAT models with ogbn-products and Reddit datasets on 32 GPUs. Experimental results show that ADGraph achieves better performance than the benchmark accuracy of GraphSage and GAT, while getting 24-29 times speedup on 32 GPUs.

KEYWORDS

Graph neural networks, Distributed training, Multi-GPU, Deep learning, Parameter Server.


Context-aware Short-term Interest First Model for Session-based Recommendation

Haomei Duan and Jinghua Zhu, School of Computer Science and Technology, Heilongjiang University, Harbin, China

ABSTRACT

In the case that user profiles are not available, the recommendation based on anonymous session is particularly important, which main aim is to predict the items that the user may click at the next moment based on the users access sequence over a while. In recent years, with the development of recurrent neural network, attention mechanism, and graph neural network, the performance of session-based recommendation has been greatly improved. However, the previous methods did not comprehensively consider the context dependencies and short-term interests first of the session. Therefore, we propose a context-aware short-term interest first model (CASIF). In CASIF, we dynamically construct a graph structure for session sequences and capture rich context dependencies via graph neural network (GNN), latent feature vectors are captured as inputs of the next step. Then we build the short-term interest first module, which can to capture the users general interests from the session in the context of long-term memory, at the same time get the users current interests from the item of the last click. In the end, the short-term and long-term interests are combined as the final interest and multiplied by the candidate vector to obtain the recommendation probability.

KEYWORDS

recommendation, session-based, context-aware, neural, network, attention


User Characteristics of Olympic Gold Medallists on Instagram: A Quantitative Analysis of Rio2016

Amirhosein Bodaghi, Federal University of Rio de Janeiro, Department of Computing Science, Centre of Mathematical and Natural Sciences – CCMN, Rio de Janeiro, Brazil

ABSTRACT

The purpose of this study is to examine Olympic champions’ characteristics on Instagram to first understand whether differences exist between male and female athletes and then to find possible correlations between these characteristics. We utilized a content analytic method to analyse Olympic gold medallists’ photographs on Instagram. By this way we fetched data from Instagram pages of all those Rio2016 Olympic gold medallists who had their account publically available. The analysis of data revealed the existence of a positive monotonic relationship between the ratio of following/follower and the ratio of engagement/follower for men gold medallists, and a strong negative monotonic relationship between age and ratio of self-presenting post of both men and women gold medallists which even take a linear form for men. These findings aligned with the relative theories and literature may come together to help the athletes to manage and expand their personal brand in social media.

KEYWORDS

Instagram, self-presenting, user characteristics, Olympics, gold medallists


Towards Adversarial Genetic Text Generation

Deniz Kavi, The Koç School, Turkey

ABSTRACT

Text generation is the task of generating natural language, and producing outputs similar to or better than human texts. Due to deep learning’s recent success in the field of natural language processing, computer generated text has come closer to becoming indistinguishable to human writing. Genetic Algorithms have not been as popular in the field of text generation. We propose a genetic algorithm combined with text classification and clustering models which automatically grade the texts generated by the genetic algorithm. The genetic algorithm is given poorly generated texts from a Markov chain, these texts are then graded by a text classifier and a text clustering model. We then apply crossover to pairs of texts, with emphasis on those that received higher grades. Changes to the grading system and further improvements to the genetic algorithm are to be the focus of future research.


Valences Estimation for Spanish Sentiment Analysis using A Genetic Algorithm

Kevin Mejía1 and Yulia Ledeneva2 and René García3, 1Autonomus University of the State of Mexico, 2Ph. D. Autonomous University of the State of Mexico, 3Ph. D. Autonomous University of the State of Mexico

ABSTRACT

The analysis of opinions, in microblogs such as Twitter, has been a task that has acquired great interest due the large number of unstructured opinions that are not analyzed automatically. To address the above Sentiment Analysis (SA) is applied. In SA are three approaches: through lexicons, through machine learning, and a combination of the above called hybrid approach. This article presents a method which automatically estimates the valences of the words of a Spanish language lexicon using a Genetic Algorithm (GA), using such valences as training characteristics for Support Vector Machines (SVM). The proposed method was tested in the corpus of opinions in Spanish (COST). Evaluation has been carried out on three main measures: precision, recall and harmonic measures between the previous ones (FMeasure). The results obtained from the experiments carried out with the implemented method showed a great improvement in the classification task.

KEYWORDS

Genetic Algorithm, lexicons, machine learning, support vector machines, hybrid approach


Opinion Mining and Fine-Tuning Pre-Trained Bert on User Reviews of Gaming Apps in Hungary

Aadil Gani Ganie And Samad Dadvandipour, Institute Of Information Sciences, University of Miskolc, Hungary

ABSTRACT

Pre-trained transformer-based models are very useful for multiple NLP tasks, Pre-trained BERT model with fine-tuning has been proposed in this paper for opinion mining with unique dataset from user reviews of gaming apps in Hungary. Fine-tunning of parameters like learning rate, domain knowledge, number of epochs, batch size and catastrophic forgetting was done, we observed that our model is best predicting the sentiment class of a review with following values for above mentioned parameters, learning rate 2e-5, since BERT was trained on general domain, we changed it to specific domain, number of epochs 10 and batch size 16. Neutral comments showed less accuracy due to which best validation accuracy we achieved is 84% while as training accuracy went up to 97%.

KEYWORDS

BERT, NLP, Fine tuning, Deep learning, Transformer


An NLP-Based Reconciliation Method for CVE Reports

Igor Khokhlov, Ahmet Okutan, Ryan Bryla, Steven Simmons and Mehdi Mirakhorli, Department of Software Engineering, Rochester Institute of Technology, Rochester, New York, USA

ABSTRACT

Common Vulnerability and Exposure (CVE) reports play an important role in understanding how software vulnerabilities impact the overall security of various systems. The lifecycle of CVE continues after its discovery in the form of updates. A CVE report has various fields, such as a unique identifier or description. During CVE scraping from multiple sources, there is a chance to acquire the same CVE from different databases or get a CVE that is already in the database. These instances of the same CVE may differ in the content of their field, and the decision which fields to keep and which to update has to be made. This decision-making process is called CVE reconciliation. This paper presents a novel approach to the CVE reconciliation that is based on a combination of Natural Language Processing (NLP) techniques and expert system (ES) rules. These methods can be used as a part of a vulnerability management system that constantly and automatically acquires thousands of new CVEs and maintains old CVE reports. The paper renders novel application of NLP and ES techniques in CVE reports maintaining, develops the new methodology of CVE reconciliation, analyzes developed model’s performance, and validates the developed techniques in real-life use cases.

KEYWORDS

Common Vulnerability and Exposure, Natural Language Processing, Expert System.


Supervised Machine Learning Approaches for Sentiment Analysis on a Movie Review

Ojonukpe S. Egwuche1, Micheal O. Ajinaja2, Kolawole O. Adekunle3 & Israel D. Haruna4, 1Department of Computer Science, Federal Polytechnic, Ile-Oluji, Ondo State, Nigeria, 2Department of Computer Science, Federal Polytechnic, Ile-Oluji, Ondo State, Nigeria, 3Department of Computer Science, Federal Polytecnic, Ile-Oluji, Ondo State, 4Nigeria Road and Building Research Institute, Ota, Ogun State, Nigeria

ABSTRACT

Feedback from consumers/customers in forms of reviews provides a pool of ideas that are of immense importance to the promotional policies of any business.Developments in Information Technology have made personal blogs and online review sites possible as opinion resources for customers to access and assess the opinions of others to decide whether to buy a product or not.E-commerce websites such as Amazon, Alibaba, Jumia and social media websites such as Twitter, Facebook, etc. are widely used for effectivecommunication of viewpoints. Assignment of sentiments either positive or negative can assistusers to make informed decisions in their product selection and companies to understand their customers. Sentiment analysis is a complex problem which can be solved with either machine learning techniques with labeled data and unsupervised machine learning technique with unlabeled data.

KEYWORDS

Sentiment analysis, Movie Review Mining, Machine Learning.


The Impact of CrowdSourcing on the Performance of Concatination of Word2Vec and Glove Algorithms

Mohammad Jafarabad, Department of Computer Engineering, Qom University, Iran

ABSTRACT

Deep learning algorithms have been effective in recognizing lexical similarity. In cases where there is not enough standard labeled primary data, the volume of data can be increased by crowdsourcing. For machine learning tasks, they also combine the results of crowdsourcing, so that we have more and more accurate gold data. In this study, we combined crowdsourcing with deep learning. Crowdsourcing did the labeling process for us, and we achieved good accuracy in identifying pairs of entities.

KEYWORDS

crowdsourcing, Deep learning, gold data, labeling, word2vec.


A color image blind digital watermarking algorithm based on QR code

Xuecheng Gong and Wanggen Li, School of Computer and Information, Anhui Normal University, Anhui Wuhu, China

ABSTRACT

The current color image digital watermarking algorithm has the problem of low robustness. Aiming at this problem, a color image blind digital watermarking algorithm based on QR code is proposed. The algorithm combines Discrete Wavelet Transform (DWT) and Discrete Cosine Transform (DCT). First, the color image was converted from RGB space to YCbCr space, and the Y component was extracted and the secondlevel discrete wavelet transform is performed; secondly, the LL2 subband was divided into blocks and carried out discrete cosine transform; finally, used the embedding method to embed the Arnold transform watermark information into the block. The experimental results show that the PSNR of the color image embedded with the QR code is 56.7159 without being attacked. After being attacked, its PSNR is more than 30dB and NC is more than 0.95. It is proved that the algorithm has good robustness and can achieve blind watermark extraction.

KEYWORDS

QR Code, Color Image, Arnold Transform, DWT.


Practical Examples of Signal and System Modeling and Simulation

Elvir Cajic, Department of Faculty of Electrical Engineering Technical Education and Informatics Tuzla Bosnia and Herzegovina

ABSTRACT

Modeling is the process of making a system model, ie. implies the process of collecting and organizing knowledge about a given system. Experimental modeling, which is also called identification, is based on experimentation over inputs, ie. outputs of the real system. In practical cases, combined modeling is most often used. Simulation is an experimental technique for solving problems related to the system, but with the help of a realized system model. Models can generally be classified into two basic groups of physical and abstract models. According to the nature of the change of state, continuous, discrete and combined models can be divided into three large groups. In this paper, mathematical models of the system using equations and inequalities will be presented and simulations will be performed in Matlab software.

KEYWORDS

Modeling and simulations; mathematical model and Matlab.


An Experience on Enhancing Machine Learning Classifier Against Low-Entropy Packed Malwares

Shang-Wen Chen, Tzu-Hsien Chuang, Chin-Wei Tien and Chih-Wei Chen, Cybersecurity Technology Institute, Institute for Information Industry, Taipei, Taiwan R.O.C

ABSTRACT

Both benign applications and malwares would take packing for their different purposes to conceal the real part of the program processes. According to recent research reports, existing machine learning (ML) approach-based malware detection engines are difficult to effectively classify the packed malwares, especially when they are in low entropy packed. Recently, we counted and found that the ratio of low-entropy packed ransomware is extremely high. This would cause a high error rate of the result on currently used ML approaches. Thus, we propose an enhancement of entropy-related features and use a stack model to build up an ML malware engine to effectively detect low-entropy packed malwares. We evaluate our method by using over 15,000 malware samples collected from VirusTotal and compare the result to related researches. This experience reports our adopted model and features can significantly lower the error rate of low-entropy packed detection from 11% to 1%.

KEYWORDS

malware detection, low-entropy packing, machine learning classification.