Randy Klepetko and Ram Krishnan, Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, Texas, USA
Convolutional Neural Networks (CNN) continue to revolutionize image recognition technology and are being used in non-image related fields such as cybersecurity. They are known to work as feature extractors, identifying patterns within large data sets, but when dealing with nonnatural data, what these features represent is not understood. Several class activation map (CAM) visualization tools are available that assist with understanding the CNN decisions when used with images, but they are not intuitively comprehended when dealing with nonnatural security data. Understanding what the extracted features represent should enable the data analyst and model architect tailor a model to maximize the extracted features while minimizing the computational arameters. In this paper we offer a new tool Model integrated Class Activation Maps, (MiCAM) which allows the analyst the ability to visually compare extracted feature intensities at the individual layer detail. We also explore using this new tool to analyse several sets of data. First the MNIST handwriting data set to gain a baseline understanding. We then follow up in analysing two security data sets: computers process metrics from cloud based application servers that are infected with malware and the CIC-IDS-2017 IP data traffic set and identify how reordering nonnatural security related data affects feature extraction performance.
Convolutional Neural Networks, Security, Malware Detection, Visualizations, Deep Learning.
Anishka Duvvuri, Navya Kovvuri, Sneka Kumar, Rebecca Victor and Tanush Kaushik, Jeeva Health, USA
Anxiety is a chronic illness especially during the Covid and post-pandemic era. It’s important to diagnose anxiety in its early stages. Traditional Machine learning (ML) methods have been developmental intense procedures to detect mental health issues, but Automated machine learning (AutoML) is a method whereby the novice user can build a model to detect a phenomenon such as Generalized Anxiety Disorder (GAD) fairly easily. In this study we evaluate a popular AutoML technique with recent chat engine (Discord) conversation dataset using anxiety hashtags. This multi-symptom AutoML Random Forest predictive model is at least 75+% accurate with the most prevalent symptom, namely restlessness. This could be a very useful first step in diagnosing GAD by medical professionals and their less skilled hospital’s IT area using pre diagnostic textual conversations. But it lacks high quality in predicting GAD in most symptoms as found by a low 50% precision on most symptoms (except 5). The AutoML technology is quicker for IT professionals and gives a decent performance, but it can be improved upon by more sophisticated ANN methods like Convolution neural networks that plug AutoML’s symptom’s deficiencies with at least 80+% precision and 0.4+% in F1 score, namely in detecting poorly predicted symptoms of concentration and irritability.
General Anxiety Disorder, machine learning, Discord chat. AutoML, Convolutional neural network.
moseli motsoehli, Anton Nikolaev, Peter Sadowski, John Lynham, University of Hawaii at Manoa, USA
Fish stock assessment typically involves a lot of manual counting that requires specialists, and so is both time consuming and costly. We propose a method to perform both taxonomic classification and size estimation of fish, for the purpose of building an automated stock assessment system that uses a standard camera. We use a subset(single fish images), of a large dataset generously provided by the Nature Conservancy, and fisheries in Indonesia. We achieve a 91% top-1 classification accuracy, and 2.3cm mean error on size estimation, and propose ways to perform stock assessment on images with multiple fish. The resulting models are made published on Hugging Face models.
Computer Vision, Fish Stock Estimation, Classification, Size Estimation.
Jacob Simon Bernardo, Emanuel Franz Divinagracia and Kendra Kirsten Go, Department of Information Systems and Computer Science, Ateneo de Manila University, Quezon City, Philippines
Drinking water must be free from all offensive odors, tastes, or turbidity, and must have a consistent quality that is safe for human consumption. The presence of certain toxins and other contaminants in water might cause health concerns such as neurological disorders, reproductive problems, and gastrointestinal illness. This paper aims to compare the nine machine learning algorithms in terms of predicting water potability: Random Forest, Extra Trees, Naïve Bayes, KNN, Support Vector Machine, Gradient Boost, AdaBoost, Bagging, and Voting. The dataset used was retrieved from Kaggle and includes information on the levels of certain chemicals present in water such as aluminium, chloramine, nitrate, and uranium. Five fold cross-validation was implemented to optimize the hyperparameters of each model and evaluate their performance on new data. The results of the experiments show that Random Forest Classifier performed better than all the other aforementioned methods.
Computer Vision, Machine Learning Techniques, Water Potability Classification, Ensemble Machine Learning.
Max Berre, Business and Society Department, Audencia Business School, Nantes, France and Jean Moulin Lyon 3, Université de Lyon, Lyon, France
While startup-valuations are influenced by revenues, risks, and macroeconomic conditions, specific causality is a black box. Because valuations are not disclosed, roles played by other factors (industry, geography, and intellectual property) can often only be guessed at. VC valuation research indicates the importance of establishing a factor-hierarchy to better understand startup valuations and their dynamics, suggesting the wisdom of hiring data-scientists for this purpose. Bespoke understanding can be established via construction of hierarchical prediction models. These have the advantage of understanding which factors matter most. In combination with OLS, they also tell us circumstances of when specific causalities apply. This study explores deterministic roles of categorical-variables on valuation of start-ups (i.e. the joint-combination geographic, urban, and sectoral denomination-variables), in order to be able to build a generalized valuation scorecard approach. This study relies on regressions as well as analysis and exploration of divergent micro-populations by combining econometric and machine-learning techniques.
Valuation, Startup Valuation, Venture Capital, Entrepreneurial Finance, Machine Learning, Decision Tree, Random Forest, Hierarchical Analysis.
O P Joy Jefferson1 and Kota Sai Akshitha Reddy2, 1PES University, 2Birla Institute of Technology
The process of discovering and bringing a drug to the market is historically limited due to the sheer cost and time for development and validation of a new candidate. A significant number of drugs found on the market today were detected using drug screening assays with randomly growing forest techniques. Random forest is a useful, machine learned method where the activity in each tree might be inferred from the past activities of that tree, as well as those in surrounding trees. In this paper, the authors focus on using random forests to predict in vitro drug sensitivity. The authors investigate the primary causes of drug sensitivity prediction and summarise the methods uses. The paper is concluded with the reasoning for using alternate methods.
Machine Learning, Drug Sensitivity, Random Forests.
Hajar Baghcheband, Carlos Soares and Luis Paulo Reis, FEUP - Faculty of Engineering, LIACC-Artificial Intelligence & Computer Science Laboratory, University of Porto
Distributed environments, such as multi-agent systems, where each agent is a data source, have increased interest in transferring learning to nodes. In such cases, local models must be individually built by each agent. Agents handle locally collected and processed data. This leads to them being ineffective and inaccurate when the penetration of their local data is low. Machine Learning data markets (MLDMs) have been proposed to address this issue by exchanging relevant data to improve model performance. The preliminary results of MLDM indicated that exchanging data leads to better models however, the analysis was based on a system with only two agents. In this paper, we present extended results that provide further evidence in favor of that conclusion. The purpose of this paper is to examine how data exchange can enhance each agent’s data and improve their learning models. Each agent performs daily learning to achieve equilibrium. Under a simple trading strategy, agents can communicate without cost and trade some data. In order to understand the system’s behavior during data exchange, we study the various numbers of populations in society. We conclude that data exchange will improve learning performance even in a small society with a small portion of data.
Machine Learning, Data Market, Data Exchange, Collaborative Agent-Based Learning, Incentive Mechanism, Multi-Agent System.
Laura L. Pullum, Mathematics and Computer Science Division, Oak Ridge National Laboratory, Bethel Valley Road, Oak Ridge, USA
Reinforcement learning (RL) has received significant interest in recent years, primarily because of the success of deep RL in solving many challenging tasks, such as playing chess, Go, and online computer games. However, with the increasing focus on RL, applications outside gaming and simulated environments require an understanding of the robustness, stability, and resilience of RL methods. To this end, we conducted a comprehensive literature review to characterize the available literature on these three behaviors as they pertain to RL. We classified the quantitative and theoretical approaches used to indicate or measure robustness, stability, and resilience behaviors. In addition, we identified the actions or events to which the quantitative approaches attempted to be stable, robust, or resilient. Finally, we provide a decision tree that is useful for selecting metrics to quantify behavior. We believe that this is the first comprehensive review of stability, robustness, and resilience, specifically geared toward RL.
Reinforcement Learning, Resilience, Robustness, Stability.
Riasat Abbas, Department of Computer Science and Communications, Stockholm KTH (Royal Institute of Technology, Stockholm, Sweden
Virtual worlds are generally running on windows platforms with the support of DirectX API. For executing the virtual worlds on these platforms need high performance CPU, memory, GPU, and power. People are using the virtual worlds or metaverse in different environments such as home, hotels, internet cafes, social community center and business offices. It is very big advantage to execute the virtual worlds on mobiles, tablets, IoT and CE devices and removing the keeping an extra computers and screens in each room of the hotel, internet caffe, social community center living and offices room. This paper compares the available market 3D virtual worlds streaming systems and purpose a novel cross-platform system for distributed 3D virtual worlds in wired or wireless local networks and cloud computing setup. We introduce the novel system architecture which used to streaming the virtual worlds graphics data across the network to remote devices using the browser.
Bill Xu1 and Yu Sun2, 1École Internationale de Montréal, 11 Cm. de la Côte-Saint-Antoine, Westmount, QC H3Y 2H, 2California State Polytechnic University, Pomona, CA, 91768, Irvine, CA 92620
Recent years have witnessed the dramatic popularity of cryptocurrencies, in which millions invest to join the cryptocurrency community or make financial gains . Investors employ many ways to analyze a cryptocurrency, from a purely technical approach to a more utility-centred approach . However, few technologies exist to help investors find cryptocurrencies with bright prospects through social metrics, an equally if not more important viewpoint to consider due to the importance of communities in the space. This paper proposes an application to evaluate cryptocurrencies based on social metrics by establishing scores and models with machine learning and other tools . We verified the need for our application through surveys, applied it to test investment strategies, and conducted a qualitative evaluation of the approach. The results show that our tool benefits investors by providing them with a different lens to view cryptocurrencies and helps them make more thorough decisions.
Cryptocurrencies, Machine learning, Analysis, Application.
Nicole Ma1 and Yu Sun2, 1Sage Hill School, 20402 Newport Coast Dr, Newport Coast, CA 92657, 2California State Polytechnic University, Pomona, CA, 91768, Irvine, CA 92620
Instances of hate speech on popular social media platforms such as Twitter are becoming increasingly common, and recent studies have shown that online hate speech is correlated to hate crimes and real-life harassment . However, there still exists a lack of comprehensive deep-learning models to combat Twitter hate speech . Previous papers on the topic have shown that published natural language processing and deep learning models often do not perform well in terms of classifying hate speech . In this project, a comprehensive detection and reporting platform, entitled “TweetWatch,” was created to solve this issue. A binary classification CNN (Convolutional Neural Network) and a multi-class CNN were created to detect hate speech from real-time Twitter data and classify tweets with hate speech into five categories. Web scraping facilitated by the Twitter API, as well as Dash by Plotly were utilized to create a real-time choropleth map of the United States with respect to the amount of hate speech in each state . The results show that the novel deep learning networks are more accurate in terms of AUC and F1 Scores than previous models in multiple papers. The binary classification model had an AUC score of 98.95% and an F1 score of 97.88%. The multi-class classification model returned an AUC score of 89.46%. All metrics reached over a targeted 5% increase from previous papers, validating the proposed solution. Additionally, the only real-time choropleth map for hate speech in the United States was successfully created.
Web scraping, Natural language processing, Deep learning, Neural networks.
Kerry Zhang1 and Yu Sun2, 1University High School, 4771 Campus Drive Irvine, CA 92612, 2California State Polytechnic University, Pomona, CA, 91768, Irvine, CA 92620
We aim to tackle the issue of improving the global situation regarding climate change by creating a mobile application named Climerry, which educates its users on recent news related to climate on the home screen. Climerry also features a second tab that allows users to view opportunities to improve the climate change situation in the vicinity by typing in a ZIP code or city name . Some examples of opportunities include beach cleanups and tree-planting sessions. By informing and encouraging the general public to become more involved in the effort to preserve our planet, the negative effects of climate change may be much less significant in the future.
To prove the effectiveness of this application in encouraging the general public to take action against climate change, one experiment was performed to gauge how much knowledge regarding climate change the participants had gained by using the application . Another experiment tested the reliability of the news API used in the application by testing the accuracy of information in each of the selected articles in the featured news section of the application. The result of the experiments indicated that the application is useful when it comes to providing accurate news and educating its users on the topic of climate change.
Climate Change, News, Global Warming, Social Issue.
Hui Hu, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Resource Description Framework (RDF) is designed as a standard metadata model for data interchange on the Internet. Because of machine comprehensibility, it has been successfully used in many areas, such as the intelligent processing of numerous data. While the generation of RDF with relational database (RDB) receives much attention, little effort has been put into the automatic construction of RDF with HBase due to its flexible data structure. Since more data is stored in HBase, it is necessary to extract useful information from HBase. Here, We put forward formal definitions of RDF and HBase and propose our strategy for generating RDF with HBase. We develop a prototype system to create RDF, and test results demonstrate the feasibility of our method.
Semantic Web, RDF, HBase, Construction.
Diwanshu Shekhar1, Rohola Zandie2, Dr.Mohammad Mahoor2, 1Department of Computer Science, University of Denver, Denver, Colorado (USA), 2Department of Electrical Engineering,University of Denver,Denver, Colorado (USA)
Reasoning is one of the most important elements in achieving Artificial General Intelligence (AGI), specifically when it comes to Abductive and Counterfactual reasoning. In order to introduce these capabilities of reasoning in Natural Language Processing (NLP) models, there have been recent advances towards training NLP models to better perform on two main tasks - Abductive Natural Language Inference NLI and Abductive Natural Language Generation Task NLI. This paper proposes CoGen, a model for both NLI and NLG tasks that employ a novel approach of combining the temporal commonsense reasoning for each observation (before and after a real hypothesis) from pre-trained models with contextual filtering for training. Additionally, we use state-of-the-art (SOTA) semantic entailment to filter out the contradictory hypothesis during the inference. Our experimental results show that CoGen outperforms the current models and set a new state of the art in regards to NLI task. In terms of NLG task, CoGens performance is very comparable to the current SOTA model showing the NLP community a different research direction towards the same objective. We also perform human evaluation of CoGen. We make the source code of CoGen model publicly available for reproducibility and to facilitate relevant future research.
Abductive Commonsense Reasoning, Abduction, Natural Language Inference, Natural Language generation.
Yujie Xing and Jon Atle Gulla, Norwegian University of Science and Technology, Norway
Despite the rapid progress of open-domain generation-based conversational agents, most deployed systems treat dialogue contexts as single-turns, while systems dealing with multi-turn contexts are less studied. There is a lack of a reliable metric for evaluating multi-turn modelling, as well as an effective solution for improving it. In this paper, we focus on an essential component of multi-turn generation-based conversational agents: context attention distribution, i.e. how systems distribute their attention on dialogue’s context. For evaluation of this component, We introduce a novel attention-mechanism-based metric: DAS ratio. To improve performance on this component, we propose an optimization strategy that employs self-contained distractions. Our experiments on the Ubuntu chatlogs dataset show that models with comparable perplexity can be distinguished by their ability on context attention distribution. Our proposed optimization strategy improves both non-hierarchical and hierarchical models on the proposed metric by about 10% from baselines.
Natural Language Processing, Response Generation, Dialogue System, Conversational Agent, Multi-Turn Dialogue System.
Lina Lumburovska, Vesna Dimitrova, Aleksandra Popovska-Mitrovikj, Ss. Cyril and Methodius University of Skopje, Faculty of Computer Science and Engineering, Skopje, North Macedonia.
The Blockchain technology and decentralized systems have a huge positive impact on everyday life. Their usage becomes more popular in different areas where the electronic voting takes one of the top places. This paper gives the implementation of a newly proposed electronic voting system based on Blockchain using ECDSA with blind signatures. Additionally, the system is compared with other electronic voting systems based on Blockchain technology. Mainly these types of systems hardly ever fulfill the scalability. Nevertheless, the system has an advantage in comparison with the other systems. Since the idea of the Blockchain technology is to show the flexibility and equal privileges to all nodes, this implementation with Angular and Spring Boot shows that so everyone can track the chain. To sum up, this implementation can have a good usage in smaller departments, because of the performances and all mathematical operations.
Blockchain technology, ECDSA, e-voting, blind signatures.
Tamara Al-Masri, Ahmed Al-Tamimi and Muawya Al-Dalaien, Department of Computer Science, Princess Sumaya University for Technology, Jordan.
Agile Methodologies are one of the most effective methodologies, as it provides change and modification at each phase of the software life cycle according to the clients requirements. Still, it lacks security in those phases which makes it vulnerable to attacks or makes the project a liability to the housing corporate. The constantly changing customer demands, miscommunication, and misinterpretation between the task owner and the customer may lead to overwhelming the corporate’s resources and draining the project’s allocated time and budget. This can be mitigated by appending certain practices to the implemented development framework. In this paper, we candidate an enhanced Extreme Programming (XP) framework, one of the most common methodologies of Agile without the need to extend the process. These practices address the agonizing issue of redeveloping certain parts or all of the project and the neglect of the safety aspect throughout the development process.
Yulin Zhang1, Yu Sun2, 1University High School, 4771 Campus Drive. Irvine, CA 92612, 2California State Polytechnic University, Pomona, CA, 91768, Irvine, CA 92620
NLP-learning, distraction, Auto-block.
Sepp Rothwangl1 and Jan Lauth2, 1CALENdeSIGN, Vienna, Austria and 2University of Applied Art, Vienna.
The uneven insertion of leap seconds into UTC due to the difference to UT1 caused by the fluctuating earth rotation causes unexpected problems in the future, because earth movement even accelerates recently from as yet unknown cause. This article proposes a new epoch for process-controlled time scales.
Leap second, UTC, UT1, TAI, MJD, new epoch, Earth’s rotation.
Wang Ruixia and Hu Jiankun, Institute of Logistics Science and Engineering, Shanghai Maritime University, Shanghai, China
In order to explore the development efficiency of coastal ports, this paper uses the improved four-stage super efficiency DEA method and Malmquist index to analyze the efficiency of 20 major ports in the Yangtze River Delta port group from static and dynamic perspectives based on the input and output data of 20 major ports in the Yangtze River Delta port group from 2015-2020. The results show that: (1) Lianyungang Port, Suzhou Port and Nanjing Port are in the preface of efficiency. (2) The resource allocation of Shanghai Port and Ningbo-Zhoushan Port is in a balanced state in recent years. (3) From the dynamic point of view, the improvement of total factor productivity of most ports in the Yangtze River Delta is mainly attributable to changes in technical efficiency and technological progress.
Yangtze River Delta port cluster; Four stages; Super-efficiency DEA; Dynamic efficiency evaluation .
Joerg H. Hardy, Department of Philosophy, Free University of Berlin, Germany
Autonomous robots will need to form relationships with humans that are built on reliability and (social) trust. The source of reliability and trust in human relationships is (human) ethical competence, which includes the capability of moral decision-making. As autonomous robots cannot act with the ethical competence of human agents, a kind of human-like ethical competence has to be implemented into autonomous robots (AI-systems of various kinds) by way of ethical algorithms. In this paper I suggest a model of the general logical form of (human) meta-ethical arguments that can be used as a pattern for the programming of ethical algorithms for autonomous robots.
AI Algorithms, Ethical Algorithms, Ethics of Artificial Intelligence, Human-Robot-Interaction.
Muhamad Adib Bahari, Shah Runnizam Mohd Salleh and Muhammad KamalAbdul Kiram, Information Technology & Analytics, TNB Research Sdn. Bhd., Selangor, Malaysia
Augmented Reality is a combination of a real-world and a computer-generated environment that can enhance the operation and maintenance experience. This paper elaborates on the implementation of Augmented Reality to help monitor the health and condition of selected assets in TNB Research (TNBR) Data Center. Several data integrations with Knowledge Base AI module assist in the decision-making and troubleshooting process. This research focuses on asset maintenance and monitoring that requires frequent maintenance and close monitoring.
Augmented Reality, Knowledge Base AI, Asset Maintenance, Monitoring System
Zhiyu Huang1,2, 1School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, China, 2Hunan Key Laboratory for Service Computing and Novel Software Technology, Xiangtan, China
With the development of electric vehicles, more and more electric vehicles have difficulties in parking and charging. One of the reasons is that the number of charging piles is difficult to support the energy supply of electric vehicles, and a large number of private charging piles have a long idle time, so the energy supply problem of electric vehicles can be solved by sharing charging piles. The shared charging pile scheme uses Paillier encryption scheme and improved scheme to effectively protect user data. The scheme has homomorphism of addition and subtraction, and can process information without decryption. However, considering that different users have different needs, the matching is carried out after calculating the needs put forward by users. This scheme can effectively protect users privacy and provide matching mechanisms with different requirements, so that users can better match the appropriate charging piles. The final result shows that its efficiency is better than the original Paillier scheme, and it can also meet the security requirements.
Private charging pile sharing service, Privacy protection, Demand analysis, Homomorphic encryption, Internet of things.
Ronke S. Babatunde1, Oyeranmi A. Adigun2, 1Department of Computer Science, Kwara State University, Malete. Nigeria, 2Department of Computer Science Yaba College of Technology Lagos, Nigeria
Cerebrospinal meningitis (CSM) is characterized by acute severe infection of the central nervous system causing inflammation of the meninges with associated morbidity and mortality. The information about its symptoms, time and season of spread, most affected region, its fatality rate, type and how easily it causes major disabilities in patients can be modelled and utilized in its treatment, and prevention. This research uses data mining techniques to predict the outbreak of CSM in terms of those liable to be infected by the disease using feature information about the region and the patient. It involves the following stages; data acquisition, data preprocessing, data exploration, algorithm training and evaluation, prediction and web hosting. The intention is to help in managing the resources needed for both treatment and prevention. The outcome of the research indicated that the proposed technique is viable for the task, considering the number of correct predictions that was reported when the application was deployed and tested.
Meningitis, Logistics Regression, Prediction, Machine Learning.
Copyright © MLSC 2023