In this book chapter, we introduce a framework leveraging the intelligence of the crowd to improve the quality, credibility, inclusiveness, long-term impact and adoption of research, particularly in the academic space. This integrated platform revolves around a central knowledge graph (KG) which interacts through artificial intelligence (AI) algorithms with the community. In combination with Internet of Things (IoT) technology and blockchain, a highly productive environment including liquid governance and arbitration is created to fairly acknowledge and attractively incentivize contributions of valuable intellectual property (IP) to this knowledge base. In the proposed platform, various stakeholders customize their terms of agreement to be followed while validating the transactions on blockchain, known as smart contracts. Through the interaction of smart contracts and stakeholders, the agreement based on objective (scientific) criteria will gradually emerge from the simulated interaction and, if applicable, its experimental/empirical verification. © The Institution of Engineering and Technology 2023.
The ever-increasing rate of urban population and latest technological advances including the IoT, sensors, big data, cloud computing and data analytics has replaced the standard methods of service delivery to the citizens. The IoT devices collect real-time and integrated data by monitoring an individual’s daily activities with the aim of providing efficient services including but not restricted to smart transportation, waste management, personalized healthcare and recommendations. As personal and sensitive information is being collected by these devices, security and privacy challenges are crucial paradigms for concern. While safety and privacy have always been significant study areas, there is a need for a broader perspective to protect personal data with evolving technological challenges. This chapter introduces the security and privacy issues faced by the existing infrastructure. Some case studies are discussed with the measures undertaken for data privacy and security. The chapter concludes with open research challenges grounded on security and privacy. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021.
Construction of smart cities is no longer a future endeavor. Even though the implementation of smart city comes with enormous conveniences, the realistic implementation is challenged in different aspects. Two of the major aspects along with the design, maintenance and implementation costs are privacy and security. The frameworks introduced for smart city impose many challenges regarding privacy and security of the citizens. Open networks, smart phones, computers etc. are used for the communication in the smart city, making the sensitive data vulnerable to attacks. It is also vital to deal with the privacy issues. Thus, maintaining security and ensuring the privacy in the smart city is necessary and turning out as an open challenge. The present paper proposes Cloud Data Security Model (CDSM) for the better security of data using the cloud storage mechanism. The CSDM proposes four different categories of cloud accounts with special permissions to access the data. Moreover, with the data access record, the owner is completely aware of who is accessing the data. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021.
Computational cost in metaheuristics such as Evolutionary Algorithm (EAs) is often a major concern, particularly with their ability to scale. In data-based training, traditional EAs typically use a significant portion, if not all, of the dataset for model training and fitness evaluation in each generation. This makes EA suffer from high computational costs incurred during the fitness evaluation of the population, particularly when working with large datasets. To mitigate this issue, we propose a Machine Learning (ML)-driven Distance-based Selection (DBS) algorithm that reduces the fitness evaluation time by optimizing test cases. We test our algorithm by applying it to 24 benchmark problems from Symbolic Regression (SR) and digital circuit domains and then using Grammatical Evolution (GE) to train models using the reduced dataset. We use GE to test DBS on SR and produce a system flexible enough to test it on digital circuit problems further. The quality of the solutions is tested and compared against state-of-the-art and conventional training methods to measure the coverage of training data selected using DBS, i.e., how well the subset matches the statistical properties of the entire dataset. Moreover, the effect of optimized training data on run time and the effective size of the evolved solutions is analyzed. Experimental and statistical evaluations of the results show our method empowered GE to yield superior or comparable solutions to the baseline (using the full datasets) with smaller sizes and demonstrates computational efficiency in terms of speed. Copyright © 2024 Gupt, Kshirsagar, Dias, Sullivan and Ryan.
Grammar is a key input in grammar-based genetic programming. Grammar design not only influences performance, but also program size. However, grammar design and the choice of productions often require expert input as no automatic approach exists. This research work discusses our approach to automatically reduce a bloated grammar. By utilizing a simple Production Ranking mechanism, we identify productions which are less useful and dynamically prune those to channel evolutionary search towards better (smaller) solutions. Our objective in this work was program size reduction without compromising generalization performance. We tested our approach on 13 standard symbolic regression datasets with Grammatical Evolution. Using a grammar embodying a well-defined function set as a baseline, we compare effective genome length and test performance with our approach. Dynamic grammar pruning achieved significantly better genome lengths for all datasets, while significantly improving generalization performance on three datasets, although it worsened in five datasets. When we utilized linear scaling during the production ranking stages (the first 20 generations) the results dramatically improved. Not only were the programs smaller in all datasets, but generalization scores were also significantly better than the baseline in 6 out of 13 datasets, and comparable in the rest. When the baseline was also linearly scaled as well, the program size was still smaller with the Production Ranking approach, while generalization scores dropped in only three datasets without any significant compromise in the rest. © 2023, The Author(s).
Neural networks have revolutionised the way we approach problem solving across multiple domains; however, their effective design and efficient use of computational resources is still a challenging task. One of the most important factors influencing this process is model hyperparameters which vary significantly with models and datasets. Recently, there has been an increased focus on automatically tuning these hyperparameters to reduce complexity and to optimise resource utilisation. From traditional human-intuitive tuning methods to random search, grid search, Bayesian optimisation, and evolutionary algorithms, significant advancements have been made in this direction that promise improved performance while using fewer resources. In this article, we propose HyperGE, a two-stage model for automatically tuning hyperparameters driven by grammatical evolution (GE), a bioinspired population-based machine learning algorithm. GE provides an advantage in that it allows users to define their own grammar for generating solutions, making it ideal for defining search spaces across datasets and models. We test HyperGE to fine-tune VGG-19 and ResNet-50 pre-trained networks using three benchmark datasets. We demonstrate that the search space is significantly reduced by a factor of ~90% in Stage 2 with fewer number of trials. HyperGE could become an invaluable tool within the deep learning community, allowing practitioners greater freedom when exploring complex problem domains for hyperparameter fine-tuning. © 2023 by the authors.
Health care interoperability unfolds the way for personalized health care services at a reduced cost. Furthermore, a decentralized system holds the promise to prevent compromises such as cyber-attacks due to data breaches. Hence, there is a need for a framework that seamlessly integrates and shares data across the system stakeholders. We propose SEquestered aNd SynergIstic BLockchain Ecosystem (SENSIBLE), a blockchain-powered, knowledge-driven data-sharing framework that gives patients complete control of their medical history and can extract rich information hidden in it using knowledge graphs (KGs). By incorporating both blockchain and KGs, we can provide a platform for secure data sharing among stakeholders by maintaining data privacy and integrity through data authentication and robust data integration. We present a Proof-of-Concept of the SENSIBLE network with Ethereum to share dynamic knowledge across stakeholders. Dynamic knowledge generation on the blockchain provides a two-fold advantage of cooperation and communication amongst the stakeholders in the health care ecosystem. This leads to operational ease through sharing relevant portions of complex information while also ensuring the isolation of sensitive medical data. © 2022 The Authors. Engineering Reports published by John Wiley & Sons Ltd.
This work investigates the potential for using Grammatical Evolution (GE) to generate an initial seed for the construction of a pseudo-random number generator (PRNG) and cryptographically secure (CS) PRNG. We demonstrate the suitability of GE as an entropy source and show that the initial seeds exhibit an average entropy value of 7.940560934 for 8-bit entropy, which is close to the ideal value of 8. We then construct two random number generators, GE-PRNG and GE-CSPRNG, both of which employ these initial seeds. We use Monte Carlo simulations to establish the efficacy of the GE-PRNG using an experimental setup designed to estimate the value for pi, in which 100,000,000 random numbers were generated by our system. This returned the value of pi of 3.146564000, which is precise up to six decimal digits for the actual value of pi. We propose a new approach called control_flow_incrementor to generate cryptographically secure random numbers. The random numbers generated with CSPRNG meet the prescribed National Institute of Standards and Technology SP800-22 and the Diehard statistical test requirements. We also present a computational performance analysis of GE-CSPRNG demonstrating its potential to be used in industrial applications. © 2022, The Author(s).
Over the past seven decades since the advent of artificial intelligence (AI) technology, researchers have demonstrated and deployed systems incorporating AI in various domains. The absence of model explainability in critical systems such as medical AI and credit risk assessment among others has led to neglect of key ethical and professional principles which can cause considerable harm. With explainability methods, developers can check their models beyond mere performance and identify errors. This leads to increased efficiency in time and reduces development costs. The article summarizes that steering the traditional AI systems toward responsible AI engineering can address concerns raised in the deployment of AI systems and mitigate them by incorporating explainable AI methods. Finally, the article concludes with the societal benefits of the futuristic AI systems and the market shares for revenue generation possible through the deployment of trustworthy and ethical AI systems.
In this article, we discuss a data sharing and knowledge integration framework through autonomous agents with blockchain for implementing Electronic Health Records (EHR). This will enable us to augment existing blockchain-based EHR Systems. We discuss how major concerns in the health industry, i.e., trust, security and scalability, can be addressed by transitioning from existing models to convergence of the three technologies – blockchain, agent-based modeling, and knowledge graph in a decentralized ecosystem. Each autonomous agent is responsible for instantiating key processes, such as user authentication and authorization, smart contracts, and knowledge graph generation through data integration among the participating stakeholders in the network. We discuss a layered approach for the design of the proposed system leading to an enhanced, safer clinical decision-making system. This can pave the way toward more informed and engaged patients and citizens by delivering personalized healthcare. Copyright © 2021 Yao, Kshirsagar, Vaidya, Ducrée and Ryan.
Virgin polymers based on petrochemical feedstock are mainly preferred by most plastic goods manufacturers instead of recycled plastic feedstock. Major reason for this is the lack of reliable information about the quality, suitability, and availability of recycled plastics, which is partly due to lack of proper segregation techniques. In this paper, we present our ongoing efforts to segregate plastics based on its types and improve the reliability of information about recycled plastics using the first-of-its-kind blockchain smart contracts powered by multi-sensor data-fusion algorithms using artificial intelligence. We have demonstrated how different data-fusion modes can be employed to retrieve various physico-chemical parameters of plastic waste for accurate segregation. We have discussed how these smart tools help in efficiently segregating commingled plastics and can be reliably used in the circular economy of plastic. Using these tools, segregators, recyclers, and manufacturers can reliably share data, plan the supply chain, execute purchase orders, and hence, finally increase the use of recycled plastic feedstock. © 2020 by the authors. Licensee MDPI, Basel, Switzerland.
Machine comprehension is a broad research area from Natural Language Processing domain, which deals with making a computerised system understand the given natural language text. Question answering system is one such variant used to find the correct ‘answer’ for a ‘query’ using the supplied ‘context’. Using a sentence instead of the whole context paragraph to determine the ‘answer’ is quite useful in terms of computation as well as accuracy. Sentence selection can, therefore, be considered as a first step to get the answer. This work devises a method for sentence selection that uses cosine similarity and common word count between each sentence of context and question. This removes the extensive training overhead associated with other available approaches, while still giving comparable results. The SQuAD dataset is used for accuracy based performance comparison. ©BEIESP.
To assess the quality of food is a major challenge the food industry faces today. It is of utmost importance to test it for contaminates and non-edible material that may be present. To overcome these challenges metagenomic classification is majorly useful. Several researches involve various classification techniques and their studies. Difficulties in metagenomic classification include increasing number of genomes thereby requirement of computational methods to have high speed as well as high accuracy so as to compare DNA sequences to genomes. Centrifuge is a classification tool for quantification of species present in a sample so as to monitor the quality of the same. Given a food sample Centrifuge effectively classifies the species present in it enabling a timely and accurate analysis. © BEIESP.
Prostate cancer (PCa) is the second most prevalent cancer among men worldwide, the majority affecting those over the age of 65. The Gleason Score (GS) remains the gold standard for diagnosing clinically significant prostate cancer (csPCa); however, traditional biopsy can lead to patient discomfort. Algorithmic bias in medical diagnostic models remains a critical challenge, impacting model reliability and generalizability across diverse patient populations. This study explores the potential of Machine Learning (ML) models—Logistic Regression (LR) and multiple DL models—as non-invasive alternatives for predicting the GS using Prostate Imaging Cancer AI challenge dataset. To the best of our knowledge, this is the first attempt to use two modalities with this dataset for risk stratification. We developed a LR model, excluding biopsy-derived features like GS, to predict clinically significant prostate cancer, alongside an image triage approach with convolutional neural networks to reduce biases in the ML workflow. Preliminary results from LR and ResNet50, showed test accuracies of 69.79% and 60%, respectively. These findings demonstrate the potential for explainable, trustworthy, and responsible risk stratification enhancing the robustness and generalizability of the prostate cancer risk stratification model.
Machine learning has diverse applications in various domains, including disease diagnosis in healthcare, user behavior analysis, and algorithmic trading. However, machine learning’s use in portfolio volatility predictions and optimization has only been recently explored and requires further investigation to prove valuable in real-world settings. We thus propose an effective method that accomplishes both these tasks and is targeted at people who are new to the realm of finance. This paper explores (a) a novel approach of using supervised machine learning with the Random Forest algorithm to predict portfolio volatility value and categorization and (b) a flexible method taking into account users’ restrictions on stock allocations to build an optimized and customized portfolio. Our framework also allows a diversified number of assets to be included in the portfolio. We train our model using historical asset prices collected over 8 years for six mutual funds and one cryptocurrency. We validate our results by comparing the volatility predictions against recent asset prices obtained from Yahoo Finance. The research underlines the importance of harnessing the power of machine learning to improve portfolio performance. © 2024 by SCITEPRESS - Science and Technology Publications, Lda.
The analysis of time efficiency and solution size has recently gained huge interest among researchers of Grammatical Evolution (GE). The voluminous data have led to slower learning of GE in finding innovative solutions to complex problems. Few works incorporate machine learning techniques to extract samples from big datasets. Most of the work in the field focuses on optimizing the GE hyperparameters. This leads to the motivation of our work, Adaptive Case Selection (ACS), a diversity-preserving test case selection method that adaptively selects test cases during the evolutionary process of GE. We used six symbolic regression synthetic datasets with diverse features and samples in the preliminary experimentation and trained the models using GE. Statistical Validation of results demonstrates ACS enhancing the efficiency of the evolutionary process. ACS achieved higher accuracy on all six problems when compared to conventional ‘train/test split.’ It outperforms four out of six problems against the recently proposed Distance-Based Selection (DBS) method while competitive on the remaining two. ACS accelerated the evolutionary process by a factor of 14X and 11X against both methods, respectively, and resulted in simpler solutions. These findings suggest ACS can potentially speed up the evolutionary process of GE when solving complex problems. © 2023 by SCITEPRESS – Science and Technology Publications, Lda.
Breast Cancer is the most prevalent cancer among females worldwide. Early detection is key to good prognosis and mammography is the most widely-used technique, particularly in screening programs. However, mammography is a highly-skilled and often time-consuming task. Deep learning methods can facilitate the detection process and assist clinicians in disease diagnosis. There has been much research showing Deep Neural Networks’ successful use in medical imaging to predict early and accurate diagnosis. This paper proposes a patch-based Convolutional Neural Network (CNN) classification approach to classify patches (small sections) obtained from mammogram images into either benign or malignant cases. A novel patch extraction approach method, which we call Overlapping Patch Extraction, is developed and compared with the two different techniques, Non-Overlapping Patch Extraction, and a Region-Based-Extraction. Experimentation is conducted using images from the Curated Breast Imaging Subset of Digital Database for Screening Mammography. Five deep learning models, three configurations of EfficientNet-V2 (B0, B2, and L), ResNet-101, and MobileNet- V3L, are trained on the patches extracted using the discussed methods. Preliminary results indicate that the proposed patch extraction approach, Overlapping, produces a more robust patch dataset. Promising results are obtained using the Overlapping patch extraction technique trained on the EfficientNet-V2L model achieving an AUC of 0.90. © 2023 by SCITEPRESS – Science and Technology Publications, Lda.
The ever-present challenge in the domain of digital devices is how to test their behavior efficiently. We tackle the issue in two ways. We switch to an automated circuit design using Grammatical Evolution (GE). Additionally, we provide two diversity-based methodologies to improve testing efficiency. The first approach extracts a minimal number of test cases from subsets formed through clustering. Moreover, the way we perform clustering can easily be used for other domains as it is problem-agnostic. The other uses complete test set and introduces a novel fitness function hitPlex that incorporates a test case diversity measure to speed up the evolutionary process. Experimental and statistical evaluations on six benchmark circuits establish that the automatically selected test cases result in good coverage and enable the system to evolve a highly accurate digital circuit. Evolutionary runs using hitPlex indicate promising improvements, with up to 16% improvement in convergence speed and up to 30% in success rate for complex circuits when compared to the system without the diversity extension. © 2022 Owner/Author.
This article explores assessing the impact of the decarbonisation of the transport sector using an evidence-based approach incorporating data analysis and advanced machine learning (ML) modelling. We investigate the radical behavioural and societal changes needed for the decarbonisation of the transport sector in Ireland. We perform a study through our system DECArbonisation in Road Transport (DECART), a suite of statistical and time series ML models for facilitating policy making, monitoring and advising governments, companies and organisations in the transport sector. Based on data analysis and through scenario-modelling approaches, we present alternatives to policy and decision makers to achieve goals in mitigation of carbon emissions in road transport. The models depict how changes in mobility patterns in road transport affect CO2 emissions. Through insights obtained from the models, we infer that renewable energy in Ireland has the potential for meeting the growing electricity needs of electric vehicles. Experimentation is conducted on real-world datasets such as traffic, motor registrations, and data from renewable sources such as wind farms, for building efficient ML models. The models are validated in terms of accuracy, based on their potential to capture hidden insights from real-world events and domain knowledge. Copyright © 2022 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.
Rapid growth in vehicular congestion increases the challenges of traffic management concerning pollution and infrastructure. Efficient traffic governance can have a significant impact on a country’s economy. To alleviate these challenges, we propose an intelligent integrated traffic management system that manages congestion through cost pricing models to achieve smooth traffic flow. We propose a novel rerouting algorithm and ensemble architecture for vehicle detection and classification, tested on live traffic captured in several Indian cities. The ensemble architectures are designed on a combination of existing pre-trained models. Choice of the ensembles is based on accuracy, model interpretability, and energy efficiency. We show that the second-best ensemble produced operates with significantly less energy and better explainability than our best performer and is still within 3% accuracy of the best performer. Based on predefined road priorities, these ensemble models provide traffic and individual vehicle counts, further fed to our proposed rerouting algorithm as input. The rerouting algorithm then recommends alternative routes and estimated journey time to the user. The paper also presents the results obtained by testing the models on real-time traffic videos from Aurangabad (India) on a GPU/CPU cluster consisting of machines incorporating different GPU hardware. © 2022 by SCITEPRESS - Science and Technology Publications, Lda. All rights reserved.
Deep learning (DL) networks have the dual benefits due to over parameterization and regularization rendering them more accurate than conventional Machine Learning (ML) models. However, they consume massive amounts of resources in training and thus are computationally expensive. A single experimental run consumes a lot of computational resources, in such a way that it could cost millions of dollars thereby dramatically leading to massive project costs. Some of the factors for vast expenses for DL models can be attributed to the computational costs incurred during training, massive storage requirements, along with specialized hardware such as Graphical Processing Unit (GPUs). This research seeks to address some of the challenges mentioned above. Our approach, HyperEstimator, estimates the optimal values of hyperparameters for a given Convolutional Neural Networks (CNN) model and dataset using a suite of Machine Learning algorithms. Our approach consists of three stages: (i) obtaining candidate values for hyperparameters with Grammatical Evolution; (ii) prediction of optimal values of hyperparameters with supervised ML techniques; (iii) training CNN model for object detection. As a case study, the CNN models are validated by using a real-time video dataset representing road traffic captured in some Indian cities. The results are also compared against CIFAR10 and CIFAR100 benchmark datasets. Copyright © 2022 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.
With the growing popularity of machine learning (ML), regression problems in many domains are becoming increasingly high-dimensional. Identifying relevant features from a high-dimensional dataset still remains a significant challenge for building highly accurate machine learning models. Evolutionary feature selection has been used for high-dimensional symbolic regression using Genetic Programming (GP). While grammar based GP, especially Grammatical Evolution (GE), has been extensively used for symbolic regression, no systematic grammar-based feature selection approach exists. This work presents a grammar-based feature selection method, Production Ranking based Feature Selection (PRFS), and reports on the results of its application in symbolic regression. The main contribution of our work is to demonstrate that the proposed method can not only consistently select the most relevant features, but also significantly improves the generalization performance of GE when compared with several state-of-the-art ML-based feature selection methods. Experimental results on benchmark symbolic regression problems show that the generalization performance of GE using PRFS was significantly better than that of a state-of-the-art Random Forest based feature selection in three out of four problems, while in fourth problem the performance was the same. © 2022 Owner/Author.
Symbolic Regression is sometimes treated as a multi-objective optimization problem where two objectives (Accuracy and Complexity) are optimized simultaneously. In this paper, we propose a novel approach, Hierarchical Multi-objective Symbolic Regression (HMS), where we investigate the effect of imposing a hierarchy on multiple objectives in Symbolic Regression. HMS works in two levels. In the first level, an initial random population is evolved using a single objective (Accuracy), then, when a simple trigger occurs (the current best fitness is five times better than best fitness of the initial, random population) half of the population is promoted to the next level where another objective (complexity) is incorporated. This new, smaller, population subsequently evolves using a multi-objective fitness function. Various complexity measures are tested and as such are explicitly defined as one of the objectives in addition to performance (accuracy). The validation of HMS is performed on four benchmark Symbolic Regression problems with varying difficulty. The evolved Symbolic Regression models are either competitive with or better than models produced with standard approaches in terms of performance where performance is the accuracy measured as Root Mean Square Error. The solutions are better in terms of size, effectively scaling down the computational cost. © 2022 Owner/Author.
The research work proposes a framework for checking the correctness of Galois field arithmetic operations in digital circuits. The authors propose to automatically generate the test cases from the user input, avoiding reliance upon predesigned test cases, comprising Galois field-width and respective choice of irreducible polynomial. We do this through the use of polynomial arithmetic to verify the circuits. To the best of author's knowledge, though extensive work has been carried out in optimising the performance of arithmetic operations in Galois field, there exist no testbench to evaluate the efficacy of hardware circuits incorporating this concept. By automating the process of generating test cases, the work can be scaled to test circuits of arbitrarily large field widths, thus providing a flexible architecture that guarantees correctness of the underlying design under test. We present simulation results for Galois field polynomials of width GF(22)), GF(24) and GF(28). This work can be applied to test and prevent intentional tampering of data bit stream and safeguarding it against malicious activities, especially in applications such as cryptography that heavily relies on Galois field arithmetic. © 2021 IEEE.
The advent of the Covid-19 pandemic has resulted in a global crisis making the health systems vulnerable, challenging the research community to find novel approaches to facilitate early detection of infections. This open-up a window of opportunity to exploit machine learning and artificial intelligence techniques to address some of the issues related to this disease. In this work, we address the classification of ten SARS-CoV-2 protein sequences related to Covid-19 using k-mer frequency as features and considering two objectives; classification performance and feature selection. The first set of experiments considered the objectives one at the time, four techniques were used for the feature selection and twelve well known machine learning methods, where three are neural network based for the classification. The second set of experiments considered a multiobjective approach where we tested a well known multi-objective approach Non-dominated Sorting Genetic Algorithm II (NSGA-II), and the Multi-dimensional Archive of Phenotypic Elites (MAP-Elites), which considers quality+diversity containers to guide the search through elite solutions. The experimental results shows that ResNet and PCA is the best combination using single objectives. Whereas, for the mulit-classification, NSGA-II outperforms ME with two out of three classifiers, while ME gets competitive results bringing more diverse set of solutions. © 2021 by SCITEPRESS - Science and Technology Publications, Lda.
AutoGE (Automatic Grammatical Evolution) is a tool designed to aid users of GE for the automatic estimation of Grammatical Evolution (GE) parameters, a key one being the grammar. The tool comprises of a rich suite of algorithms to assist in fine tuning a BNF (Backus-Naur Form) grammar to make it adaptable across a wide range of problems. It primarily facilitates the identification of better grammar structures and the choice of function sets to enhance existing fitness scores at a lower computational overhead. This research work discusses and reports experimental results for our Production Rule Pruning algorithm from AutoGE which employs a simple frequency-based approach for eliminating less useful productions. It captures the relationship between production rules and function sets involved in the problem domain to identify better grammar. The experimental study incorporates an extended function set and common grammar structures for grammar definition. Preliminary results based on ten popular real-world regression datasets demonstrate that the proposed algorithm not only identifies suitable grammar structures, but also prunes the grammar which results in shorter genome length for every problem, thus optimizing memory usage. Despite utilizing a fraction of budget in pruning, AutoGE was able to significantly enhance test scores for 3 problems. Copyright © 2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.
AutoGE (Automatic Grammatical Evolution), a new tool for the estimation of Grammatical Evolution (GE) parameters, is designed to aid users of GE. The tool comprises a rich suite of algorithms to assist in fine tuning BNF grammar to make it adaptable across a wide range of problems. It primarily facilitates the identification of optimal grammar structures, the choice of function sets to achieve improved or existing fitness at a lower computational overhead over the existing GE setups. This research work discusses and reports initial results with one of the key algorithms in AutoGE, Production Rule Pruning, which employs a simple frequency-based approach for identifying less worthy productions. It captures the relationship between production rules and function sets involved in the problem domain to identify optimal grammar structures. Preliminary studies on a set of fourteen standard Genetic Programming benchmark problems in the symbolic regression domain show that the algorithm removes less useful terminals and production rules resulting in individuals with shorter genome lengths. The results depict that the proposed algorithm identifies the optimal grammar structure for the symbolic regression problem domain to be arity-based grammar. It also establishes that the proposed algorithm results in enhanced fitness for some of the benchmark problems. © 2021 by SCITEPRESS - Science and Technology Publications, Lda.
The desire of human intelligence to surpass its potential has triggered the emergence of artificial intelligence and machine learning. Over the last seven decades, these terms have gained much prominence in the digital arena due to its wide adoption of techniques for designing affluent industry-enabled solutions. In this comprehensive survey on artificial intelligence, the authors provide insights from the evolution of machine learning and artificial intelligence to the present state of art and how the technology in future can be exploited to yield solutions to some of the challenging global problems. The discussion centers around successful deployment of diverse use cases for the present state of affairs. The rising interest among researchers and practitioners led to the unfolding of AI into many popular subfields as we know today. Through the course of this research article, the authors provide brief highlights about techniques for supervised as well as unsupervised learning. AI has paved the way to accomplish cutting-edge research in complex competitive domains ranging from autonomous driving, climate change, cyber-physical security systems, to healthcare diagnostics. The study concludes by depicting the growing share in market revenues from artificial intelligence-powered products and the forecasted billions of dollars worth of market shares ahead in the coming decade. © 2021, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
Pyramid is a hierarchical approach to Evolutionary Computation that decomposes problems by first tackling simpler versions of them before scaling up to increasingly more difficult versions with smaller populations. Previous work showed that Pyramid was mostly as good or better than a standard GA approach, but that it did so with a fraction of individuals processed. Pyramid requires two key parameters to manage the problem complexity; (i) a threshold a as the performance bar, and (ii) b as the container with the maximum number of individuals to survive to the next level down. Pyramid-Z addressed the shortcomings of Pyramid by automating the choice of a (to assure that the top individuals are highly significantly better from the original population at the current level) and makes b less aggressive (to maintain a moderately sized population at the final level). In cases where evolution starts to stagnate at the final level, the population enters into a different form of evolution, driven by a form of hyper-mutation that runs until either a satisfactory fitness has been found or the total evaluation budget has been exhausted. The experimental results show that Pyramid-Z consistently outperforms the previous version and the baseline too. Copyright © 2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.
The growing interest in the search and use of alternative resources for renewable energy can lead the future towards substantially decreasing carbon footprint and reduce the effects of global warming. The proposed research explores the possibility of harnessing piezoelectric energy from the environment of moving vehicles on road. Although the technology is still immature, it has the advantage of having zero carbon footprints thus making it ideal to investigate the potential for green energy generation. The main objective is to develop regression models that can estimate energy generated from vehicular traffic. Energy is generated when force is applied to piezoelectric transducers which depend on significant factors such as the number of piezoelectric transducers and their arrangement, load applied and frequency. We design Support Vector Machine (SVM) and Generalised Linear Model (GLM) for predicting energy. The best features for training the model were selected by incorporating feature selection techniques such as Pearson’s correlation coefficient and Mutual Information Statistics. The experimental setup makes use of simulated data which takes into account vehicle count of different vehicles with and without load. The accuracy achieved from SVM and GLM are 99.6% and 99.7% respectively. The energy savings achieved by making use of generated piezoelectric energy is discussed with a sample scenario of Motorway50 of Dublin, the Irish Capital city. Through this work, we propose to investigate deeper into the feasibility towards cost-effectiveness by utilizing energy which is wasted by human and vehicular locomotion. Copyright © 2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.
Elliptic curve is a major area of research due to its application in elliptic curve cryptography. Due to their small key sizes, they offer the twofold advantage of reduced storage and transmission requirements. This also results in faster execution times. The authors propose an architecture to automatically generate test cases, for verification of elliptic curve operational circuits, based on user-defined prime field and the parameters used in the circuit to be tested. The ECC test case generations are based on the Galois field arithmetic operations which were the subject of previous work by the authors. One of the strengths of elliptic curve mathematics is its simplicity, which involves just three points (P, Q, and R), which pass through a line on the curve. The test cases generate points for a user-defined prime field which sequentially selects the input vector points (P and/or Q), to calculate the resultant output vector (R) easily. The testbench proposed here targets field programmable gate array (FPGAs) platforms and experimental results for ECC test case generation on different prime fields are presented, while ModelSim is used to validate the correctness of the ECC operations. © 2021 IEEE.
Pyramid is a hierarchical approach to Evolutionary Computation that decomposes problems by first tackling simpler versions of them before scaling up to increasingly more difficult versions with smaller populations. Previous work showed that Pyramid was mostly as good or better than a standard GA approach, but that it did so with a fraction of individuals processed. Pyramid requires two key parameters to manage the problem complexity; (i) a threshold α as the performance bar, and (ii) β as the container with the maximum number of individuals to survive to the next level down. Pyramid-Z addressed the shortcomings of Pyramid by automating the choice of α (to assure that the top individuals are highly significantly better from the original population at the current level) and makes β less aggressive (to maintain a moderately sized population at the final level). In cases where evolution starts to stagnate at the final level, the population enters into a different form of evolution, driven by a form of hyper-mutation that runs until either a satisfactory fitness has been found or the total evaluation budget has been exhausted. The experimental results show that Pyramid-Z consistently outperforms the previous version and the baseline too. © 2023 by SCITEPRESS – Science and Technology Publications, Lda.
AutoGE (Automatic Grammatical Evolution) is a tool designed to aid users of GE for the automatic estimation of Grammatical Evolution (GE) parameters, a key one being the grammar. The tool comprises of a rich suite of algorithms to assist in fine tuning a BNF (Backus-Naur Form) grammar to make it adaptable across a wide range of problems. It primarily facilitates the identification of better grammar structures and the choice of function sets to enhance existing fitness scores at a lower computational overhead. This research work discusses and reports experimental results for our Production Rule Pruning algorithm from AutoGE which employs a simple frequency-based approach for eliminating less useful productions. It captures the relationship between production rules and function sets involved in the problem domain to identify better grammar. The experimental study incorporates an extended function set and common grammar structures for grammar definition. Preliminary results based on ten popular real-world regression datasets demonstrate that the proposed algorithm not only identifies suitable grammar structures, but also prunes the grammar which results in shorter genome length for every problem, thus optimizing memory usage. Despite utilizing a fraction of budget in pruning, AutoGE was able to significantly enhance test scores for 3 problems. © 2023 by SCITEPRESS – Science and Technology Publications, Lda.
The quality assurance of circuits is of major importance as the complexity of circuits is rising with their capabilities. Thus a high degree of testing is required to guarantee proper operation. If, on the other hand, too much time is spent in testing then this prolongs development time. The work presented in this paper proposes a methodology to select a minimal set of test cases for validating digital circuits with respect to their functional specification. We do this by employing hierarchical clustering algorithms to group test cases using a hamming distance similarity measure. The test cases are selected from the clusters, by our proposed approach of distance-based selection. Results are tested on the two circuits viz. Multiplier and Galois Field multiplier that exhibit similar behaviour but differ in the number of test cases and their implementation. It is shown that on small fraction values, distance-based selection can outperform traditional random-based selection by preserving diversity among the chosen test cases. Copyright © 2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
The objective of the proposed research is to design a system called Green Artificial Intelligence Powered Cost Pricing Models for Congestion Control (GREE-COCO) for road vehicles that address the issue of congestion control through the concept of cost pricing. The motivation is to facilitate smooth traffic flow among densely congested roads by incorporating static and dynamic cost pricing models. The other objective behind the study is to reduce pollution and fuel consumption and encourage people towards positive usage of the public transport system (e.g., bus, train, metro, and tram). The system will be implemented by charging the vehicles driven on a particular congested road during a specific time. The pricing will differ according to the location, type of vehicle, and vehicle count. The cost pricing model incorporates an incentive approach for rewarding the usage of electric/non-fuel vehicles. The system will be tested with analytics gathered from cameras installed for testing purposes in some of the Indian and Irish cities. One of the challenges that will be addressed is to develop sustainable and energy-efficient Artificial Intelligence (AI) models that use less power consumption which results in low carbon emission. The GREE-COCO model consists of three modules: vehicle detection and classification, license plate recognition, and cost pricing model. The AI models for vehicle detection and classification are implemented with You Only Look Once (YOLO) v3, Faster-Region based Convolutional Neural Network (F-RCNN), and Mask-Region based Convolutional Neural Network (Mask RCNN). The selection of the best model depends upon their performance concerning accuracy and energy efficiency. The dynamic cost pricing model is tested with both the Support Vector Machine (SVM) classifier and the Generalised Linear Regression Model (GLM). The experiments are carried out on a custom-made video dataset of 103 videos of different time duration. The initial results obtained from the experimental study indicate that YOLOv3 is best suited for the system as it has the highest accuracy and is more energy-efficient. © 2021 by SCITEPRESS - Science and Technology Publications, Lda.
Electronic health records driving over the hype of digitalization are currently booming in many hospitals. Despite advancements, plethora of challenges such as data interconnectivity, interoperability, and data sharing arises due to hospitals with their own hospital management information system form clusters of data. These can be solved by effectively employing blockchain platform. The authors in this work are proposing a novel consensus algorithm titled Proof of Authenticity over the distributed platform for all medical stakeholders. Unlike the previous approaches, wherein researchers were the miners, this work illustrates a methodology to implement blockchain for health care, where the hospitals and clinics are assumed the roles of both miners and validators. The peer-to-peer network is leveraged with a designed smart contract that follows the proof of authenticity mechanism. The medical stakeholders will access the medical data under security protocols and patient’s consent in a tamper-proof network. The proposed work aims for more patient centric and transparent health care. © 2020, Springer Nature Singapore Pte Ltd.
One of the challenges in biomedical research and clinical practice is that we need to consolidate tremendous efforts in order to use all kinds of medical data for improving work processes, to increase capacity while lessening costs and enhancing efficiencies. Very few medical centers in India have digitized their patient records. Because of less interoperability among themselves, they end up having scattered and incomplete data. Health data is proprietary and being a personal asset of the patient, its distribution or use should be accomplished only with the patient’s consent and for a specific duration. This research proposes multichain as a secure, decentralized network for storing Electronic Health Records. The architecture provides users with a holistic, transparent view of their medical history by disintermediation of trust while insuring data integrity among medical facilities. This will open up new horizons of vital trends and insights for research, innovation, and development through robust analysis. The platform focuses on an interactive dashboard containing year, month, and season wise statistics of various diseases which are used to notify the users and the medical authorities on a timely basis. Prediction of epidemics using machine learning techniques will facilitate users by providing personalized care and the medical institutions for managing inventory and procuring medicines. Vital insights like patient to doctor ratio, infant mortality rates, and prior knowledge of the forthcoming epidemics will help government institutions to analyze and plan infrastructural requirements and services to be provided. © 2020, Springer Nature Singapore Pte Ltd.
Time series forecasting is a technique that predicts future values using time as one of the dimensions. The learning process is strongly controlled by fine-tuning of various hyperparameters which is often resource extensive and requires domain knowledge. This research work focuses on automatically evolving suitable hyperparameters of time series for level, trend and seasonality components using Grammatical Evolution. The proposed Grammatical Evolution Time Series framework can accept datasets from various domains and select the appropriate parameter values based on the nature of dataset. The forecasted results are compared with a traditional grid search algorithm on the basis of error metric, efficiency and scalability. © 2020 by SCITEPRESS - Science and Technology Publications, Lda. All rights reserved
In Grammatical Evolution (GE) individuals occupy more space than required, that is, the Actual Length of the individuals is longer than their Effective Length. This has major implications for scaling GE to complex problems that demand larger populations and complex individuals. We show how these two lengths vary for different sizes of population, demonstrating that Effective Length is relatively independent of population size, but that the Actual Length is proportional to it. We introduce Grammatical Evolution Memory Optimization (GEMO), a two-stage evolutionary system that uses a multi-objective approach to identify the optimal, or at least, near-optimal, genome length for the problem being examined. It uses a single run with a multi-objective fitness function defined to minimize the error for the problem being tackled along with maximizing the ratio of Effective to Actual Genome Length leading to better utilization of memory and hence, computational speedup. Then, in Stage 2, standard GE runs are performed restricting the genome length to the length obtained in Stage 1. We demonstrate this technique on different problem domains and show that in all cases, GEMO produces individuals with the same fitness as standard GE but significantly improves memory usage and reduces computation time. © 2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.
Emotion is a state of mind affected by many external parameters one of which is text either read or spoken by others or self. Recognition of emotion from facial expression, sound intensity, or text is becoming an interesting research area. Extracting emotions from text is quite unfocused but important research problem from natural language processing domain. It requires the construction of emotional lexicon in respective natural language for classification of text/document into emotional classes. In this paper, an overview of the state-of-the-art techniques used to construct emotional lexicon for different languages is given. These methods are in their initial stage of research as much of the work is conducted for optimizing the results and hence open to wide field of innovative contributions. The author concludes with a proposal for developing language independent emotional lexicon. Main challenges in implementing this are discussed and promising applications in various fields are also elaborated. © 2019, Springer Nature Singapore Pte Ltd.
The way to proficient waste management is to guarantee appropriate segregation of waste to ensure its proper reuse. The objective of this paper is to identify types of wastes generated in India, the nature of waste coming out from different cities, the current disposal method being employed there, amount of waste that gets dumped at landfill. These statistics are useful for identifying efficient methods to segregate waste to enhance the efficiency of reusing and recycling process. The paper may justify the reason behind expanding number of landfills in India. Valuable insights can be obtained after classifying waste into biodegradable/non-biodegradable classes, and its suitability for disposal or not. They can help in choosing proper disposal techniques for different categories of waste. © 2019 IEEE.
The conversion of unstructured big data into knowledgeable information has been the hotspot of search applications today. Nearly 75% of queries issued to Web search engines aim at finding information about entities. In an ideal case, the user wants to know the relations existing between the data objects. Conceptual knowledge graph provides an efficient way for exploring such relations. Past researches relied on knowledge bases like DBpedia to build such graphs. In this paper, we introduce a method that automatically extracts the key aspects of search query from the Wikipedia corpus. The extracted relations are dynamically expressed as a knowledge graph. Additionally, the system returns the list of results i.e., Wikipedia documents ranked in the order of their relevance in response to the search query. Thus, the proposed system can be viewed as an information retrieval system that leverages knowledge graph to provide more promising information to the user. © 2018 IEEE.
Fertility of the soil is considered most important criterion in any agriculture practice. Nutrients present in the soil define its fertility. Mineral nutrients such as Nitrogen (N), Potassium (K), Phosphorous (P) are vital for plant growth and food production. Lack of adequate knowledge amongst the farmers about the various parameters in farming like the soil fertility, amount of fertilizer to be used, leads to degradation of the overall soil quality. In this paper, we have represented a system to test the soil fertility by using the principal of colorimetry. Colorimetry is a technique in which we measure the amount of light absorbed by the color developed in the sample. An aqueous solution of the soil sample is prepared using extracting agents and is subjected to the photodiodes of the color sensor. The solution develops a color due to reaction of nutrients in the soil with chemicals. The output by the color sensor is calibrated with standard values present in the database. To verify the results obtained by the color sensor we use the Naive Bayes classification algorithm. This algorithm classifies the intensity values of the soil solutions into three class labels namely low, medium, high. After applying the Naive Bayes classifier, we can predict the accuracy of the intended system. The intended system is thus beneficial to reduce the time required for testing the soil fertility and determining the accuracy of our results. © 2018 IEEE.
This paper is an attempt for the work done for image and video retrieval system. Content-Based Image Retrieval(CBIR) and Content-Based Video Retrieval(CBVR), has attracted researchers from various research fields like, computer vision, artificial intelligence, human factors, machine learning, image processing, man-machine modeling to name a few. Existing methods have revealed certain flaws like noisy data, often leading to display of irrelevant images or videos. In our system we have used Hypergraph Learning for images and Similarity Matching for videos. All the above challenges are addressed by retrieving relevant images and videos in response to user's keyword based search approach. User can search by attributes present during search or user can search by new attribute which gets added in the attribute list in the database and we can have ranking of the retrieved results and get relevant data. Experimentation is carried out from Flickr database for images and videos under consideration are those which are available on YouTube. A database of images and videos that cover user's interest in diverse domains is designed and used in experimentation. © 2018 IEEE.
Since last two decades there is a rapid growth in the globalized world for recommender systems. These provide users rich insights in diverse applications such as healthcare, e-commerce, education, and tourism etc. Hence there is a growing demand to accurately analyze the reviews posted by the users on different social media sites. Tourism industry's economy largely relies on analysis of the above mentioned data. Hence we have pursued the idea of building a recommender system which will provide users with valuable insights and help them in making the correct choice. In our approach we have experimented on the data collected for 150 locations from and near Pune city, in Maharashtra representing the country India. We first categorized the reviews into location specific details. The defined categories are 'Temple', 'Historical', 'Hill Station' and 'Educational'. We then integrated the ratings provided by the previous users under each category with those of user's interests like Expense, total number of days, distance for trip and the user's interests. The combined approach will be capable of recommending a set of tours that most closely matches with the user's interests and thus enabling them to make the best choice. © 2018 IEEE.
Now a days there is a huge need for the innovation in the field of displaying the important announcements through the wireless based notification system. Since the use of internet is increased therefore everyone is interested in the wireless communication as because everyone spends most of the time on the mobile phones so they choose internet over the usage of manual efforts. With the typical traditional method of displaying of notices on the boards does not helps everyone as each user is not notified individually. Therefore our approach is to reduce the manual efforts by updating the user by sending them push notifications individually by wireless communication which will help the user not present physically still he will be able to know the important notices. This approach can be applied in college where according to pattern matching is done on the inputs and automatically the notices are been sent to the appropriate user. Using pattern matching our system does not need to each time select the users but it does pattern matching and automatically selects the users applicable for the notices. © 2018 IEEE.
The entire globe is responsible for generating large amounts of information every day due to the advent of diverse social media. Due to availability of such unstructured data, it becomes increasingly difficult to fetch relevant information that interests a user. This occurs due to the ever-present homograph words which imply multiple meanings when used under various contexts. Hence a need arises to develop an approach to organize such conflicting information generated due to homographs. Topic modeling is an approach through which we can organize the information. Twitter is one such social media site which greatly challenges the researchers to interpret accurate information. Given that a number of tweets may lead to conflicting and contradictory information as according to each user's interpretation. Hence the authors propose the use of Latent Dirichlet Allocation algorithm and generate all possible meaningful combinations through which the user's can analyze their peer's opinions by choosing the appropriate homograph models. © 2018 IEEE.
Hadoop is a distributed master-slave platform that comprises of two main components viz. Hadoop Distributed File System (HDFS) and MapReduce. HDFS provides distributed storage whereas MapReduce is useful for computational processing. A MapReduce cluster when receives multiple jobs simultaneously, the whole system performance might seriously deteriorate because of poor job response time. Thus, a real challenging issue in the MapReduce world is the efficient scheduling of jobs. Nevertheless, we see that traditional scheduling algorithms that work with Hadoop does not always assure significant average job-response times under distinct workloads. In order to address this problem, we put forward an efficient Hadoop scheduler that collects the information of workload patterns and distributes the jobs according to our hybrid scheduling technique. The experimental results exhibit that our scheduler enhances the average job-response time for MapReduce systems with different workload patterns. © 2017 IEEE.
Data Mining helps its users deduce important information from huge databases. In medical stream, practitioners make use of huge patient data. Any effective medical treatmentis achieved after complete survey of ample amount of patient data. But practitioners usually faced with the obstacle of deducing pertinent information and finding certain trend or pattern that may further help them in the analysis or treatment of any disease. Data Mining is such a tool which sifts through that voluminous data and presents the data of essential nature. In this paper, we have designed a five-step data mining model that will help medical practitioners on determining the appropriate drug to be used in ministration for epilepsy. Most of the epileptic seizures are managed through drug remedy, particularly anti-convulsant drugs. The choice is most often related to other aspects particular to every patient. The trick to building a successful predictive model is to include parts of data in your database that describes what has happened in the past. There are a wide range of older as well as recent anticonvulsants present in market. Our paper will take into consideration both the older and the recent anticonvulsants and other factors to justify the use of a drug suitable for treatment in epilepsy. To determine the drug choice for treatment in different epilepsy, we have selected the classification method. Decision trees are a sort of data mining technology that has been around for almost 20 years now. They are now increasingly being used for prediction. © Springer International Publishing Switzerland 2016.
Brain, an amazing organ of human body comprises electrical signal that can be helpful for interaction between various brain regions. Functional Magnetic resonance Imaging (fMRI) is a specialized type of Magnetic Resonance Imaging scan. Though nature of fMRI data posses various challenges in the analysis. But, even after all challenges too, it is used as an effective method to diagnose various disease and the relationships between various brain regions. In this paper, we have proposed a model that will result a better fMRI data analysis. The effective interactions among the brain regions can be explored using dynamic causal modeling (DCM) that will help us to understand the functionality of brain up to some extent. Bayesian networks can be used for causal discovery purpose in support with markov blanket which can be evaluated with the help of evaluation metrics. © (2012) Trans Tech Publications, Switzerland.
The extraction of hidden predictive information from large databases is possible with data mining. Anemia is the most common disorder of the blood. Anemia can be classified in a variety of ways, based on the morphology of RBCs, etiology, etc . In this paper we present an analysis of the prediction and classification of anemia in patients using data mining techniques. The dataset constructed from complete blood count test data from various hospitals. We have worked out with classification method C4.5 decision tree algorithm and Support vector machine which are implemented as J48 and SMO(sequential minimal optimization) in Weka. Several experiments are conducted using these algorithms. The decision ree for classification of anemia is generated which gives best possible classification of anemia based on CBC reports along with severity of anemia. We have observed that C4.5 algorithm has best performance with highest accuracy. © 2011 Springer-Verlag.
Schizophrenia is a complex psychiatric disorder which leads to local abnormalities in brain activity. Functional Magnetic Resonance Imaging (fMRI) technology enables medical doctors to observe brain activity patterns that represent the execution of subject tasks, both physical and mental. In general, each subject exhibits his own activation pattern for a given task, whose intensity is affected by the physiology of the subject's brain, the usage of medications, and the parameters of the scanner used for image acquisition. Since it is possible to co-register the resulting activation map to a standard brain, all activation patterns from the different individuals can be analyzed in terms of consistency on the brain sections or brain coordinates where the activation is observed. The dynamic Causal Model using Bayesian networks (DBNs) extracts causal relationships from functional magnetic resonance imaging (fMRI) data applying HITON-PC, a local causal algorithm. Based on these relationships, a dynamic causal model is to be build that is used to classify patient data as belonging to healthy or ill subjects. Causal Explorer is a Matlab library of computational causal discovery and variable selection algorithms. © 2011 IEEE.