Uploaded by Chandrakantha T S


Algorithms for Intelligent Systems
Series Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K. Nagar
Vinit Kumar Gunjan
P. N. Suganthan
Jan Haase
Amit Kumar Editors
and Machine
Proceedings of ICCCMLA 2020
Algorithms for Intelligent Systems
Series Editors
Jagdish Chand Bansal, Department of Mathematics, South Asian University,
New Delhi, Delhi, India
Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee,
Roorkee, Uttarakhand, India
Atulya K. Nagar, School of Mathematics, Computer Science and Engineering,
Liverpool Hope University, Liverpool, UK
This book series publishes research on the analysis and development of algorithms
for intelligent systems with their applications to various real world problems. It
covers research related to autonomous agents, multi-agent systems, behavioral
modeling, reinforcement learning, game theory, mechanism design, machine
learning, meta-heuristic search, optimization, planning and scheduling, artificial
neural networks, evolutionary computation, swarm intelligence and other algorithms for intelligent systems.
The book series includes recent advancements, modification and applications
of the artificial neural networks, evolutionary computation, swarm intelligence,
artificial immune systems, fuzzy system, autonomous and multi agent systems,
machine learning and other intelligent systems related areas. The material will be
beneficial for the graduate students, post-graduate students as well as the
researchers who want a broader view of advances in algorithms for intelligent
systems. The contents will also be useful to the researchers from other fields who
have no knowledge of the power of intelligent systems, e.g. the researchers in the
field of bioinformatics, biochemists, mechanical and chemical engineers,
economists, musicians and medical practitioners.
The series publishes monographs, edited volumes, advanced textbooks and
selected proceedings.
More information about this series at http://www.springer.com/series/16171
Vinit Kumar Gunjan · P. N. Suganthan · Jan Haase ·
Amit Kumar
Cybernetics, Cognition
and Machine Learning
Proceedings of ICCCMLA 2020
Vinit Kumar Gunjan
Department of Computer Science
and Engineering
CMR Institute of Technology
Hyderabad, India
Jan Haase
Department of Computer Science
Nordakademie, Elmshorn, Germany
P. N. Suganthan
School of Electrical and Electronics
Singapore, Singapore
Amit Kumar
Bioaxis DNA Research Centre (P) Ltd.
Hyderabad, India
ISSN 2524-7565
ISSN 2524-7573 (electronic)
Algorithms for Intelligent Systems
ISBN 978-981-33-6690-9
ISBN 978-981-33-6691-6 (eBook)
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2021
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Cognitive science is the study of human mind and brain, focusing on how mind represents and manipulates knowledge and how mental representations and processes are
realized in the brain. The field is highly transdisciplinary in nature, combining ideas,
principles and methods of psychology, computer science, linguistics, philosophy,
neuroscience, etc. Brain–machine interfaces were envisioned already in the 1940s
by Norbert Wiener, the father of cybernetics. The opportunities for enhancing human
capabilities and restoring functions are now quickly expanding with a combination
of advances in machine learning, smart materials and robotics.
Automation, artificial intelligence (AI) and machine learning (ML) are pushing
boundaries in the software and hardware industry to what machines are capable of
doing. From just being a figment of someone’s imagination in sci-fi movies and
novels, they have come a long way to augmenting human potential (reducing risk of
human errors) in doing tasks faster, more accurate and with greater precision each
time—driven by technology, automation and innovation. This is indeed creating new
business opportunities and is acting as a clear competitive differentiator that helps
analyze hidden patterns of data to derive possible insights. AI and ML can certainly
enrich our future, thereby making the need for intelligent and sophisticated systems
more important than ever.
This book is comprised of selected and presented papers at the International
Conference on Cybernetics, Cognition and Machine Learning Applications 2020. It
consists of selected manuscripts, arranged on the basis of their approaches and contributions to the scope of the conference. The chapters of this book present key algorithms and theories that form the core of the technologies and applications concerned,
consisting mainly of artificial intelligence, machine learning, neural networks, face
recognition, evolutionary algorithms such as genetic algorithms, automotive applications, automation devices with artificial neural networks, business management
systems, cybernetics, IoT, cognition, data science and modern speech processing
systems. This book also covers recent advances in medical diagnostic systems, smart
agricultural applications, sensor networks and cognitive science domain. Discussion
of learning and software modules in deep learning algorithms is added wherever
Hyderabad, India
Elmshorn, Germany
Hyderabad, India
Vinit Kumar Gunjan
P. N. Suganthan
Jan Haase
Amit Kumar
IOT Smart Locker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Anurag Narkhede, Vinit Mapari, and Aarti Karande
Brief Analysis on Human Activity Recognition . . . . . . . . . . . . . . . . . . . . . . .
Kaif Jamil, Deependra Rastogi, Prashant Johri, and Munish Sabarwal
Lora-Based Smart Garbbage Alert Monitoring System Using
ATMEGA 328, 2560, 128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Anzar Ahmad and Shashi Shekhar
Pre-birth Prognostication of Education and Learning of a Fetus
While in the Uterus of the Mother Using Machine Learning . . . . . . . . . . .
Harsh Nagesh Mehta and Jayshree Ghorpade Aher
Performance Analysis of Single-Stage PV Connected Three-Phase
Grid System Under Steady State and Dynamic Conditions . . . . . . . . . . . .
V. Narasimhulu and K. Jithendra Gowd
Delay Feedback H∞ Control for Neutral Stochastic Fuzzy Systems
with Time Delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
T. Senthilkumar
Modeling Crosstalk of Tau and ROS Implicated in Parkinson’s
Disease Using Biochemical Systems Theory . . . . . . . . . . . . . . . . . . . . . . . . . .
Hemalatha Sasidharakurup, Parvathy Devi Babulekshmanan,
Sreehari Sathianarayanan, and Shyam Diwakar
IoT-Based Patient Vital Measuring System . . . . . . . . . . . . . . . . . . . . . . . . . . .
Ashwini R. Hirekodi, Bhagyashri R. Pandurangi,
Uttam U. Deshpande, and Ashok P. Magadum
IoT-Enabled Logistics for E-waste Management and Sustainability . . . . .
P. S. Anusree and P. Balasubramanian
Navigation Through Proxy Measurement of Location by Surface
Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
G. Savitha, Adarsh, Aditya Raj, Gaurav Gupta, and Ashik A. Jain
Unsupervised Learning Algorithms for Hydropower’s Sensor Data . . . . .
Ajeet Rai
Feature Construction Through Inductive Transfer Learning
in Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Suman Roy and S. Saravana Kumar
Decoding Motor Behavior Biosignatures of Arm Movement Tasks
Using Electroencephalography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Rakhi Radhamani, Alna Anil, Gautham Manoj, Gouri Babu Ambily,
Praveen Raveendran, Vishnu Hari, and Shyam Diwakar
Emergency Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Shubham V. Ranbhare, Mayur M. Pawar, Shree G. Mane, and Nikhil B. Sardar
Type Inference in Java: Characteristics and Limitations . . . . . . . . . . . . . . 131
Neha Kumari and Rajeev Kumar
Detection and Correction of Potholes Using Machine Learning . . . . . . . . 139
Ashish Sahu, Aadityanand Singh, Sahil Pandita, Varun Walimbe,
and Shubhangi Kharche
Detecting COVID-19 Using Convolution Neural Networks . . . . . . . . . . . . . 153
Nihar Patel, Deep Patel, Dhruvil Shah, Foram Patel, and Vibha Patel
Electroencephalography Measurements and Analysis of Cortical
Activations Among Musicians and Non-musicians for Happy
and Sad Indian Classical Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Nijin Nizar, Akhil Chittathuparambil Aravind, Rupanjana Biswas,
Anjali Suresh Nair, Sukriti Nirayilatt Venu, and Shyam Diwakar
Signal Processing in Yoga-Related Neural Circuits and Implications
of Stretching and Sitting Asana on Brain Function . . . . . . . . . . . . . . . . . . . . 169
Dhanush Kumar, Akshara Chelora Puthanveedu, Krishna Mohan,
Lekshmi Aji Priya, Anjali Rajeev, Athira Cheruvathery Harisudhan,
Asha Vijayan, Sandeep Bodda, and Shyam Diwakar
Automation of Answer Scripts Evaluation-A Review . . . . . . . . . . . . . . . . . . 177
M. Ravikumar, S. Sampath Kumar, and G. Shivakumar
Diabetes Mellitus Detection and Diagnosis Using AI Classifier . . . . . . . . . 185
L. Priyadarshini and Lakshmi Shrinivasan
Review on Unit Selection-Based Concatenation Approach in Text
to Speech Synthesis System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Priyanka Gujarathi and Sandip Raosaheb Patil
Enhancing the Security of Confidential Data Using Video
Steganography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Praptiba Parmar and Disha Sanghani
Data Mining and Analysis of Reddit User Data . . . . . . . . . . . . . . . . . . . . . . . 211
Archit Aggarwal, Bhavya Gola, and Tushar Sankla
Analysis of DNA Sequence Pattern Matching: A Brief Survey . . . . . . . . . 221
M. Ravikumar and M. C. Prashanth
Sensor-Based Analysis of Gait and Assessment of Impacts
of Backpack Load on Walking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
Chaitanya Nutakki, S. Varsha Nair, Nima A. Sujitha,
Bhavita Kolagani, Indulekha P. Kailasan, Anil Gopika, and Shyam Diwakar
Wireless Battery Monitoring System for Electric Vehicle . . . . . . . . . . . . . . 239
Renuka Modak, Vikramsinh Doke, Sayali Kawrkar, and Nikhil B. Sardar
Iris Recognition Using Selective Feature Set in Frequency Domain
Using Deep Learning Perspective: FrDIrisNet . . . . . . . . . . . . . . . . . . . . . . . . 249
Richa Gupta and Priti Sehgal
Enhancement of Mammogram Images Using CLAHE and Bilateral
Filter Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
M. Ravikumar, P. G. Rachana, B. J. Shivaprasad, and D. S. Guru
Supervised Cross-Database Transfer Learning-Based Facial
Expression Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Arpita Gupta and Ramadoss Balakrishnan
Innovative Approach for Prediction of Cancer Disease
by Improving Conventional Machine Learning Classifier . . . . . . . . . . . . . . 281
Hrithik Sanyal, Priyanka Saxena, and Rajneesh Agrawal
Influence of AI on Detection of COVID-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
Pallavi Malik and A. Mukherjee
Study of Medicine Dispensing Machine and Health Monitoring
Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
Aditi Sanjay Bhosale, Swapnil Sanjay Jadhav, Hemangi Sunil Ahire,
Avinash Yuvraj Jaybhay, and K. Rajeswari
Building Image Classification Using CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
Prasenjit Saha, Utpal Kumar Nath, Jadumani Bhardawaj, Saurin Paul,
and Gagarina Nath
Analysis of COVID-19 Pandemic and Lockdown Effects
on the National Stock Exchange NIFTY Indices . . . . . . . . . . . . . . . . . . . . . . 313
Ranjani Murali
COVID-19 Detection Using Computer Vision and Deep
Convolution Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
V. Gokul Pillai and Lekshmi R. Chandran
Prediction of Stock Indices, Gold Index, and Real Estate Index
Using Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
Sahil Jain, Pratyush Mandal, Birendra Singh, Pradnya V. Kulkarni,
and Mateen Sayed
Signal Strength-Based Routing Using Simple Ant Routing
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
Mani Bushan Dsouza and D. H. Manjaiah
Fake News Detection Using Convolutional Neural Networks
and Random Forest—A Hybrid Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
Hitesh Narayan Soneji and Sughosh Sudhanvan
An Enhanced Fuzzy TOPSIS in Soft Computing for the Best
Selection of Health Insurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
K. R. Sekar, M. Sarika, M. Mitchelle Flavia Jerome, V. Venkataraman,
and C. Thaventhiran
Non-intrusive Load Monitoring with ANN-Based Active Power
Disaggregation of Electrical Appliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
R. Chandran Lekshmi, K. Ilango, G. Nair Manjula, V. Ashish,
John Aleena, G. Abhijith, H. Kumar Anagha, and Raghavendra Akhil
Prediction of Dimension of a Building from Its Visual Data Using
Machine Learning: A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
Prasenjit Saha, Utpal Kumar Nath, Jadumani Bhardawaj, and Saurin Paul
Deep Learning Algorithms for Human Activity Recognition:
A Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Aaditya Agrawal and Ravinder Ahuja
Comparison of Parameters of Sentimental Analysis Using
Different Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
Akash Yadav and Ravinder Ahuja
System Model to Effectively Understand Programming Error
Messages Using Similarity Matching and Natural Language
Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Veena Desai, Pratijnya Ajawan, and Balaji Betadur
Enhanced Accuracy in Machine Learning Using Feature Set
Bifurcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
Hrithik Sanyal, Priyanka Saxena, and Rajneesh Agrawal
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
About the Editors
Vinit Kumar Gunjan is an Associate Professor in the Department of Computer
Science and Engineering at CMR Institute of Technology Hyderabad (Affiliated to
Jawaharlal Nehru Technological University, Hyderabad) and an active researcher;
he published research papers in IEEE, Elsevier and Springer Conferences, authored
several books and edited volumes of Springer series, most of which are indexed
in SCOPUS database. He is awarded with the prestigious Early Career Research
Award in the year 2016 by Science Engineering Research Board, Department of
Science and Technology, Government of India. Senior member of IEEE and an
active Volunteer of IEEE Hyderabad section—he is presently serving as secretary
for IEEE CIS and volunteered in the capacity of Treasurer, Secretary and Chairman
of IEEE Young Professionals Affinity Group and IEEE Computer Society. He was
involved as organizer in many technical and non-technical workshops, seminars and
conferences of IEEE and Springer. During the tenure he had an honour of working
with top leaders of IEEE and was awarded with best IEEE Young Professional award
in 2017 by IEEE Hyderabad Section.
P. N. Suganthan is Professor at Nanyang Technological University, Singapore, and
Fellow of IEEE. He is a founding Co-editor-in-Chief of Swarm and Evolutionary
Computation (2010–), an SCI Indexed Elsevier Journal. His research interests include
swarm and evolutionary algorithms, pattern recognition, forecasting, randomized
neural networks, deep learning and applications of swarm, evolutionary and machine
learning algorithms. His publications have been well cited (Google Scholar Citations:
~33k). His SCI indexed publications attracted over 1000 SCI citations in a calendar
year since 2013. He was selected as one of the highly cited researchers by Thomson
Reuters every year from 2015 to 2018 in computer science. He served as the General
Chair of the IEEE SSCI 2013. He is an IEEE CIS distinguished lecturer (DLP) in
2018–2020. He has been a member of the IEEE (S’91, M’92, SM’00, Fellow’15)
since 1991 and an elected AdCom member of the IEEE Computational Intelligence
Society (CIS) in 2014–2016.
About the Editors
Jan Haase (M’07, SM’09) received his Bachelor, Master, and Ph.D. degree in
computer sciences at University of Frankfurt/Main, Germany. Then he was project
leader of several research projects at University of Technology in Vienna, Austria,
at the Institute of Computer Science and a lecturer at Helmut Schmidt University,
Hamburg, where he received his habilitation grade. 2016–2020 he held a temporal
professorship for Organic Computing at University of Luebeck, Germany and now
is a full professor at Nordakademie near Hamburg, Germany.
His main interests are Building Automation, System Specification and Modeling,
Simulation, Low-Power Design Methodologies, Wireless Sensor Networks, Automatic Parallelization and modern Computer Architectures. As a member of several
technical program committees of international conferences he is involved in the
review process of many research publications and repeatedly acted as TPC chair,
track chair, etc. at these conferences. He (co)-authored more than 100 peer reviewed
journal and conference papers and several book chapters.
As an IEEE volunteer, he currently is Germany Section’s Chair, has been Austria
Section’s Chair and is active in IEEE R8, having been R8 Conference Coordinator,
R8 Professional Activities Chair, and a member of several R8 committees. In IEEE
Industrial Electronics Society, he is a voting AdCom member and Chair of the Technical Committee on Building Automation, Control, and Management for 2020 and
2021. Furthermore, he serves and served on several society committees like the
constitution and bylaws committee, the planning and development committee, the
technical activities committee, membership committee, etc. On IEEE HQ level he
currently serves on the Conference Finance Committee and chaired the Adhoc on
Cultural Differences in the IEEE Conferences Committee. He also continuously
served as a mentor on the IEEE VoLT program since the very first season.
Amit Kumar, Ph.D. is a passionate Forensic Scientist, Entrepreneur, Engineer,
Bioinformatician and an IEEE Volunteer. In 2005, he founded the first Private DNA
Testing Company BioAxis DNA Research Centre (P) Ltd in Hyderabad, India, with
an US Collaborator. He has vast experience of training 1000+ crime investigation
officers and helped 750+ criminal and non-criminal cases to reach justice by offering
analytical services in his laboratory. Amit was member of IEEE Strategy Development and Environmental Assessment Committee (SDEA) of IEEE MGA. He
is senior member of IEEE and has been a very active IEEE Volunteer at Section,
Council, Region, Technical Societies of Computational Intelligence and Engineering
in Medicine and Biology and at IEEE MGA levels in several capacities. He has driven
number of IEEE Conferences, conference leadership programmes, entrepreneurship
development workshops, innovation and internship related events. Currently, he is
also a Visiting Professor at SJB Research Foundation and Vice Chairman of IEEE
India Council and IEEE Hyderabad Section.
IOT Smart Locker
Anurag Narkhede, Vinit Mapari, and Aarti Karande
1 Introduction
In the present day, security is the key issue for many people especially in the urban and
rural areas. By trying to cheat people, risk of the safety of the money is most important. To avoid risks in day-to-day life, everyone relies on the banks to keep important documents, jewelry, or cash. As the times are changing, banks are increasing
their branches due to public interest and hence it has become more imperative to
secure the bank as well. This paper proposes a system that contains more security
to keep the cash/jewelry/document in the locker, safe with multistage security by
using biometric, password verification, internal security using an ultrasonic sensor,
and android app to set the timer.
This system works on the Internet of things sensors. Ultrasonic sensors are used
to measure the distance to a locker using ultrasonic sound waves. Ultrasonic pulses
are sent and received using transducer by an ultrasonic sensor. These pulses relay
back information about locker proximity. System includes biometric system. The
biometric system will be enabled when the timer is on. If the biometric system is
off, the user cannot open the locker and the timer can be set using the android app.
the system allows only unauthorized authorized person to recover money from the
bank, home and office locker with four-phase security.
A. Narkhede (B) · V. Mapari · A. Karande
MCA Department, Sardar Patel Institute of Technology, Mumbai, India
e-mail: anurag.narkhede@gmail.com
V. Mapari
e-mail: vinit.mapari@spit.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
A. Narkhede et al.
2 Literature Review
The Internet of things (IOT) refers to the ever-growing network between physical
objects and systems. It is a system of interrelated computing tinny devices, mechanical and digital machines, objects. These components are provided with unique identifiers (UIDs). They have the ability to transfer data over a network without requiring
human-to-computer or human-to-human interaction.
A locker is a small cupboard or compartment. They are mostly found in cabinets,
very often in large numbers, in various public places such as offices, homes, and
locker rooms that may be locked especially. Smart locker is a locker with open base,
on authentication and automated base on instructions (Table 1).
3 Workflow of the System
3.1 To Open This Smart Locker, the User Has to Go Under
These Steps to Access the Locker
Case 1: In step 1, the individual who wants to access the locker has to ask the admin
to grant the access that will enable the password and biometric system and the timer
will start. Here, OTP will be generated. In step 2, the user has to enter the right
password (OTP) and also has to get verified by the biometric system once both get
verified the locker will open. Once the timer ends and still the locker is open, then it
will alert the admin of theft. If the admin has not given the access to the locker and
the person enables the password and biometric system manually the theft alert will
be sent to the admin that password and biometric have been enabled manually.
The workflow of this is explained in the below flowchart (Fig. 1).
Case 2: When unauthenticated user gets all the access and shows that he is an
authenticated user. At this time, internal security will come into picture and ultrasonic
sensor will take care of internal security system that will measure and store a record
of distance between the objects. When unauthenticated users clear all the paths, that
time internal security will check that is it a holiday or closing time?’ If yes, then
it further checks ‘is distance updated?’ If yes, then send an alert to all the higher
authorities. This is the secret part of the security where malicious users as well as
an authenticated user are also unaware of this secret part. The workflow of this is
explained in the below flowchart (Fig. 2).
3.2 Tool Kit
a. Ultrasonic sensor: As the name says, an ultrasonic sensor or level sensor is used to
measure the distance to a locker using ultrasonic sound waves. Ultrasonic pulses
IOT Smart Locker
Table 1 Comparison analysis of referred papers
P. No.
Fingerprint, user
To improve the security of A malicious user can get
authentication with OTP home which verifies user access of locker multiple
as well as device
They have used motion
detector and GSM
messaging module
It sends an unauthorized
image detection signal to
the microcontroller as
well as alert message will
be generated
The system is not reliable
because of microcontroller
They have used RFID
system, user
authentication, and also
biometric system
They have implemented a
security system using
matrix keypad, RFID tag,
and GSM technology
A malicious user can hack
the system
PIR sensor and email
alert used
Motion is detected by the
PIR sensor; then that will
send email to the admin
which will give the
warning of theft
Theft detection message is
delivered only by email
Face recognition, GSM,
and Zigbee used
We can access the system
through the Web where
we can monitor as well as
control the equipment
Face detection takes a
more complex algorithm
They have used dual key It has a dual key for
safety lockers as well as opening the locker. For
this, they have used
special characters and for
the biometric system the
person who is
authenticated with the
special ID that has been
They have used email
alerting and SMS
If the theft is detected,
The system does not
then it sends the SMS to a provide that much security
user remotely
as needed, and it is not
Fingerprint scanner for
biometric system
The fingerprint is used for
detecting malicious user
where the identity of the
fingerprint is unique
Security provided through
fingerprint is not that much
secure as compared to
top-level security
They have used PIR
sensor, fingerprint,
vibration sensor, and
PIR sensor and fingerprint
are used for the security
where the vibration sensor
used to detect the pressure
which gives alert through
the alarm
Identification of user is
needed for detection of the
malicious user but it does
not provide
Using PIR sensor, IOT,
and Web server
We can control the system
through mobile where we
can on and off the system
and access it remotely
If hackers hacked the
mobile system, then they
are able to get access to the
A malicious user can easily
get access and take full
control of the system
A. Narkhede et al.
Table 1 (continued)
P. No.
Used microcontroller
They have used
microcontroller based on
an automated system
which detects physical
interference to the system
and sends the warning
message immediately
If microcontroller crashed,
then it is not able to detect
are sent and receive the wave reflected back from the target using a transducer.
Here, the ultrasonic sensor helps to identify if there is a theft attempt to forcefully
breaking locker to open it.
b. Android smartphone app: The app can be accessed by the android smartphone;
the feature of the app contains the accessibility of the locker through the app; the
admin has to login to the app; after login, he can give them access to the locker,
can set the timer of the locker and also can get the notification of the system on
the app.
3.3 Proposed Model
See Fig. 3.
4 Result from Observation
This system will be more secure as it has to get access from the admin to open
the locker. If access is granted, the timer will begin, the person has to complete the
process of verifying the password and the biometric and use of the locker need to be
completed before the timer ends. The locker has to be closed; otherwise, the theft
alert will be sent to admin.
5 Conclusion
During this system building, lots of challenges were faced while gathering feedback
from the back employees regarding the security of the lockers. After gathering the
feedback, building the security system was a challenge. This system has an established process which will help us to control for accessibility of user usage. It will
store the information of the users with their location and usage levels among the
lockers also. This system will assure the owner about the locker’s security by the
IOT Smart Locker
Fig. 1 Flow diagram of case 1
Fig. 2 Flow diagram of case 2
A. Narkhede et al.
IOT Smart Locker
Fig. 3 IOT-based smart locker system using Tinkercad simulation tool
robber. This system can be used in any places where security is paramount. This
system can even be better in the process execution by using various techniques of
authentication using biometric verification of the users. This paper proposes a system
which can be used to check the authenticity logins and access to the system.
1. Run, C.J., Reza, M., Ning, Y.: Improving home automation security; integrating device
fingerprinting into SmartHome. https://doi.org/10.1109/ACCESS.2016.2606478,IEEE
2. Neeraj, K., Amit, V.: Development of an intelligent system for bank security. In: 2014 5th
International Conference-Confluence The Next Generation Information Technology Summit
3. Ashutosh, G., Medhi, P., Pandey, S., Kumar, P., Kumar, S., Singh, H.P.: An efficient multistage
security system for user authentication, pp. 3194–3197 (2016). https://doi.org/10.1109/ICE
4. Tanwar, S., Patel, P., Patel, K., Tyagi, S., Kumar, N., Obaidat, M.S.: An advanced internet of
thing based security alert system for smart home. In: Fellow of IEEE and Fellow of SCS
5. Mrutyunjaya, S., Chiranjiv, N., Abhijeet, K.S., Biswajeet, P.: Web-based online HEmbedded
door access control and home security
6. Srivatsan, S.: Authenticated secure bio-metric based access to the bank safety lockers. In:
ICICES2014. S.A. Engineering college, Chennai, Tamil Nadu, India. ISBN No. 978-1-47993834-6/14
7. Balasubramanian, K., Cellatoglu, A.: Analysis of remote control techniques employed in home
automation and security system
8. Salil, P., Sharath, P., Anil, K.J.: Biometric recognition: security and privacy concerns
9. Tejesvi, S.V., Sravani, P., Mythili, M.L., Jayanthi, K., Nagesh Kumar, P., Balavani, K.: Intellectual bank locker security system. Int. J. Eng. Res. Appl. 6 (2(Part-2)), 31–34 (2016). ISSN:
10. Safa, H., Sakthi Priyanka, N., Vikkashini Gokul Priya, S., Vishnupriya, S., Boobalan, T.: IOT
based Theft premption, and security system. https://doi.org/10.15680/IJIRSET.2016.0503229
11. Rahul, B., Kewal, T., Hiren, K., Sridhar, I.: 3 tier bank vault security. In: Computer Science
2018 International Conference on Smart City and Emerging Technology (ICSCET) (2018)
Brief Analysis on Human Activity
Kaif Jamil, Deependra Rastogi, Prashant Johri, and Munish Sabarwal
1 Introduction
Human activity recognition is emerging as a prominent concept to understand and
develop human interaction with technology as well as computer vision in the broad
sector of computer science research. The core of scientific endeavours like security
insurance, health assistance and human interaction with technology is formed by
it. But the new field is infatuated with problems such as sensor placement, sensor
motion, installing video cameras in monitoring areas, scattered background and the
heterogeneity in ways the actions/activities involving movements and unique gestures
are conducted by us (citation). To tackle the aforementioned challenges, a more
structured approach would be to collect the data and process it, and this data is
collected from sensors worn by the person or object’s body or is built in smartphone
to track gestures and movements of the person. A tri-axial accelerometer which is
a kind of sensor built in smartphones to track the users movement is used for this
K. Jamil (B) · P. Johri · M. Sabarwal
Department of Computer Science and Engineering, Galgotias University, Greater Noida 226001,
e-mail: kaifjamil01@gmail.com
P. Johri
e-mail: johri.prashant@gmail.com
M. Sabarwal
e-mail: mscheckmail@yahoo.com
D. Rastogi
School of Computing Science and Engineering, Galgotias University, Greater Noida, India
e-mail: deependra.rastogi@galgotiasuniversity.edu.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
K. Jamil et al.
1.1 Human Activity Recognition (HAR)
HAR is being the process of ‘assorting sequences of accelerometer data registered by
smartphones into known well-defined manoeuvres’. It can also be said that the process
to understand gestures or motion carried by humans using sensors to understand
and learn the human activities or actions taking place is known as HAR. Our daily
activities can be rationalized and automated if they are identified by any HAR system
taking an example of lights (smart) maybe that apprehend hand movements. The
most important part is that these systems can be supervised as well as unsupervised.
A supervised system cannot function without prior training with allocated data sets,
while the unassisted HAR system can configure with a set of rules during development
A confined explanation of the various uses of HAR in different surroundings is
provided below.
A. Security System
The installation of HAR security system through surveillance first took place at
airports and banks to prevent unlawful activities occurring at social public places.
This pattern of human activity prediction was introduced by Ryoo. The HAR
surveillance system is able to recognize multiple in progress human interactions
at the former stage, the results confirmed. A better system was proposed by
Lasecki et al. named Legion:AR that would provide one of the best deployable,
vigorous, robust activity recognition along with pre-existing systems with realtime activity recognition by utilizing the data sets gathered from the public.
B. Health Assistance and treatment
In the field of medical care, HAR is used in residential areas, hospitals and
rehabilitation centres for multiple reasons like keeping an eye on the activities
of elderly people living being monitored and treated in rehabilitation centres,
disease preclusion and management of severe diseases. In reference to rehabilitation centres, HAR system is particularly helpful to track the activities
of elderly people and monitor physical activities of children with disabilities
(mainly motors), fall detection and monitor patients with dysfunction, disturbing
motion conditions like ASD, patients with slowing of psycho-motor and having
abnormal conditions for patients with cardiac problems. This monitoring will
definitely help in ensuring timely clinical intermediation.
C. Human interaction with technology
This field utilizes HAR in applying it generally in exergaming and gaming such
as Nintendo Wii, Kinect, other games based on motions for people and people
with neurological disabilities. This system describes the human body movements
and basic of that data instructs the execution of required tasks.
Anyone with neurological disorders or lesion can perform a basic movements
to communicate physically (interactions) with these games comfortably. This
gives more control to the surgeons to monitor and make it more easy and handy
for the affected people to perform tasks that they were earlier incapable of.
Brief Analysis on Human Activity Recognition
2 Related Work
In the last decade, a significant progress has been witnessed in HAR. There are
numerous researches and studies which focus on different approaches which identifies human activity and the remarkable effect of it in the real-world context. The
different methods can be classified into the following four categories.
A. Sensor-Based
Chen et al. had given a brief survey of the sensor-based work in HAR, this
arranges the pre-existing research data in mainly two classifications: (a) Datadriven-based versus knowledge-driven-based and (b) vision-based versus sensorbased. Data centric activity recognition techniques is the main focus of this
A survey by Wang et al. brought out how sensors can be used by different deep
learning approach for HAR. This work categories the literature in HAR on the
basis of application area, deep model and sensor modality. The main focus of
this survey was to show how deep model can be used for data processing of
information acquired from sensors.
B. Wearable Device-Based
Lara and Labarador sketched the work performed in HAR using wearable devices
that have sensors. It presents a brief overview of a number of issues related
to design in the system, such as selecting attributes and sensors, protocol and
data collection, energy consumption, processing methodologies and recognition
Cornacchia et al. provided a brief survey and presented the research work
(existing) that describe the activities of interaction which have limited movement and those whose movements are not restricted by the whole body is in
motion. This paper also gives a classification based on the how the sensors are
placed on the body of a human and the type of sensor used.
C. Radio Frequency-Based
A study analysis and presentation of the work that has been researched in the
field where activity recognition is of device-free and radio-based (DFAR) was
presented by Scholz et al. This survey divides the already present work in DFAR
and device-free radio-based localization (DFL).
Amendola et al. proposed a brief study (research) which summarizes the things
that are important and must be used like RFID tech for medical related use in
IoT. This also comprised of RFID tags like passive environmental sensors like
temperature sensor, body centric, etc.
The existing work is categorized by four major groups: (a) Wi-Fi-based, (b)
RFID-based, (c) ZigBee radio-based, (d) radio-based (e.g. microwave). Metrics
like coverage, activity types, accuracy and distribution cost are used by the
authors to compare all these techniques.
K. Jamil et al.
D. Vision-Based
A survey was presented by Vrigkas et al. about the research work that has used
this approach, and based on this, he had done the classification of the literature
into two categories: multi-modal and unimodal methodologies adopted.
Another detailed overview of the research done in the action recognition field
was done by Herath et al. This then is categorized into two major categories:
representation-based solutions and DNN-based solutions.
3 Deep Neural Network (DNN)
Deep learning or DNN is the sub-category of machine learning. A DNN constitutes
various levels defined in a nonlinear working functionalities having many hidden
layers known as neural nets. The main goal is to learn feature hierarchies, where
lower level features help in establishing features that are at higher level of hierarchy.
A deep learning model will be created by employing such network, two of them are
mentioned below:
A. Convolutional Neural Network (CNN)
CNN, a feed forward type of artificial neural network, is largely used in recognition and processing of images and driving important info and analysis of the
same. It performs a set of tasks mainly generative and descriptive using deep
learning, and it often uses computer vision that involves video and image recognition. Convolution and pooling are two major operations performed by CNN.
These are applied along the dimension of time of sensor signals. We will be
using a 1D CNN since in it the kernel slides unidirectionally and since time
series data is classified by HAR.
B. Recurrent Neural Network (RNN)
Unlike other types of established neural network where all the acquired data are
independent of result, in RNN, the results from the preceding steps are given as
the data for further next step analysis and results. In this paper, we will be using
LSTM network. The LSTM design was put into use because of the error flow
in the existing model of RNN, and the analysis was that the long duration lags
were not accessible. The LSTM consists of layers that were having a recurring
connected blocks, and these were known as memory blocks.
4 System Design
The four phases of human activity recognition are shown in Fig. 1. These are:
(a) Sensor selection and Deployment, (b) Collecting data from Sensor, (c) Data
pre-processing and Feature selection and (d) Developing a Classifier to recognize
Brief Analysis on Human Activity Recognition
Fig. 1 Processes in human activity recognition
A. Data Collection
If data needs to collect, smartphones have a tri-axial accelerometer that is used
to collect data. The test users carried smartphones with them in their pockets
thus performing activities that correspond to a rate of sampling of 20 Hz, the
frequency of which was 20. These data were mapped in the tri-axial axes (X,
Y, Z axes) by an accelerometer, which shows the movement of the test user in
different directions like horizontal, upwards, downwards, etc. There two graphs
are shown below (Figs. 2 and 3):
B. Data Pre-processing
It is the process which works to transform captured data in a format so that
machine accepts it and further cater to the algorithm. The data then is processed
as per the norms, it reads the data, and then, each component of accelerometer is
Fig. 2 Data visualization of the data collected by accelerometer
K. Jamil et al.
Fig. 3 Mapping of
frequency of each activity
normalized. Then, the time-sliced representation is done of the processed data,
and this goes to be stored in data-frame. All these data are in text file format.
The deep neural network (DNN) can only work with numerical values, and thus,
we add encoded result set for each activity.
The below graphs are few representations of data records captured by accelerometer of few activities (Figs. 4, 5 and 6):
Fig. 4 Standing
Brief Analysis on Human Activity Recognition
Fig. 5 Walking
Fig. 6 Jogging
C. Training and Test Set
The idea is to train the neural network, and thus, he needs to learn from the test
data and predict the movement of the unknown data set which is totally new to
him. To utilize the data set, further we split them into two part, i.e. training and
K. Jamil et al.
Fig. 7 LSTM architecture
testing in 80/20 parts. Once this is done, we again normalize this data so that it
can be fed further into the neural network.
D. Reshaping of Data and prepare it for ML model
All the collected data are formatted in order to feed them to the neural network.
The dimensions are as follows:
I. One record’s time periods.
II. Sensors used: It is three here.
III. Count of nodes for the o/p layer.
E Training and building the model
Our approach would be to design models: CNN and LSTM with similar data
sets. The first one approach would be to entail convolution layer which is then
followed by pooling layer (max) and another layer (convolution). This gives us
the layer which is well established and would be associated to ‘Softmax’ layer
(Fig. 7).
5 Result
Now, we proceed to towards training both models with the data (training) that was
earlier prepared. For the training, the hyperparameter used is: A batch size of 300
records and the model will be trained for 50 epochs. We will be using the earlier
mentioned splitting of 80/20 split to separate validation and training data.
Next we plot the learning curve for both the models. The figure below shows both
the plotting, the Fig. 8 corresponds to the CNN model, it faced issues while in the
testing phase. The loss in test seems to take a rise after 21 epochs, while the accuracy
Brief Analysis on Human Activity Recognition
Fig. 8 Learning curve
of its test consistency is maintained till 50 epochs. The important and noteworthy
things are that if a model is more accurate its test loss curve should be having a
downward curve which is the case in CNN model.
If we look at the Fig. 9, the LSTM model we can easily make out that this model’s
test loss curve is forming a downward curve with the epochs increasing with time, and
thus, its learning capability is pretty well, and also, its accuracy curve is increasing
at the start and then acquires a consistent course.
Comparing both the models considering various parameters like consistency,
learning curves its very evident that LSTM model is more accurate than the CNN
Refer to the confusion matrix to check the prediction of these models (CNN and
CNN model’s prediction accuracy is represented in Fig. 10, and it turns out to be
Fig. 9 Learning curve
K. Jamil et al.
Fig. 10 Confusion matrix
87%. This model struggles in identification of activities like standing and upstairs
When referring to the confusion matrix of LSTM model in Fig. 11, it turns out
that it is accurate in predicting activity like walking but struggled in identifying few
activities like upstairs and standing. If we look at its diagonal matrix, the accuracy
turns out to be 92%. Both of them (CNN and LSTM) struggled a bit in identifying
the same class of activities.
Fig. 11 Confusion matrix
Brief Analysis on Human Activity Recognition
6 Conclusion
Finally, the conclusion came out to be that the LSTM model is more accurate than
the CNN model (LSTM accuracy rate: 92%, CNN accuracy rate: 87%). What we see
around is that in the real-world every individual movements/gestures have unique
data in it which can be used to train and feed into these models so as to help develop
efficient system that can contribute towards helping the society. Our aim was to study
these models and compare them on some informational set of data, it comes out that
LSTM is more accurate but both of them can if tuned more vigorously yield more
accurate data and perform better. For future, there is a plan of creating a hybrid model
combining two or more DNN so that it can help in increasing the abstraction level
which would help in better understanding and functioning of the systems. Altogether,
it can revolutionize the boundaries of human resources to tackle situation that were
not addressed to its utmost potential.
1. Huynh, T.G.: Human Activity Recognition with Wearable Sensors. Technische Universität
Darmstadt (2008)
2. Lawrence, C., Sax, K. F.N., Qiao, M.: Interactive games to improve quality of life for the
elderly: Towards integration into a WSN monitoring system. In: 2010 Second International
Conference on 112.
3. Chen, L., Hoey, J., Nugent, C.D., Cook, D.J., Yu, Z.: Sensorbased activity recognition. IEEE
Trans. Syst., Man, Cybern. Part C (Appl. Rev.) 42(6), 790–808 (2012)
4. Lasecki, W.S., Song, Y.C., Kautz, H., Bigham, J.P.: Real-time crowd labeling for deployable
activity recognition. In: Proceedings of the 2013 conference on 1203
5. Wang, J., Chen, Y., Hao, S., Peng, X., Hu, L.: Deep learning for sensor-based activity
recognition: a survey (2017). arXiv preprint arXiv:1707.03502
6. Cornacchia, M., Ozcan, K., Zheng, Y., Velipasalar, S.: A survey on activity detection and
classification using wearable sensors. IEEE Sens. J. 17(2), 386–403 (2017)
7. Amendola, S., Lodato, R., Manzari, S., Occhiuzzi, C., Marrocco, G.: Rfid technology for
iot-based personal healthcare in smart spaces. IEEE Internet Things J. 1(2), 144–152 (2014)
8. Chen, L., Nugent, C.D., Wang, H.: A knowledge-driven approach to activity recognition in
smart homes (2012)
9. Jalal, A., Uddin, Z., Kim, J.T., Kim, T.: Recognition of human home activities via depth
silhouettes and â transformation for smart homes, pp. 467–475 (2011)
10. Han, J., Shao, L., Xu, D., Shotton, J.: Enhanced computer vision with Microsoft Kinect sensor:
a review. IEEE Trans. Cybern. 43(5), 1318–1334 (2013)
11. Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming
videos. In: 2011 Iccv, pp. 1036–1043 (2011)
12. Lange, C.-Y., Chang, E., Suma, B., Newman, A.S.R., Bolas, M.: Development and evaluation
of low cost game-based balance rehabilitation tool using the Microsoft Kinect sensor. In:
Conference on Proceedings IEEE Engineering in Medicine and Biology Society, vol. 2011,
pp. 1831–1834 (2011)
13. Yoshimitsu, K., Muragaki, Y., Maruyama, T., Yamato, M., Iseki, H.: Development and
initial clinical testing of ‘OPECT’: an innovative device for fully intangible control of the
intraoperative image-displaying monitor by the surgeon. Neurosurgery 10 (2014)
K. Jamil et al.
14. Banos, O., Damas, M., Pomares, H., Prieto, A., Rojas, I.: Daily living activity recognition based
on statistical feature quality group selection. Expert Syst. Appl. 39(9), 8013–8021 (2012)
15. Herath, S., Harandi, M., Porikli, F.: Going deeper into action recognition: a survey. Image Vis.
Comput. 60, 4–21 (2017)
16. Scholz, M., Sigg, S., Schmidtke, H.R., Beigl, M.: Challenges for device-free radio-based
activity recognition. In: Workshop on Context Systems, Design, Evaluation and Optimisation,
Conference Proceedings (2011)
17. Wang, S., Zhou, G.: A review on radio based activity recognition. Dig. Commun. Netw. 1(1),
20–29 (2015)
18. Vrigkas, M., Nikou, C., Kakadiaris, I.A.: A review of human activity recognition methods.
Front. Robot. AI 2, 28 (2015)
19. Zhang, Z.: Microsoft kinect sensor and its effect. IEEE Multimed. 19(2), 4–10 (2012)
Lora-Based Smart Garbbage Alert
Monitoring System Using ATMEGA 328,
2560, 128
Anzar Ahmad and Shashi Shekhar
1 Introduction
The flooding of the trash canisters is exceptionally regular in India, yet this will
affect our general public, our environmental factors. It will harm the natural qualities
that lead to cause the contamination alongside the medical problems for human and
different creatures too.
We proposed an IOT-based cost profitable rubbish watching structure which will
screen and prepared when the garbage level crosses the cutoff level of the refuse
container. This strategy will be finished with the help of sensors, microcontroller,
and ESP8266. It will similarly offer them to get to our structure from a noteworthy
separation by using a WiFi repeat nearby the advised texts, Facebook alert, email
caution to. This will reduce the human undertakings, moreover decreases the fuel
use. We are living during a period where assignments and structures are joining with
the power of IOT to have an inexorably capable plan of working and to execute
occupations quickly! With all the power promptly accessible, this is what we have
composed. The Internet of things (IoT) will have the alternative to join direct and
immaculately a tremendous number of different structures, while offering data to a
large number of people to use and endorse. Building a general structure for the IoT
is hence an amazing undertaking, mainly because of the inconceivably tremendous
collection of contraptions, interface layer advances, and organizations that may be
related with such a system. One of the rule stresses with our condition has been solid
waste organization which impacts the prosperity and state of our overall population.
The disclosure, checking, and the officials of wastes are one of the basic issues of
A. Ahmad (B) · S. Shekhar
Graphic Era Deemed To Be University, Dehradun, Uttrakhand, India
e-mail: anz.hmd@gmail.com
S. Shekhar
e-mail: shashishekhar618@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
A. Ahmad and S. Shekhar
the present time. The standard strategy for truly watching the misfortunes in waste
containers is a massive system and uses progressively human effort, time and cost
which can without a lot of a stretch be kept up a key good ways from with our present
advances. This is our answer, a procedure where waste the board is modernized. This
is our IoT garbage monitoring structure, a creative way that will help with keeping
the urban zones perfect and sound. Follow on to see how you could have an impact to
help clean your area, home or even ecological variables, making us a step increasingly
like an unrivaled technique for living:
2 Literature Review
Chaware et al. [1] present garbage monitoring framework, which screens the
trash receptacles and illuminates about the degree of trash gathered in the trash
canisters by means of a website page. Figure 1 shows the system architecture, wherein framework utilizes ultrasonic sensors set over the containers
to recognize the trash level and contrast it, and the trash receptacles profundity. The proposed framework utilizes Arduino family microcontroller (the
LPC2131/32/34//38 microcontrollers depend on a 16/32-piece ARM7TDMI-S
CPU with constant imitating), LCD screen, WiFi modem(the ESP8266 underpins APSD for VoIP applications and Bluetooth concurrence interface) for
sending information and a signal, GSM (used to send message to the trash
warehouse if the garbage can surpasses the set edge level) ultrasonic (sensor
conveys a high-recurrence sound heartbeat and afterward times to what extent
it takes for the reverberation of the sound to reflect back).
II. RFID innovation is utilized for assortment of information with respect to trash
compartment. RFID label identified inside the recurrence extends and when any
label goes to the scope of RFID peruser, it consequently peruses information
from RFID peruser; at that point, channels gathered information and organizes
it into explicit designed SMS. From that point forward, the information is sent
Fig. 1 Overflowing of
garbage bins
Lora-Based Smart Garbbage Alert Monitoring System …
to focal server and sends the data to the web server just as approved individual’s
cell phone.
III. This paper proposed a technique as follows. The degree of trash in the container
is recognized by utilizing the ultrasonic sensor and imparts to control room
utilizing GSM framework. Four IR sensors are utilized to distinguish the degree
of the trash receptacle. At the point when the container is full, the yield of the
fourth IR is dynamic low, and this yield is given to microcontroller to make
an impression on control room through GSM. In this paper, ZigBee, GSM,
and ARM7 controller are utilized to screen the trash receptacle level. At the
point when trash container is full, this message of trash level is sentto ARM7
controller. At that point, ARM7 will send the SMS through GSM to power with
regard to which receptacle is flooding and requires tidying up.
IV. The paper proposed strategy as ultrasonic sensors are utilized to detect the degree
of canister and burden cell is utilized as an optional sensor. In the event that the
level sensors are falling flat; at that point, load cell can be utilized as a kind of
perspective. At the point when the container is full, GSM sends the message to
the server room. This message contains the arrangement of the container which
is given by GPS module. The microcontroller gets the contribution from GSM
and performs signal handling. Microcontroller conveys to GSM by utilizing
V. In the paper, the framework is planned so that it stays away from flood of the
receptacle by sending an alarm. It utilizes Arduino Uno R3 as a microcontroller
for perusing information from sensors. This innovation for the most part utilizes
RFID peruser which is interfaced with a microcontroller for the confirmation
process. When RFID label interferes with the RFID peruser, the sensor will
check the status of the receptacle and sends it to the web server.
3 Existing System
In the current framework, the trash is gathered by the region workers on the planned
routine premise, for example, week by week or 2–3 times inside the months. As
we see commonly that trash canisters are set in the open places in the urban areas
are flooding because of increment in the waste regular. Because of this, the trash
psychologists and produces the awful stench which will in general reason the air
contamination and spread infections. That can make the damage human wellbeing.
In this manner, cleaning is the huge issue. Additionally finding the way of trash
canister is one of the undertaking extraordinarily for new driver. According to stay
away from such conditions, we have planned the improved framework.
A. Ahmad and S. Shekhar
4 Proposed System
In our proposed framework, which is the IOT-based savvy trash checking framework
alongside the ESP8266, there is the constant observing with cautioning office. Prior
frameworks which were configuration were not cost proficient; likewise, they are
cumbersome in size, as they were utilizing Raspberry Pi module, GSM module,
additionally some utilizing GPS radio wire, and so forth. Here in our motivation
framework, we have expelled all the equipment part to lessen the size of hardware
and this will likewise diminish the expense of the framework and we are utilizing
LORA sensor to make our hardware progressively solid. Also, we are utilizing solar
board here for power flexibly with the battery reinforcement for shady circumstances.
5 Working
IOT-based keen trash checking framework utilizing ESP8266 is straightforward and
ongoing. Fundamentally, the procedure begins from the trash receptacle. IR sensors
are fixed on the each degree of the trash container. Here we are taking the 5° of the
trash receptacle for our undertaking exhibit. We are giving the novel ID to every trash
canister. Additionally, we are choosing the edge level for cautioning reason. Trash
level is detected by the IR sensors. When the trash in the trash canister crosses the
limit level, the alarming instant message will get gave to the concerned individual
or in the district office. This message contains the trash container ID alongside the
WiFi module. This WiFi module will assist with finding that trash canister is full or
void, and we are additionally to send the information to closest district branch. This
is useful particularly for new drivers of that district vehicle (Fig. 2).
Block diagram shows the working of the framework. Essentially there are five
primary pieces of the entire framework. Force gracefully part, detecting part, handling
part, transferring to the server/cloud, and the cautioning part. IR sensors detect the
trash level and as needs be imparts the signs to the ATMEGA328 microcontroller.
Likewise the GPS coordinates of the trash receptacle are given to the microcontroller.
Fig. 2 Working block diagram of garbage monitoring system
Lora-Based Smart Garbbage Alert Monitoring System …
Fig. 3 Garbage monitoring system
ATMEGA328 process the got signal and passed further to the ESP8266. ESP8266
is a WiFi module which is additionally filling in as a transmitter in our framework.
ESP8266 assumes significant job in diminishing the equipment of the framework. It
replaces the Raspberry Pi module. As our framework is IOT based, the alarming will
get occurring with the assistance of IOT. Because of this, GPS module is expelled.
The cautioning message with the WiFi module has no compelling reason to utilize
the GPS radio wire since we can take care of the coordinates of the trash receptacle
in the programming part as the situation of the trash container is fixed. Thus, when
trash crosses the edge level, the cautioning message will get constantly send until the
trash in the trash container is expelled by the concerned individual. Thus, our entire
framework will work. For the force flexibly, we are utilizing the sun-oriented board
here alongside the battery reinforcement (Fig. 3).
6 Cloud Database
Cloud stages permit clients to buy virtual machine cases temporarily, and one can run
a database on such virtual machines. Clients can either transfer their own machine
picture with a database introduced on it or utilize instant machine pictures that as
of now incorporate an enhanced establishment of a database. With a database as an
assistance model, application proprietors do not need to introduce and keep up the
database themselves. Rather, the database specialist organization assumes liability
for introducing and keeping up the database, and application proprietors are charged
by their use of the administration.
• Most database administrations offer online consoles, which the end client can use
to arrangement and design database occasions.
• Database administrations comprise of a database-director segment, which controls
the hidden database occasions utilizing an assistance API. The administration API
is presented to the end client, and grants clients to perform upkeep and scaling
procedure on their database occasions.
A. Ahmad and S. Shekhar
Table 1 Simulated output
Time stamp
Trash (high/low)
2020-03-12, 19:09:43
2020-03-12, 20:09:44
2020-04-12, 05:09:46
2020-05-12, 12:09:52
• Underlying programming stack ordinarily incorporates the working framework, the database and outsider programming used to deal with the database.
The specialist co-op is liable for introducing, fixing, and refreshing the basic
programming stack and guaranteeing the general wellbeing and execution of the
• Scalability highlights contrast between merchants—some offer auto-scaling,
others empower the client to scale up utilizing an API, yet do not scale
• There is regularly a responsibility for a specific degree of high accessibility (e.g.,
99.9% or 99.99%). This is accomplished by duplicating information and bombing
examples over to other database cases.
7 Simulated Output of the System on the Cloud
See Table 1.
8 Components Required
Hardware Requirements
ATMEGA 328,128, 2560
ESP8266 WiFi Module
HC-SR04 Ultrasonic/IR Sensor
Crystal Oscillator
Cables and Connectors
PCB and Breadboards
Lora-Based Smart Garbbage Alert Monitoring System …
Push Buttons
IC Sockets.
Software Requirements
• Arduino Compiler
• IOT Gecko.
9 IR Sensor
An infrared (IR) sensor is an electronic gadget that measures and distinguishes
infrared radiation in its general condition. Infrared radiation was incidentally found
by a stargazer named William Herchel in 1800. While estimating the temperature of
each shade of light (isolated by a crystal), he saw that the temperature just past the
red light was most noteworthy. IR is undetectable to the natural eye, as its frequency
is longer than that of obvious light (however it is still on the equivalent electromagnetic range). Anything that discharges heat (everything that has a temperature above
around five degrees Kelvin) emits infrared radiation. There are two sorts of infrared
sensors: dynamic and inactive. Dynamic infrared sensors both emanate and distinguish infrared radiation. Dynamic IR sensors have two sections: a light transmitting
diode (LED) and a recipient. At the point when an item approaches the sensor, the
infrared light from the LED reflects off of the article and is identified by the recipient.
Dynamic IR sensors go about as closeness sensors, and they are regularly utilized in
deterrent recognition frameworks, (e.g., in robots). The IR sensor module comprises
for the most part of the IR Transmitter and Receiver, Op-amp, Variable Resistor
(Trimmer pot), yield LED in a nutshell. IR LED emanates light, in the scope of
Infrared recurrence. IR light is imperceptible to us as its frequency (700 nm–1 mm)
is a lot higher than the obvious light range (Fig. 4).
Fig. 4 IR sensor
A. Ahmad and S. Shekhar
10 LoRaWAN
LoRa is a technique for transmitting radio signals that utilize a tweeted, multi-image
configuration to encode data. It is an exclusive framework made by chip producer
Semtech; its LoRa IP is likewise authorized to other chip makers. Basically, these
chips are standard ISM band radio chips that can utilize LoRa (or other balance
types like FSK) to change over radio recurrence to bits, with no compelling reason
to compose code to actualize the radio framework. LoRa is a lower-level physical
layer innovation that can be utilized in a wide range of uses outside of wide zone.
LoRaWAN is a point-to-multipoint organizing convention that utilizes Semtech’s
LoRa adjustment conspire. It is not just about the radio waves; it is about how
the radio waves speak with LoRaWAN entryways to do things like encryption and
recognizable proof. It additionally incorporates a cloud part, which various portals
associate with. LoRaWAN is once in a while utilized for mechanical (private system)
applications because of its confinements.
LoRaWAN has three classes that work at the same time. Class An is absolutely
offbeat, which is the thing that we call an unadulterated ALOHA framework. This
implies the end hubs do not trust that a specific time will address the passage—they
essentially transmit at whatever point they have to and lie torpid up to that point. In
the event that you have an impeccably planned framework more than eight channels,
you could occupy each schedule opening with a message. When one hub finishes its
transmission, another beginnings right away. With no holes in correspondence, the
hypothetical most extreme limit of an unadulterated salaam arrange is about 18.4%
of this greatest. This is expected to a great extent to impacts, in such a case that
one hub is transmitting and another awakens and chooses to transmit in a similar
recurrence channel with a similar radio settings, they will impact.
Class B taken into account messages to be sent down to battery-controlled hubs.
At regular intervals, the door transmits a reference point. (See the availabilities over
the highest point of the graph.) All LoRaWAN base stations transmit guide messages
at precisely the same time, as they are slave to one heartbeat for every second (1PPS).
This implies each gp satellite in circle transmits a message toward the start of consistently, permitting time to be synchronized the world over. All Class B hubs are
alloted a schedule opening inside the 128 s cycle and are advised when to tune in.
You can, for example, advise a hub to listen each tenth schedule opening, and when
this comes up, it takes into consideration a downlink message to be transmitted (see
above graph).
Class C permits hubs to listen continually and a downlink message can be
sent whenever. This is utilized fundamentally for AC-controlled applications, since
it takes a great deal of vitality to keep a hub effectively conscious running the
beneficiary consistently.
Lora-Based Smart Garbbage Alert Monitoring System …
11 Principle of Operation
This task IOT garbage monitoring framework is a creative framework which will
assist with keeping the urban areas clean. This framework screens the trash receptacles and educates about the degree of trash gathered in the trash containers by
means of a site page. For this, the framework utilizes ultrasonic sensors put over the
containers to identify the trash level and contrast it and the trash canisters profundity. The framework utilizes AVR family microcontroller, LCD screen, WiFi modem
for sending information and a signal. The framework is controlled by a 12 V transformer. The LCD screen is utilized to show the status of the degree of trash gathered
in the containers, though a site page is worked to demonstrate the status to the client
observing it. The page gives a graphical perspective on the trash receptacles and
features the trash gathered in shading so as to show the degree of trash gathered. The
LCD screen shows the status of the trash level. The framework puts on the ringer
when the degree of trash gathered crosses as far as possible. Along these lines, this
framework assists with keeping the city clean by illuminating about the trash levels
of the containers by giving graphical picture of the receptacles by means of IOT
Gecko web advancement stage (Fig. 5).
Fig. 5 Circuit diagram to establish a system of garbage monitoring system
A. Ahmad and S. Shekhar
12 Conclusion
This paper introduced the IOT-based smart garbage monitoring system using
ESP8266 with the GPS link. It will provide the improved efficient solution to the
waste management issue over the previous systems. This will responsible to reduce
the health-related issues and putted the best example for real-time garbage management system. In papers, we studied the various technologies for garbage collection
and management process. Various technologies are LORA(long range), IoT, etc.
This smart garbage monitoring System designs will be very beneficial to our societies, economics development as the fuel, cost, transport system will be reduced. The
system is efficient as it reduced human effort.
1. Int. J. Res. Sci. Eng. 3(2) (2017). e-ISSN: 2394-8299
2. Zanella, A., Bui, N., Castellani, A., Vengelista, L., Zorzi, M.: Internate of things for smart
cities. IEEE Internet Things J. 1(1) (2014)
3. Mahajan, K., Chitode, J.S.: Waste bin monitoring system using integrated technologies. Int. J.
Innov. Res. Sci., Eng. Technol. (An ISO 3297: 2007 Certified Organization) 3(7) (2014)
4. Int. J. Recent Innov. Trends Comput. Commun. 5(2) (2017). ISSN: 2321-8169
5. Bloor, R.: What is a cloud database? Retrieved 25th Nov 2012 from https://www.algebraix
data.com/wordpress/wp-content/uploads/2010/01/AlgebraixWP2011v06.pdf (2011)
6. Curino, C., Madden, S., et.al.: Relational Cloud: A Database as a Service for the Cloud.
Retrieved 24th Nov 2012 from https://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper33.pdf
7. Finley, K.: 7 Cloud-Based Database Services. Retrieved 23rd Nov 2012 from https://readwr
ite.com/2011/01/12/7-cloud-based-database-service (2011)
8. Hacigumus, H., Iyer, B., Mehrotra, S.: Ensuring the Integrity of Encrypted Databases in the
Database-as-a-Service Model. Retrieved 24th Nov 2012 from https://link.springer.com/cha
pter/10.1007%2F1-4020-8070-0_5?LI=true (2004)
9. Santos, A., Macedo, J., Costa, A., Nicolau, M.J.: Web of things and keen items for M-wellbeing
observing and control. Procedia Innov. 16, 1351–1360 ((2014)). Kumar, N.S., Vuayalakshmi,
B., Prarthana, R.J., Shankar, A.: IOT based brilliant trash ready framework utilizing Arduino
UNO. In: 2016 IEEE District 10 Meeting (TENCON) (pp. 1028–1034). IEEE (2016)
10. Sedra, Smith: Microelectronic circuits, 5th edn. New York (2004)
11. Ma, Y.-W., Chen, J.-L.: Toward intelligent agriculture service platform with lora-based wireless
sensor network. In: Proceedings of the 4th IEEE International Conference on Applied System
Innovation (ICASI), Chiba, Japan, 13–17 Apr 2018, pp. 204–207
12. Pies, M., Hajovsky, R.: Monitoring environmental variables through intelligent lamps. In:
Mobile and Wireless Technologies. Springer: Singapore, pp. 148–156 (2018)
Pre-birth Prognostication of Education
and Learning of a Fetus While
in the Uterus of the Mother Using
Machine Learning
Harsh Nagesh Mehta and Jayshree Ghorpade Aher
1 Introduction
Education is the process of achieving knowledge, values, skills, beliefs and moral
habits. People must be ingrained with good-quality education to be able to match up
with this competitive world. Education is axe for any country and it is understood
how governments are applying different strategies for creating progressive modern
society by building human capability and reducing inter-generational disadvantage.
Career education and guidance play an important role in curriculum that supports
student’s interests, strengths and aspirations thus helping students making informed
decisions about their subject choices and pathways. Strategies have already been
developed by the government to resolve these issues, but still newer ways to resolve
these issues are studied. In recent epoch, prenatal education by laying pre-birth
prognostication of fetus in the uterus of the mother has been provoked. Researches
in recent time have shown how foundation of the future health, learning and behavior
of a baby while in the uterus of the mother is laid through children’s experiences
while in the womb with their family, community and early learning environments.
The hierarchical process of development of fetus’s brain before birth is highly
impacted by the child’s early experience through prenatal education. It has been
observed that the grasping power of a fetus in mother’s uterus is superior compared
to a born child. Researchers have found out the delicate stages in a human brain
when maximum development of the brain is possible and have proved most receptive
stages occur during the pregnancy, i.e., when the child is in the womb. These delicate
sensitive spell of 9 months can be an opportunity to boost baby’s development and
H. N. Mehta (B) · J. G. Aher
SCET, MITWPU, Kothrud, Pune, Maharashtra 411038, India
e-mail: harshnageshmehta@gmail.com
J. G. Aher
e-mail: jayshree.aher@mitwpu.edu.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
H. N. Mehta and J. G. Aher
their readiness for future school learning. All the three aspects, i.e., health, values
and creativity, go in conjunction in developing the child’s mind. All the skills set,
values and qualities which are acquired during pregnancy from the mother can be
used to anticipate which education stream the child would be likely suitable to opt for
in future. This prediction can be achieved with the help of decision tree classification
algorithm of machine learning. Decision tree from the collected information can
easily get the most predicted outcome given a particular input. Thus, prediction can
be made upon what the possible future of a baby inside the womb will be using
decision tree algorithm.
2 Why Subsidizing in Early Brain Development?
2.1 Literature Survey
Humans are inbred to learning, and humans are observed to evolve through ceaseless
learning right from its existence. Alongside to this, in current axe, transition has
shifted to education and learning even before birth, and extensive emphasis is now
paid on experimenting on prenatal education by providing pre-birth interventions for
fetus inside the uterus of the mother [1].
How a mother reacts and interacts with the baby during her pregnancy may have
implications on the baby’s development. It was found by University of Cambridge
[2] that mothers who “connect” with their baby during pregnancy are more likely
to interact in a more positive way with their infant after it is born. A good foundation for baby’s development, future education and learning can be provided by
creating a couraging, enhancing, caring environment replenish with pacific and warm
interaction. Shelina Bhamani [3] has addressed development of the baby’s brain in
It is predicted by scientists working in the field of child development that when a
baby is in uterus of the mother, the baby is able to hear and remember certain sounds
and he or she is easily able to recall it even after his/her birth [4]. Among the sounds
of a pregnant mother’s beating heart, breathing and blood coursing through veins,
the baby is able to hear scuffed noise from the external environment as well [5].
All sense organs of the fetus begin to grow up in prenatal period. Human ear starts
growing after 10 weeks of pregnancy. Using doppler ultrasound signals sensitivity to
maternal voice, Tastan and Hardalac [6] proved that fetus has a learning process with
experience in uterus. According to the studies, it has been proved that the grasping
power of a fetus in mother’s uterus is superior compared to a born child. Also, as the
age rises, the development of the human brain gradually stops. Growth of the brain is
most during the duration of pre-birth to 5 years with maximum being in fetus. Apart
from the various movements of the mother, maternal age plays as significant role in
the overall development of the fetus [7].
Pre-birth Prognostication of Education and Learning of a Fetus …
Fig. 1 Three aspects of prenatal education
2.2 The Three Key Aspects What Baby Learns
Prenatal development is the evolution of the human body before the birth, and it is
also known as antenatal development. Prenatal education is the process in which the
expectant parents undergo a series of activities in order to interact positively with
the baby. While for baby’s prenatal education is the teaching the baby receives from
their families that helps them to develop [8].
All these three aspects—health, values and creativity go in conjunction in developing the child’s mind as seen in Fig. 1. All the skills set, values and qualities which
are acquired during pregnancy from the mother can be used to anticipate which education stream the child would be likely suitable to opt for in future. Values are the way
in which one is distinguished on basis of things, people, action or maybe situation.
A mother having values transmits them to the child through her actions, while health
is defined as the overall well-being of a human, activities such as maintaining the
diet, yoga, spa, etc. Creativity is doing things which turn your imaginative ideas into
reality. Language, art, music, mathematics, leadership are the some creativity aspects
that help baby’s development toward education and learning. For pre-birth prognostication of education and learning of a fetus, we need to consider the creativity aspect
of development as it is directly involved in the technical brain development.
2.3 How Early Staged Career Guidance Can Shape Future
Half of the college drop-outs globally are due to financial condition or due to academic
disqualification. Academic disqualification occurs majorly when there is lack of
interest toward the subject. Career education and guidance play an important role in
curriculum that supports student’s interests, strengths and aspirations thus helping
students making informed decisions about their subject choices and pathways [9].
Predicting what the baby can actually do with all the experiences he received can
be used to predict what the future stream of education and learning of the baby will
be. Different streams of education, i.e., science, commerce, arts, require different
skill set which a baby can acquire from their mothers. Creative aspect of prenatal
education can be viewed as set of basic skill sets required in the different fields
of studies. As the baby receives these creative aspects and skills from the mother,
we can keep track of it and thus help in providing a newer insight into the field of
H. N. Mehta and J. G. Aher
career guidance. This prediction can be achieved with the help of different prediction
algorithm of machine learning. Machine learning technique like decision tree is most
suitable for our pre-birth prognostication of future education and learning.
3 How the Future Learning Can Be Predicted
In order to provide career guidance for the baby, creativity aspect needs to be taken
into account. Creativity is doing things which turn your imaginative ideas into reality
(Table 1).
In order to build various creativity aspects, tasks focused on very individual aspect
must be performed. If we analyse the career guidance predication strategy, it is
keening observed that all the creativity aspects used in developing fetus brain are
the key skills set required to predict the optimal education stream for the fetus, As
most of the countries around the world follow three different streams of education,
namely science, commerce and arts, the career guidance strategy in this paper also
revolves around these streams for prediction. Each field requires a person to have
different mindset and skills in order to opt for it. Person tending to satisfy those skills
are likely to have a greater liking toward the stream and are likely to succeed in it
too. Decision tree classification algorithm can prove useful for us in this scenario as
its easy for interpretation and provides higher accuracy most of the times.
3.1 Applying Decision Tree to Predict Educational Stream
Decision tree is a tree-like structure wherein the internal nodes of the tree structure are
the test attributes and the leaf nodes are the class labels. By investigating the various
Table 1 Demand of different
skill set to succeed in the
different field
Logical thinking
Imaginative thinking Language
Planning and
problem solving
Social interaction
Stability of mind
Interpretation of data Music
Researching skills
Evaluation skills
Pre-birth Prognostication of Education and Learning of a Fetus …
Table 2 Various attributes and class label for decision tree prediction for future education stream
Age, Cognition, Logical Thinking, Planning and Problem Solving, Stability of
Mind, Observation, Researching Skills, Persistence, Mathematics, Imaginative
thinking, Social Interaction, Interpretation of data, Comprehension, Evaluation
Skills, Self-Confidence, Language, Art, Music, Creativity, Discipline
Class label
Likely Stream (Science, Commerce and Arts)
activities and tasks done by the approximately 100 women’s during the pregnancy,
an overall data was collected. In this data keeping in mind the future learning of the
baby, all the creativity aspects tasks and activities of the mother were noted down.
After investigating the data, for the prediction purpose, it has been segregated into
attributes and class label as can see be seen from Table 2.
In order to find the predicted future education stream in this paper, decision tree
algorithm was applied after splitting the data into train and test and the training data
is passed to DecisionTree function which performs the construction of the model
using.fit() function. In the model usage phase of the we make use of.predict() function
on the training data of attributes and thus compare it with the training class label data
to completed the model and come up with the accuracy.
From Fig. 2, we can easily interpret how decision tree can be used to predict the
future of the baby’s education and learning by investigating the train data and testing
on the test data, thereby making prediction unseen randomly inputted data. Through
decision tree model, accuracy of 93.33% was found out thus helping us to know the
field the baby is currently intended toward by analyzing the creativity aspect during
3.2 Merits and Demerits of This Predication
It has seen how decision tree model worked to predict the data. With the help of this
model, we can conclude some benefits of the model
• Predict the effect of various tasks and activities performed by the mother during
pregnancy on the child’s brain.
• Reduce the drop-outs of students from college because of the issue of academics
disqualification by guiding the child for his future through prenatal career
While this subject of study can be the future of career guidance, it comes along
with some demerits; some of them being
• One of the major demerits being that we cannot exactly predict what the child
would want to do after few years as their learning interest can change.
• Apart from this, there are many factors like older aged mother, premature birth,
alcohol and smoking of mother which can affect the brain’s development even
H. N. Mehta and J. G. Aher
Fig. 2 Results achieved from decision tree to predict the future stream
if they various activities are performed by mother and thus making it difficult to
predict the future education stream.
4 Conclusion
Thus, by understanding and analysis of what a mother does in 9 months of pregnancy,
we can predict its effects on child’s brain and how it can be helped to predict the
future education and learning of the child.
Pre-birth Prognostication of Education and Learning of a Fetus …
1. Kleindorfer, S., Robertson, J.: Learning before birth. Australasian Science 34(9), 27–32 (2013)
2. University of Cambridge: Mother’s attitude towards baby during pregnancy may have implications for child’s development (2018)
3. Bhamani, S.: Educating Before Birth via Talking to the Baby in the Womb. J. Educ. Educ. Dev.
4, 368. https://doi.org/10.22555/joeed.v4i2.1736
4. Alvarez-Buylla, A., et al.: Birth of projection neurons in the higher vocal center of the canary
forebrain before, during, and after song learning. Proc. Natl. Acad. Sci. U.S.A. 85(22), 8722–
8726 (1988). https://doi.org/10.1073/pnas.85.22.8722
5. Reissland, N. et al.: Do facial expressions develop before birth? PloS One 6(8), e24081. https://
6. Tastan, A., Hardalac, N., Kavak, S.B., Hardalaç, F.: Detection of fetal reactions to maternal voice
using doppler ultrasound signals. In: 2018 International Conference on Artificial Intelligence
and Data Processing, 2(5), 99–110 (2016)
7. Standford Children Care—“Risks of Pregnancy Over Age 30”
8. Svensson, J. et al.: Effective antenatal education: strategies recommended by expectant and new
parents. J. Perinatal Educ. 17(4), 33–42 (2008). https://doi.org/10.1624/105812408X364152
9. Angra, S., Ahuja, S.: Machine learning and its applications: a review. In: 2017 International
Conference on Big Data Analytics and Computational Intelligence (ICBDAC), Chirala, pp. 57–
60 (2017). https://doi.org/10.1109/ICBDACI.2017.8070809
Performance Analysis of Single-Stage PV
Connected Three-Phase Grid System
Under Steady State and Dynamic
V. Narasimhulu and K. Jithendra Gowd
1 Introduction
Photovoltaic systems have become an energy generator for a wide range of applications. The applications could be standalone PV systems or grid-connected PV
systems. A standalone PV system is used in isolated applications, where a PV system
that is connected through a grid is used when a PV system injects the current directly
into the grid itself. The advantage of the grid-connected system is the ability to
sell excess of energy. In response to global concerns regarding the production and
deliverance of electrical power, photovoltaic (PV) technologies are attracted toward
continue and improving living standards without environmental effect. Conventionally, two-stage PV grid-connected systems to converter dc to ac power. Two stage
PV systems required both boost converter and inverter for power conversion. It leads
to cost and complexity of the system. To overcome this, single-stage PV conversion
system is used, and the cost and complexity of the system are reduced by eliminating
boost converter in this system. To extract maximum power from the PV system [1], a
robust controller is required to ensure maximum power point tracking (MPPT) [1–3]
and deliver it to the grid through the use of an inverter [4–6]. In a grid-connected
PV system, control objectives are met by using a pulse width modulation (PWM)
scheme based on two cascaded control loops [7]. The current loop is also responsible for maintaining power quality (PQ) and for current protection that has harmonic
compensation. Linear controllers are widely used to operate PV systems at MPP [8–
13]; however, most of these controllers do not account for the uncertainties in the
V. Narasimhulu (B)
EEE Department, RGM College of Engineering and Technology (Autonomous), Nandyal, Andhra
Pradesh, India
e-mail: narasimhapid@gmail.com
K. Jithendra Gowd
EEE Department, JNTUA CEA, Anantapuramu, Andhra Pradesh, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
V. Narasimhulu and K. Jithendra Gowd
PV system. The voltage dynamics of the dc link capacitor include non-linearities
due to the switching actions of the inverter. The inclusion of these nonlinearities
will improve the accuracy of the PV system model; however, the grid-connected PV
system will be partially rather than exactly linearized as shown in [14]. Although the
approaches presented in [15–17] ensure the MPP operation of the PV system, they
do not account for inherent uncertainties in the system as well as the dynamics of the
output LCL filter. The M2C-based single-stage PV conversion system is proposed.
The PV system and M2C operating principle are presented in Sect. 2. The simulation result for validation of proposed topology is presented in Sect. 3. Finally, the
conclusions were made in Sect. 4.
2 Photovoltaic System
There are two stages of power conversion in the two-stage PV system but only
one stage conversion in single-stage PV system. Bck diagram of single-stage PV
conversion system is shown in Fig. 1. It consists of PV panel, inverter, LCL filter,
and PWM along with MPPT technique. The three-phase voltage signal is sent to
the PLL block to track the frequency under different operating conditions. The grid
voltage, current signals, and generated PV voltage and current signals are sent to
the MPPT controller to track maximum power and to generate the reference signal
for PWM controller. Here, sinusoidal PWM is used to generate the gate pulses for
inverter due to its simplicity.
2.1 PV System
The PV system is modeled using [18, 19] Eq. (1).
PV System
LCL Filter
Fig. 1 Block diagram of single-stage PV system
Performance Analysis of Single-Stage PV Connected Three-Phase …
R i
Rse i pv
α pv + se pv
− Is e Nse Np − Is
i pv = NP IL −
Rp Nse
Rp N P
where ipv and vpv are the output current and voltage of the PV system. The Rp and
Rse are the parallel and series resistance of the PV panel. I L and I s are the sun light
produced current and solar cell saturation current, respectively. N p and N se are the
number of parallel and series connected cells.
2.2 Power Electronic Converter
In this paper, the M2C is adopted to analyze the level of THD in output voltage due to
less conduction losses and its simplicity in modeling. One possible structure of M2C
[20, 21] is shown in Fig. 2. For higher power applications, i.e., commercial or industrial applications, a three-phase PV power conditioning system is preferable. Submodules of SM1, SM3, and SM5 are the upper or positive side connected modules.
The SM2, SM4, and SM6 are the bottom or negative side connected modules. Submodules are used in this application to convert voltage from DC to AC. The type
of inverter to be used in the power conditioning unit for this study was selected to
be three-level modular multilevel inverter. The M2C is controlled in voltage mode
using well known sinusoidal pulse width modulated (SPWM) switching technique.
PWM is generated using sine triangle PWM. For simulation purposes, due to the
high frequency of the carrier (5 kHz), a much higher sampling frequency is chosen
to run the simulation which reduces the speed of execution badly. In sine triangle
PWM, in order to produce the output voltage of desired magnitude waveform, phase
shift, and frequency, the desired signal is compared with a carrier of higher frequency
Fig. 2 PV grid connected using M2C
V. Narasimhulu and K. Jithendra Gowd
to generate appropriate switching signals. The output voltage of the VSI does not
have the shape of the desired signal, but switching harmonics can be filtered out by
the series LCL low pass filter, to retrieve the 50 Hz fundamental sine wave. The DQ
method is employed to extract the reference currents for PWM.
2.3 MPPT Method
For maximum power transfer, the load should be matched to the resistance of the PV
panel at MPP. Therefore, to operate the PV panels at its MPP, the system should be
able to match the load automatically and also change the orientation of the PV panel
to track the sun if possible. A control system that controls the voltage or current to
achieve maximum power which is needed. This is achieved using a MPPT algorithm
to track the maximum power. The incremental conductance method is implemented
in this work to track maximum power. It uses the advantage that the derivate of the
power with respect to the voltage at the maximum power point is zero. Phase locked
loops (PLL) are employed in order to track the angular frequency and phase shift of
the three-phase voltages for synchronization.
3 Simulation Result Analysis
To estimate the concert of the three-phase grid-connected PV system with the
proposed topology, a PV array with a total output voltage of 850 V is used. The
grid voltage is 660 V with grid frequency of 50 Hz.
The inverter switching frequency is considered to be 5 kHz. The capacitor value
of 470µF is used for dc link voltage. The LCL filter includes an inductor of 5mH and
a condenser of 2.2µF. Various operating conditions have been considered in order
to verify the concert of the proposed topology. The performance of the proposed
topology is corroborated under standard and changing atmospheric conditions. In
case 1, normal solar irradiation (1 kW-2) and ambient temperature (298 K) values
are considered. The three-phase grid-connected PV system is achieved the unity
power factor which is shown in Fig. 3.
In the case 2, it is considered that the PV unit operates under standard atmospheric
conditions until 0.5 s. At t = 0.5 s, the atmospheric condition changes in such a way
that the solar irradiation of the PV unit reduces to 50% from the standard value.
Figure 4 shows that the PV unit operates under standard atmospheric conditions up
to 0.5 s and changes in atmospheric conditions up to 0.6 s. After that, it operates
under standard conditions, and the system maintains the operation at unity power
factor. The dc link voltage balanced is achieved using PI controller in all the cases.
Performance Analysis of Single-Stage PV Connected Three-Phase …
Irradiation (w/m2)
Time (s)
(a) Irradiation
DC Voltage (V)
Time (s)
(b) DC link voltage
Voltage (V),
Current (*5A)
Time (s)
(c) Displacement of Voltage and Current
Time (s)
(d) Power Factor
Fig. 3 Responses at standard atmospheric conditions
V. Narasimhulu and K. Jithendra Gowd
Time (s)
DC Voltage (V)
(a) Irradiation
Time (s)
(b) DC link Voltage
Grid Voltage
Grid Current
Voltage (V),
Current (*5A)
Time (s)
(c) Displacement of voltage and current
Time (s)
(d) Power Factor
Fig. 4 Grid voltage and current under changing atmospheric conditions
Performance Analysis of Single-Stage PV Connected Three-Phase …
4 Conclusions
In this paper, three-phase M2C-based single-stage PV grid-connected system is
adopted and operated the system at unity power factor. The proposed topology
has performed well under standard and changing atmospheric conditions. The DQ
method has performed well to track the PWM reference currents and to synchronize the grid and PV panel. The sinusoidal PWM is implemented and controlled
in systematic manner under all conditions. Single-stage PV conversion effectively
performed in the conversion of dc-ac power. The PI controller effectively controlled
the dc link voltage under all conditions.
1. Jain, S., Agarwal, V.: A single-stage grid connected inverter topology for solar PV systems
with maximum power point tracking. IEEE Trans. Power Electron. 22(5), 1928–1940 (2007)
2. Zimmermann, U., Edoff, M.: A maximum power point tracker for long-term logging of PV
module performance. IEEE J. Photovolt. 2(1), 47–55 (2012)
3. Koutroulis, E., Blaabjerg, F.: A new technique for tracking the global maximum power point of
PV arrays operating under partial shading conditions. IEEE J. Photovolt. 2(2), 184–190 (2012)
4. Kjaer, B., Pedersen, J.K., Blaabjerg, F.: A review of single-phase grid-connected inverters for
photovoltaic modules. IEEE Trans. Ind. Appl. 41(5), 1292–1306 (2005)
5. Esram, T., Chapman, P.L.: Comparison of photovoltaic array maximum power point tracking
techniques. IEEE Trans. Energy Convers. 22(2), 439–449 (2007)
6. Houssamo, I., Locment, F., Sechilariu, M.: Maximum power point tracking for photovoltaic
power system: Development and experimental comparison of two algorithms. Renew. Energy
35(10), 2381–2387 (2010)
7. Blaabjerg, F., Teodorescu, R., Liserre, M., Timbus, A.V.: Overview of control and grid synchronization for distributed power generation systems. IEEE Trans. Ind. Electron. 53(5), 1398–1409
8. Kotsopoulos, A., Darte, J.L., Hendrix, M.A.M.: Predictive DC voltage control of single-phase
pv inverters with small dc link capacitance. In: Proceedings of IEEE International Symposium
on Industrial Electronics, pp. 793–797 (2003)
9. Meza, C., Negroni, J.J., Biel, D., Guinjoan, F.: Energy-balance modeling and discrete control for
single-phase grid-connected PV central inverters. IEEE Trans. Ind. Electron. 55(7), 2734–2743
10. Kadri, R., Gaubert, J.P., Champenois, G.: An improved maximum power point tracking for
photovoltaic grid-connected inverter based on voltage-oriented control. IEEE Trans. Ind.
Electron. 58(1), 66–75 (2011)
11. Selvaraj, J., Rahim, N.A.: Multilevel inverter for grid-connected PV system employing digital
PI controller. IEEE Trans. Ind. Electron. 56(1), 149–158 (2009)
12. Rahim, N.A., Selvaraj, J., Krismadinata, C.C.: Hysteresis current control and sensorless MPPT
for grid-connected photovoltaic systems. In: Proceedings of IEEE International Symposium
on Industrial Electronics, pp. 572–577 (2007)
13. Kotsopoulos, A., Duarte, J.L., Hendrix, M.A.M.: A predictive control scheme for DC voltage
and AC current in grid-connected photovoltaic inverters with minimum DC link capacitance. In:
Proceedings of 27th Annual Conference on IEEE Industrial and. Electronic Society, pp. 1994–
1999 (2001)
V. Narasimhulu and K. Jithendra Gowd
14. Zue, O., Chandra, A.: State feedback linearization control of a grid connected photovoltaic
interface with MPPT. Presented at the IEEE Electronic Power Energy Conference, Montreal,
QC, Canada, (2009)
15. Lalili, D., Mellit, A., Lourci, N., Medjahed, B., Berkouk, E.M.: Input output feedback linearization control and variable step size MPPT algorithm of a grid-connected photovoltaic inverter.
Renew. Energy 36(12), 3282–3291 (2011)
16. Mahmud, M.A., Pota, H.R., Hossain, M.J.: Dynamic stability of three-phase grid-connected
photovoltaic system using zero dynamic design approach. IEEE J. Photovolt. 2(4), 564–571
17. Kaura, V., Blasko, V.: Operation of a phase locked loop system under distorted utility conditions.
IEEE Trans. Ind. Appl. 33(1), 58–63 (1997)
18. Mahmud, M.A., Pota, H.R., Hossain, M.J., Roy, N.K.: Robust partial feedback linearizing
stabilization scheme for three-phase grid-connected photovoltaic systems. IEEE J. Photovolt.
4(1) (2014).
19. Mahmud, M.A., Hossain, M.J., Pota, H.R., Roy, N.K.: Robust nonlinear controller design for
three-phase grid-connected photovoltaic systems under structured uncertainties. IEEE Trans.
Power Deliv. 29(3) (2014)
20. Debnath, S., Qin, J., Bahrani, B., Saeedifard, M., Barbosa, P.: Operation, control, and applications of the modular multilevel converter: a review. IEEE Trans. Power Electron. 30(1)
21. Acharya, A.B., Ricco, M., Sera, D., Teoderscu, R., Norum, L.E.: Performance analysis of
medium-voltage grid integration of PV plant using modular multilevel converter. IEEE Trans.
Energy Conv. 34(4) (2019)
Delay Feedback H∞ Control for Neutral
Stochastic Fuzzy Systems with Time
T. Senthilkumar
1 Introduction
Time delays emerge in different practical issues in the dynamic systems and source
of instability some times and degradation in the control performance, which are
encountered in various engineering systems, such as hydraulic, electronics, chemical,
and communication biological systems [2, 3, 10]. In recent years, the study of time
delay systems has received much more attention, and various stability analysis and
the H∞ control methods have been discussed [3, 10] and and reference therein.
On the other hand, the neutral time delays are often come across such as population
ecology, heat exchanges, and lossless transmission lines. Thus, the neutral type with
time delay systems in both stochastic and deterministic models is considered in
[1, 3, 10, 11] and reference.
In recent years, a great number of results on the stability analysis, stabilization, and
H∞ control for stochastic systems with or without neutral systems have been widely
studied in the literature, such as [1, 4–6, 9, 11] and references therein. Since the mathematical model prescribed by the author Takagi and Sugeno in [8], Takagi–Sugeno
(T-S) fuzzy method approaches to establish effective method to good representation
of complex nonlinear systems by some simple local linear dynamic systems over
the past two decades. In [7, 12], the stabilization for time delay problem and H∞
control for state and input delay problems for stochastic systems with fuzzy model
are developed. The fuzzy model approaches for delay feedback H∞ control method
for neutral stochastic systems with time delay have not been considered in the past
to the author’s knowledge.
Motivated by the above discussion, the problem of delay feedback H∞ control
for neutral stochastic fuzzy with time delay systems is considered in this paper. By
T. Senthilkumar (B)
Department of Mathematics, National Institute of Technology Puducherry, Karaikal, Puducherry
609609, India
e-mail: tskumar2410@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
T. Senthilkumar
the Lyapunov stability theory and the linear matrix inequality (LMI) approach, the
required delay feedback fuzzy controllers are designed in both drift and diffusion
parts which satisfy such that the closed-loop system H∞ control for neutral stochastic
system is stochastically stable and satisfies a prescribed level γ . Finally, a numerical
example is given to show the effectiveness and feasibility of the developed theoretical
method. Throughout this paper, notations are quite standard.
2 Problem Description
Consider a class of neutral stochastic T-S fuzzy system with time delay as follows
Plant rule i : IF υ1 (t) is ωi1 , υ2 (t) is ωi2 , · · · · · · and υ p (t) is ωi p , THEN
d x(t) − Ni x(t − h) = Ai x(t) + Adi x(t − h) + B1i u(t) + Avi v(t) dt
+ E i x(t) + E di x(t − h) + B2i u(t) dw(t),
z(t) = Ci x(t) + Cdi x(t − h) + B3i u(t),
x(t) = ϕ(t),
t ∈ [−h, 0],
where i = 1, 2, . . . , r . r is the number of IF-THEN rules. ωi j is the fuzzy set,
υ(t) = [υ1 (t) υ2 (t) . . . υ p (t)]T is the premise variable, x(t) ∈ Rn denotes the state
vector, v(t) ∈ R p is the disturbance input defined on L2 [0 ∞). u(t) ∈ Rm and z(t) ∈
Rq is control input and controlled output, respectively. ω(t) is the standard Brownian
motion defined on the complete probability space (Ω, F , {Ft }t≥0 , P) and satisfies
E {dw(t)} = 0, E {dw(t)2 } = dt. ϕ(t) is real-valued continuous, h is positive scalar.
ρ(Di ) < 1, and it denotes the spectral radius of Di .
Utilizing center average defuzzifier, product inferences, and singleton fuzzifier,
dynamic T-S fuzzy model (1)–(3) is expressed as follows:
d x(t) =
h i (υ(t))
Ai x(t) + Adi x(t − h) + B1i u(t) + Avi v(t) dt
+Ni d x(t − h) + E i x(t) + E di x(t − h) + B2i u(t) dw(t) ,
z(t) =
h i (υ(t)) Ci x(t) + Cdi x(t − h) + B3i u(t) ,
x(t) = ϕ(t), t ∈ [−h, 0],
where h i (υ(t)) = rνi (υ(t))
, νi (υ(t)) = j=1 ωi j (υ j (t)), and ωi j (υ j (t)) denoting
i=1 νi (υ(t))
the grade of membership of υ j (t) in ωi j . It is easy to see that νi (υ(t)) ≥ 0 and
Delay Feedback H∞ Control for Neutral Stochastic …
> 0 for all t. Hence, h i (υ(t)) ≥ 0 and
i=1 h i (υ(t)) = 1, ∀ t.
Briefly, h i can be used to represent h i (υ(t)).
Employing parallel distributed compensation technique, the following rules for
fuzzy-model-based control are used in this paper:
Control rule i : IF υ1 (t) is ωi1 , υ2 (t) is ωi2 , . . . and υ p (t) is ωi p , then
i=1 νi (υ(t))
u(t) = K 1i x(t) + K 2i x(t − h), i = 1, 2, . . . , r.
Eventually, delay feedback fuzzy control law is obtained as
u(t) =
h i (K 1i x(t) + K 2i x(t − h))
where the matrices K 1i , K 2i are the controller gains. Combining (8) and (4)–(6), the
overall closed-loop system can be expressed by the following
h i Ni x(t − h) = f (t)dt + g(t)dw(t),
d x(t) −
z(t) =
r r
hi h j
Ci + B3i K 1 j x(t) + Cdi + B3i K 2 j x(t − h) ,
i=1 j=1
x(t) = ϕ(t), t ∈ [−h, 0]
where f (t) = ri=1 rj=1 h i h j Ai + B1i K 1 j x(t) + Adi + B1i K 2 j x(t − h) +
Avi v(t)
and g(t) = ri=1 rj=1 h i h j E i + B2i K 1 j x(t) + E di + B2i K 2 j
x(t − h) .
Definition 1 In this paper, our aim is to design a state feedback controller such that
a) the system (9)–(11) is stochastically stable in the sense of Definition 1 in [9]; and
b) z(t)E2 < γ v(t)2 for all nonzero v(t) ∈ L 2 [0, ∞) with zero initial condition
x(0) = 0.
3 Main Results
The sufficient condition for the solvability of the H∞ control for neutral stochastic
fuzzy system with time delay is given as follows
Theorem 1 Consider the closed-loop stochastic fuzzy system, (9)–(11) is stochastically stabilizable with a disturbance attenuation level γ > 0, if there exist matrices,
Q̄ > 0, R̄ > 0, X > 0, Y1 j , Y2 j , (1 ≤ i ≤ j ≤ r ), and h > 0 such that the following LMIs are satisfied:
T. Senthilkumar
Ω ii
Ω +Ω
< 0, 1 ≤ i ≤ r
< 0, 1 ≤ i < j ≤ r
Ω11 Adi j 0 Avi
⎢ ∗ − Q̄ 0
⎢ ∗
∗ −γ 2 I
Ωi j = ⎢ ∗
⎢ ∗
⎢ ∗
⎣ ∗
AiTj E iTj CiTj
T CT X N T ⎥
ATdi j E di
j di j
i ⎥
0 ⎥
ATvi 0
0 ⎥
−X 0
0 ⎥
∗ −X 0
0 ⎥
∗ −I
0 ⎦
∗ −X
Ω11 = Ai j + AiTj + Q̄ + h 2 R̄, Ai j = Ai X + B1i Y1 j , E i j = E i X + B2i Y1 j ,
Adi j = Adi X + B1i Y2 j , E di j = E di X + B2i Y2 j , Ci j = Ci X + B3i Y1 j , Cdi j = Cdi X + B3i Y2 j .
Then, the desired delay state feedback controller (8) can be realized by
K 1 j = Y1 j X −1 , K 2 j = Y2 j X −1 , 1 ≤ j ≤ r.
Proof Consider the Lyapunov–Krasovskii functional:
T h i Ni x(t − h) P x(t) −
h i Ni x(t − h)
V (x(t), t) = x(t) −
x T (s)Qx(s)ds
0 t
x T (s)Rx(s)dsdθ
−h t+θ
where P > 0, Q > 0 and R > 0 are symmetric matrices with appropriate dimensions.
By Itô’s Formula [5], stochastic derivative of V (x(t), t) is obtained as
d V (x(t), t) = L V (x(t), t)dt + 2 x(t) −
h i Ni x(t − h) Pg(t)dw(t). (16)
Delay Feedback H∞ Control for Neutral Stochastic …
Using Lemma 1 in [11], it can be seen that
−2x T (t − h)
h i NiT P f (t) ≤ x T (t − h)
h i NiT P
h i Ni x(t − h)
+ f T (t)P f (t).
From Lemma 2.3 in [6], we can obtain that
x T (s)Rx(s)ds ≤ −
x(s)ds .
Then, from (17)–(18) and using Schur complement lemma, easily, we obtain,
LV (x(t), t) ≤
r h i h j ζ T (t)Ω̂ i j ζ (t),
i=1 j=1
ζ (t) = x (t) x (t − h)
v(t) , Ω̂ i j = Ω̃ i j + ÃiTj P Ãi j + Ẽ iTj P Ẽ i j
sym(P(Ai + B1i K 1 j )) + Q + h 2 R P(Adi + B1i K 2 j ) 0 P Avi
−Q + Ni P Ni
0 ⎥
−R 0 ⎦
ÃiTj = Ai + B1i K 1 j Adi + B1i K 2 j 0 Avi , Ẽ iTj = E i + B2i K 1 j E di + B2i K 2 j 0 0 .
Ω̃ i j = ⎢
Note that
z T (t)z(t) ≤
r r
h i h j ζ T (t)C˘iTj C˘i j ζ (t)
i=1 j=1
where C˘iTj = Ci + B3i K 1 j Cdi + B3i K 2 j 0 0 .
Now, we set
J (t) = E
t 0
z T (s)z(s) − γ 2 vT (s)v(s) ds
T. Senthilkumar
where t > 0. Under the zero initial condition for t ∈ [−h, 0], and it follows that
J (t) = E
t z T (s)z(s) − γ 2 vT (s)v(s) + L V (x(s), s) ds − E V (x(t), t)
t z T (s)z(s) − γ 2 vT (s)v(s) + L V (x(s), s) ds
ζ T (s)Ω̌ i j ζ (s)ds ,
where Ω̌ i j = Ω̂ i j + C˘iTj C˘i j + diag(0 0 0 − γ 2 I ). If Ω̌ ii < 0, and (Ω̌ i j + Ω̌ ji ) < 0
holds for any (1 ≤ i < j ≤ r ), equivalently
to yield J (t) < 0. Pre- and post
multiplying Ω̌ by diag X, X, X, I and its transpose, respectively, defining
the new variables X = P −1 , Q̄ = X Q X, R̄ = X R X and performing some simple
algebraic manipulations,
we know that
the condition (12)–(13) holds, and the Schur
complement ensures E LV (x(t), t) < 0. By Definition 1 and [5], the closed-loop
stochastic fuzzy system (9)–(11) is stochastically stabilizable with disturbance a
attenuation level γ .
Remark 1 In the system (9) with v(t) = 0, in the above theorem easily, we can
obtain the stabilization problem for the delayed feedback neutral stochastic fuzzy
system with time delays.
Remark 2 In [1, 11] have discussed the H∞ control method for uncertain neutral
stochastic time delay system without fuzzy approach. In this paper, the H∞ control
for neutral stochastic fuzzy systems with delay feedback is considered as special
4 Numerical Example
Consider a delay feedback H∞ control for neutral stochastic fuzzy system (9)–(11)
with the following parameters:
Delay Feedback H∞ Control for Neutral Stochastic …
−0.3 0.2
−0.92 0.49
−0.1 −0.5
A1 =
, A2 =
, Ad1 =
0.1 −0.4
0.15 0.51
0.5 0.01
−0.15 0.21
−0.3 0.2
−0.23 −0.4
Ad2 =
, B11 =
, B12 =
−0.1 −0.3
0.12 −0.2
0.3 −0.4
−0.2 0.3
−0.2 0.15
0.1 0
, Av2 =
, N1 = N2 =
Av1 =
−0.02 0.1
0.15 −0.33
0 0.1
−0.1 −0.2
−0.1 0.1
−0.3 0.1
, E2 =
, E d1 =
E1 =
0.1 −0.4
0.1 −0.2
0.1 −0.2
−0.3 0.2
−0.1 0.1
−0.3 0.1
, B21 =
, B22 =
E d2 =
0.1 −0.19
0.3 0.1
0.10 −0.27
−0.1 0.1
−0.1 0.11
−0.12 0.3
, C2 =
, Cd1 =
C1 =
0 −0.1
0 −0.12
0.22 −0.13
−0.1 0.1
0.13 −0.25
−0.12 0.1
Cd2 =
, B31 =
, B32 =
0 −0.1
0.35 −0.31
−0.035 −0.1
By utilizing Theorem 1, a delay feedback fuzzy controller such that the closed-loop
above system is stochastically stabilizable with prescribed level γ can obtained.
Consider the minimum γ = 17 and the maximum upper bound for the delay h =
7.3225 by using Matlab LMI Control Toolbox to solve the LMI (12)–(13), we obtain
the feasible solution as follows
0.0392 0.0038
0.0154 −0.0073
0.3141 −0.1406
, Q̄ =
, R̄ =
∗ 10−7 .
0.0038 0.0262
−0.0073 0.0131
−0.1406 0.4432
Thus, by using Theorem 1, the controller gains can be obtained as follows :
K 11 =
K 21 =
0.7131 1.3557
0.4064 −1.3250
, K 12 =
0.3526 2.1322
0.3751 1.6662
−0.7859 0.1537
0.4577 0.5300
, K 22 =
0.1919 1.0364
0.4058 −1.0820
5 Conclusion
In this paper, the delay feedback H∞ control for neutral stochastic fuzzy system with
time delay is investigated. By the Lyapunov stability theory and LMI approach, the
aim is to design an delay feedback fuzzy controller which satisfies that the closedloop system is stable in the mean square and also satisfied H∞ performance criteria.
An illustrated example is provided to examine the effectiveness and feasibility of the
proposed approach.
T. Senthilkumar
1. Chen, W., Ma, Q., Wang, L.: Xu, H,: Stabilisation and H∞ control of neutral stochastic delay
Markovian jump systems. Int. J. Syst. Sci. 49, 58–67 (2018)
2. Fridman, E., Shaked, U.: Delay-dependent stability and H∞ control: constant and time-varying
delays. Int. J. Control 76, 48–60 (2003)
3. Karimi, H.R.: Robust delay-dependent H∞ control of uncertain time-delay systems with mixed
neutral, discrete and distributed time-delays and Markovian switching parameters. IEEE Trans.
Circ. Syst. I(58), 1910–1923 (2011)
4. Li, B., Yang, G.: Robust stabilization and H∞ control of uncertain stochastic time-delay systems
with nonlinear perturbation. Int. J. Robust Nonlinear Control 26, 3274–3291 (2016)
5. Mao, X.: Stochastic Differential Equations and Their Applications. Horwood, Chichester
6. Senthilkumar, T., Balasubramaniam, P.: Delay-dependent robust stabilization and H∞ control
for nonlinear stochastic systems with Markovian jump parameters and interval time-varying
delays. J. Optim. Theory Appl. 151, 100–120 (2011)
7. Senthilkumar, T., Balasubramaniam, P.: Delay-dependent robust H∞ control for uncertain
stochastic T-S fuzzy systems with time-varying state and input delays. Int. J. Syst. Sci. 42,
877–887 (2011)
8. Takagi, T., Sugeno, M.: Fuzzy identification systems and it’s application to modeling and
control. IEEE Trans. Syst., Man, Cybern. 15, 116–132 (1985)
9. Xu, S., Chen, T.: Robust H∞ control for uncertain stochastic systems with state delay. IEEE
Trans. Automat. Control 47, 2089–2094 (2002)
10. Xu, S., Lam, J., Chen, B.: Robust H∞ control for uncertain fuzzy neutral delay systems. Eur.
J. Control 10, 365–380 (2004)
11. Xu, S., Shi, P., Chu, Y., Zou, Y.: Robust stochastic stabilization and H∞ control of uncertain
neutral stochastic time-delay systems. J. Math. Anal. Appl. 314, 1–16 (2006)
12. Zhang, B., Xu, S., Zong, G., Zou, Y.: Delay-dependent stabilization for stochastic fuzzy systems
with time delays. Fuzzy Sets Syst. 158, 2238–2250 (2007)
Modeling Crosstalk of Tau and ROS
Implicated in Parkinson’s Disease Using
Biochemical Systems Theory
Hemalatha Sasidharakurup, Parvathy Devi Babulekshmanan,
Sreehari Sathianarayanan, and Shyam Diwakar
1 Introduction
Computational and mathematical modeling of complex biological systems help to
understand complex interactions between biomolecules inside a cell and how disruption in their connections can lead to complex diseases. In order to study complex
diseases, instead of studying its responsible protein/gene alone, all the complex reactions leading to the emergent properties of the entire system also must be studied [1].
Mathematical modeling has been used to solve questions related to the complexity
of living systems such as the brain, due to the difficulties in doing experiments on
humans or other organisms [2]. Computational systems models provide better understanding of the integrated functioning of large-scale distributed brain networks and
show how disruptions in brain function and connectivity impact proper functioning.
Parkinson’s disease (PD) is the most common neurodegenerative movement disorder,
affecting approximately six million people worldwide. Although many medications
are there to control the symptoms, there is no complete cure for this disease due to
its complexity. The major goal of this study was to model two major biochemical
sub-pathways involved in PD, tau and ROS using BST and kinetic equations, where
disturbance in these pathways lead to death of dopamine producing cells inside the
brain. Insoluble tau aggregates form a structure called neurofibrillary tangles, which
are characteristic of neurodegeneration in Alzheimer’s disease and PD [3]. Recent
studies have shown that DJ-1 and LAMP2A, two proteins have important roles in
abnormal protein aggregation in the brain that leads to PD conditions [4, 5]. Studies
also have proved that calcium homeostasis and inflammatory cytokines stimulate tau
hyperphosphorylation causing production of neurofibrillary tangles (NFT) inside the
H. Sasidharakurup · P. D. Babulekshmanan · S. Sathianarayanan · S. Diwakar (B)
Amrita School of Biotechnology, Amrita Vishwa Vidyapeetham, Amritapuri campus, Kollam,
Kerala 690525, India
e-mail: shyam@amrita.edu
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
H. Sasidharakurup et al.
brain, also leading to PD conditions [6]. Although our previous models had explained
the role of tau and ROS along with other factors causing conditions related to cell
death in PD, the interconnection between them have not been well studied [7]. In
this model, the interconnection between DJ-1 and calcium ions in regulating ROS
and tau pathways in PD have been focused [5]. The model also explores the relation
between LAMP2A protein and DJ-1 and how mutations in these proteins develop
PD [4]. Some of the other factors which trigger ROS production and tau hyperphosphorylation such as inflammatory cytokines, calcium homeostasis, mitochondrial
dysfunction and glutamate have been also discussed in this study. The positive feedback loop between mitochondrial dysfunction and NFT production has also been
studied. In addition, the model also discusses the possible usage of oxidized DJ-1 as
a biomarker for PD.
2 Methods
In this study, systems theory and kinetic equations to reconstruct the biochemical
pathways in PD. BST employed time-dependent ordinary differential equations to
represent different types of reactions in a biochemical pathway network to analyze
the rate of individual reactions over time and how they together are responsible for
the emergent properties of the system. Initial concentration values for both diseased
and normal conditions and their rate constants were extracted from the previous
experimental studies. The values were normalized and produced by mimicking the
percentage variations in concentration between control and diseased condition as
observed in experimental studies. The pathway modeling tool, CellDesigner, has
been used to model and simulate interactions (see Fig. 1). A reaction was created
by connecting the reactant and product using a straight-line arrow using the GUI.
Other modifications such as adding another reactant can also be created similarly.
Individual structures inside a cell such as mitochondria, nucleus, etc., can be represented as compartments in a model. The biomolecules inside cells and their complex
interactions were modeled using kinetics laws including reversible and irreversible
Michaelis Menten, convenience kinetic equations, Generalized mass equation, Hill’s
equation, Zeroth order forward and reverse kinetics, etc. Some of the equations are
described below:
2.1 Generalized Mass Action Kinetics
ẋi =
ai j
xk i jk
Modeling Crosstalk of Tau and ROS Implicated in Parkinson’s Disease …
Fig. 1 Pathway segment showing implication of excess calcium influx on tau phosphorylation
where i = (1, …, d). Each variable x i represents the concentration of a reactant, and
ẋ i denotes the time derivative of x i . The parameters aij are known as rate constants,
whereas the parameters gijk are kinetic orders.
2.2 Michaelis–Menten’s Kinetics
Michaelis–Menten’s kinetics includes both reversible and irreversible reactions.
vmax [s]
d[ p]
km + [s]
where V max is the maximum rate achieved by the system, at saturating substrate
concentration in relation of reaction rate v to [S], where [s] is the concentration of a
substrate S. K m is Michealis constant is the rate constant. And the rate constant K m
(Michaelis constant) is equal to [S] when v is half of V max .
H. Sasidharakurup et al.
2.3 Hill Equation
Gene regulation was modeled in the pathway using the Hill equation.
= n
n =
K d + [L]
(K A )n + [L]n
where θ is the fraction of the receptor protein concentration; [L] is the concentration
of unbound ligand; k d , dissociation constant and n is the hill coefficient.
In this way, by comparing both normal and diseased conditions, one can
observe the important biomolecules where the concentration changes affect the cell
homeostasis and predict the behavioral changes of the system.
3 Results
Major biomolecules interacting with tau and ROS pathways have been modeled in
both normal and diseased states and some of the important protein mutations and
changes involved in this system that could develop diseased conditions have been
3.1 Increased Concentration Levels of ROS, Alpha Synuclein
Aggregation and Oxidized DJ-1 in Diseased Conditions
In control, concentration levels of ROS and DJ-1 production have been noticed
comparatively less compared to diseased state (see Fig. 2a). Increased production
of ROS lead to oxidative stress and caspase activation that also lead to cell death.
High concentration of oxidized DJ-1 production was observed in diseased condition
compared to control (See Fig. 2b). In the diseased state, presence of oxidized DJ1 prevented its anti-oxidizing properties and led to increased oxidative stress and
cell death. Results showed that DJ-1 triggered the production of LAMP2A and the
presence of less DJ-1 led to decreased LAMP2A production. The elevations in alpha
synuclein aggregation was also noticed as a leading cause of cell death during the
diseased state.
Modeling Crosstalk of Tau and ROS Implicated in Parkinson’s Disease …
Fig. 2 a Low concentration level of ROS and DJ-1 in control. b Increased concentration levels of
ROS and oxidized DJ-1 and related cell death. c No MPTP formation, mtDNA damage or Complex
1 in control. d Increased levels of tau, neurofibrillary tangles, oxidative stress in diseased state
3.2 Mitochondrial Dysfunction Leads to NFT Production
in Diseased Conditions
Simulation shows that in control conditions, tau protein and its phosphorylation
decreased with time. Neurofibrillary tangles formation slightly increased in the
beginning but decreased shortly and became stable along with phosphorylated tau.
Formation of MPTP, damage in mtDNA or Complex 1 in mitochondria has not been
observed in control condition (see Fig. 2c). In diseased conditions, the result shows
an increase in calcium influx into neurons and mitochondria. A sudden increase in
calcium level inside mitochondria led to increased ROS production. An increase in
mtDNA damage and complex-1 damage was also observed. The result shows that
MPTP formation has been rapidly increasing with time. Decrease in phosphorylated
tau has also been observed with increase in neurofibrillary tangle formation which
again leads to mitochondrial dysfunction. This loop continues eventually leading to
cell death (see Fig. 2d).
H. Sasidharakurup et al.
4 Discussion
The main goal of this study was to understand the major proteins and their interactions
involved in tau and ROS pathways that lead to PD using BST and mathematical equations. From the model, it has been observed that the ROS and tau played a major role
in initiating cell death factors that lead to dopaminergic cell death in PD. The results
indicated that ROS production and oxidative stress were directly linked to other
abnormal processes such as alpha synuclein aggregation, Lewy bodies, tau phosphorylation, etc., that can be observed in PD condition. The results have shown an
elevation in the DJ-1, alpha synuclein aggregation and neurofibrillary tangles during
excess production of ROS in diseased state as compared to control. This suggests
that ROS and tau are interconnected, and the interplay correlates with progression
of PD condition. The model suggested that oxidative stress due to increase in ROS
production was a major factor leading to cell death in PD, since it led to activation
of several caspases and JNK pathways. Only disturbances in some of interconnections were involved in determining the emergent properties of the systems leading
to disease condition.
5 Conclusion
A computational model to analyze the importance of ROS and tau pathway has
been modeled to understand how simple reactions in these pathways make complex
interactions regulating the emergent properties of the system. The model predicts
biomarkers of oxidative stress, mitochondrial dysfunction, DJ-1 oxidation and
calcium homeostasis related to PD. The model shows how dysfunction in some
of the factors in normal conditions lead to diseased conditions. The predictions from
this model can be further tested in animal models and human subjects extrapolating
existing experimental data.
Acknowledgements This work derives direction and ideas from the Chancellor of Amrita Vishwa
Vidyapeetham, Sri Mata Amritanandamayi Devi. This study was partially supported by the Department of Science and Technology Grant DST/CSRI/2017/31, Government of India and Embracing
the World Research-for-a-Cause initiative.
1. Fischer, H.P.: Mathematical modeling of complex biological systems: from parts lists to
understanding systems behavior. Alcohol Res. Health. 31, 49–59 (2008)
2. Ji, Z., Yan, K., Li, W., Hu, H., Zhu, X.: Mathematical and computational modeling in complex
biological systems. Biomed Res. Int. 2017, 1–16 (2017). https://doi.org/10.1155/2017/5958321
Modeling Crosstalk of Tau and ROS Implicated in Parkinson’s Disease …
3. Braak, H., Braak, E., Yilmazer, D., De Vos, R.A.I., Jansen, E.N.H., Bohl, J.: Neurofibrillary
tangles and neuropil threads as a cause of dementia in Parkinson’s disease. J. Neural Trans.
49–55 (1997)
4. Issa, A.R., Sun, J., Petitgas, C., Mesquita, A., Dulac, A., Robin, M., Mollereau, B., Jenny, A.,
Chérif-Zahar, B., Birman, S.: The lysosomal membrane protein LAMP2A promotes autophagic
flux and prevents SNCA-induced Parkinson disease-like symptoms in the Drosophila brain.
Autophagy 14, 1898–1910 (2018). https://doi.org/10.1080/15548627.2018.1491489
5. Xu, X.M., Lin, H., Maple, J., Björkblom, B., Alves, G., Larsen, J.P., Møller, S.G.: The
Arabidopsis DJ-1a protein confers stress protection through cytosolic SOD activation. J. Cell
Sci. 123, 1644–1651 (2010). https://doi.org/10.1242/jcs.063222
6. Guo, T., Noble, W., Hanger, D.P.: Roles of tau protein in health and disease. www.alz.co.uk/res
earch/world-report-2016 (2017)
7. Sasidharakurup, H., Melethadathil, N., Nair, B., Diwakar, S.: A systems model of parkinson.
Disease 21, 454–464 (2017). https://doi.org/10.1089/omi.2017.0056
IoT-Based Patient Vital Measuring
Ashwini R. Hirekodi, Bhagyashri R. Pandurangi, Uttam U. Deshpande,
and Ashok P. Magadum
1 Introduction
IoT was first proposed by Kevin Ashton in 1999 [1]. This is the physical communication network where billions of data is collected from various devices we use and
transforms them into usable information [2]. By 2020, unprecedented growth in the
Internet of things (IoT) technologies will make it possible to talk about 50 billion
connected devices through the Internet [3]. IoT can be used in the medical field so
that doctors can monitor patients from anyplace at any time. This system can be used
for patients who need continuous monitoring of their health. A systematic review of
various mobile healthcare approaches was carried out by [4, 5].
IoT establishes a bridge between the ‘digital world (Internet)’ and the ‘real world
(physical device).’ The devices are connected to the cloud-based services and create
unique identification over the Internet [6, 7]. The Raspberry Pi acts as an aggregator of data collected from different sensors and provides a communication channel
to external entities such as Web browsers running on devices over Wi-Fi/cellular
data network for transferring aggregated sensors’ data. The REST APIs and asynchronous notification services are used for transferring the sensors’ data and triggering measurement activities. Since the pulse or ECG sensors used in the proposed
platform emit analog signals, a 16-bit analog to digital converter ADS1115 is used
for sampling. The digital samples are collected by the Python application running on
Raspberry Pi for further processing and storing of aggregated data to a local database.
The other Web server-based applications coordinate with the Python application and
A. R. Hirekodi (B) · B. R. Pandurangi · U. U. Deshpande
KLS, Gogte Institute of Technology, Belagavi, India
e-mail: ashwinihirekodi16@gmail.com
A. P. Magadum
Osteos India, Pvt. Ltd, Belagavi, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
A. R. Hirekodi et al.
Fig. 1 High-level architecture for vital signs integration
transfer the sensor data for the respective patient being checked for vital signs as
depicted in Fig. 1.
Figure 1 shows the high-level architecture of the different entities involved in the
overall flow of sensor data to NoSQL cloud-based database where patient record
is being stored. The MySQL is used as an intermittent local storage for storing the
aggregated sensor digital samples read from ADC chip. The overall coordination
of sensor data collection and transfer of aggregated data along with ID of a patient
being administered for vital signs to the cloud database happens via Web server and
asynchronous notification service. The medical staff checking one of the vital signs
such as pulse reading of a patient initiates the measurement by plugging a pulse
sensor on patient’s finger and clicking a ‘START’ button in the electronic medical
record (EMR) module UI. The standard pulse reading procedure is carried out and
sensor data samples are collected as per the standards set by the medical organization.
2 Literature Survey
Researchers have worked on this subject for a long time. Here is a brief summary of
the related work.
IoT-Based Patient Vital Measuring System
Rani et al. [8] proposed a system for the gathering of the readings of various
important indications of the patients and sending the readings to the doctor or the
individual about the health condition. In this project, MQTT communication was
used to send the data in the form of pictorial representation to the cloud platform.
Tastan [9] proposed a wearable sensor-based tracking system to record the
patient’s heartbeat and the blood level. The patient who is in a critical health situation can be continuously monitored using this technique. If there is a fluctuation in
the patient’s health levels, then the information/details of his health conditions are
sent to the family members or the doctor through mail or Twitter notifications. The
purpose of this project is to give medical treatment as soon as possible in case of
heart diseases, so that survival chances of a patient are increased.
Misbahuddin et al. [10] proposed the system for the victims of mass disasters and
emergencies with MEDTOC, a real-time component used for the holistic solution.
The proposed system sends real-time details of the affected victims to the doctor
or to the central database about their health condition even before the arrival of the
patient. But this project can only be useful if the disaster area has a cellular network.
If cellular networks are having issues, then alternate connectivity such as Wi-Fi can
be utilized in the future.
3 Hardware and Software Platform
This section has a detailed view of the hardware and the software implementation of
the project.
A. Hardware Platform
Figure 2 shows the hardware implementation of pulse sensor [11]. The three wires
of the sensors are for signal(S), Vcc (3–5 V), and GND. In this project, the sensor is
powered by 3.3 V pin and the signal pin will be connected to Raspberry Pi through
the ADS1115 ADC module because Raspberry Pi by default cannot read analog
voltage. Pulse sensor and ECG are used for checking the heartbeat of the person.
The connections shown in the figure are done using jumping wires, where ECG is
connected to channel 2 and pulse sensor is connected to channel 1. The consumption
of power is less.
B. Software Platform
Python Code: The software used in this project is Linux, which uses Python
code to run sensors. For checking of the pulse, a code is written specifying
all the required parameters that are necessary. For getting analog output of the
pulse sensor, the ADC module is interfaced via I2C communication. The upper
peak and lower peak of the pulse is to be found. Then, the difference between
the peak points is taken to convert it into BPM. The raw analog output is sent
and BPM is sent to the serial port which is read from processing IDE for further
A. R. Hirekodi et al.
Fig. 2 Hardware
implementation for pulse
(ii) MySQL Database: MySQL is an open-source relational database management system, which serves as a temporary storage of digital samples received
from sensors in real time before presenting their aggregated reading to patient
EMR front-end module, and eventually, the aggregated values will be stored
in the cloud database (MongoDB) for patient medical history and continuous
monitoring purposes.
(iii) Web server: Web server used in this system is NodeJS which is distributed
SaaS-based cloud application interacting with front-end patient medical record
system and it is an open-source communicating server. NodeJS is used when
staff calls for patient’s details, all the updated readings are displayed on the
Algorithmic code for NodeJS (https server):
Step 1: Declare http and spawn.
Step 2: Assign child process model i.e. python application to spawn variable and
run using spawn method.
Step 3: Create an http server using imported http and assign an unused port.
Step 4: Http server receives patient id as request from EMR application.
IoT-Based Patient Vital Measuring System
Step 5: The ID will be sent to python pulse sensor/ECG child application.
Step 6: Pulse and ECG readings will be received from python application.
Step 7: Received readings will be sent as Http response to EMR application.
4 Result and Performance Analysis
The objective of the project is to assist the patients to keep track of their health
condition on a regular basis, so that whenever there are any issues, they can contact
their doctors as soon as possible. Even the doctors can get the reports from the
MongoDB database by giving the patient’s ID which is stored when the patient
comes for check-up. When the patient ID is entered, the EMR triggers and readings
are updated in the WebSocket. As compared to other devices, the proposed system
can get readings to the doctor faster and is more accurate. Every time the patient
comes for a check-up, data is acquired and is stored for almost a year.
Figure 3 shows the reading of a pulse sensor that is displayed on Raspberry Pi
which is stored on a private cloud. Whenever a beat is found, the pulse is measured
and when there is no response from the person then ‘no beat found’ is displayed. The
patient’s details are stored under the patient’s ID. The name, age, weight, and cellular
number are mentioned in excel form. Medical information can only be accessed if the
doctor or medical staff knows the password which is given. Right now, only medical
staff and doctor who is in charge of patient is able to go through the medical history.
Every time the patient comes for a check-up data is acquired and is stored as long as
it is required. The storage of the cloud is for 1 to 10 GB.
Fig. 3 Pulse-rate recording on the cloud
A. R. Hirekodi et al.
5 Security Compliance
This digital platform conforms to the safety and security regulations defined by
the standard bodies such as CDASH and EHR standards. The implementation of
this platform considers all the security measures to preserve the patient personal
details and the front-end Web-based integrated EMR system binds with premium
SSL certificate and all the front-end Web UI requests to the backend will be carried
over secured HTTP connection. The platform also makes use of encryption feature
provided by the NoSQL database out of the box. So, with all these security features
incorporated in the digital platform along with backup and high availability features
of cloud database server, it is ensured that the critical sections of the patient personal
data and any other platform data critical to end users are preserved safely all the time.
6 Conclusion and Future Work
This application enables the communication between patients and doctors, allowing
tracking of the patient’s health as well. The stored data can be accessed easily by
doctors and nurses only by entering the patient’s ID. The expected performance
should be as close to the manual meters used for reading pulse and ECG of patients
in a clinical setting with a marginal error in the range of 2–5%. Another aspect of
expected performance is centered around the consistency of the readings from pulse
and ECG sensors as more and more patients are screened in mass screening events
such as camps conducted by the doctors and specialists.
1. Ashton, K.: That ‘internet of things’ thing. RFID J. 22(7), 97–114 (2009)
2. Gubbi, J., Buyya, R., Marusic, S., Palaniswami, M.: Internet of Things (IoT): A vision
architectural elements, and future directions. Fut. Gener. Comput. Syst. 29(7), 1645–1660
3. Fernandez, F., Pallis, G.C.: Opportunities and challenges of the internet of things for healthcare: systems engineering perspective. In: International Conference on Wireless Mobile
Communication and Healthcare, pp. 263–266 (2014)
4. Jersak, L.C., da Costa, A.C., Callegari, D.A.: A systematic review on mobile health care.
Technical Report 073, Faculdade de Informática PUCRS—Brazil, May 2013
5. Fong, E.-M., Chung, W.-Y.: Mobile cloud-computing-based healthcare service by noncontact
ECG monitoring. Sensors 13(12), 16451–16473 (2013)
6. Deshpande, U.U., Kulkarni, M.A.: Iot based real time ecg monitoring system using cypress
wiced. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 6(2) (2017)
7. Deshpande, U.U., Kulkarni, V.R.: Wireless ECG monitoring system with remote data logging
using PSoC and CyFi. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 2(6): 2770–2778 (2013)
8. Rani, S.U., Ignations, A., Hari, B.V., Balavishnu, V.J.: IoT patient health monitoring system.
Indian J. Health Res. Dev. 1330–1334 (2017)
IoT-Based Patient Vital Measuring System
9. Tastan, M.: IoT based wearable smart health monitoring system. Celal Bayar Univ. J. Sci.
343–350 (2018)
10. Misbahuddin, S., Zubairi, J.A., Alahdul, A.R., Malik, M.A.: IoT-based ambulatory vital signs
data transfer system. Hindawi J. Comput. Netw. Commun., 8 p (2018)
11. IoT based heartbeat monitoring system using Raspberry pi website (2019). [Online]. Available
IoT-Enabled Logistics for E-waste
Management and Sustainability
P. S. Anusree and P. Balasubramanian
1 Introduction
The Organization for Economic Cooperation and Development (OECD) defines electronic waste or e-waste as “any appliance using an electric power supply that has
reached its end-of-life” [1]. Electrical and electronic equipment (EEE) such as refrigerators, washing machines, computers, television sets, laptops and smartphones that
reach the useful end is considered electronic waste or e-waste [2] and is known as
waste electrical and electronic equipment (WEEE). E-waste is a constantly growing
[3] and complex waste form [4], and its management is extremely challenging [5].
E-waste toxins cause environmental imbalances [6] including global warming [7]
and climate change [8] apart from health implications [9]. The processing of humongous quantities of e-waste [10] is a major setback for economies. This also results
in illegal trading [11] and mismanagement of e-waste [12]. Sustainable Development Goals proposed by the United Nations in 2015 strive for global environmental
sustainability [13, 14]. E-waste management has an important role in this context.
Digital revolution has led to rapid growth in e-waste generation, but today, technologically oriented solutions can ensure e-waste management. Modern computing and
smart environment have evolved a sophisticated and dynamic network of connectivity and mobility [15]. Pervasive computing infrastructure enables a ubiquitous
environment that the user can sense and control at any point [16]. The novel
paradigm of Internet of things (IoT) network embedded devices [17] such as mobile
phones, sensors, tags, actuators and radio-frequency identification (RFID) [18]. Electronic devices embedded with sensors would revolutionize the industry by enabling
smart management of end-of-life devices [17]. Enabling information accessibility
P. S. Anusree (B) · P. Balasubramanian
Department of Commerce and Management, School of Arts and Sciences, Amrita Vishwa
Vidyapeetham, Kochi, India
e-mail: anusree7389@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
P. S. Anusree and P. Balasubramanian
between the stakeholders through the pervasive smart network will transform the
e-waste management scenario. The current study attempts to develop a sustainable
and systematized e-waste management procedure that can be implemented by a
well-framed network between different sectors, stakeholders and countries.
2 Literature Review
The Internet network has transformed control over objects that surround us [19].
Pervasive smart environment enables embedded devices to work collaboratively and
adapt as per needs [17, 20]. Close to 60% of the IoT market captures the industrial
domain in India and the remaining by the consumer devices [19]. In waste management scenario, significant contribution can be formed with the paradigm of IoT [21]enabled identification, positioning, tracking, monitoring and management systems
[22, 23]. Economies that lack state-of-the-art infrastructure for e-waste management
activities such as collection, disposal and recycling can benefit with an integrated
global network that facilitates these functions. The best-of-two-worlds (Bo2W)
concept developed by the StEP Initiative (Solving the E-waste Problem), United
Nations [24, 25], is a collaborative approach between developed and developing
counties. This would encourage global recycling in e-waste [26], which involves
recycling infrastructure accessibility to countries that lack scientific recycling mechanism. Obsolete devices can thus be recycled to ensure a circular economy [27]
through the sustainability of resources and environment.
A study in Malaysia developed a smart e-waste collection box for the households
with sensors that promoted e-waste collectors for timely pickup [28]. A study in Italy
recommended a collaborative robot model to disassemble scrap components and
optimize recycling [29]. A study in India proposed a collaborative e-waste management system for the stakeholders using smart contracts and blockchain technology
ensuring accountability of devices and management activities including collection,
transportation and recycling [30]. In another study, the concept of virtualization, i.e.,
efficient server management technique was proposed to reduce the hazards of electronic waste [31]. A study in Sweden developed the WEEE ID project in-built with
sensors that enable sorting as well as grading of waste mobile phones for treatment
processes [32]. The philosophy of best-of-2-worlds fosters environmental, economic
and social aspects in e-waste management by integrating geographically distributed
technologies and making these accessible to developed and developing nations [33].
The approach was successfully tested by conducting informal initial processing of
e-waste at units in India and the end processing or smelting undertaken by European
recycling EMPA [33]. The Bo2W approach or international recycling increases the
rate of recovery of metals, reduces toxic emissions and also generates social and
economic opportunities; however, it can lead to difficulties in case of unmonitored
hazardous trade [34]. Hence, we attempt to propose a novel approach integrating
stakeholders and ensuring transparency throughout the procedure.
IoT-Enabled Logistics for E-waste Management and Sustainability
3 E-waste Management Network
We propose a systematic procedure to integrate the activities of collection, disposal,
recycling and related e-waste management activities enabled through a smart environment. Smart pervasive environment that connects all the stakeholders such as the
manufacturer, collection agent, recycler, customer, financing agent and government
on a common platform makes communication and processing transparent. Here, the
devices introduced in the market by the producer would have unique identification
numbers that will serve tracking throughout the life of the product. At the sales point,
the customer details are captured using the Aadhaar or PAN identification [35]. IoTenabled application would permit real-time information accessibility to stakeholders
in the network (Fig. 1).
E-waste information—IoT-enabled sensors that integrate electronic devices
through their unique numbers make information available on established application. Consumers can easily inform device pick to the collecting vendor through
the application as in e-commerce portals in China [36].
The integrated application makes it possible for all stakeholders including
producer, consumer, collection agent, recycler and government authority to track
the movement of particular devices.
In detail information regarding the availability of processes depending on the
e-waste product, routes of the necessary processes should be available, if not
Fig. 1 IoT application flowchart representing the information flow between the stakeholders
P. S. Anusree and P. Balasubramanian
Fig. 2 Logistics integrated model representing the logistical view of the proposed model
logistics information regarding location and package of consignments should
be made available to the producers, retailers and government.
Specific window for government authorities to access conformity with applicable rules and regulations for given devices.
Finance and tax-related information within this system in not included in the
current study; however, with such massive project integration, monetary flows
at various stages of the operations are crucial.
Considering the operations within district, state, country and global scales, the
system needs to be audited by authorized professionals. Additional employment
will be generated at this level too (Fig. 2).
Through the application, consumers (individual and bulk) can at the touch of a
button inform pickup status of particular electronic device.
The collection agent and the transport facility are also included in the integrated
model. The informal networks of e-waste scavengers can thus have an improved
The workers collect the devices from the household or workspace and deliver to
the common point or a collection hub. Smart collection hubs can be developed
with time as per the infrastructural capabilities of the district or state concerned.
The collected devices can then be transferred to dismantling and/or recycling
networks within state or country depending on the availability of facilities and
infrastructure. Retrieved materials and resources can be applied for further
processing into the markets.
IoT-Enabled Logistics for E-waste Management and Sustainability
For complex advanced treatments and processing, e-waste can be contracted
with international facilities for global recycling [34]. Thus, plant capacities that
remain unused in many parts of the world due to shortage of e-waste stock can
be utilized efficiently.
Metals, different materials and substances recovered through the processes can
be introduced to markets.
The model explains the logistic view of e-waste scrap in the integrated system.
The objective with such a model is to make sure that the existing e-waste industry
is transformed to a professional structure; however, doubts regarding employment
of lacks of workers in the informal sector arise here. Dedicated efforts from the
stakeholders and authorities can uplift the work-life scenarios of the workers by
integrating them in the network. The knowledge and experience of the scavengers
and the scrap workers can be transformed profitably with proper training and guidance. Futuristically, the traditional methods of e-waste treatment can be completely
eradicated to protect workers from occupational health hazards in dumpsites [37].
Pay-related shortcomings persist in the model; however, user pay alone is insufficient for management companies as far as solid household waste is concerned [38].
Overlooking the existing gaps in the model, it can be ascertained that integrating
stakeholders in the capacities is built around digital services and IoT; improvements
can be attained in the area of e-waste management. Smart environment thus created
can lead to efficient use of resources and sustainability.
4 Conclusion
The paper attempted to present an integrated view of Internet of things (IoT) enabled
into a combination of formal–informal e-waste scenario. With the linkage of Internet
services through smart environment and best-of-2-worlds philosophy, the paper
proposed a model specifying IoT application and logistics view of this system.
Gaps in the model including financial flows and administrative paths are the limitations in the study. However, to make sustainable efforts in e-waste management,
a perspective has been proposed through the model which would ultimately benefit
in creating employment, industry development and welfare of the society as well as
environmental sustainability.
1. OECD: Extended Producer Responsibility; A Guidance Manual for Governments. Organization
for Economic Cooperation and Development (OECD) (2001)
2. Kumar, A., Sharma, L.: A study of e-waste management on the subject of awareness of college
students. Foreword by Secretary, 41
P. S. Anusree and P. Balasubramanian
3. Heacock, M., Kelly, C.B., Asante, K.A., Birnbaum, L.S., Bergman, Å.L., Bruné, M.N.,Kamel,
M., et al.: E-waste and harm to vulnerable populations: a growing global problem. Environ.
Health Perspect. 124(5):550–555 (2016)
4. Widmer, R., Oswald-Krapf, H., Sinha-Khetriwal, D., Schnellmann, M., Böni, H.: Global
perspectives on e-waste. Environ. Impact Assess. Rev. 25(5), 436–458 (2005)
5. Bazargan, A., Lam, K.F., & McKay, G.: Challenges and opportunities of e-waste management.
Nova Science (2012)
6. Awasthi, A.K., Wang, M., Wang, Z., Awasthi, M.K., Li, J.: E-waste management in India: a
mini-review. Waste Manage. Res. 36(5), 408–414 (2018)
7. Devika, S.: Environmental impact of improper disposal of electronic waste. In: Recent
Advances in Space Technology Services and Climate Change 2010 (RSTS & CC-2010),
pp. 29–31. IEEE, Nov 2010
8. McAllister, L., Magee, A., Hale, B.: Women, e-waste, and technological solutions to climate
change. Health Hum. Rights J. 16(1), 166–178 (2014)
9. Seeberger, J., Grandhi, R., Kim, S.S., Mase, W.A., Reponen, T., Ho, S.M., Chen, A.: Special
report: E-waste management in the United States and public health implications. J. Environ.
Health 79(3), 8–17 (2016)
10. Tansel, B.: From electronic consumer products to e-wastes: Global outlook, waste quantities,
recycling challenges. Environ. Int. 98, 35–45 (2017)
11. Bisschop, L.: Is it all going to waste? Illegal transports of e-waste in a European trade hub.
Crime, Law Soc. Change 58(3), 221–249 (2012)
12. Awasthi, A.K., Zeng, X., Li, J.: Environmental pollution of electronic waste recycling in India:
a critical review. Environ. Pollut. 211, 259–270 (2016)
13. UN: Transforming our world: the 2030 agenda for sustainable development, United Nations
(2015). Available at https://sustainabledevelopment.un.org/content/documents/21252030%
20Agenda%20for%20Sustainable%20Development%20web.pdf. Accessed 12 Nov 2016.
14. Annan-Diab, F., Molinari, C.: Interdisciplinarity: practical approach to advancing education
for sustainability and for the sustainable development goals. Int. J. Manag. Educ. 15(2), 73–83
15. Malatras, A.: State-of-the-art survey on P2P overlay networks in pervasive computing
environments. J. Netw. Comput. Appl. 55, 1–23 (2015)
16. Satyanarayanan, M.: Pervasive computing: vision and challenges. IEEE Pers. Commun. 8(4),
10–17 (2001)
17. Mukhopadhyay, S.C., Suryadevara, N.K.: Internet of things: challenges and opportunities. In:
Internet of Things. Springer, Cham, pp. 1–17 (2014)
18. Atzori, L., Iera, A., Morabito, G.: The internet of things: a survey. Comput. Netw. 54(15),
2787–2805 (2010)
19. Thapliyal, R., Patel, R.K., Yadav, A.K., Singh, A.: Internet of things for smart environment
and integrated ecosystem. Int. J. Eng. Technol. 7(3.12), 1219–1221 (2018)
20. Ahmed, E., Yaqoob, I., Gani, A., Imran, M., Guizani, M.: Internet-of-things-based smart environments: state of the art, taxonomy, and open research challenges. IEEE Wirel. Commun.
23(5), 10–16 (2016)
21. Saha, H.N., Auddy, S., Pal, S., Kumar, S., Pandey, S., Singh, R., Saha, S.: Waste management using Internet of Things (IoT). In: 2017 8th Annual Industrial Automation and
Electromechanical Engineering Conference (IEMECON), pp. 359–363. IEEE, Aug 2017
22. Ashton, K.: That ‘Internet of Things’ thing. RFid J. 97–114 (2009)
23. Vimal Jerald, A., Rabara, S.A., Bai, T.D.P.: Internet of things (IoT) based smart environment
integrating various business applications. Int. J. Comput. Appl. 128(8), 32–37 (2015)
24. Wang, F., Huisman, J., Meskers, C.E., Schluep, M., Stevels, A., Hagelüken, C.: The Bestof-2-Worlds philosophy: developing local dismantling and global infrastructure network for
sustainable e-waste treatment in emerging economies. Waste Manag. 32(11), 2134–2146. In:
Krüger, C. (ed.) E-Waste Recycling in India–Bridging the Gap Between the Informal and
Formal Sector (2012)
IoT-Enabled Logistics for E-waste Management and Sustainability
25. Buchert, M., Manhart, A., Mehlhart, G., Degreif, S., Bleher, D., Schleicher, T. Kummer, T.:
Transition to sound recycling of e-waste and car waste in developing countries (2016)
26. Kuehr, R., Wang, F.: Rich and poor nations can link up to recycle e-waste (web). Retrieved 30
Jan 2010, from https://www.scidev.net/
27. Park, J., Sarkis, J., Wu, Z.: Creating integrated business and environmental value within the
context of China’s circular economy and ecological modernization. J. Clean. Prod. 18(15),
1494–1501 (2010)
28. Kang, K.D., Kang, H., Ilankoon, I.M.S.K., Chong, C.Y.: Electronic waste collection systems
using Internet of Things (IoT): household electronic waste management in Malaysia. J. Clean.
Prod. 252, 119801 (2020)
29. Alvarez-de-los-Mozos, E., Renteria, A.: Collaborative robots in e-waste management. Procedia
Manuf. 11, 55–62 (2017)
30. Gupta, N., Bedi, P.: E-waste management using blockchain based smart contracts. In: 2018
International Conference on Advances in Computing, Communications and Informatics
(ICACCI), pp. 915–921. IEEE Sept 2018
31. Krishnan, S.S.R., Balasubramanian, K., Mudireddy, S.R.: Design/implementation of a novel
technique in virtualization to reduce e-waste. Int. J. Adv. Res. Comput. Commun. Eng 2, 12
32. Barletta, I., Johansson, B., Cullbrand, K., Björkman, M., Reimers, J.: Fostering sustainable
electronic waste management through intelligent sorting equipment. In: 2015 IEEE International Conference on Automation Science and Engineering (CASE), pp. 459–461. IEEE, Aug
33. Wang, F., Huisman, J., Meskers, C.E., Schluep, M., Stevels, A., Hagelüken, C.: The Bestof-2-Worlds philosophy: developing local dismantling and global infrastructure network for
sustainable e-waste treatment in emerging economies. Waste Manage. 32(11), 2134–2146
34. Manhart, A.: International cooperation for metal recycling from waste electrical and electronic
equipment: an assessment of the “Best-of-Two-Worlds” approach. J. Ind. Ecol. 15(1), 13–30
35. Tiwar, D., Raghupathy, L., Khan, A.S.: Evolving and e-waste tracking system for India to
mitigate the problems of e-waste management. Int. J. Res. Appl. Sci. Eng. Technol. (IJRASET)
6(1), 1865–1873 (2018)
36. Zhang, B., Du, Z., Wang, B., Wang, Z.: Motivation and challenges for e-commerce in e-waste
recycling under “Big data” context: a perspective from household willingness in China. Technol.
Forecast. Soc. Chang. 144, 436–444 (2019)
37. Annamalai, J.: Occupational health hazards related to informal recycling of e-waste in India:
an overview. Indian J. Occupat. Environ. Med. 19(1), 61 (2015)
38. Boateng, K.S., Agyei-Baffour, P., Boateng, D., Rockson, G.N.K., Mensah, K.A., Edusei, A.K.:
Household willingness-to-pay for improved solid waste management services in four major
metropolitan cities in Ghana. J. Environ. Publ. Health (2019)
Navigation Through Proxy Measurement
of Location by Surface Detection
G. Savitha, Adarsh, Aditya Raj, Gaurav Gupta, and Ashik A. Jain
1 Introduction
Several recent advancements have been observed in the field of mobile robots and
autonomous vehicles. Most of these systems rely on external sources like global positioning system (GPS) for navigation. This increases the costs as well as complexity
of such systems. Moreover, the moving robots quite often lost its balance and fall
down which results in damage of their parts leading to an increment in the maintenance expenditures. To deal with all such issues, an attempt is made by us to develop
a moving system which is capable of detecting the type of surface on which it is
moving. The detected surface type can be used as a substitute for measurement of
location. Since the surface is detected by inertial measurements, the robots will get an
optimum range of values for acceleration and speed by which it can navigate safely
without falling down. The system mainly comprises of a raspberry pi processing
device, an IMU sensor and a moving cart. In the further sections, the details of the
approaches used for the development are explained in detailed manner.
2 Related Works
Sebastian Thrun et al. [1], proposed a system for indoor navigation of mobile robots.
The system used grid-based maps learned using artificial neural networks (ANNs)
and Bayesian integration along with topological maps built on top of it. Feder et al.
[2] presented a technique of adaptive mobile robot navigation and mapping which
G. Savitha · Adarsh (B) · A. Raj · G. Gupta · A. A. Jain
Department of Computer Science and Engineering, B.N.M.I.T. (Affiliated To VTU), Bengaluru,
e-mail: adarshs169@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
G. Savitha et al.
explains how to perform concurrent mapping and localization using sonar. Benn
et al. [3] presented a monocular image processing-based algorithm for detecting and
avoiding obstacles. The authors used colored segmentation technique against the
selected plane of the floor to detect the collision. Balch et al. [4] presented an avoiding
past strategy which is implemented using a spatial memory within a schema-based
motor control memory which produced promising results in simulation and with
mobile robots. Conner et al. [5] proposed a methodology for composition of local
potential functions for global control and navigation such that the resulting composition completely solves the navigation and control problem for the given system
operating in a constrained environment. Walas et al. [6] proposed a walking robot
controller for negotiation of terrains of terrains with different traction characteristics. The paper presented the discrete event systems (DES) for the specification of the
robot controller. Kertesz et al. [7] proposed a technique to fuse four different types
of sensors to distinguish six indoor surface types. Random forest (RF) was used for
analyzing machine learning aspects. Rimon et al. [8] proposed a technique for exact
robot motion planning and control using artificial potential function that connects
the kinematic planning problem with the dynamic execution problem in a provably
correct fashion. Brooks et al. [9] proposed an approach to detect geometric hazards
by using self-supervised classification. The accuracy was also assessed for different
data sets with traditional trained classification. Lomio et al. [10] proposed a methodology for indoor surface detection by mobile robots using inertial measurements.
The paper presented several time series classification techniques for classifying the
floor types based on the IMU sensor dataset. Feature extraction techniques proposed
by Savitha et al. [11] provides efficient ways for selecting required features from a
given dataset for offline signature verification. The work describes use of singular
value decomposition for feature selection. Savitha et al. [12] proposed multimodal
biometric fusion methodology which offers face and fingerprint as biometric traits as
an input for sanctuary purpose that are not unique to each other of the human body.
The system proposed a linear discriminant regression classification (LDRC) algorithm. Savitha et al. [13] presented a system which focus on developing a multimodal
biometric authentication system, in order to improve the recognition process.
3 Methodology
The work presented here involves four main stages. These stages are data collection,
data preprocessing, feature extraction, model selection, and performance evaluation.
Each of these stages are covered in upcoming sections.
Navigation Through Proxy Measurement of Location …
3.1 Data Collection
The initial step is to collect the time series data of inertial measurements from
the inertial measurement units (IMU) sensors. An IMU is an electronic device that
measures and reports orientation, velocity, and gravitational forces through the use of
accelerometers and gyroscopes and often magnetometers. There are several variants
of IMU sensor available is market but for the work here, MPU 6050 is used. The
sensor is interfaced with raspberry pi which is kept on a moving wheeled robot. Again
there are several variants of raspberry pi available, and for the work described here,
model 3B+ is used. The robot is then moved on different kinds of floors to collect the
dataset. Here, data is collected for 3 different surfaces comprising of mosaic surface
(which can be considered as smooth surface and resembles texture of other similar
surfaces like wood, tiles, and marbles), pavement and moderate rough surfaces like
concrete floor as shown in Fig. 1. The sensor collects data over acceleration, velocity
and orientation along x-axis, y-axis, and z-axis providing total six degrees of freedom.
For interfacing IMU sensor with raspberry pi, I2C communication protocol must
be enabled first in raspberry pi. This can be achieved by typing the command raspiconfig in the raspberry pi terminal following by navigating to ‘Advanced Options’
from where I2C can be enabled. Once the protocol has been enabled, IMU can be
interfaced with raspberry pi board with the pin connection as shown in Fig. 2.
Fig. 1 Types of surfaces
G. Savitha et al.
Fig. 2 Interfacing MPU
6050 with raspberry pi
Fig. 3 Data rows of different surfaces
Once the proper connection has been established, the raspberry pi can be coded to
collect data from the sensor over the surfaces described earlier. Data collected from
sensor will be stored in.csv extension file for further preprocessing.
For the work here, over 1000 data rows are collected for each surface for better
training of model as shown in Fig. 3.
3.2 Data Preprocessing
Data preprocessing is a data mining technique which is used to transform the raw
data in a useful and efficient format. Each of the feature row is first mapped to a target
value having the name of the floor. This is followed by encoding the values of target.
Since target value is the name of the type of floor which is a non-numeric categorical
data, it need to be encoded into numeric form so that it can be used for performing the
prediction. Checks over dataset are performed for any missing values. If any missing
Navigation Through Proxy Measurement of Location …
Fig. 4 Correlation matrix visualization
value is present, it will be substituted with mean of values in the case of features data
and median in case of categorical target values. Hence, a preprocessed set of data
which is suitable for performing data analysis was obtained. This data is further used
for building the classification model.
3.3 Feature Extraction
Feature extraction starts from an initial set of measured data and builds derived values
intended to be informative and non-redundant, facilitating the subsequent learning
steps, and in some cases leading to better human interpretations. The features which
are useful for training the model are selected, and the remaining ones are eliminated.
The correlation matrix is constructed for the set of features and the target values.
The features having high correlation between them are considered, and among those
features, the one having low correlation with the target variable are eliminated. The
correlation matrix is constructed using pandas library. The orientation variables of
all axes are eliminated in this process and the remaining ones are selected. Figure 4
shows the visual representation of the correlation matrix.
3.4 Model Selection
There are several algorithms which can be used for classifying a given set of data.
The process of selecting the best algorithm and developing the model which best fits
G. Savitha et al.
the data set is termed here as model selection. The train test split technique is used
to divide the dataset into two parts namely the training set and test set. The model is
trained on the training set and validated against the test set.
The other cross validation strategy used to select the model is K-Fold cross validation technique. In this methodology, the K value is fetched to the system which
is the total number of divisions in the data set. One among such divisions is taken
as test set and remaining as the training set. This process is iterated till all of the set
is taken as test set. In this way, various set of accuracy is obtained for each set of
iteration. Hence, the mean accuracy value of all of the iteration is calculated and is
considered as the performance of the model.
3.5 Performance Evaluation
The algorithms used to build the classification model are random forest classification
and K nearest neighbors classifier. Random forest consists of a large number of
individual decision trees that operate as an ensemble. Each individual tree in the
random forest fetches a class prediction and the class with the most votes becomes
the model’s prediction. The K nearest neighbors works on the principle that similar
data points exist close to each other. The K-nearest data points to the test data point
are considered for prediction of the class.
Random forest model has performed better than the K nearest neighbors model.
The accuracy of KNN is around 50% in case of K-Fold cross validation technique and
is around 58% in case of train test split. In contrast to that, the accuracy of random
forest is around 66% in case of train test split CV and 50% in case of K-Fold CV.
Classification report for K-nearest neighbors is
micro avg
macro avg
weighted avg 0.59
Classification report for random forest
Navigation Through Proxy Measurement of Location …
micro avg
macro avg
weighted avg 0.67
The accuracy of KNN in case of train test split is 58% and in case of K-Fold cross
validation is 49.20%. The model is developed for feeding it into the raspberry pi
for performing floor detection. The RAM available in raspberry pi is 1 GB SRAM
and also has relatively low processing power as compared to PCs or other computing
devices. Since the raspberry pi has limited processing power, it is not feasible to dump
a complex model like random forest having a high level of complexity to perform
the classification. The time complexity for building a complete unpruned decision
tree is O(V * nlog(n)) where v is the number of variables or features and n is the
number of data points. A random forest consists of several such decision trees. In
contrast to this, the time complexity of KNN algorithm is O(m * n) where n is the
number of training examples and m is the number of dimensions in the training set.
For simplicity, assuming n m, the complexity of the brute-force nearest neighbor
search is O(n). Hence, both K nearest neighbors and random forest are suitable as
the model for performing the classification of floors based on the data collected by
the IMU sensors. The system is thus able to detect the floor quite accurately and
4 Conclusion and Future Enhancements
The work presented here suggests an ambient method to make the robots more
intelligent by providing them the information about the environment particularly
about the type of floor on which it is moving. The floor detection can be used for
navigation in indoor as well as outdoor surfaces. Moreover, since the type of floor is
depending upon the inertial measurements, the robots can be enhanced by embedding
actuators which will adjust its speed and acceleration based on the type of floor. This
will ameliorate the performance of mobile robots and protect them from damage due
to falling down during navigation. Further, having an idea about the type of floor
also help the robots for a safe navigation and protect them from external threats.
The proposed system uses a raspberry pi computing device which is compact and
suitable for smaller mobile robots. On the contrary, the earlier technologies used
large sized trolley along with desktop computer system for the data collection. This
has significantly improved the mobility of the device. Moreover, the accuracy of
classification is also nearly same, i.e., 66% in case of our proposed model and 68% in
G. Savitha et al.
case of previous systems. Further, the computational requirements have also reduced
to a great extent in our system. Instead of performing edge computing in the mobile
system, cloud-based computing can also be used to further enhance the accuracy of
the prediction. A state-of-art classification model can be used to predict the floor type
by running the model on a high processing device like GPUs, and the result can be
fed to the system using WiFi technology available with the raspberry pi. Thus, the
input data and the predicted result can be shared between the system and cloud in
a much better way. Although this technique will increase the cost along with some
delay due to data passing, the accuracy of the results will get enhanced to a great
1. Sebastian Thrun Computer Science Department and Robotics Institute, Carnegie Mellon
University, Pittsburgh, PA 15213, US. In: Proceedings of Elsevier Artificial Intelligence on
Learning Metric-Topological Maps for Indoor Mobile Robot Navigation (1998)
2. Feder, H.J.S., Leonard, J.J., Smith, C.M.: Proceedings of the International Journal of Robotics
Research for Adaptive Mobile Robot Navigation and Mapping. Massachusetts Institute of
Technology, 77 Mass Ave., Cambridge, MA021392 Charles Stark Draper Laboratory, 555
Technology Square, Cambridge, MA02139 (1999)
3. Benn, W., Lauria, S.: Robot Navigation Control Based on Monocular Images: An Image
Processing Algorithm for Obstacle Avoidance Decisions. In: Proceedings of Hindawi
Publishing Corporation, Mathematical Problems in Engineering, vol. 2012, Article ID 240476,
14 p. Department of Information Systems and Computing, Brunel University, Uxbridge UB8
3PH, UKA (2012), https://doi.org/10.1155/2012/240476
4. Balch, T., Arkin, R.: Proceedings of IEEE on Avoiding the Past: A Simple But Effective Strategy
for Reactive Navigation. Mobile Robot Laboratory, College of Computing, Georgia Institute
of Technology Atlanta, Georgia USA (1993)
5. Conner, D.C., Rizzi, A.A., Choset, H.: Proceedings of IEEE 2003 Conference on Intelligent
Robots and Systems on Composition of Local Potential Functions for Global Robot Control
and Navigation. Carnegie Mellon University, Pittsburgh, US (2003)
6. Walas, K.: Terrain Classification and Negotiation with a Walking Robot. Springer Science +
Business Media, Dordrech, vol. 78, pp. 401–423 (2015)
7. Kertesz, C.: Rigidity-Based Surface Recognition for a Domestic Legged Robot. IEEE Robot.
Autom. Lett. (2016)
8. Elon Rimon University of California Daniel E. Koditschek University of Pennsylvania,
kod@seas.upenn.edu. In: Proceedings of Departmental Papers. Department of Electrical and
Systems Engineering on Exact Robot Navigation Using Artificial Potential Functions (1992)
9. Christopher, A.: Brooks and Karl lagnemma self-supervised terrain classification for planetary
surface exploration rovers. J. Field Robot. 29(3), 445–468 (2012)
10. Lomio, F., Skenderi, E., Mohamadi, D., Collin, J., Ghabcheloo, R., Huttunen, H.: In: Proceedings of Arxiv on Surface Type Classification for Autonomous Robot Indoor Navigation.
Tampere University, Finland (2019)
11. Savitha, G., Vibha, L.: Textural and Singular Value Decomposition Feature Extraction Technique for Offline Signature Verification. In: Proceedings of International Journal of Information
Processing, vol. 8(3), pp. 95–105 (2014). Department of Computer Science and Engineering,
BNMIT, Bangalore-560070, India , Venugopal K R, Department of Computer Science and
Engineering, University Visvesvaraya College of Engineering, Bangalore University, Bangalore 560 001, India , L M Patnaik Honorary Professor, Indian Institute of Science, Bangalore,
India. ISSN: 0973-8215
Navigation Through Proxy Measurement of Location …
12. Savitha, G., Vibha, L., Venugopal, K.R.: Multimodal biometric authentication system using
LDR based on selective small reconstruction error. J. Theoret. Appl. Inf. Technol. 92(1) (2016).
ISSN: 1992-8645
13. Savitha, G., Vibha, L., Venugopal, K.R.: Multimodal cumulative class specific linear discriminant regression for cloud security. Int. J. Comput. Sci. Inf. Secur. (IJCSIS) 15(2) (2017). ISSN
Unsupervised Learning Algorithms
for Hydropower’s Sensor Data
Ajeet Rai
1 Introduction
The maintenance of different machines in hydropower plant is one of the crucial
tasks for the engineers. It is very difficult to monitor each machine manually. Such
types of problems cause the application of data science in the energy sector. Instead
of monitoring manually, we can develop a system that uses data which helps to keep
tracking the performance of the machines. The data generated from a machine’s
sensor behaves the same if there are no issues with machines but if sensors generate
abnormal data that shows engineers have to check the machines if it’s working
well. There are many techniques to solve these problems using data and one of
them is anomaly detection. The abnormal data can be treated as outliers, and the
system should be able to detect these outliers after training on the past data. So,
anomaly detection techniques are well suited for such kind of problems. The terms
anomalies and outliers are used interchangeably. Mathematically, data points that are
far away from the mean or median of data can be treated as outliers or anomalies.
Sometimes human error, instrument error, sampling error, novelties, etc., can also
be a reason for anomalies in data. Machine learning approaches like supervised and
unsupervised learning are used to detect anomalies. Supervised algorithms such as
support vector machine, k-nearest neighbors, Bayesian networks, decision trees, etc.
whereas unsupervised algorithm such as self-organizing maps (SOM), K-means,
C-means, expectation–maximization (EM), one-class support vector machine, etc.
used for anomaly detection. Methods based on statistics, clustering, distance, and
density are also suggested by researchers. Statistical methods for anomaly has been
categorized in the distribution-based method, and depth-based methods are also very
A. Rai (B)
iServeU Technology, Bhubaneswar, India
e-mail: ajeetrai2293@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
A. Rai
effective. Also, parametric Z-score and non-parametric proximity-based models as
well have been suggested for anomaly detection.
2 Literature Review
In literature, the depth-based method has been implemented by Ruts and Rousseeuw
[1]. Different clustering methods such as DBSCAN, K-means, etc. have been implemented for anomaly detection [2]. Various statistical, machine learning, and deep
learning methods have been proposed to detect anomalies in data [3, 4]. Netflix used
principal component analysis (PCA) for anomaly detection [5]. There are many other
deep learning models such as LSTMs, RNNs, and DNNs that are implemented for
anomaly detection [6]. Another perspective of detecting anomalies is in time-series
data. Data containing time variables are treated differently like other methods. Statistical methods such as seasonal and trend decomposition using Loess (STL) along
with interquartile range (IQR) and method based on generalized extreme studentized
deviate test (GESD) also used to detect anomalies in time-series data. In particular
for the application in hydropower plants, Ahmed et al. used the minimum spanning
tree approximated distance measures as an anomaly detection method [7]. Also, Liu
et al. used SCADA data mining technique for anomaly detection in wind turbines
[8]. CRISP-DM data mining used by Galloway et al. for the condition monitoring of
tidal turbines [9].
3 Methodology
Our objective is to build a model such that the model should be able to detect anomalies in data. We tried with many techniques which we have discussed above, and
we are presenting few techniques here. Machine learning models such as one class
support vector machine (SVM) and isolation forest have been discussed here. For our
study, the dataset which has been used is taken from the thermal power plant’s turbine
machine. The data was collected from a turbine through a sensor. We got very messy
and noisy data, so the first data cleansing step was performed. Features timestamp
and corresponding values of sensors extracted from data with 35,040 observations.
Before implementing, model data scaling was necessary, so we used standardization
for scaling. However, there is a lot of sensor data in the dataset but only one sensor
will be used for this paper because whatever result will be valid for all the sensors.
The depiction of the plot shown below is the behavior of a sensor over the period of
one year (Fig. 1).
Unsupervised Learning Algorithms for Hydropower’s Sensor Data
Fig. 1 Time series plot for the turbine’s sensor data
3.1 One Class Support Vector Machine
Researchers have developed many machine learning models such as tree-based,
linear, neural network and kernel models, etc. Depending on the data, different models
can be used in different scenarios. One such kernel-based model is support vector
machine (SVM) which is useful when data is not linear. SVM models can outperform
the other models when there are no clear decision boundaries in the class label. There
are two kernel-based method one is support vector machine (SVM) used for classification problems and support vector regression (SVR) used for regression problems.
The modified version of SVM is known as one class SVM which is used for solving
unsupervised problems. In classification problems, if data is not balanced and ratio
is very large between the class labels, then the anomaly detection method is most
suitable. One class SVM model is trained on normal data only such that when unseen
data is very different from normal data points, it is classified as anomalies.
Consider the feature matrix X and label class Y such that having pairs,
(X, Y ) = {(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )}
Where xi is input data and yi is output data,
yi ∈ {0, 1}
We define hyperplane,
A. Rai
Fig. 2 Anomalies in turbine’s sensor data using one class SVM
wT x + b = 0
Where w T is weight that belongs to higher dimension,
x is data points,
Such that the margin between two classes should be maximum. That means, we
have a function that can be minimized using any optimization method. Hyperplane
separates the class in such a way that there should be homogeneous data points in
each region. That means, we fixed the position of the hyperplane such that the margin
is maximum by solving objective function. Then, we need a kernel function to change
the position of the data point into the higher dimensions. The data points which are
abnormal are sent to the higher dimension and identified as anomalies.
During practical implementation first, we build a model with default parameters
kernel, gamma, nu, etc. After optimizing the parameters model classified 3388 data
points as anomalies and remaining 30,880 data points as normal (Fig. 2).
3.2 Isolation Forest
Isolation forest is another unsupervised machine learning algorithm for anomaly
detection. Most of the anomaly detection techniques focused on data points which are
normal and the data points which are not similar to normal one, treated as outliers as
we have seen in one-class SVM. But isolation forest algorithm focused on abnormal
data points and others treated as normal data points. Algorithms consider that anomalies are very few and they are different from normal data points. This makes algorithms easy to isolate those data points which are abnormal as compared to normal
data points. In isolation forest, data points divided recursively in partitions. These
partitions can be considered as trees. The number of partitions required to isolate data
points can be treated as length of depth. The anomalous data points take less partitions whereas normal data points more. This makes this algorithm faster than many
Unsupervised Learning Algorithms for Hydropower’s Sensor Data
Fig. 3 Anomalies in turbine’s sensor data using isolation forest
other techniques. In isolation forest below the equation is used to make decisions
whether the data points are anomalous or not.
Where, h(x) is length of depth to the data points,
s(x, n) = 2 −
c(n) is the average length of depth of unsuccessful search,
And n is the number of external nodes.
If the above equation gave a score close to 1 that indicates that point is anomalous
whereas much smaller than 0.5 indication for normal data points. If the score is close
to 0.5 that means the entire data does not seem to have outliers.
We performed the same data preparation steps as we did for one-class SVM. After
optimizing the model, we got 10,132 anomalies data points (Fig. 3).
4 Conclusion
Decisions about anomalies are subject to the domain. Different domains have
different meanings for anomalies. Plots have shown that one-class SVM and isolation forest are able to classify anomalies subject to parameter tuning. Anomalies data
points which are marked with red color, correctly identified as anomalies in both the
methods. But, one-class SVM has less false alarm than Isolation Forest. Isolation
forest identified more normal data points as anomalies than one-class SVM. So, we
conclude one-class SVM is best suited for our work. In particular, for the application
in hydropower, these methods will help engineers to monitor the performance of
sensors in the turbine. Not only the turbine, but the same technique can be used for
different machines as well.
A. Rai
1. Ruts, I., Rousseeuw, P.: Computing depth contours of bivariate point clouds. Comput. Stat. Data
Anal. 23, 153–168 (1996). https://doi.org/10.1016/S0167-9473(96)00027-8
2. Hardin, J., Rocke, D.: Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator. Comput. Stat. Data Anal. 44, 625–638 (2004). https://doi.org/10.
3. The Science of Anomaly Detection. Numenta, Redwood City (2015)
4. Das, K.: Detecting patterns of anomalies, 174 (2009)
5. RAD—Outlier Detection on Big Data, 10 Feb 2015. Available https://medium.com/netflix-tec
6. Vallis, O.S., Hochenbaum, J., Kejariwal, A.: A novel technique for long-term anomaly detection
in the cloud. Twitter Inc. (2014)
7. Ahmed, I., Dagnino, A., Ding, Y.: Unsupervised anomaly detection based on minimum spanning
tree approximated distance measures and its application to hydropower turbines. IEEE Trans.
Automat. Sci. Eng. 1–14 (2018). https://doi.org/10.1109/TASE.2018.2848198
8. Liu, X., Lu, S., Ren, Y., Wu, Z.: Wind turbine anomaly detection based on SCADA data mining.
Electronics 9, 751 (2020). https://doi.org/10.3390/electronics9050751
9. Galloway, G.S., Catterson, V.M., Love, C., Robb, A.: Anomaly detection techniques for the
condition monitoring of tidal turbines. In: PHM 2014—Proceedings of the Annual Conference
of the Prognostics and Health Management Society 2014, pp. 713–724 (2014)
Feature Construction Through Inductive
Transfer Learning in Computer Vision
Suman Roy and S. Saravana Kumar
1 Introduction
The concept of deep learning was ideated by Rina Dechter in 1986 [1], and Igor
Aizenberg and colleagues in 2000, which was for Boolean threshold neurons. In
1971, a paper described the deep network having eight layers being trained by the
group method of data handling algorithm [2]. Deep learning which is designed for
computer vision began with the Neocognitron introduced by Kunihiko Fukushima
in 1980 [3]. In the year 1989, Yann LeCun et al. by using standard backpropagation
algorithm to deep neural network to find out handwritten area codes on mail, the
algorithm took 3 days to train [4]. Currently, transfer learning is being evolved as
one of the main research areas among the researchers, where the use of the neural
networks in 1996 [5].
This paper is organized in following way, Sect. 2 talks about deep learning, Sect. 3
talks about transfer learning and the types of transfer learning like inductive, transductive, and unsupervised transfer learning. Section 4 talks target dataset which is
PLACE database as having very large collection of data which is used as the base
dataset for other places dataset with different characteristics similarly ImageNet,
and its other scaled down version was discussed. Finally, Sect. 5 Overall Approach,
it is observed that the overall goal is to transfer the model in the field of transfer
learning which is one of the fast evolving field for researchers in the area of machine
S. Roy
Department of Computer Science, CMR University, Bangalore, India
e-mail: suman.16phd@cmr.edu.in
S. Saravana Kumar (B)
Professor, Department of Computer Science and Engineering, CMR University / iNurture IT
Vertical, Bangalore, India
e-mail: saravanakumars81@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
S. Roy and S. Saravana Kumar
learning. Section 6 Results in the experiment, sigmoid function is used as final classifier. Section 7 Conclusions and Future Scope and concludes the effects of data size,
layers, and its impact on transfer learning.
2 Deep Learning
Supervised learning is that the machine learning task of learning a function that
maps an input to an output supported example of input–output pairs from labeled
training data comprises as described in the foundation of machine learning [6]. In
supervised learning, each example maybe a pair consisting of input data and therefore
the desired output value. The details of the neural networks [7] and how the use
and application of neural network for intelligent system play a role to determine
the learning-based systems. In the field of deep learning, we have Convolutional
Neural Network [8] (CNN), Artificial Neural Networks (ANN) [9], Recurrent Neural
Network (RNN) [10], Deep Belief Network (DBN) [11], Autoencoder [12, 13],
Generative Adversarial Networks (GANs) [14], Self-Organizing Map (SOM) [15],
and Boltzmann machine [16].
3 Transfer Learning
Transfer learning is one type of research topic in machine learning which looks at
storing knowledge obtained by solving a problem and applying it to a different but
related domain. Building a model from scratch is time consuming and costly and
depending on the availability of hardware and software. By using transfer learning,
one can see significant improvement in the performance and computation accuracy.
Transfer learning [17, 18] is the improvement of learning during a new task through
the transfer of data from a related task that has already been learned. While most
machine learning algorithms are designed to deal with single tasks, the event of
algorithms that facilitate transfer learning may be a topic of ongoing interest within
the machine learning community. There are three type of transfer learning which are
inductive [19], transductive [20], and unsupervised [20] transfer learning.
Inductive transfer learning setting is as follows for this paper,
• Labeled data in source domain not available, but in target domain, it is available—
this is known as self-taught learning.
• Labeled data in source and target domain—Are available, so source and target
tasks are learned simultaneously, this is known as multi-task learning.
Feature Construction Through Inductive Transfer Learning …
4 Target Dataset—Place Dataset
Places with 205 [21] scene categories having 2.5 million of images with a category
label. Using neural network, one can learn deep scene recognition tasks and establish
new state-of-the-art performances on scene-centric benchmarks. In the study, it is
further looked into Places365—Standard will be considered for the experiment. To
construct the Places database, one has to follow these steps by querying and then
downloading the images followed by labeling the images with respect to categories
and then scaling up the dataset by using a classifier and the further separating based
on the same class categories. Once images are downloaded, one has to categorize
them with respect to either places or scenes or image level pixels, then labeling of
images either in positive image set or negative image set by cleaning them with their
categories; the category here is a deciding factor in choosing a particular image set.
5 Overall Approach
In this paper, we have used the pre-trained weights of ImageNet [22]. After that,
the base dataset and target dataset sizes were varied to check how transfer learning
affects the overall performance. If the target dataset is different from the base dataset,
then training the whole neural network would be time consuming, but if a pre-trained
is being used which will certainly reduce the overall timing.
For inductive transfer learning, we have four different approaches which are (1)
Instance transfer, (2) Feature representation transfer, (3) Model transfer, (4) Relational knowledge transfer. For our experiment, we are using only model transfer for
5.1 Data Pre-processing
As part of data pre-processing following subset of Places205 databases [21] with
2.5 million images from 205 scene categories, 100 images per category for validation
set and 200 images for test set are used for base dataset; Places365 [23] are the target
dataset to be used for target dataset. In this paper by looking into the Places 205
dataset, ImageNet, and then further work on the subset of data rather than the full
dataset, we have used different subsets of base and target datasets.
S. Roy and S. Saravana Kumar
5.2 Model Transfer
By creating a model from which will be used to transfer the features, first train
the neural network on the source dataset; the weights and parameters of the source
model are stored in either in Tensorflow [24] and Keras [25]. The approach is in the
beginning to get the baseline; transfer learning will not be used as the model itself
is now being trained on the selected training set, once the model is tuned enough,
and then, using all the features of the model to the source task were attributes play a
major role, and we will continue with backpropagation on the newly created features.
Inception V3 has 48 Layers, whereas RESNET50 has 50, and VGG16 has 16 layers.
The transferring of the features is completed one layer at a time. Within the process,
parameters are not updated by gradient descent as our goal is to transfer the model
by looking into the total parameter, trainable, and non-trainable parameters.
5.3 Training the Datasets
To train the model, we are using Tensorflow and Keras framework [25] in Anaconda
[25] environment. To train the MODEL, a good computing power is essential as the
training is time intensive and based on the selection of CPU and GPU. Tensorflow uses
computational graph abstraction, and it provides the Tensor board for visualization.
Thus, the ratio which is being used in these models remains consistent. Finally, to
determine for how many iterations one should train the models, it is found in the
model transfer that even if we increase the size of the dataset the parameters which
are affected remains same which signifies that in model transfer data size does not
play a major role. To train the dataset at dense layer 1, “sigmoid” and at 512-layer
“relu” activation function are used. To compile the mode, binary cross-entropy is
being used.
To fine tune, the pre-trained model in VGG at 6-layer, layer names are
block2_pool, block3_conv1, and block3_conv2, is used. In INCEPTION V3
model at 299-layer, layer names conv2d_94, batch_normalization_86, and activation_88 whereas in RESNET 50 model at 155-layer, layer names res5b_branch2a,
bn5b_branch2a, activation_44 is used for the training the dataset.
5.4 Figures and Tables
In this experiment, we are using Places205 [21] dataset along with Places365 [23]
and then carried out the model in Inception V3 [26], VGG16 [27], and RESNET50
[22] models by varying the sizes of base dataset with respect to target dataset. For
the experiment, the batch size is taken as 32 and 3 epochs are used. For small size
dataset, set the time consumed as also mentioned in Table 1. We have found in this
Feature Construction Through Inductive Transfer Learning …
Table 1 Data with INCEPTION V3, RESNET50, and VGG16: 1 class
places 205,
target-places 365
(in s)
Val loss
experiment that even though we increase the data size the models take longer during
the predict the accuracy and value loss but virtually size has no impact which is
shown in the various tables below.
The first experiment is carried out for Inception V3 model, where base dataset is
“ImagePlaces205” and target dataset is “Places365.” The bases dataset is having
15,100 images, and target dataset is having 5000 images. The total parameters
are 21,802,784 in which trainable parameters are 0 and non-trainable parameters
are 21,802,784 [12]. Then, after using “sigmoid” function, the total parameters are
23,115,041 in which trainable parameters are 1,312,257 and non-trainable parameters
are 21,802,784 [13]. Then, we add the “fully connected layer” for which total parameters are 23,115,041 in which trainable parameters are 1,705,985 and non-trainable
parameters are 21,409,056 (Fig. 1).
The second experiment is carried out with RESNET50, where the base dataset is
ImagePlaces205 and target dataset is Places365. The bases dataset is having 15,100
images, and target dataset is having 5000 images. The total parameters are 23,587,712
in which trainable parameters are 0 and non-trainable parameters are 23,587,712.
Then, after using “sigmoid” function, the total parameters are 40,628,609 in which
trainable parameters are 17,040,897 and non-trainable parameters are 23,587,712.
Then, we add the “fully connected layer” for which total parameters are 40,628,609
S. Roy and S. Saravana Kumar
Fig. 1 Result in graphs, with INCEPTION V3 and 3 epochs and 1 class
in which trainable parameters are 25,972,225 and non-trainable parameters are
14,656,384 (Fig. 2).
The third experiment is carried out with VGG16 model, where the base dataset is
ImagePlaces205 and target dataset is Places365. The bases dataset is having 15,100
images, and target dataset is having 5000 images and 1 class. The total parameters are 14,714,688 in which trainable parameters are 0 and non-trainable parameters are 14,714,688. Then, after using “sigmoid” function, the total parameters are
Fig. 2 Result details. The model is used as RESNET50 with 3 epochs and 1 class
Feature Construction Through Inductive Transfer Learning …
Fig. 3 Result details. The model is used as VGG16 with 3 epochs and 1 class
17,337,665 in which trainable parameters are 2,622,977 and non-trainable parameters are 14,714,688. Then, we add the “fully connected layer” for which total parameters are 17,337,665 in which trainable parameters are 17,077,505 and non-trainable
parameters are 260,160 (Fig. 3).
The experiment is then repeated with 2 classes for Inception v3, where the base
dataset is ImagePlaces205 and target dataset is Places365. The bases dataset is having
30,200 images, and target dataset is having 5000 images with 2 classes. The total
parameters are 21,802,784 in which trainable parameters are 0 and non-trainable
parameters are 21,802,784. Then, after using “sigmoid” function, the total parameters are 23,115,041 in which trainable parameters are 1,312,257 and non-trainable
parameters are 21,802,784. Then, we add the “fully connected layer” for which
total parameters are 23,115,041 in which trainable parameters are 1,705,985 and
non-trainable parameters are 1,705,985 (Fig. 4).
The experiment is then repeated with 2 classes for RESNET50, where the base
dataset is ImagePlaces205 and target dataset is Places365. The bases dataset is having
30,200 images, and target dataset is having 5000 images with 2 classes. The total
parameters are 23,587,712 in which trainable parameters are 0 and non-trainable
parameters are 23,587,712. Then, after using “sigmoid” function, the total parameters
are 40,628,609 in which trainable parameters are 17,040,897 and non-trainable
parameters are 23,587,712. Then, we add the “fully connected layer” for which
total parameters are 40,628,609 in which trainable parameters are 25,972,225 and
non-trainable parameters are 14,656,384 (Fig. 5).
The experiment is then repeated with 2 classes for VGG16, where the base dataset
is ImagePlaces205 and target dataset is Places365. The bases dataset is having 30,200
S. Roy and S. Saravana Kumar
Fig. 4 Result details. The model is used as INCEPTION V3 with 3 epochs and 2 classes
Fig. 5 Result details. The model is used as RESNET50 with 3 epochs and 2 classes
images, and target dataset is having 5000 images with 2 classes. The total parameters are 14,714,688 in which trainable parameters are 0 and non-trainable parameters are 14,714,688. Then, after using “sigmoid” function the total parameters are
17,337,665 in which trainable parameters are 2,622,977 and non-trainable parameters
are 14,714,688. Then, we add the “fully connected layer” for which total parameters are 17,337,665 in which trainable parameters are 17,077,505 and non-trainable
parameters are 260,160 (Figs. 6, 7, 8, 9, and 10; Table 2) [28].
Feature Construction Through Inductive Transfer Learning …
Fig. 6 Result details. The model is used as VGG16 with 3 epochs and 2 classes
Fig. 7 Result in graphs with value and training accuracy
6 Result
In this paper, we have carried out the experiment how the feature construction through
inductive transfer learning in computer vision works for model transfer in INCEPTION V3, RESNET50, and VGG16. We have shown the results how a pre-trained
model as feature extractor works on different layers and then combining the layer
and using ImageNet weights in our model to avoid re-training the data which is very
much time consuming. We have also observed the total no of parameters which are
either trainable or non-trainable. We have also observed that the no. of epochs does
S. Roy and S. Saravana Kumar
Fig. 8 Result in graphs with value and training loss
Fig. 9 Result in graphs with value and training accuracy
not play a role in reaching the accuracy of the model. In the experiment, sigmoid
function is used as final classifier. In the end, we can say that model transfer one of
the significant approaches for inductive transfer learning, where saving the time and
getting a high accurate model can be achieved in a very less time, where if we need
to train our model from scratch, then it will be very cumbersome and humongous
task which will not only time consuming but also will be costly and getting a good
fast result is the need of the hour.
Feature Construction Through Inductive Transfer Learning …
Fig. 10 Result in graphs with value and training loss
Table 2 Data with INCEPTION V3, RESNET50, and VGG16 and 2 classes
Model/base-image-places Time Per/steps Epochs Train
205, target-places 365
(in s)
accuracy loss
7.2134 0.9995
1.1921 1.0000
1.1953 1.0000
1.1921 1.0000
1.1952 1.0000
1.1921 1.0000
0.0261 0.9991
1.1925 1.0000
5.5760 1.0000
1.1921 1.0000
1.7187 1.0000
1.1921 1.0000
0.0014 0.9992
1.1921 1.0000
1.1921 1.0000
1.1921 1.0000
1.1921 1.0000
1.1921 1.0000
0.0010 0.9997
1.2775 1.0000
1.2803 1.0000
1.2775 1.0000
1.2869 1.0000
1.2024 1.0000
0.0064 0.9994
1.1921 1.0000
1.1921 1.0000
1.1921 1.0000
1.1921 1.0000
1.1921 1.0000
0.0237 0.9956
1.1921 1.0000
1.7744 1.0000
1.1921 1.0000
1.1921 1.0000
1.1921 1.0000
S. Roy and S. Saravana Kumar
7 Conclusion and Future Work
In this paper, it is observed that the overall goal is to transfer the model in the field of
transfer learning which is one of the fast evolving field for researchers in the area of
machine learning with a pre-trained network with the necessary adjustment and to the
new network which is fast and time and cost saving. We have looked into INCEPTION
V3, VGG16, and RESNET50 as each of having different layers of its own and see
how each layer’s shapes while training the base and target training dataset. Further
research can be carried out for instance transfer, feature representation transfer, and
relational knowledge transfer in either same of even can be looked into domain to
domain transfer.
Acknowledgements I along with my guide would like to acknowledge that the kind of inputs and
information available is keep increasing but it is also important to use the available pre-trained
models which not only helps to understand the various work being carried out in this field, but also
huge scope exists for many future work in the field of transfer learning.
1. Dechter, R.: Learning While Searching in Constraint-Satisfaction Problems. University of
California, Computer Science Department, Cognitive Systems Laboratory (1986)
2. Ivakhnenko, A.: Polynomial theory of complex systems. IEEE Transact. Syst. Man Cybern.
1(4), 364–378 (1971)
3. Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of
pattern recognition unaffected by shift in position. Biol. Cybern. 36(4), 193–202 (1980)
4. LeCun, Y. et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput.
1:541–551 (1989)
5. Pratt, L.: Special issue: Reuse of neural networks through transfer. Connect. Sci. Retrieved
2017-08-10 (1996)
6. Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. The MIT
Press. ISBN 9780262018258 (2012)
7. Jordan, M.I., Bishop, C.M.: Neural networks. In: Tucker, A.B. (ed) Computer Science Handbook, 2nd edn. (Section VII: Intelligent Systems). Chapman & Hall/CRC Press LLC, Boca
Raton. ISBN 1-58488-360-X (2004)
8. Bengio, Y., Goodfellow, I.J., Courville, A.: Deep Learning, pp. 183–200 (2015)
9. van Gerven, M., Bohte, S.: Artificial Neural Networks as Models of Neural Information
Processing. Front. Res. Topic (2017)
10. Den Bakker, I.: Python Deep Learning Cookbook, pp. 173–189. Packt Publishing (2017). ISBN
978-1-78712-519-3 (2017)
11. Bengio, Y., Goodfellow, I.J., Courville, A.: Deep Learning, pp. 382–384 (2015)
12. Zaccone, G., Rezaul Karim, M., Menshawy, A.: Deep Learning with Tensorflow, p. 98. ISBN
978-1-78646-978-6 (2017)
13. Sammut, C., Geoffrey, I.: Encyclopedia of Machine Learning, p. 99 (2017)
14. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville,
A., Bengio, J.: Generative Adversarial Networks (2014)
15. Kohonen, T., Honkela, T.: Kohonen network. Scholarpedia https://www.scholarpedia.org/art
icle/Kohonen_network (2007)
Feature Construction Through Inductive Transfer Learning …
16. Sammut, C., Geoffrey, I.: Encyclopedia of Machine Learning, pp. 159–162 (2017)
17. Olivas, E.S., Guerrero, J.D.M., Sober, M.M., Benedito, J.R.M., López, AJS.: Handbook
of Research on Machine Learning Applications and Trends: Algorithms, Methods, and
Techniques,(2009), pp. 242–264. Information Science Reference. ISBN 978-1-60566-767-6
18. Torrey, L, Shavlik, J.: Transfer Learning. University of Wisconsin, Madison, WI (2009)
19. Olivas, E.S., Guerrero, J.D.M., Sober, M.M., Benedito, J.R.M., López, A.J.S.: Handbook of
research on machine learning applications and trends: algorithms, methods, and techniques,
pp. 245–246. Information Science Reference. ISBN 978-1-60566-767-6 (2009)
20. Arnold, A., Nallapati, R., Cohen, W.W.; A comparative study of methods for transductive
transfer learning. In: Proceedings of the 7th IEEE International Conference on Data Mining
Workshops. IEEE Computer Society, Washington, DC, pp. 77–82 (2007)
21. Place 205 Dataset. https://places.csail.mit.edu/user/download.php
22. ResNet50. https://www.kaggle.com/dansbecker/transfer-learning/data
23. Places365. https://places2.csail.mit.edu/download.html
24. https://www.tensorflow.org/
25. www.keras.io
26. InceptionV3 Model in Kaggle. https://www.kaggle.com/google-brain/inception-v3
27. VGG16 Model in Kaggle. https://www.kaggle.com/keras/vgg16
28. Pan, S.J., Yang, Q.: A survey of transfer learning (2009). https://ieeexplore.ieee.org/document/
Decoding Motor Behavior Biosignatures
of Arm Movement Tasks Using
Rakhi Radhamani, Alna Anil, Gautham Manoj, Gouri Babu Ambily,
Praveen Raveendran, Vishnu Hari, and Shyam Diwakar
1 Introduction
Human movements are usually volitional in nature for adaptation with the real-world
for activities of everyday life [1]. Emerging trends in brain–computer interface (BCI)
for decoding brain signals lightened a realistic technical possibility for connecting
machines to the human brain [2]. Many countries have been actively designing and
developing wearable assistive devices for rehabilitation such as robotic prosthetic
hands and exoskeletal orthotic hands to regain functionality, thereby improving the
quality of daily life activities and other social activities [3]. According to World
Health Organization (WHO) report on disabled population in India, 2.21% of Indian
population has one or the other kind of disability, among which 69% of the overall
disabled Indian population lives in rural areas, due to road, rail, and agricultural
injuries, with an over all of 20% of individuals reported with locomotion disabilities.
In this current situation, it was highly recommended to develop low-cost prosthetic
devices to meet needs of amputee population (https://wecapable.com/disabled-pop
Neuroscience community have been relying on low-cost surface-based EEG
portable sensors for elucidating the neural mechanisms underlying cognitive
processes such as motor movement, motor imagery, attention, memory and visual
perception on a millisecond-range [4]. It has been shown that sensory motor cortex
generates mu rhythm (7–13 Hz) and beta rhythm (15–30 Hz) with event related
desynchronization (ERD) of mu rhythms in motor imagery tasks. Mapping neurologically relevant signatures for squeeze tasks with EEG reported frontal symmetry
as regions for motor planning and motor execution characterized by predominant
R. Radhamani · A. Anil · G. Manoj · G. B. Ambily · P. Raveendran · V. Hari · S. Diwakar (B)
Amrita School of Biotechnology, Amrita Vishwa Vidyapeetham, Amritapuri Campus, Kollam,
Kerala, India
e-mail: shyam@amrita.edu
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
R. Radhamani et al.
mu and beta oscillations [5]. Recent study on decoding reach-and-grasp actions in
humans using EEG showed central motor cortex (Cz) during movement onset on
contralateral side (C1) than on the ipsilateral side (C2). Previous EEG-based studies
on left- and right-hand movement imagery indicated sensorimotor areas (C3, C4)
as functional areas which were qualitatively the same with imagination and execution processes [6]. Studies on fast finger movement and index finger movement
demonstrated the presence of the alpha and beta bands over premotor and primary
sensorimotor cortex and variations in beta power during preparation and execution of complex grasping movements [7, 8]. Studies on temporal organization of
neural grasping patterns indicated different movement covariates in grasping tasks,
as centro-parietal lower beta frequency band in pre-shaping stage and contralateral
parietal EEG in the mu frequency band in the holding stage with muscle activity [9,
10]. Research initiative is still necessary to progress the brain mapping studies for
classification of motor tasks in more realistic situations.
This paper highlights the use of low-cost EEG device for understanding behavioral changes in brain rhythms (mainly theta and gamma) attributed in cognitive
motor performance in real motor tasks, motor imagery, and visual imagery processes.
The present study intended to explore cortical activity measures as biosignature for
arm movement in diverse populations of study subjects and to characterize EEG
spatial pattern variations as an attribute for linear and complex movement-related
task applicable in activities of daily life.
2 Methods
2.1 Subject Summary and Data Collection
Sixteen able-bodied volunteering participants (6 males and 10 females, 4 left-handed
subjects and 12 right-handed subjects; age range of 18–40 years; mean age 25.4 years)
without diagnosis of neurological impairment or motor dysfunctions voluntarily
participated in this study. Vividness and motor imagery questionnaire (VMIQ) was
employed for testing visual and kinesthetic imagery of motor tasks. Physiological
data were also measured, and subjects were duly informed about the purpose of the
research. An open consent was collected from all the participants. The data collection
and methodology were approved by ethics review board of the university.
2.2 Design of Experiment and Execution Steps
As a reference to analyze motor control, subjects were asked to perform steady hand
game with minimum errors as possible. The experiment consisted of three categories:
Decoding Motor Behavior Biosignatures of Arm Movement …
Fig. 1 Illustration of linear and complex movement patterns on a marble board game
motor execution as in motor performance, motor imagery, and visual imagery, for
upper left/right arm movement.
A marble board game (40 cm * 40 cm, 9 g) was selected for the study, as it involves
precise motor in daily life activities of holding and moving objects. Participants were
seated comfortably with both forearms resting on the chair’s arm, and a computer
screen was in front of them. In a linear movement, using pincer grasp, subjects
must move the marble ball from left to right and vice-versa with both left and right
hands. In complex movements, subjects were asked to move marble in clockwise and
anticlockwise according to the visual cue provided (Fig. 1).The linear task involved
both lateral and medial rotation, partial flexion of the forearm and partial flexion of
the arm. The complex task involves a series of movements, movement from O to A
involved arm flexion, forearm extension, A to B involved gradual lateral rotation,
partial flexion of the forearm and acute flexion of the arm, in B to C arm remained
in anatomical position and the forearm gets completely flexed, C to D have gradual
medial rotation, acute arm flexion and partial flexion of the forearm, whereas in D
to A, forearm is extended and the arm is flexed.
2.3 Design of Experiment and Execution Steps
EEG signals were recorded using 14 + 2 electrode system according to the 10–20
international electrode location system, at a sampling frequency of 128 Hz. The linear
task lasted for 25 s and complex task lasted for 40 s in time for each anticlockwise
and clockwise movement pattern (Fig. 2).
Offline signal processing of EEG data was performed using EEGLAB tool. EEG
data were filtered between 0.01 and 60 Hz using filtering techniques. Power spectrum density (PSD) for specific frequency distribution (Delta, 0.01–3 Hz, Theta,
4–7 Hz, Alpha, 8–12 Hz, Beta, 13–30 Hz, and Gamma, 31–60 Hz) at brain lobes
was computationally estimated.
R. Radhamani et al.
Fig. 2 EEG recording protocol with time periods among movement tasks
3 Results
3.1 Increased Θ and γ Activity in Frontal and Temporal
Regions in Motor Planning Phase of Cognitive Tasks
In cortical mapping of specific time bins of planning phase of the motor execution
(ME) tasks, it was observed that averaged θ wave oscillations were higher in frontal
regions (AF3, AF4, F3, and F4) and γ wave activations were observed in temporal
regions (T6, T5). In motor imagery tasks, θ, and γ wave activations were found
in temporal (T4, T6), frontal (F7, F8, F3, F5) regions, respectively. Motor planning
phase of visual imagery showed γ and θ wave in the frontal (F7, FC5, F3, F8) regions
(Fig. 3). As in any cognitive performances, beta (β), delta (δ), and alpha (α) brain
rhythms were also noted at different brain lobes.
3.2 Behavioral Variations in Θ and γ Rhythms in Frontal,
Temporal and Parietal Lobes in Motor Execution Phase
of Cognitive Motor Performances
During motor execution time bin, θ and γ rhythms activity were observed in the
anterior frontal region (AF4). In motor imagery tasks, no significant power spectrum
distribution variations in θ rhythms were observed, while γ rhythms were visualized
in the frontal region (F8). In visual imagery task, θ rhythms were observed in the
frontal and temporal (AF4, F8, T4) regions and δ rhythms were visible in frontal
(AF3, F3, and AF4) regions (Fig. 4). β, δ, and α were also noted at different brain
Decoding Motor Behavior Biosignatures of Arm Movement …
Fig. 3 Power spectral density plot indicating cortical activation differences during the motor
planning phase of different cognitive tasks
3.3 Activity Related Increase of Theta and Gamma Waves
in Frontal and Temporal Regions in Linear and Complex
Motor Actions
In clockwise movement, during motor execution phase, γ and θ rhythms were
observed in frontal regions (AF3, AF4, F8, F7), lobes in real motor action. In anticlockwise movement pattern, θ rhythms were observed in frontal (F8, AF3, AF4)
regions, γ rhythms were observed in the temporal regions (T4, T6). In anticlockwise
movement of MI task, higher spectral power of θ was observed in frontal (AF4) and
occipital (O2) regions and no significant spectral power variations in γ rhythms. In
clockwise movement of θ and γ waves were higher in frontal (AF3, AF4, F8, F7)
regions. In VI anticlockwise movement pattern, it has been observed that θ and γ
rhythms were higher in the frontal regions (AF3 and AF4) of brain (Fig. 5). Lefthanded and right-handed subjects showed theta and gamma spectral variability as
biosignatures for specific arm movement tasks (data not presented due to limited
number of left-handed subjects for validation).
R. Radhamani et al.
Fig. 4 Heatmaps showing activation of EEG rhythms during motor execution phase of varying
cognitive tasks
4 Discussion
Toward understanding relevant neural circuitry patterns of motor movements, the
present study focused on understanding cortical activity and associated neural
dynamics for analyzing specific biosignature in an upper limb movement task. Our
methodology depicted attempts to decode neural correlates of real-time motor tasks,
imagined and visually imagined linear and complex patterns of movement that was in
parallel with activities of daily life. Higher intensity of θ rhythms in anterior frontal
regions and γ rhythms in temporal lobe during the motor planning indicated substantial information about movement initiation and execution. Higher spectral power
density of γ rhythms in the frontal region of VI tasks indicated the salient changes of
functional activation in motor cortex areas associated with different cognitive tasks.
Activation of θ waves in the frontal regions in motor execution phase of real
motor tasks indicated the activation of the sensorimotor cortex with the stimuli.
During motor imagery and visual imagery tasks, θ and γ waves have no significant
variations indicating lack of substantial stimuli extraction for movement execution
process. Moreover, motor execution and motor imagery showed similarity patterns
Decoding Motor Behavior Biosignatures of Arm Movement …
Fig. 5 Topographical plots of spectral power density showing brain rhythm pattern variations
related to direction-related patterns of motor tasks
in γ wave activation indicated that motor imagery tasks follow similar neurological
patterns as in the actual movement scenario. During motor execution, brain rhythms
were predominant in ipsilateral regions, whereas imagery tasks have contralateral
activation of brain regions. Initial cortical mapping studies on varying patterns of
brain rhythms in left- and right-handed subjects for complex movement indicated
handedness as a factor which influences sensorimotor rhythm distribution during
motor performances. The pattern of activity of the linear and complex movement
patterns indicated kinaesthetic variations during voluntary actions. Data need to be
validated with support vector machine classifier to predict accuracy.
5 Conclusion
This study with ME, MI, and VI task mapped cortical activity of the brain related
to initiation and execution of motor activity among diverse populations. The results
could be applied for feature extraction in classifying patterns of activation in brain
rhythms for different motor tasks using low-cost EEG.
Acknowledgements This work derives direction and ideas from the Chancellor of Amrita Vishwa
Vidyapeetham, Sri Mata Amritanandamayi Devi. This work is partially funded by Embracing the
World Research-for-a-Cause initiative.
R. Radhamani et al.
1. Cordella, F., Ciancio, A.L., Sacchetti, R., Davalli, A., Cutti, A.G., Guglielmelli, E., Zollo, L.:
Literature review on needs of upper limb prosthesis users. Front. Neurosci. 10, 1–14 (2016).
2. Schwartz, A.B., Cui, X.T., Weber, D.J.J., Moran, D.W.: Brain-controlled interfaces: Movement
restoration with neural prosthetics. Neuron 52, 205–220 (2006). https://doi.org/10.1016/j.neu
3. Ou, Y.-K., Wang, Y.-L., Chang, H.-C., Chen, C.-C.: Design and development of a wearable
exoskeleton system for stroke rehabilitation. Healthcare 8, 18 (2020). https://doi.org/10.3390/
4. Alazrai, R., Alwanni, H., Baslan, Y., Alnuman, N., Daoud, M.I.: EEG-based brain-computer
interface for decoding motor imagery tasks within the same hand using Choi-Williams timefrequency distribution. Sensors (Switzerland) 17, 1–27 (2017). https://doi.org/10.3390/s17
5. Krishnan, M., Edison, L., Radhamani, R., Nizar, N., Kumar, D., Nair, M., Nair, B., Diwakar, S.:
Experimental recording and computational analysis of EEG signals for a squeeze task: Assessments and impacts for applications. In: International Conference on Advance Computing,
Communications and Informatics, ICACCI 2018, pp. 1523–1527 (2018). https://doi.org/10.
6. Bodda, S., Chandranpillai, H., Viswam, P., Krishna, S., Nair, B., Diwakar, S.: Categorizing
imagined right and left motor imagery BCI tasks for low-cost robotic neuroprosthesis. Int.
Conf. Electrical and Electronical Optimization Technologies, ICEEOT 2016, pp. 3670–3673
(2016). https://doi.org/10.1109/ICEEOT.2016.7755394
7. Schwarz, A., Ofner, P., Pereira, J., Sburlea, A.I., Müller-Putz, G.R.: Decoding natural reachand-grasp actions from human EEG. J. Neural Eng. 15, 1–15 (2018). https://doi.org/10.1088/
8. Gudiño-mendoza, B., Sanchez-ante, G., Antelis, J.M.: Detecting the Intention to move upper
limbs from electroencephalographic brain signals. Comput. Math. Methods Med. 2016, 1–11
(2016). https://doi.org/10.1155/2016/3195373
9. Sburlea, A.I., Müller-Putz, G.R.: Exploring representations of human grasping in neural, muscle
and kinematic signals. Sci. Rep. 8, 1–14 (2018). https://doi.org/10.1038/s41598-018-35018-x
10. Cisotto, G., Guglielmi, A.V., Badia, L., Zanella, A.: Classification of grasping tasks based on
EEG-EMG coherence. In: 2018 IEEE 20th International Conference on e-Health Networking,
Application Services, pp. 6–11 (2018). https://doi.org/10.1109/HealthCom.2018.8531140
Emergency Robot
Shubham V. Ranbhare, Mayur M. Pawar, Shree G. Mane,
and Nikhil B. Sardar
1 Introduction
Nowadays, terrorism and other security issues have grown to a peak which causes
headache for many nations including India. In order to encounter these issues, India
has developed its own defense method, i.e., commando operation. In this operation,
soldiers die while performing rescue or encounter operation due to lack of equipment.
Commandoes need advanced rescue tools for successful rescue operations. This
project work shows how a remote-controlled robot can be used to carry the operation
of commandoes. The main application of this robot is to search and rescue. Most of
the applications are related to environment and security basis; to prevent the critical
situation and risking life, these bots are good in assisting during military operations
[1]. The main objective of this robot is to ensure the safety of the workers and
provide the necessary equipment. The work performed on the initial stage of sensing
and controlling is shown in this paper. Related research is also provided using the
references [2, 3]. The tests performed during the build of the robot are also described.
The overall design is described using the appropriate structure and data.
This robot can detect:
Presence of terrorists hiding in buildings.
S. V. Ranbhare (B) · M. M. Pawar · S. G. Mane · N. B. Sardar
MIT Academy of Engineering, Alandi, Pune, Maharashtra 412105, India
e-mail: svranbhare@mitaoe.ac.in
M. M. Pawar
e-mail: mmpawar@mitaoe.ac.in
S. G. Mane
e-mail: sgmane@mitaoe.ac.in
N. B. Sardar
e-mail: nbsardar@mitaoe.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
S. V. Ranbhare et al.
Presence of metallic weapon or explosives.
Presence of poisonous or inflammable gasses.
These techniques are based on processing speed and accuracy that provides various
algorithms for threat detection. Further, we discuss a few challenges in the detection
process which gives a conclusion.
1.1 Literature Survey
At the initial stage, the robot was designed by following the rules of Isaac Assimo.
This law says that the robots should not harm any human being and the Robot has
to follow the instructions given by an instructor. Lots of soldiers were killed in
the war, so nations started using technologies. As per the data available on “all of
robots: a robotics history,” the robot used in war was designed by German and the
design changes in the robotics, changed the principal of Assimo in the nineteenth
century. Tanks have undergone some modifications which were done by replacing
some equipment with electronic devices. By considering the environmental conditions, the tanks were designed, in such that, they could be controlled from a distance
of 500–1500 m away.
2 Description of Vehicle Section
The robot works on RF Tx/Rx [3], controller, robotic sensors, LCD display, and
power supply. The initial work or the model is required for detecting, sensing, and
controlling a robotic system. In this section discuss the two sensors and two different
devices to control the robot.
2.1 Gas detector
This type of device is used widely in the chemical industry and can be found in
locations, such as bunkers and gas companies. Also, use in gas leak and measurement
of the explosion [4].
Emergency Robot
2.2 Metal Detector
The main objective of the metal detector is to identify the devices which are hidden
in complex location or detection of landmines that come in the track of military men during any operation. These metal detectors work on the principles of
electromagnetism [5].
2.3 Web Camera
To make this robot more effective in all dimensions, we mounted the webcam; it
gives live footage of the area, and this can guide us to the perfect information in
less time and is more versatile to use. This camera uses infrared light to cove the
information at night or less light [6].
2.4 Gun Control
It is used to detect the target provided by leaser.
Using those sensors and electronics devices, the system allows us to control the
robot more effectively over speed and operations. The robot can be featured with
the collision avoidance sensors like ultrasonic and more efficient suspensions and a
drone carrying facility.
3 System Implementation
3.1 Work Flow of the Model
Initialization of accelerometer with ground as reference.
Setting potentiometer with reference as control signal.
Transmitting signal using RF transmitter communication.
Collecting information at the receiver side (Robot) and transmitting it to the
After processing the signal, send to motor drivers to perform actions.
At receiver side (Robot), if any threat gets detected, then transmit to RF transmitter
on robot.
This signal will be received by the RF.
Received RF signal is send to controller and then to LCD display to process.
S. V. Ranbhare et al.
The flow chart shows the running process of program and a short description of the
actual project. By interfacing all the components with 8051 controller and debugging
will get the required outcome. At the very first stage, the robot is null and initialized.
When any command is asked to perform, the instruction is first compared, and then,
the process of actual movement is started.
The operation includes roaming in surrounding and acquiring an actual position.
If any threat is detected by the robot, then it immediately intimates the user, about
the detection of the actual threat. It can be gas detection, metal detection, or any
terrorist. The indication is given by beeping on the controller side and by the actual
image captured by the camera. This helps the team to understand the situation and
get information (Fig. 1).
3.2 Troubleshooting
It is very common that accurate output does not get at the very first stage; the same
case is with me. Setting up with the communication channel was the major problem.
In that case, you just set up a wired communication. In short, you connect the Dout
(Data Out) pin in the encoder IC directly to the Din (Data In) pin of the decoder. Then,
you check whether the address of both the encoder and decoder is the same, and later,
you should go for any VCC and GND connections. If any successful communication
link is not established, then change resistance as they are responsible for the change
in frequency. If any successful communication link is not established, then you have
to change the encoder and decoder IC. Once the communication is established, you
can connect the RF module.
4 Proposed System
The proposed system is of transmission on both the sides [7], i.e., robotic transmitter
and human transmitter. The robotic transmitter is used to inform the controller about
the present situation and the detection of metal, gas, and the presence of terrorists.
The human transmitter is a controlling device; this provides the instruction to the
robot (Fig. 2).
4.1 Controlling Section
The proposed system is of reception of data on both side, i.e., human receiver and
robotic receiver. The human receiver collects the data send by the robot. The data
can be a detection signal (Figs. 3 and 4).
Emergency Robot
Fig. 1 Flow chart
Initialize and if
Robo side=1&0
Signal Transmission from user
Signal Receiving from Robo
Sensor detecting & webcam
If Received=
Survive area & send signal back
to user side
Beep buzzer & LED ON
receiver side
S. V. Ranbhare et al.
Fig. 2 Block diagram of human transmission/reception
The robotic receiver is a decoding section which decodes the transmitted signal
by the transmitter and act according to the instructions programmed.
Emergency Robot
Fig. 3 Block diagram of
receiver section
Fig. 4 Block diagram of robotic transmitter and receiver section
S. V. Ranbhare et al.
Fig. 5 Robotics transmitter
5 Hardware Implementation
5.1 Circuit Diagram
Figure 5 is a transmitting section from a robot which transmits signals when it detects
the threat.
Figure 6 is a receiver section of robot. When the signal from the user is transmitted, this section is responsible for the reception of signal and performs actions
accordingly. Actions involved are front movement, back movement, right turn, left
turn, camera rotation.
Fig. 6 Robotics receiver
Emergency Robot
Figure 7, section which transmits the controlling signals of the robot. The signals
are generated from the accelerometer and compared with reference. This provides
variety of combinations and is able to control the robot accordingly. Figure 8 is simply
a display circuit. The threat detected by the robot is received here and displayed on
the LCD display.
Fig. 7 Human transmitter
Fig. 8 Human receiver
S. V. Ranbhare et al.
Fig. 9 Working motor
6 Result
Figure 9 shows the working of the robotic motors used for traveling and taking
accurate position. Figure 10 shows the initial condition of all sensors, where the
robot has not detected any object or gas.
Figure 11 shows the result, where metal detector has detected a threat of landmines
or grenade. Figure 12 shows the result, where gas detector has detected a threat of
poisonous gas or gas leak. Figure 13 shows the actual model and the size of the
robot after mounting the equipment and sensors. Figure 14 shows the mounting of
RF communication and the gun triggering mechanism. Figure 15 shows the gun
controlling direction and the guidance provided by camera to perform any action.
7 Conclusion
This emergency robot can easily move to any location by detecting the threats and
giving clear information about the situation. Due to camera implementation, the
image can be transmitted to a specific receiver for more details. The robot can be
used on the war field or in a situation, where humans cannot enter. The robot has the
Emergency Robot
Fig. 10 Indicating display
Fig. 11 Metal detected
capability to move in rocky and slope regions. This robot can also be used in disaster
situations such as floods and emergency service providers. This robot has the ability
to detect a gas leak and detect underground metal can help military men to make or
find the path. The robot is light in weight and can be carried to any location. It is
Fig. 12 Gas detected
Fig. 13 Actual size
Fig. 14 RF module
S. V. Ranbhare et al.
Emergency Robot
Fig. 15 Mounted gun
easy to disassemble and reassemble the robot at any location. It has the capability to
be carried by a drone and can support any military applications.
8 Future Scope
The robot can be improved by using advance processors, thermal sensors, rotational
cameras, and advancement in robot location by Global Positioning System (GPS);
also drone of appropriate capacity can be used to shift the robot from one location
to another.
1. Bainbridge, W.A., Hart, J.W., Kim, E.S., Scassellati, B.: The benefits of interactions with
physically present robots over video-displayed agents. Int. J. Soc. Robot. 3(1), 41–52 (2011)
2. Gao, G., Clare, A.A., Macbeth, J.C., Cummings, M.L.: Modeling the impact of operator trust
on performance in multiple robot control. In: 2013 AAAI Spring Symposium Series (2013)
3. Cardozo, S., Mendes, S., Salkar, A., Helawar, S.: Wireless communication using RF module
and interfacing with 8051 microcontroller. IJSTE—Int. J. Sci. Technol. Eng. 3(07) (2017). ISSN
(online): 2349-784X
4. Parasuraman, R., Miller, C.A.: Trust and etiquette in high criticality automated systems.
Commun. ACM 47(4), 51–55 (2004)
S. V. Ranbhare et al.
5. Nováček, P., Ripka, P., Pribula, O., Fischer, J.: Metal detector, in particular mine detector. G.
Kellermann. 09/04; 2005/01/08). US Patent 7265551, 2007
6. Mehta, L., Sharma, P.: Spy night vision robot with moving wireless video camera. Int. J. Res.
Eng. Technol. Manag. (IJRETM) (2014)
7. Sample, A., Smith, J.: Experimental results with two wireless power transfer systems. In: Radio
and Wireless Symposium 2009. RWS ‘09. IEEE, pp. 16–18 (2009)
Type Inference in Java: Characteristics
and Limitations
Neha Kumari and Rajeev Kumar
1 Introduction
Java is a static-typed language. This implies that the type of a variable is known before
compilation. The type of variable can be assigned in two ways; one is the explicit
way where the programmer mentions the type of variable in code, and the other
is the implicit way where compiler automatically deduces the type based on code
information. This automatic deduction of type reduces verbose codes and simplifies
code writing and readability. Type inference was added in Java 5 to reduce the burden
of explicit declaration for generic methods. The Java compiler was enhanced to infer
the type of generic methods using context information. Since then, the Java type
inference algorithm has evolved in its several editions. Earlier type inference was
limited to generic methods whereas Java 10 onwards type inference for local variables
is also included. Such enhancements in the type inference algorithm has improved
its scope in languages. However, there are several situations where the compiler is
unable to infer a type despite the context. One such case is to infer wildcard as a return
type of generic method. Java’s support for backward compatibility and complex type
system is some of the reasons for this.
A type inference algorithm should be sound. An inference algorithm is sound when
it always results in well-typed for the inference variables. Java type inference does
not always infer a proper type. In contemporary time, several sound type inference
algorithms have been proposed which are based on machine learning techniques
[1, 6, 12]. These algorithms use a corpus of program code to train the learning models
(e.g., RNNs, GNNs), and then, these trained models are used to predict the type. Such
N. Kumari (B) · R. Kumar
School of Computer & Systems Sciences, Jawaharlal Nehru University, New Delhi 110067, India
e-mail: nkumari.cse@gmail.com
R. Kumar
e-mail: rajeevkumar.cse@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
N. Kumari and R. Kumar
learning-based algorithms can enhance Java static type inference to become sound. In
this paper, we observe incremental changes in the Java type inference algorithm and
discuss its limitations. Our aim is to exhibit the limitations and pitfalls of Java type
inference mechanism. Thus, this study can help to enhance the inference algorithm
in different contexts.
This paper is organized as follows: Sect. 2 discusses evolution of type inference
algorithm in mainstream Java. Section 3 elaborates a few limitations and pitfalls of
Java type inference algorithm. Section 4 discusses various aspects of type inference
algorithms. Finally, we conclude this paper with future work.
2 Type Inference in Java
Java type inference algorithm is local scoped. Therefore, it searches for constraints
or type information in same expression or statement. The following three steps are
the basis of Java type inference: reduction, incorporation, and resolution [5].
• Reduction is a process where constraints reduce to a set of bounds on inference
• Incorporation ensures a set of bounds on inference variables, which are consistent
when new bounds are added.
• Resolution examines the bounds and determines an instantiation that is compatible
with those bounds.
In Java, type inference algorithm evolves from its several editions. Due to backward compatibility and upward upgradation principal, the Java type inference evolved
in a restricted manner. Here, we discuss such incremental changes in inference algorithm of Java. We include only those versions of Java where major changes occurred
in type inference algorithm.
2.1 Java 5
In Java 5, type inference was introduced for generic methods. Without type inference,
a generic method must specify type argument explicitly for each method invocation.
public static <T> void sample(T t) {}
//Generic method invocation with explicit type
In the above example, T is explicitly initialized. The type inference algorithm enables
a compiler to automatically infer the type parameter based on the target type. Consider
the following example:
Type Inference in Java: Characteristics and Limitations
public static <T> void sample(T t) {}
//Generic method invocation without explicit type
2.2 Java 7
The diamond operator, introduced in Java 7, enables type inference for instance
creation. The generic class type parameter infers the type using assignment context.
Consider the following example:
//Before Java 7
SampleClass<Integer> Obj = new SampleClass<Integer>();
//Java 7 Onwards
SampleClass<Integer> Obj = new SampleClass<>();
In this example, the compiler infers the type “Integer” for the formal type parameter
of the generic class SampleClass<X>.
2.3 Java 8
In Java 8, type inference mechanism expanded to method invocation context.
//assignment context
List<Integer> list1 = new ArrayList<>();
// method invocation context
List<Integer> list2 = SampleClass.m(new ArrayList<>());
In the first code example as above, the missing constructor parameter is inferred
from the left-hand side of the assignment, whereas in second code, the diamond
operator appears in the method invocation context. Prior to Java 8, the type parameter
inference within method invocation context was not valid [2].
2.4 Java 10
The local variable type inference in Java 10 extends type inference for local variables
[4]. It introduced a reserved type “var” which infers the type of a variable through
var x = new ArrayList<Byte>();//infer ArrayList<Byte>
var y = list.stream();
N. Kumari and R. Kumar
// infer Stream<String>
Unlike earlier versions of type inference in Java, it can infer all kinds of local
variables with initializers and is not only limited to generic types and lambda formals.
This variable type inference has eased code writing and also improved readability by
removing redundant code. However, there are several risks associated while using it,
for example, use of var without initializers, improper use of var type, etc. Therefore,
a programmer needs to be careful with the use of local type inference.
3 Limitations
In the above section, we discussed how type inference is evolved in Java and reduces
verbosity and boiler codes. It also improves code writing. However, there are several
restrictions that make type inference algorithm less expressive and underutilized. For
example, Var type cannot be used as a method return type, argument type, field type,
lambda expressions, etc. Also, there are several situations where inference causes
error only due to the sophisticated structure of the type system. In the following
sub-sections, we mention some such cases where type inference can be achieved but
the sophisticated type system restricts to infer.
3.1 Wildcard as Return Type
The return type of a method can be inferred based on assignment context, but the
type inference algorithm fails to infer return type if there is a wildcard.
List<?> method(List<?> list){ ....return list;}
List<String> str = new ArrayList<>();
//error: incompatible types
List<String> result= obj.method(str);
The above code works if the wildcard gets replaced by a type variable. Consider
the following example:
List<T> method(List<T> list) { ....return list;}
List<String> str = new ArrayList<>();
List<String> result= obj.method(str); //inferred.
The lack of wildcard inference fails to type check and allows explicit cast for
an incompatible type. In following code, the illegal cast of a list of string type is
accepted for a list of integers. Due to this, String values can be added to an Integer
Type Inference in Java: Characteristics and Limitations
List<?> method(List<?> list){ ....return list;}
List<Integer> integer = new ArrayList<>();
List<String> result=(List<String>)obj.method(integer);
System.out.println(integer); // Output: [10,string]
3.2 Chained Method Call
Java type inference uses assignment context to infer generic methods. However,
the assignment context fails to infer when generic method invocations are chained
class Sample<Z>
<T> Sample<T> m1(){ Sample<T> t=new Sample<>();
return t;};
Z m2(Z z){ return z;}
//works for single method invocation
Sample<String> s = Sample.m1();
//error: Object cannot be converted to String
String s1 = Sample.m1().m3("str");
The above code will work when the programmer explicitly mentions the type of
generic method as follow.
String s2 = Sample.<String>m1().m3("str"); // Valid
3.3 Local Variable Inference
As mentioned in Sect. 2.4, the “var” keyword is used to infer local variables only.
It cannot infer those variables that can appear explicitly in class files, for example,
field type, array type, method parameters, and return types. Also, it cannot infer
variables without an initializer as the choice of type depends on such expression only.
Apart from these limitations, this local type inference requires proper guidelines as
mentioned in [4] to avoid risks associated with it, for example, uses of primitive
literals. The var type cannot differentiate among numeric values, and this infers all
numeric values as integer.
var x=127;
byte y=x; //error:lossy conversion from int to byte
var y=(byte)x; //need to cast explicitly
N. Kumari and R. Kumar
Here, x is within the range of byte type, but var infers it as int type. Therefore, we
get incompatible type error when assigned it to a byte type.
4 Discussion
Type inference is a crucial feature of several programming languages. Functional programming languages like ML, Ocaml, and Haskell support complete type inference.
These languages follow Damas and Milner’s globally scoped inference algorithm,
which infers type based on the functionality of values [3], whereas the static objectoriented programming languages (OOPLs), like Java, Scala, and C# follow a local
type inference approach which is based on the declaration of local information. Both
type inference systems have their limitations. For example, Milner’s type inference
system does not support subtyping and the local type inference system is incapable
to infer global variables. Complete type inference is challenging for OOPLs like
Java. There are several situations where explicit type annotation is necessary, such as
subtyping. Moreover, the local type inference system in Java suffers from some limitations, and there are some issues, for example, uncertain inference while performing
join operations on wildcards, inconsistency with wildcard capture conversion, etc.
These issues of Java type inference have been discussed in [9, 10]. The restricted Java
type system and its support for backward compatibility are some of the reasons for
type inference issues. Several approaches have been suggested for the improvement
of unsound Java type inference algorithm [8, 10, 11], and to avail complete solution
for Java type inference system [7].
5 Conclusion
The Java type inference algorithm is evolving. As of now, it can infer the type of
local variables also. This feature of Java compiler has reduced loads of verbose
and boilerplate codes and enhanced code readability. However, Java type inference
algorithm is limited in use. There are many situations when types are not inferable
instead of the availability of required type information. Moreover, the type inference
algorithm does not guarantee to infer a proper type. Java type inference algorithm
needs to enhance such that it can infer more types and ensures sound result.
In future, we aim to use machine learning techniques to develop a sound type inference system for Java. In contemporary time, several machine learning-based type
inference algorithms have been proposed for a variety of programming languages
[1, 6]. These machine learning models understand which types naturally occur in
certain contexts, and based on this, machine learning techniques provide type suggestions. This can be verified by a type checker. The machine learning approaches
ensure safety that can help to develop a robust type inference algorithm for Java.
There are various aspects of Java program from where we can get information about
Type Inference in Java: Characteristics and Limitations
a variable type, for example, identifier name, associated comments, code usage patterns, etc. Such data helps to train a learning model that can precisely predict the
type of a variable.
1. Boone, C., de Bruin, N., Langerak, A., Stelmach, F.: DLTPy: Deep learning type inference of
Python function signatures using natural language context. arXiv preprint arXiv:1912.00680
2. Cimadamore, M.: JEP101: Generalized target-type inference. Last accessed: 2020. https://
3. Damas, L., Milner, R.: Principal type-schemes for functional programs. In: Proceedings of 9th
POPL, pp. 207–212, New York. ACM (1982)
4. Goetz, B.: JEP 286: Local-Variable Type Inference. Last accessed: 2020. https://openjdk.java.
5. Gosling, J., Joy, B., Steele, G., Bracha, G., Buckley, A.: The Java Language Specification (Java
SE 8 edition) (2015)
6. Hellendoorn, V.J., Bird, C., Barr, E.T., Allamanis, M.: Deep learning type inference. In: Proceedings of 26th ESEC/FSE, pp. 152–162, New York. ACM (2018)
7. Plümicke, M.: More type inference in Java 8. In: Proceedings Perspectives of System Informatics, pp. 248–256, Berlin, Heidelberg. Springer (2014)
8. Smith, D.: Designing Type Inference for Typed Object-Oriented Languages. Ph.D. thesis, Rice
University, USA (2010)
9. Smith, D., Cartwright, R.: Java type inference is broken: can we fix it? In: Proceedings of 23rd
OOPSLA, pp. 505–524, New York, USA. ACM (2008)
10. Tate, R., Leung, A., Lerner, S.: Taming wildcards in Java’s type system. In: Proceedings of
32nd PLDI, pp. 614–627, New York, USA. ACM (2011)
11. Torgersen, M., Plesner Hansen, C., Ernst, E., von der Ahé, P., Bracha, G., Gafter, N.: Adding
wildcards to the Java programming language. In: Proceeding of 19th SAC, pp. 1289–1296,
New York, USA. ACM (2004)
12. Wei, J., Goyal, M., Durrett, G., Dillig, I.: LambdaNet: Probabilistic Type Inference Using
Graph Neural Networks. arXiv preprint arXiv:2005.02161 (2020)
Detection and Correction of Potholes
Using Machine Learning
Ashish Sahu, Aadityanand Singh, Sahil Pandita, Varun Walimbe,
and Shubhangi Kharche
1 Introduction
Traffic congestion has been increasing worldwide as a result of increased motorization, urbanization, population growth and changes in population density. Congestion
reduces utilization of the transportation and infrastructure and increases travel time,
air pollution, fuel consumption and most significantly, traffic accidents. There is an
exponential increase in the population of Mumbai. As people live in far places from
their offices, they believe in commuting through road transport or trains. This has
led to faster corroding of the roads. These roads are left attended because of huge
traffic, delays and accidents which can cause mental strain and in some cases can
be fatal. At present, there are various ways to detect the potholes either manually or
automated. The main aim of our project is to detect the pothole automated and also
repair it without us being present there. Along with that our system is quite cheaper
than other proposed systems. To reduce the potholes, we have decided on a mechanism where it could simultaneously detect and store the data of the pothole in the
database. For transmission of data, we use LoRaWAN module which can send data
of the pothole in the database. Then it would send a robot without any manual assistance to that place to correct it. This reduction of manual labour will also lead less in
time consumption for the correction of pothole. In our system, image processing is
an integral part of the detection of potholes. Plus, for the transmission of the images
and data, we are using LoRa instead of Zigbee, as LoRa can transmit data over longer
ranges, i.e. ~10 km. Also the project deals with the filling of potholes which has not
been attempted earlier on, here instead of asphalt we are using chip filling which can
greatly reduce the recurring of the filling. Also by using chip filling techniques, we
A. Sahu (B) · A. Singh · S. Pandita · V. Walimbe · S. Kharche
Department of Electronics and Telecommunications, SIES Graduate School of Technology, Navi
Mumbai, India
e-mail: ashishsahu26041998@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
A. Sahu et al.
can reduce the carbon emission by the production of asphalt resulting in a greener
world and environment. The project will improve the efficiency of road maintenance
and reduce the labour requirement for the same.
2 Literature Survey
We know that the potholes are a nuisance to the society and well-being of the humans.
Several measures have been taken in reducing the number of potholes or detecting it
to make the driver of the potholes aware of it beforehand. In one such case, a camera
was used to detect any erosion on the roads by using various image processing techniques like Canny edge detector, dilation and contouring. It was quite portable as
there was a reduction in non-portable objects as computers were used. Since image
processing was used, it was relatively faster than any other previous processes. The
system produced an average accuracy, sensitivity and specificity within the expected
output, and the success rate of sending reports of detection was found to have no
errors. The value of the accuracy achieved during the testing showed that the system
was excellent in terms of the overall performance [1]. Another idea was presented
with an integrated approach to road distress identification. In this the most important
thing was a set of multiple algorithms working synergistically for a common goal.
By adjusting and changing various functions like pixel, sub-image or object and the
number of signals to be processed, they were able to achieve robust performance
and computational tractability [2]. Another paper suggested the use of 2D LiDAR
and camera for a pothole detection system. It used specifically two LiDARs so as to
scan the wide road more accurately. After that they developed the system algorithm
including filtering clustering, line extraction and gradient of data function. Error rate
of pothole detection system shows the performance of developed system. One of
the novel things was that 3D pothole detection can be performed using 2D LiDAR.
Pothole detection using video data is combined with that of 2D LiDAR, and combined
data gives more accurate pothole detection performance [3]. In another paper, the
problem was tackled by designing a Wi-Fi-based infrastructure enabling application data transfer to the vehicles moving on the roads. In this method, the driver is
given early notifications on the roads condition so as to assist the driver in making
strategic and real-time tactical decisions in varied environments. The architectural
design and system support for the pothole detection and warning system ensured that
the driver gets information about potholes well in advance and has sufficient time
to take decision according to the prevailing road conditions [4]. In another paper, it
countered the problem by using an ultrasonic sensor for the pothole detection and
Zigbee module pair for communication. The proposed model uses NXP LPC1768
microcontroller for taking decisions about controlling the speed of the vehicle. Its
main aim was to achieve proper and efficient detection of pothole and communication between multiple vehicles. Zigbee was used for multi-vehicular communication
establishment. Pothole detection is an important feature of the autonomous vehicles, and this idea can be extended to detect vehicles in the vicinity and any type
Detection and Correction of Potholes Using Machine Learning
of obstacles on the road [5]. Another method describes accelerometer data-based
pothole detection algorithms for deployment on devices with limited hardware and
software resources and their evaluation on real-world data acquired using different
Android OS-based smart phones. The evaluation test resulted in optimal set-up for
each selected algorithm, and the performance analysis in context of different road
irregularity classes shows true positive rates as high as 90%.
Thus, we have seen all these papers that image processing is an integral part of the
detection of potholes. Plus, for the transmission of the images and data, we are using
LoRa instead of Zigbee, as LoRa can transmit data over longer ranges, i.e. ~10 km.
Also the project deals with the filling of potholes which has not been attempted
earlier on, here instead of asphalt we are using chip filling which can greatly reduce
the recurring of the filling. Also by using chip filling techniques, we can reduce
the carbon emission by the production of asphalt resulting in a greener world and
environment. The project will improve the efficiency of road maintenance and reduce
the labour requirement for the same (Table 1).
3 System Architecture
Figure 1 shows the system architecture which is explained stepwise as follows:
CCTV footage: A CCTV takes the footage of the surrounding area, i.e. a specific
part of the road. It sends images of the section of the road in specific intervals
to the server via LoRA.
Server: The main server where the pothole detection model is, so basically it
gets the video from the CCTV and runs it in the SVM model to see if there is a
pothole in the sent video.
Pothole detection: It is done using trained SVM model. Once the pothole is
detected, it is updated in the real-time database so that the bot can go to the
desired location.
Database: This is the real-time database of the whole system. It gets updated
with the location and size of the pothole by the server.
Camera module: The camera module on the bot is used to detect the pothole at
the exact location when the bot is on the field. It is used for distance calculation
and depth calculation.
GPS module: It is installed in the bot to get the general location of the bot, also
so that the bot finds its way to the exact location of the pothole.
Ultrasonic sensor: It is used for collision detection and depth detection. Ultrasonic sensors are cheap and can be used fairly easily for collision detection. For
depth detection like of a pothole, which has an irregular surface, LiDAR can be
used but it is an expensive alternative.
Raspberry Pi: It is the heart and brain of the system. It is the main controller.
It runs the bot and has the image processing model in it for pothole detection.
Also, it helps in the manoeuvering of the bot from one place to another, i.e.
A. Sahu et al.
collision detection and moving the bot to exact coordinates. It also calculates
the distance of the pothole from the bot. It also has the amount of material
needed for filling the hole and the depth of the hole.
9. Motors: It is used for the movement of the bot. These are high power, high
RPM, high torque motor so that the bot can move around with its weight at a
fair speed.
10. Burners: These are used to heat up the thermosetting plastic and heat, and the
flame is regulated by the Raspberry Pi.
11. Servomotor: These are used for controlling the valve of the material container
so that the exact amount of material can be used.
4 Support Vector Machine
The image of a pothole is very complicated. Roads are usually dark grey in colour
or almost similar to black. It is very difficult to track a black object in a blackcoloured road by image processing algorithms. So basically in short, there are many
outliers in a pothole image, and data is nonlinear. SVM is robust and is not much
impacted by data with noise and outliers. Due to this, SVM excels among all the
other machine learning algorithms. It is very accurate compared to other algorithms.
The predictions results using this model are very promising. Pothole detection is a
binary classification problem. SVM excels in binary classification. For detecting a
pothole accurately on the road, the data set for training should be huge. SVM runs
efficiently on large and expensive data sets. For the classification and detection of
protocols, we use SVM. SVM is a model which can do linear classification and
regression. It is based on concept called hyperplane which draws boundary between
data instances plotted in the multidimensional feature space. SVM algorithm builds
an N-dimensional hyperplane model that assigns future instances onto one of the
two possible output classes. SVM is perfectly meant for binary classification. It is
robust, i.e. not much impacted by data or outliers. The prediction results using this
model are very promising.
SVM Algorithm.
Step 1
Step 2
Step 3
Step 4
Selection of two classes on which classification has to be done.
Boundary plane is drawn between the two classes (hyperplane).
Find the optimal hyperplane.
Data is classified using the correct hyperplanes and input training data
(Fig. 2).
Detection and Correction of Potholes Using Machine Learning
Fig. 1 System architecture
Fig. 2 Working of SVM
A. Sahu et al.
4.1 Working of SVM
1. Hyperplane and margin: For an N-dimensional feature space, hyperplane is a
flat subspace of dimension (N-1) that separates and classifies a set of data. For
example if we consider a two-dimensional feature space, then a hyperplane will
be a one-dimensional subspace or straight line.
Mathematically, in a two-dimensional space hyperplane can be defined by the
equation which is given by:
c0 + c1X1 + c2X2 = 0 which is nothing but an equation of straight line
This concept is used for binary classification.
2. Kernel Trick: SVM has a technique called kernel trick to deal with nonlinearly separable data. These are functions which can transform lower-dimensional
input space to higher-dimensional space. In the process, it converts linearly nonseparable data to a linearly separable data. These functions are called kernels.
There are mainly three main types of kernels:
I. Linear Kernel: it is in the form K(xi, xj) = xi.xj
II. Polynomial Kernel: It is in the form K(xi, xj) = (xi.xj+ 1)ˆd
III. Sigmoid Kernel: It is in the form K(xi, xj) = tanh(kxi.xj − epsilon).
4.2 Working of Pothole Detection Robot
Step 1: The bot monitors the area around it in search of potholes with the help of
camera mounted at the top of servomotor.
Step 2: The frames from the camera are captured real time using Open CV image
processing library. These images are passed through SVM model. As soon as an
image of pothole is passed, the model predicts it accurately and gives the feedback
that the image is of pothole. If the image is not pothole, then again the bot starts
Step 3: After the image is detected, then the distance of the pothole from the bot is
calculated using simple mathematics and image processing operations. After that the
bot approaches the pothole.
Step 4: After reaching the spot, bot calculates the depth of pothole using ultrasonic
sensor. After that it fills the pothole up to its depth and process is complete. This
whole process is repeated when another pothole is detected.
Detection and Correction of Potholes Using Machine Learning
5 Results and Discussions
The result would be based on the accuracy we have achieved from different types
of feature extraction techniques. For training, we have used a total of 698 images,
343 potholes (we call it positive images) and 355 plain images (we call it negative
For testing, we have used a total of 16 images, eight potholes and eight plain
5.1 Normal Support Vector Classifier (SVC) Where
no Features Are Used Gives 62.5% Accuracy.
1. Positive image accuracy is 75% accuracy. b. Negative image accuracy is 50%
Inference: Figure 3 suggests that the model identifies a pothole as a pothole 75%
times but identifies a normal plain road as pothole image 50% of times. Thus, a total
accuracy of 62.5% is achieved.
Fig. 3 Normal SVC results
A. Sahu et al.
Fig. 4 SVC with corner results
5.2 SVC with Corner Detection Gives 62.5% Accuracy.
1. Positive image accuracy is 75% accuracy. b. Negative image accuracy is 50%
Inference: Figure 4 depicts that the model identifies a pothole as a pothole 75%
times but identifies a normal plain road as a pothole image 50% of times. Thus, a
total accuracy of 62.5% is achieved.
5.3 SVC with Canny Edge Detection is 75% Accuracy.
a. Positive image accuracy is 50% accuracy. b. Negative image accuracy is 100%
Inference: Figure 5 depicts that the model identifies a pothole as a pothole 50%
times but identifies a normal plain road as a pothole image 0% of times. Thus, a total
accuracy of 75% is achieved.
5.4 SVC with Canny Edge Detection and Corner Detection
68.75% Accuracy.
1. Positive image accuracy is 62.5% accuracy.
2. Negative image accuracy is 75% accuracy.
Detection and Correction of Potholes Using Machine Learning
Fig. 5 SVC with Canny edge results
Inference: Figure 6 suggests that the model identifies a pothole as a pothole 62.5%
times but identifies a normal plain road as a pothole image 25% of times of times.
Thus, a total accuracy of 68.75% is achieved.
Seeing the main accuracy graph Fig. 7, we can say that Canny edge detection
gives the best output overall output, which is 75% overall accuracy.
But seeing the graph of only positive images that is the model identifies a pothole
image as a pothole, a normal SVC without any feature extraction or an SVC with
Fig. 6 SVC with corner and Canny edge results
A. Sahu et al.
Fig. 7 Final comparison
corner detection gives us the best result that is 75% accuracy. This is shown in the
graph in Fig. 8.
Now the case comes with the graphs of negative images, which means a plain
road image is identified correctly and not as a pothole. We see SVC with Canny edge
detection gives the optimum output of 100% accuracy. It is shown in the graph in
Fig. 9. With this, we can say that Canny edge detection gives the best output overall
Fig. 8 Positive images comparison
Detection and Correction of Potholes Using Machine Learning
Fig. 9 Negative images comparison
output. It can also be inferred as follows:
The robot can successfully detect the pothole by image processing. So we can say
SVM has been successfully trained. Robot can successfully calculate the distance of
the pothole from it by mathematical equations and image processing.
Robot can successfully repair the pothole in a fast and accurate manner.
Robot successfully monitors the area by an angle of 360° in 3D space, so at any
angle pothole is located, and the bot will be able to find and repair it.
Robot works in fully autonomous mode. That means there is no human interference while robot is operating. Robot spots the pothole goes to it and then repairs
Robot operates in four steps:
Robot identifies the pothole.
Robot calculates the distance of the pothole and goes near to it.
Robot calculates the depth of the pothole using LiDAR sensor.
Robot then starts filling the pothole until it is completely filled up to its depth.
Then robot starts moving randomly in an autonomous manner in order to look for
other potholes. It also has the ability to randomly stop moving sometimes and keep
scanning in order to accurately identify a pothole.
A. Sahu et al.
6 Future Scope
So if we see the after effects of using this model, we could clearly see a dip in
carbon emission in the atmosphere, thereby reducing the greenhouse effect. Also
by reducing the number of potholes, we see that there is a decrease in the number
of accidents, reduces the commuters’ anxiety, reaching on time will be a thing then
and also minimizing the cost taken by the municipality. Since we are already using
CCTV camera, accidents on the road can be identified, and prevention from road rage
and locating criminal bound vehicles will be easier. The data can be further used to
analyse traffic patterns and lay new roads. The data can be used to understand the
wear of roads and plan a total renewal of roads.
7 Conclusion
The proposed system is a completely autonomous vehicle capable of traversing
through streets; it can detect road signs and can detect obstacles in its path LiDARs
and react accordingly. Irregularities on the road, i.e. the potholes are detected, and
cold lay asphalt material is dispensed on the affected area, thus making the road
smooth and pothole free. This whole process is fully automatic. This will also result in
a decrease of heavy machinery used for repairing and will also reduce the expenditure.
Table 1 Literature survey in concise
Applications used
Garcillanosa [1]
RPi, image processing, cloud
Image processing is very fast,
very portable and efficient
Gill (1997)
Detectors, line trackers, Hough
Integrated approach to road
distress identification, robust
performance and computational
Choi [2]
2D LiDAR, camera, OpenCV
Wide area of the road scanned
efficiently, more accurate pothole
detection performance
Rode [3]
Wi-Fi-based architecture, GPS
Assist in making strategic and
real-time tactical decisions
Artis Mednis, Girts
Android OS, accelerometer
sensors, GPS module
Detects different road irregularity
classes show true positive rates as
high as 90%
Detection and Correction of Potholes Using Machine Learning
1. Garcillanosa, M.M.: Smart detection and reporting of potholes via image-processing using
Raspberry-Pi microcontroller. In: Conference: 2018 10th International Conference on Knowledge and Smart Technology (KST). https://doi.org/10.1109/KST.2018.8426203
2. Choi, S.-i.: Pothole detection system using 2D LiDAR and camera. INSPEC Accession
Number: 17063558. https://doi.org/10.1109/ICUFN.2017.7993890
3. Rode, S.S.: Pothole detection and warning system: infrastructure support and system design.
In: 2009 International Conference on Electronic Computer Technology. INSPEC Accession
Number: 10479675. https://doi.org/10.1109/ICECT.2009.152
4. Hegde, S.: Pothole detection and inter vehicular communication. In: 2014 IEEE International
Conference on Vehicular Electronics and Safety. INSPEC Accession Number: 15001142. https://
5. Salavo, L.: Real time pothole detection using android smartphones with accelerometers. In:
IEEE XploreConference: 2011 7th IEEE International Conference on Distributed Computing
in Sensor Systems and Workshops (DCOSS). https://doi.org/10.1109/DCOSS.2011.5982206 ·
Detecting COVID-19 Using Convolution
Neural Networks
Nihar Patel, Deep Patel, Dhruvil Shah, Foram Patel, and Vibha Patel
1 Introduction
Coronavirus disease 2019 or COVID-19 is the newest virus in the category of coronaviruses that caused the earlier epidemics of severe acute respiratory syndrome
(SARS-CoV) in 2002 and Middle East respiratory syndrome (MERS-CoV) in 2012.
COVID-19 outbreak is believed to have started from the Huanan seafood market of
Wuhan city in Hubei province in People’s Republic of China in late 2019, most probably from a bat to a pangolin and finally to humans. It is a zoonotic disease, which
is caused by a pathogen and transmits from non-human animals (usually vertebrate)
to human beings. Other deadly zoonotic diseases include Spanish flu (1918), HIV
(1980), bird flu or H5N1 (2006), swine flu or H1N1 (2009) and Ebola virus (2013).
The first case and its transmission are reported to have begun in December 2019
as confirmed by the WHO and China. As a result of the severity of the virus, the
World Health Organization (WHO) declared the COVID-19 outbreak a public health
emergency of international concern (PHEIC) on January 30, 2020 and a pandemic on
March 11, 2020. As of 1 July 2020, more than 10.4 million cases have been reported
across 188 countries and territories, resulting in more than 511,000 deaths. The USA
N. Patel (B) · D. Patel · D. Shah · F. Patel · V. Patel
Vishwakarma Government Engineering College, Ahmedabad, Gujarat 382424, India
e-mail: niharpatel1999@gmail.com
D. Patel
e-mail: deeppatel4557@gmail.com
D. Shah
e-mail: shahdhru2000@gmail.com
F. Patel
e-mail: foramp66@gmail.com
V. Patel
e-mail: vibhadp@vgecg.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
N. Patel et al.
remains the most affected country with nearly 2.5 million cases. As studied by Johns
Hopkins University, the global death-to-case ratio is 4.8% in USA as of 1st July, but
the number varies as regions.
This is a huge threat affecting mankind, and it is need of the hour to contribute
collectively to eliminate it. COVID-19 is a respiratory disease, and so, it majorly
affects lungs, though some cases also had multiple organ failures. Chest X-ray can
be useful to detect the disease at early stages. Also if this process could be automated using deep learning, it would be beneficial for the doctors and could save
their time. In symptomatic patients, the most observed symptoms are fever, cough,
nasal congestion, fatigue and other signs of upper respiratory tract infections. The
infection can progress to severe disease with dyspnoea and severe chest symptoms
corresponding to pneumonia in approximately 75% of patients, as seen by computed
tomography on admission of the patient [11]. Pneumonia mostly occurs in the second or third week of a symptomatic infection. Prominent signs of viral pneumonia
include decreased oxygen saturation, blood gas deviations, changes visible through
chest X-rays and other imaging techniques, with ground glass abnormalities, patchy
consolidation, alveolar exudates and interlobular involvement, eventually indicating deterioration [9]. Recent findings have revealed that the key imaging techniques
used in the diagnostic test of COVID-19 disease are the chest X-rays and computed
tomography (CT) scans. Hence, CNN can be used efficiently to detect COVID-19
in patients as the patients’ chest X-ray shows certain anomalies in radiography. The
aim of this paper is to test various deep learning models available, both standard and
customized for detecting COVID-19 at an early stage through X-ray images.
2 Related Work
A detailed discussion on diagnosing the COVID-19 disease with the help of X-rays is
given by Apostolopoulos et al. [2]. The author experimented various CNN models like
VGG19, MobileNet v2, Inception, Xception and ResNet v2 using transfer learning
and achieved accuracy of 96.78%. Li et al. [8] developed a CNN model COVNet
to identify COVID-19 from other community acquired pneumonia using chest CT
scans. COVNet is a 3D deep neural model with ResNet50 as its backbone. A series of
computed tomography slices are given as input to COVNet, and it generates features
for each slice. After combining these features by max pooling operation, the final
feature map is then given to the fully connected dense layer and softmax layer for the
prediction of COVID-19 disease. Abbas et al. [1] proposed a novel approach based
on class decomposition for the classification of COVID-19 chest X-ray images. This
research presents a Decompose, Transfer, and Compose (DeTraC) model by applying
a class decomposition layer to a pre-trained ResNet18 architecture to detect COVID19 from normal and severe acute respiratory syndrome (SARS) images. The accuracy
of this model was 95.12 with 97.91% sensitivity and 91.87% specificity. The study
described in [3] introduces a three-phase approach to fine-tune a pre-trained ResNet50 architecture to enhance the efficiency of the COVID-ResNet model for the image
Detecting COVID-19 Using Convolution Neural Networks
classification of four different classes normal, bacterial, viral pneumonia and COVID19. In this work, input images are gradually resized in three phases, and the model
is finely tuned at each phase.
3 Proposed Approach
The proposed approach diagnoses COVID-19 patients using the X-ray data of the
patient. The X-ray image needs to be provided to the pre-trained deep neural network,
and hence, an accurate prediction of whether a patient has been infected with COVID19 or viral pneumonia can be obtained. By training our model on a dataset of 956
X-rays images along with the labels, we could cause our model to accurately predict
from the three specified labels.
3.1 Data Preprocessing
The dataset needed for training was prepared by using the COVID-19 images from
github repository of IEEE8023, and the images for normal and viral pneumonia class
were downloaded from Kaggle. The dataset included 256 X-ray images of COVID19-infected patients, 350 X-ray images of patients diagnosed with viral pneumonia
and 350 X-ray images of normal people. 10% of the data was used for validation,
and a further 10% was used for model testing. The images were preprocessed before
feeding to the deep learning model. The size of the images was set to (224, 224, 3),
and the pixel values were normalized.
3.2 Implementation
The experimental evaluation of several custom CNN models as well as standard CNN
models is presented in this paper. All these models were finely tuned by adjusting
different hyperparameters like learning rates, momentum, various batch sizes and
optimizers. The best model that fits the data with appropriate hyperparameter tuning
has been identified for the given task. 100 epochs are used as a standard for all models.
For training, Google Colab was used as it provides GPU-accelerated computing.
For the task of image classification, many standard deep neural network architectures are available. AlexNet [6] was a major breakthrough in the image classification
domain, and it significantly outperformed all the prior competitors eventually winning the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012 by
reducing the top-5 error from 26 to 15.3%. The network had a very similar architecture
as LeNet [7] but was deeper, with more filters per layer, and with stacked convolutional layers. Inception v1 or GoogleNet [10] was introduced in 2014 by Google, the
N. Patel et al.
main intuition being creating “wider” models rather than “deeper” models. The other
variants include Inception v2 and Inception v3 which were modifications of their predecessors. ResNet or residual network [4] introduced in 2015 was a great improvement for image classification. The core idea of ResNet is introducing a so-called
identity shortcut connection that skips one or more layers and hence eliminating the
vanishing gradient problem. There are many variants of ResNet which are modifications in the number of layers. ResNet-18, ResNet-34, ResNet-50, ResNet-101,
ResNet-110 and ResNet-152 are some of the variants. Other approach to eliminate
vanishing gradients was introduced in DenseNet or dense convolution networks [5]
in 2017. In DenseNet, each layer obtains additional inputs from all preceding layers
and passes on its own feature maps to all subsequent layers. Hence, each layer is
receiving a “collective knowledge” from all preceding layers. Different variants of
DenseNet include DenseNet-121, DenseNet-169, DenseNet-201 and DenseNet-264.
These models are standardized by the deep learning community, and hence, they are
preferable to use. Also, the models could be used with pre-trained weights using
transfer learning. For our study, all the standard architectures except Inception v3
are fully trained on the dataset. For Inception v3 network, we applied transfer learning by loading the pre-trained weights of the ImageNet dataset to re-train the network
for our dataset.
Along with standard architectures, we customized the ResNet architecture by trying number of different arrangements for residual blocks keeping its configuration
same as that of standard architecture. ResNet uses four modules made up of residual
blocks which has convolution operations with skip connections. These modules have
basic residual blocks arranged sequentially. The ResNet-18 model has four residual
blocks, each comprising of two convolution layers in [2, 2, 2, 2] sequence which sums
to 16 and the additional input convolution and output softmax equals 18. The custom models including ResNet-10, ResNet-12, ResNet-14, ResNet-18 (customized)
and ResNet-20 were implemented, fine-tuned with a batch size of 32 and Nesterov
accelerated gradient descent as the optimizer. Each module of ResNet-10 contains
residual blocks in order [1, 1, 1, 1]. Similarly, ResNet-12 [2, 1, 1, 1], ResNet-14 [1,
2, 2, 1], ResNet-18 [3, 2, 2, 1] and ResNet-20 [2, 2, 2, 3] were formed and implemented. Here, the number of filters for each module is fixed that is 64, 128, 256 and
512 filters, respectively. The last block is followed by the softmax layer with three
units as we have three classes to predict.
Furthermore, a custom CNN model was also created. Figure 1 shows the architecture of the CNN model. Model consisted of three pairs of convolution layers followed
by a max pooling layer and then a fully connected layer of 128 hidden units. The
dense layer with 128 units and activation as rectified linear unit (ReLU) is then connected with the softmax layer that gives the probability of each of the three classes
using the softmax activation function. For this model, the RMSProp was used as an
optimizer with the learning rate of 0.001, and categorical cross-entropy was used as
a loss function.
Detecting COVID-19 Using Convolution Neural Networks
Fig. 1 Custom CNN3 architecture
4 Results and Discussions
Table 1 summarizes the results of different architectures implemented. Inception v3
gave the best results among the standard CNN architectures with the training accuracy
of 99.22% and validation accuracy of 97.89%. Inception v3 network has a number
of salient features to help enhance network performance on our dataset. It provides
factorized convolutions (reduces computational complexity) and an additional technique of regularization called label smoothing that prevents data overfitting.
Another main factor behind such an exemplary performance of Inception v3 is that
the network is wider, rather than deeper, in order to solve the information loss problem
that often occurs in very deep neural networks. In Inception v3, the batch-normalized
auxiliary classifiers also take care of the problem of the vanishing gradients. Figure 2a,
b shows the plots for loss and accuracy for training and validation of Inception v3.
For custom models, the best performance was given by the CNN3 model with
training accuracy of 96.61% and validation accuracy of 97.89%. Figure 3a, b shows
the plots for loss and accuracy for training and validation of CNN3. One thing to
Table 1 Loss and accuracy of implemented models
Inception v3
Accuracy (%)
N. Patel et al.
(a) Loss
(b) Accuracy
(a) Loss
(b) Accuracy
Fig. 2 Inception v3
Fig. 3 CNN3
notice here is that the plots of the CNN3 model contain many spikes during the
training process as compared to those of Inception v3 model. CNN3 is a smaller
model as compared to Inception v3 but has similar validation accuracy. The reason
for this is X-Ray images are in gray-scale and thus have less features than multicolored images and so smaller networks can also predict true classes with promising
accuracy. For promising results, we want to select a model that has smooth curves
for accuracy and loss. Here, using the Inception v3 model for the X-ray image
classification task would be more preferable than using the CNN3 model.
After training the Inception v3 and CNN3 models, the models were tested for
the X-ray images new to them. They were tested on 95 X-ray images including
25 images of COVID-19, 35 images of viral pneumonia and 35 images of normal
conditions. The confusion matrix depicts the measurement of performance of the
machine learning or deep learning model in classification purposes. The diagonal
of the confusion matrix represents the number of true positives of the results. The
confusion matrix for both the models is shown in Fig. 4a, b. The Inception v3 model
had specificity and sensitivity (recall) for all three classes COVID-19, viral pneumonia and normal as (98.5% and 100%), (96.66% and 94.2%) and (98.3% and 94.2%),
respectively. Similarly, for CNN3 model, the specificity and sensitivity for all three
classes COVID-19, viral pneumonia and normal were (100% and 100%), (100% and
94.2%) and (96.66% and 100%), respectively.
Detecting COVID-19 Using Convolution Neural Networks
(a) Inception v3
(b) CNN 3
Fig. 4 Confusion matrix
5 Conclusion
Using deep learning and machine learning in medical science domain has given extraordinary results in the past decade. Seeing the current situation, the task of detecting
COVID-19 at an early stage with the help of Deep Learning would be helpful to the
medical community and could save their precious time. Though too much reliance
on automatic detection poses a threat, small errors in results can be hazardous in real
life. It could be eliminated by increasing the dataset for better model. Also, a two-step
validation must be there to reduce errors from prediction. The current work suggests
that use of CNN for COVID-19 detection shows promising results. The Inception
v3 and CNN3 were found to be better than others with similar validation accuracy.
More and better models could be developed based on the existing models for better
prediction. Hence, this paper discusses at length, number of standard and customized
deep learning-based COVID-19 detection models which could be helpful to mankind
in this hour of world crisis.
1. Abbas, A., Abdelsamea, M.M., Gaber, M.M.: Classification of Covid-19 in chest x-ray images
using detrac deep convolutional neural network. arXiv preprint arXiv:2003.13815 (2020)
2. Apostolopoulos, I.D., Mpesiana, T.A.: Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Phys. Eng. Sci. Med., p. 1 (2020)
3. Farooq, M., Hafeez, A.: Covid-resnet: a deep learning framework for screening of covid19
from radiographs. arXiv preprint arXiv:2003.14395 (2020)
4. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778
5. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional
networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
6. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional
neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105
N. Patel et al.
7. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document
recognition. Proc. IEEE 86(11), 2278–2324 (1998)
8. Li, L., Qin, L., Xu, Z., Yin, Y., Wang, X., Kong, B., Bai, J., Lu, Y., Fang, Z., Song, Q., et al.:
Artificial intelligence distinguishes Covid-19 from community acquired pneumonia on chest
ct. Radiology, p. 200905 (2020)
9. Rothan, H.A., Byrareddy, S.N.: The epidemiology and pathogenesis of coronavirus disease
(Covid-19) outbreak. J. Autoimmunity, 102433 (2020)
10. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 2818–2826 (2016)
11. Velavan, T.P., Meyer, C.G.: The Covid-19 epidemic. Trop. Med. Int. Health 25(3), 278 (2020)
Electroencephalography Measurements
and Analysis of Cortical Activations
Among Musicians and Non-musicians
for Happy and Sad Indian Classical
Nijin Nizar, Akhil Chittathuparambil Aravind, Rupanjana Biswas,
Anjali Suresh Nair, Sukriti Nirayilatt Venu, and Shyam Diwakar
1 Introduction
Recent advances in neuroscientific research on music perception and cognition have
provided biosignature-based evidences for connecting brain plasticity and musical
activity [1]. Music, a perceptual entity, has been controlled by auditory mechanisms influencing cognitive behaviours of humans such as memory and attention,
language processing and perception [2]. With the changing life styles in this era,
listening to music has remarkable influence in promoting physical rehabilitation
managing stress, improving communication, increasing stimulation and enhancing
other cognitive skills [3]. With the ubiquitous nature of music, it has thought to
evoke and enhance a wide range of emotions, with sadness and happiness as most
frequent ones. Experimental studies with neuroimaging techniques on influence
of musical parameters such as tempo, rhythms and tunes indicated listening to a
familiar music awoken attention or aurosal in neurologically deficit patients [4].
The brain areas responsible for music perception involved superior temporal gyrus
of the temporal lobe and lateral sulcus and the transverse temporal gyri and the
sound processing regions, parietal and frontal areas of human cerebral cortex that
was responsible for elucidating mental consciousness [5]. Robust research using
different modalities and experimental designs focussed on identifying the neural
correlates for familiar and unfamiliar musical excerpts among diverse populations
under different geographical conditions [6, 7]. Neuroimaging studies on frequency
tagging techniques have mapped neural dynamics of auditory cues in music listeners
and performers [8]. Previous studies integrated with biosignal analysis reported the
hypothetical approach for understanding the influence of music on brain waves and
N. Nizar · A. C. Aravind · R. Biswas · A. S. Nair · S. N. Venu · S. Diwakar (B)
Amrita School of Biotechnology, Amrita Vishwa Vidyapeetham, Amritapuri campus, Kollam,
Kerala, India
e-mail: shyam@amrita.edu
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
N. Nizar et al.
functional neural networks augmenting multimodal sensory and motor information,
auditory, visual-spatial, auditory-spatial and memory skills among musicians and
non-musicians [9]. Perceptual activation of frontal midline θ rhythm (Fm theta)
during music listening or music training over a fixed period indicated improved
cognitive performance in mental calculation, working memory and learning process.
β rhythm activation have thought to be associated with increased alertness and cognitive performances, α power synchronization in left and right hemisphere of the brain
regions in music listening and perceptions accounted for internal processing and
creative thinking [10]. Neuroscience community has been dwelling to understand
impact of Indian classical music on brain behaviour among different study population [11]. Research studied reported the effect of Indian classical music and rock
music on shifting of alpha rhythm to beta rhythm with the switching of techno to
classical music type during music perception [12]. A study on Indian Bhupali raga
in university students indicated significant effect on attention and concentration of
digit span task when compared to the scores with pop music [13].
Even though the influence of music on human brain and cognition was well documented, research on the effects of sad and happy music on human cognition remains
elusive. The present study focuses on understanding the effect of Indian classical
ragas Reethigowla raga (a song as an example of happy music) and Shivaranjini
raga (as an example of sad music) on brain activity using low-cost non-invasive
EEG signal analysis technique. The objective was to understand the neural correlates associated with different auditory stimuli (happy music, sad music placebo and
sham condition) and to computationally explore spatiotemporal characterization of
functional circuits among musicians and non-musicians. The study could be further
extended to investigate biosignal markers elicited in response to human emotion and
cognition, memory retention, visual perception and information processing.
2 Methods
2.1 Subject Selection and Screening
20 healthy university students (16 females and 4 males), age group of 18–23 years,
without any hearing or neurological impairment were recruited for the study. Participants were categorized into two groups, musicians (N = 10) who had musical training
for more than 3 years and non-musicians (N = 10) having no prior musical training
experience. Physiological parameters and the mental state of the study participants
were assessed using cognitive batteries. An informed consent was collected from the
participants prior to experiment process.
Electroencephalography Measurements and Analysis of Cortical Activations …
Fig. 1 Schematic illustration of experiment protocol
2.2 Auditory Stimuli
Two ragas from Indian Carnatic music with emotional behaviour happy and sad were
the choice of music stimuli for this study. Reethigowla raga was selected as happy
music stimulus, Shivaranjini raga as sad music stimulus, as it has well documented
evidences on both physiological and psychological aspects. For comparing the neural
networks underlying different auditory stimuli, a placebo condition a familiar voice
(speech of a famous person) and Sham condition with a stressor noise (traffic noise)
were included as auditory stimuli. The length of each auditory stimulus was 6 min
(Fig. 1).
2.3 Experiment Design
The subjects were randomly assigned to have one of the two music interventions
(either happy or sad raga). Among the set of 10 musicians, participants were divided
into two subsets; 5 were subjected to listen to happy stimuli; and remaining 5 subjects
have to listen to sad stimuli. Similar random selection was done for non-musician’s
groups with two subsets. Real-time EEG data collection was performed for different
auditory cues; happy music, sad music, placebo and sham condition. For analyzing
the effect of ragas on brain activity and cognitive skills memory, attention and concentration, subjects were advised to listen to the respective music stimuli once daily for
6 days, during the experiment paradigm. Placebo and sham stimuli were tested on
consecutive days of course of the study.
N. Nizar et al.
2.4 EEG Data Acquisition and Computational Analysis
The participants were seated in a comfortable position in a soundproof dimly lit
laboratory condition. The recording was carried in an eye closed state in order to
avoid the possible visual artefacts. Data acquisition was performed using a low-cost
surface-based non-invasive EEG device having 14 + 2 electrodes, and sampling rate
was set at 128 Hz. The collected biosignal data were computationally converted to
numeric values using MATLAB embedded with EEGLAB tool. Artefact-free EEG
data were obtained by filtering methods like band pass filtering and Independent
Component Analysis (ICA). Topographical plots representing functional regions of
brain for different auditory stimulus were computationally analyzed [14].
3 Results
3.1 Brain Rhythms Shifts from Gamma-Alpha in Happy
Music Perception and Alpha–Beta-Gamma for Other
Auditory Cues in Musicians and Non-musicians
In non-musicians, the normalized topographical plots of alpha and gamma rhythms
of auditory cues (pre-silence, stimuli and post stimuli) showed a shift of gamma
rhythms in fronto-parietal and temporal lobes (F4, P8, T8) in pre-silence condition
to alpha rhythms in parietal and occipital lobes (P8 and O2) with happy music stimuli.
It was also observed that similar pattern of alpha rhythm intensity was retained in
post-stimulus condition. In musicians, it was seen that gamma rhythm intensity was
higher in the temporal and parietal lobes (T8, P8) during silence which was shifted
to alpha waves in frontal, parietal and occipital (F4, P8, O2) in Reethigowla music
at its highest peak. Post-stimulus silence condition shows alpha wave frequency in
right fronto-parietal and occipital regions as same to that of auditory cue. In placebo
condition, non-musicians showed higher intensity of alpha rhythm in frontal lobes
(AF4, AF3, F4, F3) in pre-silence period, and the alpha rhythm was shifted to parietal
and occipital lobes (P8/O2) during the cue. After placebo cue, the alpha rhythms were
shifted to gamma rhythms with higher intensity in occipital lobes (O2). Pattern of
activity of alpha rhythms in frontal and parietal lobes was similar in case of presilence and auditory cue in musicians for placebo stimuli, which was shifted to
gamma rhythms in frontal and parietal lobes (F4/P8) post-auditory stimuli. In sham
condition, for non-musicians, it was observed that alpha rhythm activity during presilence in frontal and occipital lobes (F4/O2) was shifted to beta rhythms in frontal
and temporal lobes (F3/T7) during the auditory stimuli. The activity pattern was then
shifted to gamma rhythms in frontal regions (F4) post-stress condition. In musicians,
for sham condition, shifting of alpha rhythm activity in frontal and temporal lobes
(F4/T8) before stress condition to beta activity in frontal lobes (F4) during stress
Electroencephalography Measurements and Analysis of Cortical Activations …
Fig. 2 Differential cortical activation of brain rhythms in musicians and non-musicians to a happy
music cue, placebo and sham stimuli
condition was observed. The pattern was then shifted to gamma activity in frontal
lobes (F4) post-stress silence condition (Fig. 2).
3.2 Behavioural Pattern Variations of Brain Rhythms for Sad
Music Stimuli and Other Auditory Cues in Musicians
and Non-musicians
Cortical mapping of alpha of Shivaranjini raga (before music, during and after music)
showed higher alpha rhythm activity in all time bins. It was observed that in nonmusicians, frontal (AF4, F4) alpha rhythm activity in pre-silence condition was
shifted to temporal (T8) and parietal (P8) regions during and post sad stimuli. In musicians, pre-stimuli condition showed higher beta activity intensity in frontal lobes (F4
and F8), which was shifted to dominant alpha activity in occipital lobes (O2) during
and post sad stimuli. In placebo condition, among non-musicians, alpha in frontal and
occipital lobes (F4, O2) were shifted to beta activity in frontal and temporal regions
from silence condition to auditory cue. Beta activity in frontal (F3) and temporal
(T7) regions were shifted to gamma activity in frontal regions (F4/F8) post-auditory
stimuli. In musicians, alpha activity in frontal lobes (F4, F8) in silence condition
is shifted to beta activity in frontal lobes (F4) with auditory stimuli, which were
then shifted to gamma activity in frontal lobes (F4/F8) post-auditory cue silence
condition. In sham condition, for non-musicians, no shifts of brain rhythms were
observed in experimental condition. Gamma rhythm activity was higher in anterior
frontal (AF3, AF4) and frontal lobes (F7, F8, F3, F4) during pre-silence, auditory
N. Nizar et al.
Fig. 3 Cortical differentiation of brain rhythms to sad stimuli placebo and sham stimuli
stimuli and post-silence condition. In musicians, higher theta activity at frontal lobe
(F4) was observed before stress which was shifted to gamma activity in frontal lobes
(AF4, F4) with stress and gamma activity in temporal region (T8) post-stress silence
condition (Fig. 3).
4 Discussion
The focus of the study was to functionally map cortical activity patterns of brain
rhythms among musicians and non-musicians under different auditory stimuli.
Higher intensity of gamma rhythms in silence state of non-musicians may be correlated with the preparation phase of cognitive task execution. Spectral analysis of
cortical EEG rhythms indicated shifting of gamma from fronto-parietal and temporal
lobes to alpha in parietal and occipital lobes during happy stimuli indicated music
were synchronizing the brain signals to alpha waves, and the subjects were experiencing positive emotions with the auditory cue. Retaining of alpha rhythms after
happy stimuli indicated the subjects was at relaxed and calm state. Musicians
also showed gamma to alpha shift indicating processing of joyful emotions with
Rajagowla raga. Differential activation patterns of brain lobes for similar music
cue among musicians and non-musicians indicated variations in information pathways and language processing pathways at similar time bins. Musicians and nonmusicians have shown higher intensity of alpha rhythm for a famous voice clip as
auditory cue throughout the experiment time period, indicating subject in an alert
condition and switching of intensity of alpha rhythms to different lobes indicated
activation of visual and auditory pathways at particular time bins. In stress condition, among musicians and non-musicians, shifting of alpha to beta rhythms indicated
Electroencephalography Measurements and Analysis of Cortical Activations …
motor preparatory processes for sound synchronization and the activation of pathways associated with auditory–motor interaction. Beta to gamma shift post-stress
stimuli indicated affective processing in the brain which could be related to induced
stress condition. Patterns of variation of brain rhythms at different lobes indicated
activation areas of auditory stimuli and evoked visual responses to select events.
The spectral plot analysis of EEG signals of non-musicians for Shivaranjini raga
showed alpha wave dominance in anterior frontal lobe, temporal, parietal regions
of brain indicated activating the centres responsible for attention, auditory pathway
and language processing. Shifting of beta rhythm in frontal regions to alpha rhythms
in occipital regions of musicians indicated the visual perception of the music score
and the memory retention. For a placebo stimulus in non-musicians and musicians,
switching of alpha, beta and gamma among frontal and temporal lobes indicated the
alertness of the subjects to the familiar voice and its perception. The gamma wave
dominance in anterio-frontal lobes of brain in non-musicians to traffic noise cue
indicated the attaining of stressor state. The switching of theta to gamma rhythms in
anterio-frontal and temporal regions of musicians indicated the changing of alertness
state to stressor state while listening to sham cue. Behavioural studies of different
auditory cues in musician and non-musicians evidenced the interconnection of neural
networks of musical activity and brain plasticity.
5 Conclusion
There has been a potential interest for understanding how different music stimuli
can evoke and enhance emotional responses. The selected Indian ragas in this study
have musical composition that has helped to create emotional states. The study on
happy music and sad music on musicians, and non-musicians may help researchers
to moderately correlate varying EEG frequency measures for different auditory
cues. Further investigations based on cognitive performance tasks need to be implemented for testing impact of music perception in music therapy, especially for stress
Acknowledgements This work derives direction and ideas from the Chancellor of Amrita Vishwa
Vidyapeetham, Sri Mata Amritanandamayi Devi. This work was partially funded by Embracing the
World Research-for-a-Cause initiative.
1. Jäncke, L.: Music drives brain plasticity. F1000 Biol. Rep. 6, 1–6 (2009). https://doi.org/10.
2. Schellenberg, E.G., Weiss, M.W.: Music and cognitive abilities (2013)
N. Nizar et al.
3. Rentfrow, P.J.: The role of music in everyday life: current directions in the social psychology of
music. Soc. Personal. Psychol. Compass. 6, 402–416 (2012). https://doi.org/10.1111/j.17519004.2012.00434.x
4. Hurless, N., Mekic, A., Peña, S., Humphries, E., Gentry, H., Nichols, D.F.: Music genre preference and tempo alter alpha and beta waves in human non-musicians. Impuls. Prem. Undergrad.
Neurosci. J. 22, 1–11 (2013)
5. Singh, N.C., Balasubramanian, H.: The brain on music 299–308 (2018)
6. Kumagai, Y., Matsui, R., Tanaka, T.: Music familiarity affects EEG entrainment when little
attention is paid. 12, 1–11 (2018). https://doi.org/10.3389/fnhum.2018.00444
7. Freitas, C., Manzato, E., Burini, A., Taylor, M.J., Lerch, J.P., Anagnostou, E.: Neural correlates
of familiarity in music listening: A systematic review and a neuroimaging meta-analysis. Front.
Neurosci. 12, 1–14 (2018). https://doi.org/10.3389/fnins.2018.00686
8. Nozaradan, S.: Exploring how musical rhythm entrains brain activity with electroencephalogram frequency-tagging. Philos. Trans. R. Soc. B Biol. Sci. 369, 1–11 (2014). https://doi.org/
9. Gaser, C., Schlaug, G.: Brain structures differ between musicians and non-musicians. J.
Neurosci. 23, 9240–9245 (2003)
10. Lin, Y.P., Duann, J.R., Feng, W., Chen, J.H., Jung, T.P.: Revealing spatio-spectral electroencephalographic dynamics of musical mode and tempo perception by independent component
analysis. J. Neuroeng. Rehabil. 11, 1–11 (2014). https://doi.org/10.1186/1743-0003-11-18
11. Santhosh, A.K., Sangilirajan, M., Nizar, N., Radhamani, R., Kumar, D., Bodda, S., Diwakar,
S.: Computational exploration of neural dynamics underlying music cues among trained and
amateur subjects. Procedia Comput. Sci. 171, 1839–1847 (2020). https://doi.org/10.1016/j.
12. NS, D.S.K.: A Study on Effect of Indian Classical Music on Brain Activity Using EEG Signals.
J. Med. Sci. Clin. Res. 05, 21702–21706 (2017). https://doi.org/https://doi.org/10.18535/jmscr/
13. Nagarajan, K., Srinivasan, T., Ramarao, N.: Immediate effect of listening to Indian raga on
attention and concentration in healthy college students: a comparative study. J. Heal. Res. Rev.
2, 103 (2015). https://doi.org/10.4103/2394-2010.168367
14. Krishnan, M., Edison, L., Radhamani, R., Nizar, N., Kumar, D., Nair, M., Nair, B., Diwakar,
S.: Experimental recording and computational analysis of EEG signals for a squeeze task:
assessments and impacts for applications. In: 2018 International Conference on Advances
Computing Communication Informatics, ICACCI 2018, pp. 1523–1527 (2018). https://doi.
Signal Processing in Yoga-Related Neural
Circuits and Implications of Stretching
and Sitting Asana on Brain Function
Dhanush Kumar, Akshara Chelora Puthanveedu, Krishna Mohan,
Lekshmi Aji Priya, Anjali Rajeev, Athira Cheruvathery Harisudhan,
Asha Vijayan, Sandeep Bodda, and Shyam Diwakar
1 Introduction
In today’s world, people are also facing a pandemic of lifestyle disorders mainly
attributed to lifestyle and habits including the lack of sufficient physical activity,
and this may have led to problems related to psychosomatic and mental health
[1]. Several researches have showed that lifestyle factors and psychological issues
like stress, anxiety and depression have adverse effects on memory, attention and
other cognitive skills which leads to neurological disorders like Parkinson’s disease,
Alzheimer’s Disease, multiple sclerosis, epilepsy [2]. Medications are readily available, but due to their adverse effects, researchers seek non-pharmacological and
non-invasive treatments for these disorders. Evidence have demonstrated that regular
yoga practices can be an complementary solution for improving memory, attention
and visual perception [3] and also improve psychophysiological measurements associated with anxiety, depression and stress. Studies have shown an regular hatha yoga
practices improved working memory and attention in healthy older adults [4]. Yoga
focuses on improving one’s self through physical and mental practices that involve
more mindful elements that are absent in other forms of the exercises [5].
Yoga comprises of physical postures, controlled breathing exercises, deep meditation practices and mantras [6] and improves health by increasing physical stamina,
flexibility, balance and relaxation [7]. Physical and cognitive benefits of yoga is
related to the increased activation of gray matter volume in amygdala, increased
body perception, activation of parasympathetic nervous system, stronger functional
connectivity within the basal ganglia [8, 9] and cerebellar circuits [10].
D. Kumar · A. C. Puthanveedu · K. Mohan · L. A. Priya · A. Rajeev · A. C. Harisudhan ·
A. Vijayan · S. Bodda · S. Diwakar (B)
Amrita School of Biotechnology, Amrita Vishwa Vidyapeetham, Amritapuri Campus, Clappana
po, Kollam, Kerala 690525, India
e-mail: shyam@amrita.edu
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
D. Kumar et al.
For understanding the spatial, temporal and spectral characteristics of the cortical
regions associated with various cognitive tasks such as memory, attention, motor
coordination, visual and auditory perception, non-invasive neuroimaging technique
like electroencephalogram (EEG) can be used. The current study focuses on the
effects of various yoga-based practices in the brain using computational and statistical
analysis on EEG signals. Datasets were compared to address the different aspects
of a yoga practice such as postures and movement contributed differently to cortical
function. The main objective of the paper was to compare stretching and sitting
asanas based on its spatiotemporal changes in the brain which is associated with
improved working memory, attention and visual perception.
2 Methods
2.1 Experimental Protocol and Characteristics
The study was conducted among 70 healthy volunteering subjects of mean age
21 years. Subjects were randomized into three groups as control group and two experimental groups. Each experimental group was further divided into two subgroups,
among which a subgroup was trained to practice sitting yoga (Swastikasana and
Vajrasana) and the other subgroup practiced stretching yoga (Suryanamaskar) for
6 weeks. Experimental group 1 consisted of 20 female subjects, among where ten
subjects were asked to perform stretching asanas and the remaining ten subjects
were asked to perform sitting asanas or static yoga. Experimental group 2 consisted
of 20 male subjects, whose yoga practices were same as experimental group 1.
Control group consisted of 30 subjects who neither had any prior training in sitting
or stretching yoga, nor were allowed to perform yoga during the session.
An open consent was collected from all the participants prior to the data collection
which was approved by an institutional ethics review board. Sitting yoga group and
stretching yoga group performed their asanas for 10 min followed by shavasana.
EEG signals after pre- and post-yoga sessions were recorded in an eyes-opened
state for 2 m 40 s. After post-yoga recordings, three different tasks were assigned
to the participants, digit letter substitution task, word memory task [11] and Gollin
incomplete figure tests. The experimental protocol is summarized in Fig. 1.
Computational analysis of the raw EEG signals was done with MATLAB, and
the artifacts (eye blink and muscle movements) were removed using a basic FIR
filter which included the filtering of raw data between 1 and 50 Hz using EEGLAB.
Fast Fourier transform (FFT) was used to convert time domain signals to frequency
domain, and the distribution of δ, θ, α, β, γ rhythms was computationally estimated
at different brain regions. Statistical test like t-test and ANOVA was also carried out
on the task scores.
Signal Processing in Yoga-Related Neural Circuits …
Fig. 1 Experiment protocol
3 Results
3.1 Increased Changes Attributed to Memory and Attention
Observed in Practitioners
From the plot analysis, it was observed that there was a decrease in beta rhythm
in the left anterior frontal (AF3) and frontal (F7) regions for the static group when
compared to pre-yoga recordings which reflect a decrease in semantic processing
and an increase in memory performance (Fig. 4) [12]. A decrease in alpha rhythm
was also observed in the right temporal (T8), motor cortex (FC6) and frontal region
(F4, F8) which reflect a gradual increase in attention. In the case of dynamic group,
it was observed that there was a beta activation in the temporal (T7, T8) and parietal
(P7, P8) which are known to increase the accuracy of decision making [13]. The
right motor cortex region (FC6) was also found to be activated after dynamic yoga
(Fig. 2).
Gender-based comparisons showed a decrease in beta rhythm in the left anterior
frontal (AF3) and frontal (F7) regions when compared to pre-yoga recordings which
also reflects a decrease in semantic processing and an increase in memory performance [12] (Fig. 3). In female subjects, there was significant activation in the right
frontal lobe (F4, F8 AF4, FC6) which focuses on the emotional state [14], whereas
in male subject, an alpha activation was observed in the frontal regions suggesting
an increase in momentary memory storage and other cognitive functions [15].
D. Kumar et al.
Fig. 2 Bar graphs shows channel-wise activity of alpha and beta rhythms for static verses dynamic
yoga practitioners
Fig. 3 Bar graphs shows channel-wise activity of alpha and beta rhythms for male verses female
yoga practitioners
Spectral map comparisons revealed that there was a profound alpha activation
in the parietal (P7, P8) and occipital (O1, O2) regions in yoga practitioners when
compared to control suggesting an increase in ability to associate stimuli to its
corresponding responses and an increase in attention (Fig. 4).
The raw data was preprocessed and analyzed on a MATLAB (MathWorks, USA)
platform in Intel i3 CPU @ 2.00 GHz, with 4 GB RAM and 64-bit Operating System,
Signal Processing in Yoga-Related Neural Circuits …
Fig. 4 Cortical maps comparison between static and dynamic yoga practitioners for α rhythms
× 64-based processor. Time complexity for the analysis was calculated, and it was
observed that the control data took lesser time to be processed than the other groups
(Table 1). Time taken for static group of ~11.9 s and the control was ~3.4 s. As the
number of samples increases, the computational time also increases.
Table 1 Computational time
for processing the data
Static (Pre and post)
Time (s)
Dynamic (Pre and post)
Male static (Pre and post)
Female static (Pre and post)
Male dynamic (Pre and post)
Female dynamic (Pre and post)
D. Kumar et al.
Fig. 5 Graphical comparison between experimental groups in DLST and WM tasks
3.2 Statistical Analysis on the Task Scores Shows
Homogenous Performance Between the Genders
A t-test was computed to compare the means of both experimental group 1 (female
subjects) and experimental group 2 (male subjects). The p values for DLST task
and WM were 0.86 and 0.89, respectively. At 0.05 level of significance, the p
value suggests that there was no gender-based differences when the two tasks were
conducted (Fig. 5).
3.3 Statistical Analysis on the Task Scores Shows There is
Significant Difference Between the Groups
The scores for both DLST and WM tasks among the dynamic, static and control
groups were compared for significant differences using single factor analysis of
variance (ANOVA). It showed a significant difference between the dynamic, static
and control groups with a p value of 0.001 and 0.021 for DLST and WM tasks,
respectively, at 0.05 level of significance. This correlates to a significantly higher
attentional skill and memory among the yoga group when compared to the control
group (Fig. 6).
4 Discussion
The current study focuses on understanding the neural correlates of cognitive tasks,
such as working memory, internal attention and visual perception by analyzing yogabased practices with brain activity mapping using EEG and percentage power spectral
density analysis. As an initial study for understanding the functional neural dynamics
associated with different cognitive functions of brain, brain topography mapping of
inter-subjects (between and among the group) was analyzed. Time complexity for
Signal Processing in Yoga-Related Neural Circuits …
Fig. 6 Graphical comparison between control and yoga groups in DLST and WM tasks
the computations was calculated and was found to be less than 12 s for the current
dataset and increases with an increase in the number of samples.
Cortical mapping of alpha rhythms on both practitioners showed an activation
in the parietal (P, P8) and occipital (O1, O2) regions which can suggest that yoga
practice increased attention and stimulus–response associations. An increase of alpha
activity in frontal regions could suggest the regulatory functions related to emotional
challenges and attentional engagements. An increase in alpha and beta activity in the
right posterior parietal lobe could have implied an increase in the working memory.
A decrease in beta activity in the left frontal regions also suggests there may be
a significant change in memory performance. The preliminary results also suggest
similar dynamic changes which need to be validated with greater number of subjects.
As yoga-based practices are shown to have an impact on the anatomical changes of
brain which mainly includes frontal cortex, hippocampus, anterior cingulate cortex
and insula, these imaging techniques like EEG can be used to study mind–body
interventions. These mapping and EEG dynamics potentially suggest the need for
further exploration of memory and time-based attenuation attributed to the practice
and possible implications of yoga having been used by generations of communities
as sustainable wellness practices and for mental and physical well-being.
5 Conclusion
Understanding neural circuits and its functions associated with yoga-based practices
helps us to preserve and nurture traditional beliefs. This initial study has allowed
mapping functional activity associated with various cognitive tasks and in identifying
variations in cortical mapping related to the integration of yoga in their daily-life
Acknowledgements This work derives direction and ideas from the Chancellor of Amrita Vishwa
Vidyapeetham, Sri Mata Amritanandamayi Devi. Authors thank staffs and students of Amrita
Vishwa Vidyapeetham, Amritapuri for their role as volunteering subjects. This work was funded
D. Kumar et al.
partially by Amrita School of Biotechnology and Embracing the World Research-for-a-Cause
1. Singh, S.: Yoga: An Answer To Lifestyle Disorders, vol. 5, pp. 27–34 (2016)
2. Deshpande, R.C.: A healthy way to handle work place stress through Yoga. Meditation Soothing
Humor 2, 2143–2154 (2012)
3. Akshayaa, L., Jothi Priya, A., Gayatri Devi, R.: Effects of yoga on health status and physical
fitness an ecological approach—a survey. Drug Invent. Today. 12, 923–925 (2019)
4. Gothe, N.P., Kramer, A.F., Mcauley, E.: The Effects of an 8-Week Hatha Yoga Intervention on
Executive Function in Older Adults, vol. 69, pp. 1109–1116 (2014)
5. Bhatia, T., Mazumdar, S., Nn, M., Re, G., Rc, G., Vl, N., Mishra, N., Raquel, E., Gur, R.C.:
Protocol to evaluate the impact of yoga supplementation on cognitive function in schizophrenia
: a randomised controlled trial (2014)
6. Rocha, K.K.F., Ribeiro, A.M., Sousa, M.B.C., Albuquerque, F.S.: Improvement in physiological and psychological parameters after 6 months of yoga practice. Conscious. Cogn. 21,
843–850 (2012)
7. Tran, M.D., Holly, R.G., Lashbrook, J.: Effects of Hatha Yoga practice on the health-related
aspects of physical fitness. Prev. Cardiol. 4, 165–170 (2001)
8. Monchi, O., Petrides, M., Strafella, A.P., Worsley, K.J.: Functional Role of the Basal Ganglia
in the Planning and Execution of Actions, pp. 257–264 (2006)
9. Roman-gonzalez, A., Processing, S., Human, A., Systems, C.: EEG Signal Processing for BCI
Applications Avid Roman-Gonzalez To cite this version : HAL Id : hal-00742211 (2012)
10. Fisher, S.D., Reynolds, J.N.J.: The Intralaminar Thalamus—An Expressway Linking Visual
Stimuli to Circuits Determining Agency and Action Selection, vol. 8, 1–7 (2014)
11. Radhamani, R., Nizar, N., Kumar, D., Pillai, G.S., Prasad, L.S., Jitha, S.S., Vannathi Kuniyil,
M.K., Sekhar, A.A., Kumar, V.S., Pillai, S., Diwakar, S.: Computational analysis of cortical
EEG biosignals and neural dynamics underlying an integrated mind-body relaxation technique.
Procedia Comput. Sci. 171, 341–349 (2020)
12. Hanslmayr, S., Matuschek, J., Fellner, M.C.: Entrainment of prefrontal beta oscillations induces
an endogenous echo and impairs memory formation. Curr. Biol. 24, 904–909 (2014)
13. Park, J., Kim, H., Sohn, J.W., Choi, J.R., Kim, S.P.: EEG beta oscillations in the temporoparietal
area related to the accuracy in estimating others’ preference. Front. Hum. Neurosci. 12 (2018)
14. Palmiero, M., Piccardi, L.: Frontal EEG Asymmetry of Mood: A Mini-Review (2017)
15. Fundamentals of Cognitive Neuroscience (2013)
Automation of Answer Scripts
Evaluation-A Review
M. Ravikumar, S. Sampath Kumar, and G. Shivakumar
1 Introduction
To evaluate the performance of students at various levels (like primary level, high
school level, undergraduate level, and also at post-graduation level), examination
process will be conducted, which plays an important role in preparation and formulation of question papers. To evaluate the answer booklet written by the students is
very difficult task because handwritten varies from person to person because of the
style, font, size, orientation, etc. In particular, the question paper pattern of primary
and high school level will consist of fill in the blanks, match the following, true
or false, one-word answers, the odd man out, and pick out the odd word. All these
patterns will be answered by the students in their booklets. The questions in the booklets are printed type and the answers that were written by the students is handwritten
Evaluation of the answer booklets particularly written by the primary level
students is very challenging task for the teachers. Handwriting of some students
may be in cursive form which is connected by ligatures as joining strokes and some
may contain disjointed words and an apparent mixture of uppercase and lowercase
letters. For student development and to learn, evaluation can be greatly helpful.
Handwriting is still a necessary skill in our society to achieve a consistent
evaluation. Answers written by students must be clearly understood for the teachers.
M. Ravikumar (B) · S. Sampath Kumar · G. Shivakumar
DOS in Computer Science, Kuvempu University, Shimoga, Karnataka, India
e-mail: ravi2142@yahoo.co.in
S. Sampath Kumar
e-mail: sampath1447@gmail.com
G. Shivakumar
e-mail: g.shivakumarclk@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
M. Ravikumar et al.
The teachers have to look after the answers properly to grade the marks, and it
is not possible to understand every student handwritten answer written in a booklet
easily by viewing the books and grading the marks. Handwritten recognition became
most challenging research area in document image analysis.
The manual system of evaluation for technical subjects is difficult to do for the
evaluators. Answers contain various parameters for evaluation, such as questionspecific content and writing style. Evaluating answers may vary along with the
perception of person because checking hundreds of answer scripts contain same
answers that can be a boring task for evaluators. To overcome all these possible
problems and for faster evaluation, automation of answer scripts is needed. Hence,
in order to automate the evaluation of answer books, it is necessary to overcome all
these issues.
The strategic approach of automated evaluation process provides a framework
for evaluation. From the survey, in Karnataka, there are around 23,640 government
primary schools and 8216 high schools (excluding aided and unaided). If the answer
scripts are evaluated automatically, the process of declaring the result will become
easy and faster.
Every year, the education institution must conduct two examinations and by taking
overall students answer scripts for evaluation, it is very complex task. The automation
framework gives many benefits such as reuse of resources, reduction in time, and
less paper work, and human errors can be controlled during the evaluation process.
Automation of answer scripts evaluation helps teaching faculties to reduce the work
The paper is organized in the following sections: Sect. 2 contains the review of
related work. Section 3 gives brief idea of classification techniques used. Section 4
contains challenges and Sect. 5 explains the conclusions of this research work.
2 Related Work
In this review paper [1], an algorithm, i.e., single sentence descriptive answer is
proposed. The main intension is to represent the answer in the form of graph and
comparing it with predefined answer. Aforementioned problem is solved in the
proposed method; precisely the proposed system will solve grammatically incorrect
sentences. The answer which belongs to a particular subject written by the students
and the standard predefined answer is converted into graphical form in order to
compare the similarities such as string match and WordNet to calculate the similarity score. The proposed model briefs an explanation for automation of subjective
answer evaluation process.
For subjective answer assessment, a method called as subjective answer evaluation
system is proposed [2], which is based on four modules, login module, information
extraction module, weighting module, and score generation module. The proposed
model works at word level, where the answers written by students are compared
with predefined keyword, and the grading for the students is performed based on the
Automation of Answer Scripts Evaluation-A Review
highest matching score. There are three main steps that are key words and synonyms
extraction, matching of keywords, weighting keyword and generating score.
Different examination pattern where one-word answer, true or false, and multiplechoice question are evaluated using a method called blooms taxonomy is proposed
[3]. The blooms taxonomy is represented in the form of triangle which is divided
into four different steps, for above-mentioned problem. The stated method can also
be adopted for online evaluation system. Answer evaluation for one word is implemented by using the concept of text pattern recognition, for recognition of similar
pronunciation of English words.
To assess single sentence and one-word answer, a cognitive and computationalbased algorithm is proposed [4], where the pattern from the answer are extracted
for the purpose of comparing with model answer. The proposed algorithm is implemented using NLTK Python toolkit. The proposed method mainly focuses on inference process which is required for the development to asses one-word and one
sentence-based answer.
To evaluate the descriptive answer, a pattern matching algorithm is proposed [5].
The main intension is to represent student answer and teacher predefined answer
in the form of graph to compare each other, to apply some of similarity measures
like [string match, partial string match, full string match, WordNet] for allocation of
marks. The proposed method represents an approach to check the degree of student
learning knowledge, by evaluating their descriptive exam answer sheets.
To evaluate multiple sentence descriptive answers, a pattern matching technique
algorithm is proposed [6], and the proposed system caters a Web application for
checking descriptive-type answer, learner descriptive answer, and predefined answer.
To convert and to apply similarity measures, the major step in the proposed algorithm
is string match, WordNet, and spreading process. These steps help in analysis of
similarity matching in assessment of evaluation process.
A non-optical test scoring of grid answer on projection profile method is discussed
[7], where the algorithm is used for scoring non-optical traditional grid answer sheets.
Projection profile and thresholding methods are used for experimentation percentage
of correctness that was measured. Answer sheets are divided into three types for
testing, total 16,500 questions were tested, and average accuracy result is detected.
For evaluation of subjective answer, a method is proposed [8], and machine
learning and NLP are proposed to solve evaluation of subjective answer problem.
The algorithm explains semantic meaning of the context, by performing tasks like
words tokenizing and sentence. For experimentation, Python flask Web application
is used and also developed Android app for the results.
Automatic OMR answer sheet evaluation, a new technique is proposed [9], using
computer and scanner a software-based approach is developed, for generating scores
multiple-choice test is used. Opencv is implemented to match accurate option that
is already been saved in database. Thus, designed software is used for decoding
answer sheets. By using the proposed system, it can be executed in microlevel in
administration of government sectors.
A field programmable gate array (FPGA) for implementation of OMR answer
sheet scanning using sensors is proposed [10]. A finite state machine is designed
M. Ravikumar et al.
for FPGA. IR sensors are used to scan the answers. Pre- and post-algorithms are
used for computing stored data. Automatic document feeder is used for scanning the
OMR answer sheets. An efficient alternative method is proposed for optical mark
recognition technique over complex images.
Natural language processing that has been used for evaluating student descriptive
scripts is proposed [11], and the techniques like statistical, information extraction,
and full natural language processing are used for automatic marking of free text.
Computer-aided assessment (CAA) has been implemented for evaluation of descriptive answer. Collective meaning of multiple sentences is considered and proposed in
the existing system. The proposed system tries to check grammatical and spelling
mistakes made by the student.
A Python tool that has been used for evaluation of subjective answers is proposed
[12], and a Python tool for evaluation of subjective answers (APTESA) is a tool
developed for automated evaluation. The tool is divided into two types for better
evaluation: First is the semi-automated mode and second is complete automated
mode. The development of APTESA is by using pyqt, Python, and its modules.
Thus, comparatively it is established that semi-automated mode yields better results
than the complete automated mode.
A comparative study of techniques for automatic evaluation of free text [13], the
techniques like LSA, BLEU, and NLP, has been used for automatic evaluation of
free text. LSA technique can extract hidden meaning in a text. Bilingual evaluation understudy (BLEU) is to measure the translation closeness between candidates.
Translation NLP is used as a statistical means to disambiguate word or multiple parse
of same sentence. The above-mentioned techniques review the different methods for
automatic evaluation based on free text.
Automatic grading of multiple-choice answer sheets is proposed [14]. The main
process of the system is divided into three parts, i.e., checking the number of problem
series, identifying examiners id, and checking the answer part. Correlation coefficient is applied to system for checking the answer from solution sheet images. The
proposed system supports with any pencil or pen. Finally, the accuracy was 99.57%
for poorly erased marking. In the developed system, 71.05% time is saved when
compared with manual checking of answers.
A novel approach for descriptive answer script is proposed [15]. ITS has been
designed for models and for different algorithms. The ITS system provides a content
materials and important test sessions. Descriptive answer evaluation works best for
simple sentences. Third party tool has been used for grammar checking and spellchecking module for better accuracy. Thus, proposed system represents assembled
approach with number of essential features.
Descriptive examination and assessment system by comparison-based approach
is proposed [16], and descriptive answers which are stored on the server machine
are compared with standard descriptive answer by using the approach text mining
technique, which involves matching keywords and sequence. Thus, the proposed
algorithm provides an automation of descriptive answer evaluation.
Automation of Answer Scripts Evaluation-A Review
Soft computing techniques are used to evaluate students answer script that is
proposed [17], and weights of the attributes have been generated for automatic evaluation. To evaluate students, answer scripts in a more flexible way, adjustment quantity is normalized to ensure the fairness of the adjustment in each inference result.
By introducing a new fuzzy evaluation system, the system provides a more quick
and valid evaluation.
A performance evaluation based on template matching technique is proposed
[18], and frequently asked question (FAQ) answering system that provides prestored
answers to user question, an automated approach is adopted. Question answering
system is divided into two systems, i.e., closed domain and open domain. Three
main techniques are used for answering system based on template matching which
are random classification templates, similarity-based classification of templates, and
weighting template words. On experimentation, it is showed that similarity-based
clustering method with weighted templates would be better choice according to
Different approaches for automated question answering are proposed [19], and
natural language processing (NLP) information retrieval and question templates are
used to compare question answering approaches. For difficulties in quality of answer,
NLP is applied. For extraction of facts from text, IR QA is used.
An automatic answering system to solve close domain problems is proposed
[20], to perform matching template technique is used. To conquer mistakes that
may happen because of spelling botches, a strategy is created. The framework is
developed with the goal of answering questions asked through phones and messages
to understand the SMS language comprehend to English.
In the next section, summary of algorithms and classification techniques is
3 Challenges
In this section, we mentioned some of challenging issues related to evaluation of
answer scripts, they are as follows.
Present system evaluates only in English language. In future, it can be extended
to other languages.
Grammar checking tool has to be implemented for accurate assessment of Marks
of a student.
A framework can be created to assess answers with diagrams and formulas.
A technique for assessment of handwritten paper by converting it to soft copy
using descriptive examination system and voice recognition system can be
To minimize the gap between human and computer assessor, an appropriate
algorithm has to be implemented.
M. Ravikumar et al.
For same pattern of answer sheet, the algorithm can be improved in many angles
for some repetitive task like line and grid determination.
4 Conclusion
This survey paper gives an immense idea of different approaches and techniques
used for evaluation of answer booklet. Subjective and descriptive type of evaluation is explained, and machine learning and many more techniques are vastly used
in automating the evaluation process. Thus, it is observed that from this research
concept, and this work provides a better basic knowledge for automating the answer
booklet evaluation (Table 1).
Table 1 Summary of algorithms and classification techniques is discussed
Accuracy (%)
Dr. Vimal and
MCQ and
Database from
10 candidates
Database from
Question by
Piyush Patil
dataset created
on their own
Ms. Shwetha
IAM statistical
Dharma Reddy
Sample of 120
Ms. Paden
Sample of 120
written answer
Shaikhji Zaid
data from
Nilima Sandip
Aptesa. Text
Automation of Answer Scripts Evaluation-A Review
1. Praveen, S.: An approach to evaluate subjective questions for online examination system. Int.
J. Innov. Res. Comput. Commun. Eng. 2, 6410–6413 (2014)
2. Tulaskar, A., Thengal, A., Koyande, K.: Subjective answer evaluation system. Int. J. Eng. Sci.
Comput. 7(4), 10457–10459 (2017)
3. Parmar, V.P., Kumbharana, C.K.: Analysis of different examination patterns having question
answer formulation, evaluation techniques and comparison of MCQ type with one word answer
for automated online examination. Int. J. Sci. Res. Publ. 6(3), 459–463 (2016)
4. Dhokrat, A., Hanumant, G., Mahender, N.: Automated answering for subjective examination.
Int. J. Comput. Appl. 56, 14–17 (2012)
5. Nikam, P., Shinde, M., Mahajan, R., Kadam, S.: Automatic evaluation of descriptive answer
using pattern matching algorithm. Int. J. Comput. Sci. Eng., 69–70 (2015)
6. Sattayakawee, N.: Test scoring for non-optical grid answer sheet based on projection profile
method. Int. J. Inf. Educ. Technol. 3(2), 273–277 (2013)
7. Patil, P., Patil, S., Miniyar, V., Bandal, A.: Subjective answer evaluation using machine learning.
Int. J. Pure Appl. Math., 01–13 (2018)
8. Kulkarni, D., Thakur, A., Kshirsagar, J., Ravi Raju, Y.: Automatic OMR answer sheet evaluation
using efficient reliable OCR system. Int. J. Adv. Res. Comput. Commun. Eng. 6(3), 688–690
9. Patil, A., Naik, M., Ghare, P.H.: FPGA implementation for OMR answer sheet scanning using
state machine and Ir sensors. Int. J. Electr. Electron. Data Commun. 4(11), 15–20 (2016)
10. Patil, S.M., Patil, S.: Evaluating student descriptive answers using natural language processing.
Int. J. Eng. Res. Technol. (IJERT) 3(3), 1716–1718 (2014)
11. Reddy Tetali, D., Kiran Kumar, G., Ramana, L.: Python tool for evaluation of subjective answers
(APTESA). Int. J. Mech. Eng. Technol. (IJMET) 8, 247–255 (2017)
12. Rinchen, P.: Comparative study of techniques used for automatic evaluation of free text answer.
Int. J. Recent Innov. Trends Comput. Commun. 3839–3842 (2014)
13. Patel, D., Zaid, S.: Efficient system for evaluation of Omr sheet—a survey. Int. J. Adv. Res.
Eng. Sci. Manag., 01–08 (2017)
14. Ntirogiannis, K., Gatos, B., Pratikakis, I.: An Objective Evaluation Methodology for Handwritten Image Document Binarization Techniques, 217–224 (2008)
15. Gite, N.S.: Implementation of descriptive examination and assessment system. Int. J. Adv. Res.
Sci. Eng. 07, 252–257 (2018)
16. Malpe, V., Bhatia, S.: Evaluation of students’ answer scripts using soft computing techniques.
Int. J. Mod. Eng. Res. (IJMER) 2(3), 1280–1289 (2012)
17. Gunawardena, T., Pathirana, N., Lokuhetti, M., Ragel, R., Deegalla, S.: Performance evaluation
techniques for an automatic question answering system. Int. J. Mach. Learn. Comput. 5(4),
294–300 (2015)
18. Andrenucci, A., Sneiders, E.: Automated question answering: review of the main approaches.
In: Proceedings of the Third International Conference on Information Technology and
Applications, pp. 514–519 (2005)
19. Gunawardena, T., Lokuhetti, M., Pathirana, N., Ragel, R., Deegalla, S.: An Automatic
Answering System with Template Matching for Natural Language Questions. IEEE, 353–358
20. Singh, P., Sheorain, S., Tomar, S., Sharma, S., Bansode, N.K.: Descriptive answer evaluation.
Int. Res. J. Eng. Technol. 05, 2709–2712 (2018)
Diabetes Mellitus Detection
and Diagnosis Using AI Classifier
L. Priyadarshini and Lakshmi Shrinivasan
1 Introduction
According to International Diabetes Federation (IDF), there are no signs of diabetes
epidemic relenting. More than 463 million adults worldwide struggle with diabetes
according to 9th edition of IDF Diabetes Atlas. Diabetes is responsible for 4.2 million
deaths each year and can result in severe complications, disability and reduced quality
of life, which to a large extent could be prevented with proper diagnosis and access
to medical care. Diabetes mellitus (DM) is a chronic disease that develops when the
pancreas is no longer capable of producing insulin or the condition when the body
is unable to use the insulin produced. It contributes to elevated blood glucose levels
known as hyperglycaemia. High glucose levels cause damage to body by failure of
various organs and tissues. Diabetes requires ongoing medical treatment and patient
self-management knowledge to avoid severe complications in order to decrease the
likelihood of long-term problems. Development in insulin sensitivity or reduction in
the generation of hepatic glucose will resolve the hyperglycaemia condition. In order
to achieve this, continuous and accurate monitoring of insulin and glucose level are
With advancement in computer technology, there is a huge demand for production
of knowledge-based and intelligent systems specially in medical diagnosis. This has
led to interaction between doctors and engineers almost in all interdisciplinary fields.
With the implementation of computer-based technologies, CDSS was designed to
provide assistance to medical practitioners [1], the results obtained are faster and more
accurate when compared to that of conventional technologies. However, there exists
uncertainty in medical data which leads to difference in diagnosis. Introduction of
artificial intelligence in medical domain has provided solutions for such conditions
L. Priyadarshini (B) · L. Shrinivasan
Ramaiah Institute of Technology, Bangalore, Karnataka, India
e-mail: priyanaik0684@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
L. Priyadarshini and L. Shrinivasan
with expert systems such as fuzzy systems and neural network. In [2, 3], it was
shown that fuzzy logic is a solution to improve these uncertainties providing powerful
decision-making support with improved reasoning capabilities. In [4], a fuzzy expert
system was proposed to manage dynamics of diagnosis and medication of type 1
diabetes in an individual by calculating probability of diabetes. Output is semantically
arranged in terms of fuzzy numbers like very low, low, medium, high and very high
diabetes. Based on these, insulin dosage was recommended. Similarly, in [5], a
fuzzy system was designed to determine risk percentage of getting diagnosed with
diabetes wherein clinicians were assisted through a GUI making it a user-friendly
layout and saving diagnosis time. Joshi and Borse [6] makes use of artificial neural
network (ANN) using back propagation as an accurate method to diagnose DM in
an individual. In [7, 8], ANFIS was proposed which incorporates the features of
both fuzzy control interpolation and neural network (NN) with back propagation
for adaptability. The results show that classification accuracy is higher than other
methods. In [9], pre-emptive diagnosis was attempted to identify features and classify
data using machine learning. Experimental results showed that ANN outperformed
SVM, Naïve Bayes and K-nearest neighbour techniques with testing accuracy of
77.5%. Attempts have been made in the present work to apply ANFIS to PIMA Indian
Diabetes Database (PIDD) to obtain better prediction and classification accuracy with
which the inherent uncertainty degree can be accessed.
2 Proposed Methodology
The proposed work as shown in Fig. 1 presents a simple fuzzy system for diagnosing
DM. PIDD parameters glucose level, insulin level, body mass index (BMI), diabetes
pedigree function (DPF) and age were fed as input attributes and probability of
diabetes as output.
The membership functions were computed using fuzzy logic toolbox in
MATLAB, with which the fuzzy inference system (FIS) tracks inputs and output
Fig. 1 Proposed fuzzy-based expert system design
Diabetes Mellitus Detection and Diagnosis Using AI Classifier
Fig. 2 Classification of PIMA datasets depicting probability of diabetes
data. FIS was constructed by fuzzifying crisp input values contained in the knowledge base using Mamdani approach. If–then rules were created to form rule base for
decision making and aggregated into fuzzy single output set. Thereafter, defuzzification process was carried out with centroid method which provides crisp output as
shown in Fig. 2. Proposed fuzzy system was robust in nature and capable of classifying diagnosis for a large dataset of various attributes with both objective and
subjective knowledge coordinated in logical way. However, to increase accuracy
further, the number of rules was increased. This slowed down system time response
which indicated system had poor adjusting capability during a learning process.
ANFIS is an integrated NN and fuzzy inference system exploiting the benefits
of both. ANFIS forms a class of adaptive networks where in degree of membership
are modified either by back-propagation algorithm or by using hybrid algorithm.
Supervised algorithm such as the back-propagation algorithm minimises difference
between actual and desired output by gradient descent method thus optimising
premise parameters whereas with hybrid algorithm, both premise and consequent
parameters are optimised, the latter by least squares estimate (LSE) method thus
converging faster. Figure 3 shows structure of the proposed ANFIS model which
comprises of two fixed nodes and three adjustable nodes which are interconnected
equivalent with first order Takagi–Sugeno-type system.
The proposed ANFIS model had five layers of five input attributes and one
output. Initially, 80% of the dataset was loaded as training data. FIS was generated with trapezoidal membership function. Both hybrid and back-propagation algorithm were applied for optimization and compared at different epochs. Sugeno-based
fuzzy model was considered for its efficiency and adaptability to extract appropriate
knowledge from database. Input attributes were fuzzified in layer1 where every node
being adaptive in nature was associated with a linguistic label which defines degree
of membership set for each input and forms premise parameters of ANFIS model.
Fixed nodes in layer 2 performs AND operation to multiply inputs indicating simple
multiplier with each node representing firing strength of each rule. Each node in
layer 3 was fixed and plays the role of firing strength normalisation. Nodes in layer
4 are adaptive and computes product of normalised firing strength obtained from
previous layer and first order polynomial forming consequent parameters. Layer 5
had just one fixed node which computes the total output by adding up all received
signals, i.e. process of defuzzification. Outputs were classified based on weighted
mean (WM) method. Once training was complete, remaining 20% of dataset was fed
to the system for testing. 243 if–then fuzzy rules were created. Further, performance
L. Priyadarshini and L. Shrinivasan
Fig. 3 Proposed structure of ANFIS model
of the model was verified by calculating root mean square error (RMSE) and mean
squared error (MSE) during training and testing by using subsequent equations:
|ti − oi |]
RMSE = [
p i=1
MSE = [
|ti − oi |]
p i=1
with ‘p’ indicating number of data points, ‘t’ representing target value and ‘o’
representing output value.
3 Results and Discussion
The model was trained and tested by using fuzzy logic toolbox in MATLAB. Accuracy of the model during training and testing using both hybrid algorithm and backpropagation algorithm was substantiated through the statistical indices mentioned in
Eqs. 1 and 2 in the previous section. The statistics obtained from the training data is
presented below in Table 1.
As observed from the performance table, in both cases, RMSE and MSE values
were marginal for the training data, thus negligible. The same was depicted in the
graphs below in Fig. 4. RMSE values obtained by implementing hybrid algorithm
Diabetes Mellitus Detection and Diagnosis Using AI Classifier
Table 1 Performance of ANFIS model with hybrid algorithm
Back propagation
Fig. 4 Performance evaluation of ANFIS model for hybrid and back-propagation algorithms
were found to be lower than those obtained with back-propagation algorithm. Thus,
hybrid algorithm proves to provide better performance accuracy and thereby more
efficient in training the ANFIS model. Validation is an essential phase in evaluating
ability of the model. ANFIS model was compared with different classifiers of machine
learning techniques as tabulated in Table 2 in terms of statistical parameters such as
accuracy, specificity and sensitivity obtained from the confusion matrix.
A fivefold cross-validation of SVM, K-nearest neighbour and Naïve Bayes are
presented in the table. Also, an optimised recursive general regression neural network
(R-GRNN) oracle was applied on the same PIMA diabetes database in previous work
[10]. Accuracy results of all these models were compared and observed that most
accurate results were obtained from the ANFIS method of classification.
Table 2 Performance
validation of classification
Accuracy (%)
Naïve Bayes
L. Priyadarshini and L. Shrinivasan
4 Conclusion
The need for an efficient expert system to be used to diagnose and analyse medical
data has been in demand by CDSS in the recent years. ANFIS model was designed
with hybrid of NN and fuzzy system to achieve medical diagnosis in a timely and
concise manner. ANFIS has the advantage over limitation of fuzzy logic in adjusting
to changes and requirement of lesser number of epochs, thus reducing computation
time. Results exhibit proposed ANFIS model to have a good classification accuracy
of 86.48% with minimal error when compared with other algorithms. In future, the
model is to be tested for a larger dataset and compared with deep neural network to
provide competent diagnosis and support CDSS in the best way.
1. Sutton, R.T., Pincock, D., Baumgart, D.C., Sadowski, D.C., Fedorak, R.N., Kroeker, K.I.: An
overview of clinical decision support systems: benefits, risks, and strategies for success. NPJ.
Digital Med. 3(1), 1–10 (2020)
2. Takagi, T., Sugeno, M.: Fuzzy identification of systems and its applications to modeling and
control. IEEE Trans. Syst. Man Cybern. 1, 116–132 (1985). https://doi.org/10.1016/b978-14832-1450-4.50045-6
3. Sikchi, S.S., Sikchi, S., Ali, M.S.: Design of fuzzy expert system for diagnosis of cardiac
diseases. Int. J. Med. Sci. Publ. Health 2(1), 56–61 (2013). doi.org/https://doi.org/10.5455/ijm
4. Lalka, N., Jain, S.: Fuzzy based expert system for diabetes diagnosis and insulin dosage control.
In: International Conference on Computing, Communication & Automation. IEEE (2015).
5. Abdullah, A.A., Fadil, N.S., Khairunizam, W.: Development of fuzzy expert system for diagnosis of diabetes. In: 2018 International Conference on Computational Approach in Smart
Systems Design and Applications (ICASSDA) IEEE (2018). doi.org/https://doi.org/10.1109/
6. Joshi, S., Borse, M.: Detection and prediction of diabetes mellitus using Back-propagation
neural network. In: 2016 International Conference on Micro-Electronics and Telecommunication Engineering (ICMETE). IEEE (2016). doi.org/https://doi.org/10.1109/icmete.201
7. Kalaiselvi, C., Nasira, G.M.: A new approach for diagnosis of diabetes and prediction of cancer
using ANFIS. In: 2014 World Congress on Computing and Communication Technologies. IEEE
(2014). doi.org/https://doi.org/10.1109/wccct.2014.66
8. Saraswati, G.W., Choo, Y.-H., Jaya Kumar, Y.: Developing diabetes ketoacidosis prediction
using ANFIS model. In: 2017 International Conference on Robotics, Automation and Sciences
(ICORAS). IEEE (2017). doi.org/https://doi.org/10.1109/icoras.2017.8308066
9. Alassaf, R.A., et al.: Preemptive diagnosis of diabetes mellitus using machine learning. In:
2018 21st Saudi Computer Society National Computer Conference (NCC). IEEE (2018).
10. Kirisci, M., Yılmaz, H., Ubeydullah Saka, M.: An ANFIS perspective for the diagnosis of
type II diabetes. Ann. Fuzzy Math. Inform. 17(2), 101–113 (2019). doi.org/https://doi.org/10.
Review on Unit Selection-Based
Concatenation Approach in Text
to Speech Synthesis System
Priyanka Gujarathi and Sandip Raosaheb Patil
1 Introduction
The text-to-speech system converts raw or input text in any natural language in to
its corresponding spoken waveform. Speech signal is used in many applications in
human computer interactive systems. In India, wide variety of societies, religions
are present. Great linguistic diversity and majority of Indian states have different
languages that is natively written and spoken. Nowadays, people are more interested
in their native language. A TTS synthesis system mainly consists of two components:
the natural language processing (NLP) and the signal processing. Recent state-ofthe-art corpus-based text-to-speech systems generate synthesized speech by concatenating phonetically labeled speech segments which are selected from a large speech
database. Database must contains various combinations of labeled speech segments.
TTS systems using a corpus-based concatenative synthesis method produce more
natural and higher quality speech as the size of the recorded database becomes larger.
A speech corpus is digitally recorded and stored, and then, the speech segments are
marked manually with visualization tools or automatically with segmentation algorithms. But manual segmentation is very tedious and time consuming task. Manual
segmentation is done through many softwares, e.g., audacity, wave surfer, etc. Speech
segments are selected to minimize discontinuity problems caused by their concatenation. Generally, a combination of diphone, half-syllables, syllables, and triphones
P. Gujarathi (B)
E & TC Engineering Department, JSPM Rajarshi Shahu College of Engineering, Pune,
Maharashtra, India
e-mail: jspmpriyanka@gmail.com
S. R. Patil
E & TC Engineering Department, Bharati Vidyapeeth’s College of Engineering for Women, Pune,
Maharashtra, India
e-mail: sandip.patil@bharatividyapeeth.edu
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
P. Gujarathi and S. R. Patil
are chosen as speech segments because they involve most of co-articulations and
2 The Process of Speech Production
Speech signals are fundamentally composed of a sequence of sounds. Transition
between sounds provides a symbolic representation of information. The arrangement of these sounds information is governed by the rules of language. These rules
of languages and their implementation is called linguists domain. Study and classification of these sounds speech signal is called as phonetics. The concept of speech
signal processing is to enhance or extract information to get more detail information
about structure of the signal means how the information is encoded in the signal [1].
Basic components of TTS system are (i) Text pre-processing, (ii) Text to phoneticprosodic translation.
Text pre-processing: In this step, text string of characters are translated into a
new string with ambiguities resolved, example of this is the translation of “Sr”
into either “Senior” or “Serial” or “Sir,” depending on the linguistic context.
(ii) Text to phonetic-prosodic translation: Parsing is done on processed text to determine its semantic structure. The sequence of words and their derived structure
then used to generate prosodic information and sound units like phonemes,
diphone, syllables, and polysyllables. The generation of sound is usually a
more complex task than searching words in dictionary because the pronunciation of words is highly dependent on context. Some prosodic rules determine
the quantities such as pitch, amplitude, and duration for each of these sound
(iii) After sound unit labeling, pitch, duration, amplitude, spectrum modification
the signal processing component generates speech signal [2].
2.1 Basic TTS System
TTS system is mainly divided into two parts: Front end and back end (Fig. 1).
Front end: (i) Text Normalization, (ii) Phonetic Transcription, (iii) Prosody
Text Normalization: In text pre-processing input text containing symbols,
numbers, abbreviations are converted into equivalent words. This is also called
as text tokenization. Text analysis can be generally divided into several stages,
such as labeling, word segmentation, syntactic parsing, semantic interpretation, etc. Lexical ambiguity should be minimized before processing any speech
signal. A text Normalization plays a major role in TTS systems, because the
segmentations and part of speech (POS) of the sentence will directly influence
Review on Unit Selection-Based Concatenation Approach …
Fig. 1 Basic TTS system
the prosody of speech such as pitch contour, the duration of syllable or pause,
stress, and many more parameters [3].
(ii) Phonetic Transcription: It is also called as text to phoneme conversion. In this,
each word is assigned to its phonetic transcription.
(iii) Prosody information: Basically, there are three specific features related to
prosody are: intonation, segmental duration, and energy. In linguistics prosody
means the rhythm, stress, and intonation of speech waveform prosody may
reflect the various features of the speaker, their utterances. For example: It
may represent the emotional state of speaker, prosody may also reflect if it
is statements or questions or command. Intonation is the variation of pitch
period information for word. It is used in all languages to shape sentence and
indicates its structure. It is used in non-tonal languages to add attributes to
words, and it is also used to differentiate between questions (like yes or no),
declarative statements, commands, requests, etc. Fluctuations in pitch either
give rising pitch information or falling pitch information. Different intonations
are: rising intonation (Pitch of the voice increases with time), falling intonation (Pitch of the voice decreases with time), dipping intonation (Pitch falls
and then rises), peaking intonation (Pitch rises and then falls.).Pitch can also
indicate attributes and has four different levels: Low pitch is used at the end
of utterances. Normal conversion uses middle or high pitch. High pitch occurs
at the end of questions (yes/no). Very high pitch is used for strong emotions.
Mostly, intonation modeling is considered to play a key role to produce natural
speech synthesis systems.
Phonetic transcription and prosodic information together will give symbolic
linguistic representations of signal.
Back end: The symbolic linguistic representation of speech signal is applied to
synthesizer block to get sound signal using signal processing. It also includes
computation of output prosody such as pitch contour, durations, spectral information, cepstrum information, phoneme, etc., to get natural sounding speech
output. Backend is very crucial and important component when intelligibility
and quality of speech signal in TTS is considered. Unit length selection is an
important task in concatenative speech synthesis.
A shorter unit length of sound requires less space but sample collecting and
labeling becomes more difficult and complex.
P. Gujarathi and S. R. Patil
(ii) A longer unit length of sound requires more memory space but gives more
naturalness, better co-articulation effect, and less concatenation points.
(iii) Choices of unit for TTS are phonemes, diphones, triphones, demi syllables,
poly syllables, syllables, and words.
3 Different Synthesizer Technologies
The essential qualities required for speech synthesis system are intelligibility and
naturalness. Intelligibility means the ease with which output is understood and how
comprehensible speech is in given conditions. Naturalness describes how closely the
output is similar to human speech. Ideal speech synthesizer must be both natural and
intelligible. Different speech synthesis technologies are articulatory synthesis, LPC
synthesis, formant synthesis, concatenative synthesis, HMM-based synthesis, sine
wave synthesis, source filter synthesis.
3.1 Concatenative Synthesis
Basically, concatenation means stringing together different speech segments from
pre-recorded speech database. Generally, concatenative synthesis can generate
natural sounding speech signal. In this first speech, database is created and from
database speech segments are extracted by using different semiautomatic or automatic segmentation algorithms. Then, these segments (Sound unit: Phoneme, syllables, Polysyllables) are joined together to get synthesized speech signal, but some
audible glitches are observed in the output, and it should be minimum to get natural
sounding speech output. There are different subtypes of concatenative synthesis:
1. Unit selection synthesis 2. Diphone synthesis 3. Domain specific synthesis. 4.
Phoneme-based synthesis.
3.2 Unit Selection Synthesis
In unit-selection synthesis, large database of recorded speech is used. First database
is created after that each recorded utterance is segmented into individual phones,
diphones, halfphones, syllables, polysyllables, words, phrases, etc., according to
speech unit is selected. The division of the units in the speech database into segments
is made based on segmentation and different acoustic parameters for each segment
such as pitch, position, duration in the syllable, and the context for each segment is
stored. Then, each segment is indexed for easy recovery. In synthesis, desired output
utterance is created by determining the best combination of segments from speech
database accessed using a proper index.
Review on Unit Selection-Based Concatenation Approach …
3.3 Unit Selection and Specification Process
Many text-to-speech synthesizers for Indian languages have used synthesis techniques that require prosodic models but due to unavailability of properly annotated
databases for Indian languages, prosodic models for these synthesizers have still
not been developed properly. Syllable-like speech unit is suitable for concatenative
speech synthesis do not require extensive prosodic models [4]. The general format
of an Indian language syllable is CVC, CV, CCV, etc., where C is a consonant, V
is a vowel. Syllable is again divided into onset, rime, coda. During the selection
process, the phonetic and prosodic constraints are applied. Indian languages are
syllable centered, and pronunciations are based on these syllables. Syllable units can
capture co-articulation better than phones.
4 Comparison of Different Synthesis Methods
Prosodic information generation is the important parameter. Prosodic parameters
include pitch contour, energy level, initial duration, final duration, pause duration.
Recurrent fuzzy neural network (RFNN) is a multilayer recurrent neural network
technique for a Chinese TTS system with a Chinese database based on the timedomain pitch synchronous overlap adds (TD-PSOLA) method [3]. For formant estimation, pole analysis procedure is used. Different methods of spectral and formants
smoothing are explored. Spectral smoothing is achieved at segment boundaries by
interpolating the LP autocorrelation vectors. Formant smoothing methods involves
direct modification of the formant frequencies. Spectral continuity at concatenative
boundaries and across periodic groups in the sentence is observed [5]. Non-predictive
analysis-by-synthesis scheme for speaker-dependent parameter estimation is implemented to get a high compression ratio. The spectral coefficients are quantized
by using a memory less split vector quantization (VQ) approach. Non-productive
and predictive type methods are combined to improve the coding efficiency in TTS
system [6]. Accurate phone boundaries are essential for acoustic–phonetic analysis
in automatic speech recognition and speech synthesis systems. However, the process
of manually determining phonetic transcriptions and segmentations is laborious,
expensive. It requires expert knowledge and very time consuming. In addition, exact
positions in time and some phone or syllable boundaries are detected accurately. The
cost and effort required for this process are important for large databases, so there
is need for automatic segmentations and is mainly motivated by the need for large
speech databases used to train and evaluate to build concatenative text-to-speech
(TTS) systems. Mel Frequency cepstrum coefficients (MFCC) spectral features are
used. The phone boundaries are automatically detected at the maximum spectral transition positions and manual detection of phone boundaries in the training part of the
TIMIT database [7]. There are two types of costs used in concatenation, target costs
and concatenation costs. The target cost is calculated by adding the effort required in
P. Gujarathi and S. R. Patil
finding the relevant unit in database. The concatenation cost is estimated in joining
the speech units [8].
Speech segments or units can be of various sizes such as phones, diphones, syllables, and even words. Selection of each of the sound units has their own advantages
and disadvantages. The selection of sound units depends on basic characteristics of
the language. Indian languages have a well-defined syllable structure. Syllable can
preserve better co-articulation effect as compared to phones and diphones. Hence,
syllables are selected as the basic units for synthesis. Unit selection process is carried
out based on two cost functions: concatenation cost and target cost. Unit selection cost
functions (target cost and concatenation cost) should ensure that the selected optimal
unit sequence should closely match with the target unit specification and with other
adjacent units in the sequence [9]. For modifying speech prosody, a time-domain
algorithm is used which requires less work for computation. The pitch synchronization overlap and add algorithm (PSOLA) modifies the pitch and duration of
speech signals [10]. But PSOLA has no control over formant features. Modifying
some prosodic features such as line formants and pitch, we can get natural speech
synthesis [11]. Another method involves concatenation of pre-recorded speech audio
files by selecting the most appropriate unit from a speech corpus database [12]. Unit
selection algorithm to select the best annotated speech unit from the database [13].
Speech synthesis approach is statistical parameters HMM for generating intelligible
speech [14]. The advantages of HMM and unit selection can be place together to
generate better speech quality [15]. A unit-selection algorithm has been used for
developing the speech synthesizer [16].
The basic genetic algorithm searches in the large search space by searching at
multiple locations than at just one location [17]. Implementation of GA in speech
synthesizers based upon unit selection [18]. The approach taken for implementation of GA is based on the concept of reducing join cost [8]. Unit selection
speech synthesis used to solve some of the problems of unnaturalness introduced
by the signal processing techniques [19]. Adoption of deep neural networks for
acoustic modeling has further enhanced the prosodic naturalness and intelligibility
of the synthetic speech [20]. In addition, the ongoing emergence of neural network
waveform generation models (neural vocoders). Neural vocoders has nearly closed
the quality gap between natural and synthetic speech. Speech synthesis systems
raw waveform generation neural models WaveNet and GlotNet are presented [21].
WaveNets can be used for text-to-speech (TTS) synthesis in state-of-the-art concatenative and statistical parametric TTS systems [22]. Source-filter vocoder STRAIGHT
used in parametric TTS [23].
Unit selection methods require improvements to prosodic modeling and that
HMM-based methods require improvements to spectral modeling for emotional
speech signal. The main drawback of statistical parametric speech synthesis is that
the spectra and prosody generated from HMMs can be over-smooth and lacking the
detail present in natural spectral and prosodic patterns because of the averaging in
the statistical method [24]. The main drawback of concatenative methods such as
unit selection is that the technique requires a large speech database genetic algorithm in solving unit-selection problem [25]. As compared to diphone-based TTS
Review on Unit Selection-Based Concatenation Approach …
systems, unit selection TTS synthesis minimizes the number of artificial concatenation points and reduces the need for prosodic modification at synthesis time [26]. At
current stage, unit selection and waveform concatenation synthesis [3] and HMMbased parametric synthesis [1] are two main speech synthesis methods. Each of these
two methods has its own advantages. For unit selection and waveform concatenation method, the original waveforms are preserved and better naturalness can be
obtained especially given a large database. On the other hand, HMM-based parametric synthesis provides better smoothness, robustness, flexibility, and automation in system building [27]. Although statistical parametric speech synthesis offers
various advantages over concatenative speech synthesis, the synthetic speech quality
is still not as good as that of concatenative speech synthesis or the quality of natural
speech [28]. In Table 1, comparison of different synthesis methods, prosodic features,
and performance evaluation for TTS systems are given from different papers.
Table 1 Comparison of TTS systems
Synthesis method Unit
analysis features
Lin et al.
1. Recurrent
fuzzy neural
listening test
energy levels,
duration, and
pause duration
Phuq Hui
Low et al.
Spectral and
Xian-Jun Xia
et al.
Hidden markov
model (HMM)
Log likelihood
ratios (LLR)
Spectral and
F0 features
and the phone
Listening test
Lee et al.
pattern matching
Sorin Dusan
et al.
measure: Peak
Mel-frequency –
Isabelle Y.
et al.
P. Gujarathi and S. R. Patil
Table 1 (continued)
Synthesis method Unit
analysis features
Kasuya et al.
algorithm, linear
Jittiwarangkul smoothing,
et al.
Moving average
energy, the
root mean
square energy,
the square
Jitsup et al.
1-D stationary
wavelet transform
detection analysis
Chou et al.
neural network,
syllable endpoint
Tian Li et al.
Root means
square energy
rate (ZCR)
Power, pitch,
features from
time and
Thomas et al.
Monosyllables Yes
Dı´az et al.
Unit selection
method, Viterbi
Accent, pitch,
Objective and 2006
Xin Wang
et al.
speech, decision
context clustering
MFCC, logF0
Unit selection
Barra-Chicote and HMM-based
et al.
Objective and 2010
Yee Chea Lim Genetic
et al.
algorithm (GA)
MFCCs and
Review on Unit Selection-Based Concatenation Approach …
Table 1 (continued)
Synthesis method Unit
Alı´as et al.
Diphone and
et al.
A genetic
Ling et al.
Minimum unit
selection error
(MUSE) for
Robert A.J.
Clark et al.
Unit selection for Diphones
the Festival
Lauri Juvela
et al.
speech synthesis
WaveNet, neural
neural vocoder
et al.
speech synthesis
analysis features
Weight tuning
duration, f0
Syllable, word
Listening test
5 Conclusion
For text-to-speech conversion, the unit-selection speech synthesis is the simplest
method. The sound unit plays important role. Different speech units are available like
phoneme, diphone, triphone, syllable, mono syllable, poly syllable, words, etc. These
speech units are synthesized by different speech synthesis technologies. However,
unit-selection synthesizers are usually limited to one speaker and one voice. But by
selecting longer speech units high naturalness, concatenation is achieved [29]. Available speech synthesis technologies are having both advantages and disadvantages.
Using different prosodic features, we can get natural speech synthesis output. But
prosody and naturalness are still an issue and research is going to improve and to get
intelligible speech.
P. Gujarathi and S. R. Patil
1. Rabiner, L.R., Schafer, R.W.: Digital signal processing of speech signals
2. Xydas, G., Kouroupetroglou, G.: Tone-Group F0 selection for modeling focus prominence in
small-footprint speech synthesis. Speech Communication, Greece (2006)
3. Lin, C.-T., Wu, R.-C., Chang, J.-Y., Liang, S.-F.: A novel prosodic-information synthesizer
based on recurrent fuzzy neural network for the Chinese TTS system. IEEE Trans. Syst. Man
Cybern. Part B: Cybern. 34(1) (2004)
4. Thomas, S.: Natural Sounding Text-to-Speech Synthesis Based on Syllable-Like Units. Indian
Institute of Technology Madras (2007)
5. Low, P.H., Ho, C.-H., Vaseghi, S.: Using Estimated Formants Tracks for Formants Smoothing
in Text to Speech (TTS) Synthesis. Brunel University, London, UB8 3PH, UK. LEEE (2003)
6. Lee, C.-H., Jung, S.-K., Kang, H.-G.: Applying a speaker-dependent speech compression
technique to concatenative TTS synthesizers. IEEE Trans. Audio Speech Lang. Proces. 15(2)
7. Dusan, Rabiner, L.: On the relation between maximum spectral transition positions and phone
boundaries. Sorin Center for Advanced Information Processing Rutgers University, Piscataway,
New Jersey, USA
8. Gahlawat, M., Malik, A., Bansal, P.: Integrating Human Emotions with Spatial Speech
Using Optimized Selection of Acoustic Phonetic Units. Computer Science & Engineering
Department, DCRUST, Murthal, India (2015)
9. Narendra, N.P., Sreenivasa Rao, K.: Optimal Weight Tuning Method for Unit Selection Cost
Functions in Syllable Based Text-to-Speech Synthesis. School of Information Technology,
Indian Institute of Technology Kharagpur, Bengal, India (2012)
10. Moulines, E., Charpentier, F.: Pitch-synchronous waveform processing techniques for text-tospeech synthesis using diphones. Speech Commun. 9, 453–467 (1990)
11. Erro, D., Navas, E., Hernáez, I., Saratxaga, I.: Emotion on version based on prosodic unit
selection. IEEE Trans. Audio Speech Lang. Proces. 18, 974–983 (2010)
12. Iida, A., Campbell, N., Higuchi, F., Yasumura, M.: A corpus-based speech synthesis system
with emotion. J. Speech Commun. 40, 161–187 (2003)
13. Black, A.W.: Unit selection and emotional speech. In: Paper Presented at the Eurospeech,
Geneva, Switzerland (2003)
14. Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Commun.
51, 1039–1064 (2009)
15. Taylor, P.: Unifying unit selection and hidden Markov model speech synthesis. In: Paper
Presented at the Interspeech (2006)
16. Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large
speech database. In: Paper Presented at the 1996 IEEE International Conference on Acoustics,
Speech, and Signal Processing. ICASSP (1996)
17. Hue, X.: Genetic Algorithms for Optimization. University of Edinburgh, Edinburgh (1997)
18. Kumar, R.: A Genetic Algorithm for Unit Selection Based Speech Synthesis. InterSpeechICSLP, Korea (2004)
19. Black, A.W., Campbell, N.: Optimising selection of units from speech databases for concatenative synthesis. In: Proceedings of the Eurospeech’95, Madrid, Spain (1995)
20. Zen, H., Senior, A., Schuster, M.: Statistical parametric speech synthesis using deep neural
networks. In: Proceedings of International Conference on Acoustics, Speech and Signal
Processing (May 2013)
21. Juvela, L., Bollepalli, B., Tsiaras, V., Alku, P.: GlotNet—A Raw Waveform Model for the
Glottal Excitation in Statistical Parametric Speech Synthesis. IEEE/ACM Trans. Audio Speech
Lang. Proces. (2019)
Review on Unit Selection-Based Concatenation Approach …
22. van den Oord, A., et al.: WaveNet: A Generative Model for Raw Audio (2016). Available:
23. Kawahara, H., Estill, J., Fujimura, O.: Aperiodicity extraction and control using mixed mode
excitation and group delay manipulation for a high quality speech analysis, modification and
synthesis system STRAIGHT. In: Proceedings of MAVEBA (2001)
24. Barra-Chicote, R., Yamagishi, J., King, S., Montero, J.M.: Analysis of Statistical Parametric and
Unit Selection Speech Synthesis Systems Applied to Emotional Speech, Javier Macias-Guarasa
25. Lim, Y.C., Tan, T.S., Salleh, S.H.S., Ling, D.K.: Application of Genetic Algorithm in Unit
Selection for Malay Speech Synthesis System (2012)
26. Clark, R., Richmond, K., King, S.: Multisyn: open-domain unit selection for the festival speech
synthesis system. Speech Commun. 49, 317–330 (2007)
27. Ling, Z.-H., Wang, R.-H.: Minimum Unit Selection Error Training for Hmm-Based Unit
Selection Speech Synthesis System. iFlytek Speech Laboratory (2008)
28. Takamichi, S., Toda, T., Black, A.W., Neubig, G., Sakti, S., Nakamura, S.: Postfilters to modify
the modulation spectrum for statistical parametric speech synthesis. IEEE/ACM Trans. Audio
Speech Lang. Proces 24, 755–767 (2016)
29. Kayte, S., Mundada, M., Kayte, C.: A review of unit selection speech synthesis. Int. J. Adv.
Res. Comput. Sci. Softw. Eng. (2015)
30. Li, T., Shen, F.: Automatic Segmentation of Chinese Mandarin Speech into Syllable-Like.
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
31. Liberman, I.Y., Shankweiler, D., William Fischer, F., Carter, B.: Explicit syllable and phoneme
segmentation in the young child. Child Psychol. 18, 201–212 (1074)
32. Gold, B., Morgan, N.: Speech and Audio Signal Processing. Wiley India Edition, New Delhi
33. Narendra, N.P., Sreenivasa Rao, K.: Syllable Specific Unit Selection Cost Functions for Textto-Speech Synthesis. Indian Institute of Technology Kharagpur
34. Xia, X.-J., Ling, Z.-H., Jiang, Y., Dai, L.-R.: HMM-Based Unit Selection Speech Synthesis
Using Log Likelihood Ratios Derived from Perceptual Data. National Engineering Laboratory
for Speech and Language Information Processing, University of Science and Technology of
35. Dusan, S., Rabiner, L.: On the Relation Between Maximum Spectral Transition Positions
and Phone Boundaries. Center for Advanced Information Processing Rutgers University,
Piscataway, New Jersey, USA (2006)
36. Kasuya, H., Wakita, H.: Automatic Detection of Syllable Nuclei as Applied to Segmentation
of Speech. Speech Communications Research Laboratory, Inc.
37. Jittiwarangkul, N., Jitapunkul, S., Luksaneeyanawin, S., Ahkuputra, V., Wutiwiwatchai, C.:
Thai Syllable Segmentation for Connected Speech Based on Energy. Digital Signal Processing
Research Laboratory Bangkok, Thailand
38. Jitsup, J., Sritheeravirojana, U.-T., Udomhunsakul, S.: Syllable Segmentation of Thai Human
Speech Using Stationary Wavelet Transform. King Mongkut’s Institute of Technology
Ladkrabang, Bangkok, Thailand (2007)
39. Chou, C.-H., Liu, P.-H., Cai, B.: On the Studies of Syllable Segmentation and Improving
MFCCs for Automatic Birdsong Recognition. In: IEEE Asia-Pacific Services Computing
Conference, Department of Computer Science and Information Engineering, Taiwan (2008)
40. Thomas, S., Nageshwara Rao, M., Murthy, H.A., Ramalingam, C.S.: Natural Sounding TTS
Based on Syllable-Like Units. Indian Institute of Technology Madras
41. Díaz, F.C., Banga, E.R.: A Method for Combining Intonation Modelling and Speech Unit
Selection in Corpus-Based Speech Synthesis Systems. Campus Universitario, 36200 Vigo,
Spain (2006)
42. Wang, X.: An HMM-Based Cantonese Speech Synthesis System. Joint Research Center for
Media Sciences, Technologies and Systems, China. IEEE Global High Tech Congress on
Electronics (2012)
P. Gujarathi and S. R. Patil
43. Alías, F., Formiga, L., Llora, X.: Efficient and Reliable Perceptual Weight Tuning for UnitSelection Text-to-Speech Synthesis Based on Active Interactive Genetic Algorithms (2011)
44. Clark, R.A.J., Richmond, K., King, S.: Multisyn: Open-Domain Unit Selection for the Festival
Speech Synthesis System CSTR. The University of Edinburgh, 2 Buccleuch Place, Edinburgh
UK (2007)
45. van Santen, J., Kain, A., Klabbers, E., Mishra, T.: Synthesis of prosody using multi-level unit
sequences. Speech Commun. 46, 365–375 (2005)
Enhancing the Security of Confidential
Data Using Video Steganography
Praptiba Parmar and Disha Sanghani
1 Introduction
Over the past few decades, the internet has become very common utilization medium
for transmitting information from sender to the receiver. It has opened the new door
for the attacker to attack on that information and easily take the information of the
users. There is much confidential information like military’s secrets, law enforcement’s secrets, etc. Terrorists can also use data hiding as well as cryptography techniques for securing their data. Cryptography is the science of using mathematics to
encrypt and decrypt the data [1]. Steganography is an art and science of hiding the
secret data into another media. The advantages of video steganography are:
• Large space for hiding confidential data.
• Multiple numbers of frames for hiding the data. So, it is hard for the intruder to
detect the data.
• Increase the security.
The rest of the paper is organized as follows. In Sect. 2, steganography and cryptography is defined. In Sect. 3, related work is defined about this research work. After
Sect. 4 defined frame selection approach for selecting frames from cover video. In
Sect. 5, parametric matrixes are calculated. Section 6 is defined results and discussion.
Sections 7 and 8 represent conclusion and references.
P. Parmar (B) · D. Sanghani
Shantilal Shah Engineering College, Bhavnagar, India
e-mail: prapti.parmar97@gmail.com
D. Sanghani
e-mail: dishasanghani83@yahoo.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
P. Parmar and D. Sanghani
2 Steganography and Cryptography
2.1 Steganography
Steganography is an art and science of hiding the data behind any media. It can be
categorized in four types based on cover medium (Fig. 1).
Information is hidden behind any media like audio, text, image, or video. The
data will be hiding behind audio then it is calledaudio steganography. Information is
hiding behind the text is called text steganography and so on. There are two types of
data in steganography which are secret data and carrier data. Confidential data which
is hidden called secret data. This data cannot be seen from human visual eye. The
cover medium is used for hiding the data is called carrier data. This cover medium
is seen by human visual eye.
In Fig. 2, the confidential information is in text format. This text data is encrypted
by any cryptographic algorithm. Encrypted data is hidden behind any cover medium.
Here, it uses image as a cover medium. After it creates stego image.
In above Fig. 3, stego image is extracted using any extracted algorithm. Then,
the encrypted data is extracted and going for decryption. Here, use cryptographic
decryption algorithm and achieve original data. There are different algorithms of
steganography like; (1) least significant bit (LSB), (2) bit-plane complexity segmentation (BPCS), (3) DCT-based techniques, (4) DWT-based techniques. LSB technique is used for embedding the data into least significant bit of cover medium. Data
is divided into 3D partitioning and after that it embedded behind any medium called
BPCS techniques. DWT and DCT technique are used for compressing the data, and
after compression, it embedded into low, medium, and high frequency band.
Fig. 1 Types of
steganography [2]
Fig. 2 Steganography
process at sender side [3]
Enhancing the Security of Confidential Data …
Fig. 3 Steganography
process at receiver side [3]
al text
2.2 Cryptography
Cryptography is the mathematical technique related to information security such
as confidentiality, integrity, authentication, etc. The widely use of cryptographic
technique is unauthorized uses from the user into communication channel. There are
two types of cryptography techniques based on the key.
1. Symmetric key cryptography and
2. Asymmetric key cryptography.
Sender and receiver use the same key for encryption and decryption of the message
is called symmetric key cryptography. Sender and receiver use different key such
as public and private key for encryption, and decryption is called asymmetric key
3 Related Work
The following section discusses the steganography and cryptography algorithms
that have been used together for providing high security. There are many techniques implemented in this field. In recent years, if the attacker analyzing the video
sequences, then it easily detect the stego video and fetch the original data. Yadav and
Bhogal [4] present video steganography in spatial and discrete wavelet transform.
They had used DWT and BPCS methods for embedding the data behind any video.
The main work of this research is to increase the capacity in video data hiding within
DWT technique. 3D SPIHT BPCS steganography uses decomposition of bitmaps.
When one video file has been selected after this extracted frame and the frame
decomposed into the bitmaps, we can get a dualistic frame for each bit-plane [4].
At the corresponding approach, Mstafa et al. [5] presents a research on DWT and
DCT domains based on multiple object tracking and error correcting code. Multiple
object tracking is used for tracking motion-based objects from cover video.
That objects are used for embedding the data into video. They had propose a novel
approach of multiple object tracking (MOT) with the help of DWT and DCT coefficient. Ramandeep Kaur et al. [6] present a hybrid approach of steganography methods
and cryptography methods. Multiple use of both algorithms, it gives better security
to embedded data. 4LSB and identical match algorithms are used for embedding
data and how to arrange the data into cover medium for embedding purpose, respectively. Canny edge detector helps to embedded data behind edges of that cover video
frames because edges have large amount of sharp area that no one can easily detect
P. Parmar and D. Sanghani
that embedded data. RSA encryption algorithm is asymmetric encryption algorithm.
Sender and receiver use different keys for encrypting and decrypting the data. In [6],
RSA gives high security for data securing. Jangid and Sharma [7] propose multilevel clustering algorithm with integer wavelet transform (IWT). Multi-clustering
algorithm use K-Mean clustering for cluster the Cover Frame. Data is partition by
K-Mean clustering.
4 Frame Selection Approach
Some attacker attacks on data using sequential analysis of the video frame. They
can easily detect data which are hidden behind any medium. So, frame selection
approach using mathematical function helps to avoid this analysis.
Step 1: Calculate total number of frames (Total_No_Frames) of cover video;
Step 2: Take an Alpha for constant value;
Step 3: Calculate slab value;
Slab_value = seed_value + (Total_No_Frames/Alpha);
Step 4: Calculate Floor (Slab_value). And store this value to s1;
Step 5: s1 is the list of selected frames.
Alpha is a constant value. If user wants to increase the number of selected frames,
then value of alpha is also increased, and if user wants to decrease the selected frames,
then alpha value is also decrease. Value of alpha is totally depends on data size and
user’s requirements.
5 Methodology
Sequential methods give better security. Least significant bit is used for performing
video steganography and AES encryption algorithm used for performing encryption.
When we used both together, then it gives confidentiality and integrity of data (Fig. 4).
Algorithm Steps:
Step 1: User has to take cover video.
Step 2: Extract the video frames and select frames from video using arithmetic
Step 3: Take a secret text data from user.
Step 4: Perform encryption algorithm.
Enhancing the Security of Confidential Data …
Cover Video
Secret data
Selected frames using
arithmetic formula
Stego video
Extract Stego
video frames
Fig. 4 Proposed work flow
Step 5: After that encrypted data perform video steganography and embedded behind
cover video.
Step 6: Stego video is generated.
Step 7: After extraction of selected video frames.
Step 8: Extraction of encrypted data from video.
Step 9: Decryption of data and take original data.
6 Parameter Metrices
Evaluation of results can be defined by peak-signal-to-noise ratio (PSNR) and mean
square error (MSE). Both are quality measurement evaluation methods.
1. Mean square error (MSE)
P. Parmar and D. Sanghani
The mean square error (MSE) represents the cumulative squared error between
the stego image and the original image, whereas PSNR represents a measure of
the peak error [4].
[I1 (m, n) − I2 (m, n)]2
M and N are the number of rows and columns in the input images.
2. Peak-signal-to-noise ratio (PSNR)
The PSNR block computes the peak-signal-to-noise ratio, in decibels, between
two images. This ratio is used as a quality measurement between the original and a
compressed image. The higher the PSNR, the better is the quality of the compressed
or reconstructed image [8].
PSNR = 10 log 10(R 2 /MSE)
7 Results and Discussion
Table 1 shows the value of PSNR and MSE of average ratio of selected frames from
cover video. If we take PSNR average value of whole value, then it gives 100%
results and it was wrong. Because here, we had used frame selection approach and
we had to take only selected frames of PSNR average value. Different size video
with different data size gives the best comparison results.
Test.mp4 is original video and used for performing steganography process. The
size of this video is 1.08mb, and after performing extraction of video frames from
video, then it gives 132 frames. We had used frame selection approach using
arithmetic formulas for selection of frames from cover video (Figs. 5 and 6).
8 Conclusion
Cryptography and steganography both are very useful algorithms for protecting the
data to be transmitted. It gives high multiple securities to data because cryptography
encrypts the secret data and steganography keep hide the existence of that data.
This paper gives high PSNR ratio and low MSE value with LSB method and AES
encryption algorithm along with frame selection approach. Frame selection approach
helps for selecting video frames from original video; so, attacker cannot easily detect
secret data using sequential analysis of video frame. The main objective behind using
LSB method is it implemented on any type of video format.
Enhancing the Security of Confidential Data …
Fig. 5 Original cover video
Fig. 6 Stego video
Table 1 Various parameter metrics value with different video frames
Video size
Data (bytes)
PSNR (db)
1.08 mb
128 bytes
1.08 mb
2048(2 kb)
752 kb
128 bytes
P. Parmar and D. Sanghani
1. Wang, H.: Cyber warfare steganography vs. steganalysis. In: Communications of the ACM, vol.
47 no. 10 (2004)
2. Wadekar, H., Babu, A., Bharvadia, V., Tatwadarshi, P.N.: A new approach to video steganography using pixel pattern matching and key segmentation. In: 2017 International Conference on
Innovations in information Embedded and Communication Systems (ICIIECS) (2017)
3. Prashanti, G., Jyothirmai, B.V., Sai, K.: Data confidentiality using steganography and cryptographic techniques. In: 2017 International Conference on Circuits Power and Computing
Technologies [ICCPCT].s (2017)
4. Yadav, S.K., Bhogal, R.K.: An video steganography in spatial, discrete wavelet transform and
integer wavelet domain. In: 2018 International Conference on Intelligent Circuits and Systems
5. Mstafa, R.J., Elleithy, K.M., Abdelfattah, E.: A robust and secure video steganography method
in DWT-DCT domains based on multiple object tracking and ECC. IEEE. https://doi.org/10.
6. Ramandeep Kaur, Pooja, Varsha: A hybrid approach for video steganography using edge
detection and identical match techniques. In: IEEE WiSPNET 2016 Conference
7. Jangid, S., Sharma, S.: High PSNR based video steganography by MLC (multi-level clustering)
algorithm. In: International Conference on Intelligent Computing and Control Systems ICICCS
8. https://in.mathworks.com/help/vision/ref/psnr.html
Data Mining and Analysis of Reddit
User Data
Archit Aggarwal, Bhavya Gola, and Tushar Sankla
1 Introduction
Reddit is a popular user-driven Website that consists of several communities devoted
to debating a predefined subject, called subreddits. Users can submit original content or links and have discussions with other users. This is a unique Web platform,
because the focus is not on a single user but on the community [13]. The optimal
sentiment analysis method will be to analyse a series of search results for a given
object, to produce a list of product characteristics (quality, features, etc.) and to
aggregate opinions on every one of these (negative, neutral, positive). Even so, the
concept has often been defined more widely recently in order to incorporate several
different forms of review of the content of the assessment. In general, opinions may
be conveyed on something, e.g. a commodity, a facility, a subject, a person, an entity
or a case. The general term object is used to represent the person to which comments
have been given. Our objective through this study is to create our own corpus into
a Reddit program. The system is precisely trained to take inputs which are as status updates from the corpus, ignoring the updates sans words or emojis. During the
testing phase, the capability of the system is judged by its capacity to categorize the
polarity of opinion per status update.
Curiskis et al. [3] research on opinion mining or sentiment analysis began with
the identification of words bearing opinion (or feeling), e.g. excellent, beautiful,
marvelous, terrible and worst. Several scholars focused on mining these words and
A. Aggarwal (B) · B. Gola · T. Sankla
Bharati Vidyapeeth’s College of Engineering, New Delhi, India
e-mail: archit.aggarwal1998@gmail.com
B. Gola
e-mail: golabhavya@gmail.com
T. Sankla
e-mail: t.sankla97@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
A. Aggarwal et al.
Fig. 1 Steps in opinion mining
defining their polarization assessment (i.e. positive, negative and neutral) or semantic
orientations. The authors have identified numerous language features which can be
exploited from a large corpus to identify sentiment words and their orientations
(Fig. 1).
2 Related Works
There have been many past works that show the analysis of Reddit data such as
comments using neural networks. Earlier, the Reddit submissions were evaluated
on the basis of the title of the submission, but the results showed how the various
factors such as the title, submission times and the community choices of image
submission determine the effect of the content by checking on the resubmitted images
on Reddit. Using the language model that included bad and good words, speech
tagging, title length, sentiment analysis; it was concluded that the success of the
submission was governed by the title. Except the language model, the content quality,
time of submission and the community contributed immensely to the post’s success.
Broy [1] there is similar analysis for Websites like Facebook, Twitter, Google+,
etc. [4]. Many research has been done on sentiment analysis for blogs and product
reviews as well [11]. Researchers have also studied the impact of microblogging
on sentiment analysis [8]. Various studies have pointed out on the significance of
machine learning in text classification [10]. The research showed how sentiment
analysis was performed as a pyramid scheme in which the text was first categorized as
containing sentiment, and then as positive and negative. The analysis was performed
by machine learning algorithms. Work done on labelling emoticons as positive and
sentiment is quite relevant to Twitter as users have emoticons in their tweets. The
various levels of natural language processing tasks are document level, sentence level
and phrase level. And sentiment analysis is performed on all these levels [9]. This
research focuses on the application of SVMs in sentiment analysis having a diverse
Data Mining and Analysis of Reddit User Data
information source [14]. Here, unlike the previous research, the text is firstly classified
as polar or neutral and then as positive or negative. While in [6], the algorithm
identified with a large number of adjectives which were each assigned a score of
polarity. The main reason for such scoring was that the authors believed that the online
texts consist of very less neutrality. They focus on positive and negative words by
going through the synonyms and antonyms from the WordNet. The recursive search
connects words from the groups quickly which made taking the preventive steps
important. Like, assigning weights which decrement exponentially as the number of
hops increases. They showed that algorithm was accurate when compared with the
ones manually picked from the list.
Kreger et al. [7] the paper focused on the data that opinion mining uses, ML
and sentiment analysis tasks for text classification. The research concluded that to
perform classification fixed syntactic patterns that are used to express opinions are
3 Research Methodology
On Reddit, only the final scores resulting from the difference between upvotes and
downvotes are displayed, and the total number of votes a post has received is hidden.
Thus, posts that have been heavily downvoted may still have positive overall scores.
In order to identify these posts, Reddit provides a ‘controversial post’ flag. Posts
which, in turn, have received significant negative feedback and have negative overall
scores can become hidden in the discussion once their score falls below a certain
threshold. The person posting categorizes all posts on Reddit into whichever topic
they think the post belongs to also known as a subreddit [2]. The subreddits are
accessible to every user to share his opinion through comments or posts. Also, users
can comment on someone else’s comment on this platform, thus creating a nested
comment structure while a discussion is going on. We are involved in examining the
polarity of opinion in responses to news in of any article. We need comments to train
our systems which contain an point of view about the article. For this purpose, we
review the gathered data manually to see if there are views on the topic itself in the
nested discussions. We notice that these nested comments in certain cases deviate
from the topic being discussed in the post. Most of the times they contain personal
attacks or sarcastic comments.
Firstly, a large amount of data is collected from Reddit and is analysed. Crawler
collects data from popular sections on Reddit where the user is more active, and then,
it crawls through comments, likes and number of votes [5]. Data is divided on the
basis of date of posting, popularity, highest scoring in a time period. Different users
can post topics in different subreddit so we need to list all subreddits relevant to the
topic. Subreddits’ Metadata contains all details except comments on the topic. We
have taken top 250 posts which constitute around 1% of data available on Reddit.
After Metadata of the list, the comments, submission and the difference of up and
down scores are collected. As a crawler, we have used PRAW library which gives way
A. Aggarwal et al.
to share objects of Python across the network. All crawlers were executed on virtual
machines, and all of them have access to shared storage. Data of distinct pages is
stored. One of the difficulties faced in the project is to fight back against the blocking
system of Reddit. Since many crawlers want to collect data from the Reddit, due to
this, huge number of requests is received by Reddit system, and the system cannot
handle a large number of bot requests at time that is why it starts blocking the crawler.
If the rate of request of crawlers becomes greater than threshold, it will be blocked
for some time by anti-crawling techniques. The IP address of crawler will be black
listed. This is a very common technique used by this type of Web platform. For this,
the crawling speed decreases, and the crawler sends only one request in one minute
[12]. Reddit normally anticipates creeping in various ways. Like a non-confirmed
session cannot peruse the subreddit requested by date. A confirmed client can peruse
a subreddit by date, yet the separation you can peruse back is topped. In this manner,
it is innately unrealistic to get an entire creep of the site. The other problem is that
the crawlers make many attempts to crawl badly or down URLs of Reddit because
crawlers were unable to find difference between down and bad URLs. It was also
noted that sometimes systems get restarted because of load and controlling reliability
of shared resources.
Comments in Reddit have a tree-like structure, and it can be seen like a directed
acyclic graph or more particularly, a tree with the root that accommodates itself.
While each Reddit’s remark tree is distinct, numerous properties appear to recur. For
instance, the higher score a comment has, the more likely it is to have many replies.
Many variables can be examined including the tree height, the number of leaves,
normal traverse and many more. These parameters are then compared with other
socio-political topics. Using them, we can plot graphs, classify data and clustering
for further calculation. The normal comment on Reddit had less than 10 replies, with
95% of all entries having fewer than 100. It has seen more than 5000 comments
on a post. Different properties including greatest width and height were found to
have comparative appropriations to the general tree size. These qualities show direct
association with the dissemination of the tree. All subreddits relevant to the topic
need to be listed.
Due to the strong impact on the sentiment polarity due to low comment frequency,
we anticipate our sentiment prediction method to get even less output on the subreddit
forums with a significant amount of single comment posts. In a posting for our data,
we set a requirement of two or more to the minimum number of comments; we
discard this data because we require a broad range of samples to address a variety
of topics. Now, the model has not reported a worse classifier performance with the
same problem. Therefore, as the sentiment is predicted for different users with such
categorized subreddit choices, we expect that to have less variability in problems. We
have also done Subreddit Visualization using R. First we extracted the Reddit dataset
using Google BigQuery and the user information to create network graphs. The graph
includes weight, source and destination and calculates the weight by counting the
user participants in the two subreddits above 5, by using the comment section (Fig. 2).
Data Mining and Analysis of Reddit User Data
Fig. 2 Methodology of our experiment
4 Result and Discussion
By conducting this research, we first analysed the edge connection between various subreddits. This helped us analyse the interconnection between them and can
visualize which subreddit can be recommended to a user.
In the above graphs, we can see that if a user regularly follows r/worldnews, then he
would most likely also like r/politics and r/television, and hence, we can recommend
these subreddits to him. Also, further in our study, we extracted top 1000 posts in
r/India subreddit in the year of 2020 and their respective 100 comments. Then, we
applied sentiment analysis using NLTK library to these comments to figure out:
1. Overall Sentiment Analysis of the Data
2. Sentiment Analysis on a Specific Topic
A. Aggarwal et al.
Data Mining and Analysis of Reddit User Data
These graphs represent the sentiments of users towards the aforementioned topics.
Based on user interaction with certain posts, we can recommend new subreddits. We
can use this data to better understand our user base, and companies can use this data
to identify their potential customers and reach out to them with their products and
5 Conclusion
This project provides us with a dataset on r/India and an overview of the comment
structure of Reddit. We have identified complex correlations and Web platform properties among subreddits. This task has shown that using NLTK for sentiment analysis
on Reddit post and comments is appropriate and works fine. This project can provide
us with insights into the current trends on a specific subject, so that people’s feelings
can be realized. The use of such methods may be used for marketing, evaluating
guests and making operational improvements or capital expenditure. Discussions in
online forums are very rich and complex, both in terms of the content and dynamics
of conversations and the characteristics of the underlying platform. Our proposed
A. Aggarwal et al.
archetypes link these important elements and give us insights into the relationship
between feelings, topics and user actions.
6 Future Scope
Due to the use of APIs (PRAW), this project is limited to Reddit, but we will use the
same algorithm and tools to evaluate other Websites like 4Chan, Facebook or Twitter.
Due to the reliability issues of the algorithm, much less data was used for analysis.
By improving algorithms, we can use this method to analyse large datasets for more
accurate results. By using knowledge gained through the work on this paper, we
suggest a simple concept for a predicting public sentiment in response to any news.
Such a model will be focused on predicting the accuracy of user responses to various
subjects in news items. Other characteristics based on the text of the news report,
such as the number of responses to the report, may be predicted. We present the
research effort to evaluate the influence of additional predictions used as features for
sentiment prediction.
1. Broy, M.: Software engineering from auxiliary to key technology. In: Software Pioneers, pp.
10–13. Springer (2002)
2. Choi, D., Han, J., Chung, T., Ahn, Y.Y., Chun, B.G., Kwon, T.T.: Characterizing conversation
patterns in Reddit: from the perspectives of content properties and user participation behaviors.
In: Proceedings of the 2015 ACM on Conference on Online Social Networks, pp. 233–243
3. Curiskis, S.A., Drake, B., Osborn, T.R., Kennedy, P.J.: An evaluation of document clustering
and topic modelling in two online social networks: Twitter and Reddit. Inf. Process. Manage.
57(2), 102,034 (2020)
4. Dod, J.: Effective substances. In: The Dictionary of Substances and Their Effects. Royal Society
of Chemistry. Available via DIALOG. http://www.rsc.org/dose/titleofsubordinatedocument.
Cited 15 (1999)
5. Glenski, M., Pennycuff, C., Weninger, T.: Consumers and curators: browsing and voting patterns on Reddit. IEEE Trans. Comput. Soc. Syst. 4(4), 196–206 (2017)
6. Godbole, N., Srinivasaiah, M., Skiena, S.: Large-scale sentiment analysis for news and blogs.
ICWSM 7(21), 219–222 (2007)
7. Kreger, M., Brindis, C.D., Manuel, D.M., Sassoubre, L.: Lessons learned in systems change
initiatives: benchmarks and indicators. Am. J. Commun. Psychol. 39(3–4), 301–320 (2007)
8. Manning, C.D., Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language
Processing. MIT Press (1999)
9. Mullen, T., Collier, N.: Sentiment analysis using support vector machines with diverse information sources. In: Proceedings of the 2004 Conference on Empirical Methods in Natural
Language Processing, pp. 412–418 (2004)
10. Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization
based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for
Computational Linguistics, p. 271. Association for Computational Linguistics (2004)
Data Mining and Analysis of Reddit User Data
11. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Tr. Inf. Retrieval 2(1–2),
1–135 (2008)
12. Stoddard, G.: Popularity and quality in social news aggregators: a study of Reddit and hacker
news. In: Proceedings of the 24th International Conference on World Wide Web, pp. 815–818
13. Weninger, T.: An exploration of submissions and discussions in social news: mining collective
intelligence of Reddit. Soc. Netw. Anal. Min. 4(1), 173 (2014)
14. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity: an exploration of features
for phrase-level sentiment analysis. Comput. Linguist. 35(3), 399–433 (2009)
Analysis of DNA Sequence Pattern
Matching: A Brief Survey
M. Ravikumar and M. C. Prashanth
1 Introduction
DNA could be a super particle, which contains hereditary headings in working of
all the amazing living life forms. The main role of deoxyribonucleic acid molecules
is that the long storage of data. DNA is regularly contrasted with a lot of diagrams,
since it contains the directions (proteins) expected to build different parts of cells,
for example, proteins and RNA atoms. The deoxyribonucleic acid segments that
carry this genetic info square measure referred to as genes. DNA includes two long
polymers of fundamental units called nucleotides, ester securities with a blend of
sugars and phosphate. The structure of deoxyribonucleic acid was first discovered
by James D. Watson and Francis Crick. DNA is dealt with into long structures called
chromosomes. These chromosomes square measure duplicated before cells divide,
in a process called DNA replication. It comprises of four bases Adenine (A), Guanine
(G), Cytosine (C), Thymine (T) all these make a DNA code. The cell of DNA contains
hereditary data. This information is shared through chromosomes. There are 23 pair’s
chromosomes. The nucleotides bond, A to T and C to G, between the two strands of
the helix just like the rungs of a ladder or, better, the steps in a very spiral stairway.
A pair of complementary nucleotides (or bases) A-T, G-C, T-A, or C-G is called a
base pair (bp). DNA replication, that takes place in association with organic process,
involves the separation of the 2 strands of the spiral and also the synthesis of a
replacement strand of nucleotides complementary to each strand.
DNA is the most accurate and exact way of identifying an individual because
every human being has its unique individual mapping in every cell of a human body.
M. Ravikumar · M. C. Prashanth (B)
Department of Computer Science, Kuvempu University, Shimoga, Karnataka, India
e-mail: prashanth.m.c87@gmail.com
M. Ravikumar
e-mail: ravi2142@yahoo.co.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
M. Ravikumar and M. C. Prashanth
A person polymer contains information regarding their heritage and might generally
reveal whether or not they square measure in danger sure diseases. Human being has
around three billion DNA bases and more than of a present basis are the same in all
people according to United States National Library of medicine. DNA sequencing
in an exceedingly approach that permits researchers to work out the bases of order
in a deoxyribonucleic acid nucleotides. DNA analysis has crystal rectifier to some
attention grabbing and necessary findings within the last few decades. Recently, it
has published in the study of journal science found that mistakes are random in DNA,
but not in heredity or also in environmental factors of two third of career mutation
in cells.
2 Literature Review
In this review paper [1], which focus more on networks of Bayesian to deal the
complication level of the real data level in DNA setting involved in evidence. Thus,
networks considered as more efficient tool in the domain of Forensic Science. DNA
profiling evidence is the main categories of evidence whose assessment been studied
through Bayesian networks. The scope of this paper also includes Forensic DNA
profiling such as DNA stain and stains with low quantity of DNA.
An efficient pattern matching algorithm, derivative BoyerMoore (d-BM) for
compressed DNA sequence is proposed [2]. Using this algorithm, both DNA
sequences and also DNA pattern are compressed. Experimentation is carried out
on both synthetic and also real data, which shows that as the pattern length increases,
time taken to process the data decreases. It means that for short pattern, the algorithm
is less efficient when compared with long patterns.
Classification of DNA pattern is performed using a shape descriptor analysis is
proposed [3]. For the purpose of classification KNN, back propagation and Naïve
Bayes classifiers are used. Experimentation is carried out on the data sequence
complete DNA genome of the E-coli bacteria. KNN classifier gives an average of
41.26% classification rate. Friedman statistical algorithm used for non-parametric
statistical test, which works on a ranking-based classification of each dataset in the
DNA recognition pattern by correlation canonical algorithm (CCA) has proposed
[4], which used to find the required DNA genetic sequence code. The algorithm
finds between correlation observations of two genomic dataset of hemoglobin coding
sequences (HBB), which contains alpha and beta hemoglobin chains on the basis of
the data simulated. To pattern recognition, CCA uses two cases, presumed model
investigation for sequence of HBB for test case and other one for integration site
for searching (training set). Unsupervised numerical tool CCA finds the function of
correlated over sets of different variables and used in a DNA sequence.
Finite automata (FA) model is used to analyze the DNA pattern to convert the
patterns to the state of non-deterministic finite automata (NFA) and then to the state
of deterministic finite automata (DFA) [5]. This purposed model will detect the
Analysis of DNA Sequence Pattern Matching …
change, alteration, or duplication in the obtained genetic information, which has the
advantage of irrelevant mutation, sequencing errors and incomplete specifications,
can be identified by using FA. The pattern analyzed by creating a transition table for
conversion of NFA into DFA. Time complexity is reducing, and to analyze the DNA
pattern efficiently, NFA converted into DFA.
A sequence pattern matching fuzzy algorithm proposed for a data of sequence,
where a similar match obtains by using the fuzzy function [6]. The algorithm proposed
finds the most “similar” match by preprocessing, fuzzification and inference of
sequences. Approximate match achieved from the reference pattern and used to
identify the patterns with zinc finger domain proteins. The non-occurrences are not
allowed by the proposed algorithm.
The algorithm, faster string matching based on hashing, and bit parallelism
proposed [7] to know the occurrences of the problem of length in a text of length. The
proposed algorithm has phase of preprocessing and searching. Preprocessing is to
pseudocode length pattern is larger than the bit cover. The searching phase combine
each attempts of the 2q-grams ORing the bit vectors. The fast variation of improved
hashq is by greedy strip and hashing. By the experiment results, as Fhash is faster
than the other algorithms of size 4 DNA alphabets.
A pattern matching context-sensitive algorithm proposed to detect the RNA
forms in secondary structures [8]. User-interface of Java used for the RNA_spec
language for RNA sequence to provide the spaces scan and to compare with a context
free conventional grammar. The use of context-sensitive searching of a pseudoknot
particular is an advantage to represent the RNA natural structures.
This article [9] proposes a method for automated similarity evaluation for the
given patterns of denaturing gradient gel electrophoresis (DGGE) image. The vertical
data stripes in the each targeted image have many vertical stripes and have its own
similarity information which is evaluated on the basis of stripe by stripe. When
analyzing cross-correlation analysis with DP (depository participants described as
an agent (law) of the depository) matching analysis, it shows the way of matching
in DP performed better than the analysis by cross-correlation using FRR and FRA.
In this paper, distinct natural numbers searching approach to find the occurrences
in DNA given string over a transition matrix is proposed [10]. The algorithm has only
O (n − 1) time complexity as the worst case for a length in given string for the length
of DNA pattern to search. The numerical transition table checks frequency in the
scheme of search and then turns the exact matching to the number of comparisons.
A Zhu-takaoka Bayer-Moore-Horpool (ZTBMH) algorithm for fast pattern
matching in biological sequences proposed [11], which gives a variation of Zhutakaoka (ZT) algorithm. It absorbs the idea of BMH algorithm, utilizes only heuristic
bad character and reduces comparisons number. Nucleotide in genome and amino
acid sequences data set compared with the algorithm and obtained fast matching
result of 10.3 s. For small alphabet such as nucleotide sequences, the algorithm
performance is faster.
The pattern matching BerryRavindran fast search (BRFS) algorithm is proposed
[12], which is a fast and exact matching algorithm on the basis of fast search (FS)
algorithm, and it absorbs working of BerryRavindran (BR) algorithm. From BR
M. Ravikumar and M. C. Prashanth
algorithm, heuristic bad character exploits to get the maximal shift and reduces the
number of comparison character. Shifted values used to calculate to store in a one
dimensional array. Experimentation conducted on both BRFS and BR algorithms in
which BRFS gives the best result than BR for short patterns.
A character key location Pattern Matching with Wildcards-based algorithm
proposed [13], where a valuable information from DNA sequences explored. The
matching done with the constraints length and wildcards based on the key character
location and subspace division to reduce the search time and the range of searching
while the characters of patterns distributed differently. The searching efficiency
increased by 40–60% of algorithm with SAIL comparison.
An efficient pattern matching algorithm is presented [14] based on pattern of string
preprocessing by considering the text of consecutive three characters. The pattern
aligned follows the mismatch with the text character by sliding the both patterns in
parallel until founding the first pattern. It gives the best performance by both side
searching when compared with other algorithms.
Composite BoyerMoore (BM) algorithm proposed for string matching [15], which
uses the historical matching information and accelerates the speed of the pattern
while matching effectively. Binary matching, the test made between BM and CBM
algorithm, where the proposed CBM algorithm gives the efficiency of 84% when
compared with BM algorithm.
Comparison algorithm, using logical match technique proposed [16], which used
to generate the number of matches and mismatches using fuzzy membership values.
The logical match performed in Linux platform using CPP language for DNA
sequences. The proposed algorithm generates the membership values to find the
similarities between sequences with time complexity O(m + n). The unique method
analyzed for DNA sequences from NCBI databank, and it is on both artificial and
real datum (piece of information) for the computational time that depends on the
length of the sequences.
DNA sequences analysis using distributed computing system is developed [17] to
detect disease in forensic criminal analysis and for protein analysis. The subsequence
distributed identification algorithm used to detect the repeated patterns in computer
investigation of DNA analysis using internet. The algorithm requires two patterns
for searching and comparison with the broken DNA sequence. The broken DNA
sequence gets the one segment of data by server (java-based sever with GUI), which
is required to search and to identify the pattern of trinucleotide pattern. By the
analysis, non-consecutive pattern identification is having the more complexity with
non-consecutive searching algorithm.
For cluster pattern matching in gene expression data, a hierarchical approach is
proposed [18], which analyze the cluster of gene expression data without making a
distinction among DNA sequences called “genes.” The proposed method is gradient
simple base technique to remove the noisy gene from the bottom-up and clustering
density-based approaches, by identifying the density estimation of the sub-clusters
with large clusters proposed algorithm gives the best measure of z-score.
To detect the codon by the Hash function from the large RNA sequences met and
stop techniques are proposed [19], it search faster on any length of codons and find
Analysis of DNA Sequence Pattern Matching …
the gene sequence even when increasing the length of its string and the proposed
techniques performs in less time.
String matching in DNA performed using index-based shifting approach is
proposed [20], which has preprocessing and searching phase. The number of character comparison are made in preprocessing phase, and in searching phase, as the
first character of substring and pattern will be same, so the pattern second character
matched with first substring second character from left to right. For the sequence
of protein, the algorithms work faster than the English text with increasing the
For evaluating the DNA mutation pattern sequences are matched using distancebased approach is proposed [21]. To perform this, nucleotide pattern arrangement
in DNA hamming technique modified to find the primary Hepatitis C Virus (HCV)
pattern from experts. Experimentation compared with positively affected different
age groups isolated 100 sample data. Experiment results the matching score with
hamming value after including the normalization value.
Compressed genomic pattern matching algorithm proposed [22], focuses on the
textual data for compressing the complexities analyzing the experimentation, the
sequences of DNA will reduced. It indicates that, the data compressed can easily
analyzed with the benefits for the community of biological.
To approximate circular pattern matching, the simple, fast, filter-based algorithm
is purposed [23], which finds all the occurrences of rotations in a pattern of length in
the test of length. The proposed algorithm practice will be much faster because of the
reduction in huge in the search space by filtering. The comparisons of the proposed
algorithm with ACSSMF- simple algorithm, runs twice faster than state of art.
For approximation, the pattern maximal matching with constraints one-off and
gaps a Nettree approach is proposed [24]. To perform this, along with Nettree,
a heuristic search of an offline occurrence algorithm is used. Experimentation is
conducted on a real-world biological data; the comparison of the performance of the
proposed algorithm is made with SAIL algorithm.
To evaluate the associated TF and TFBS patterns of DNA and to form the effective
pipeline link up the associated unified, score patterns by summing, and normalization
method is proposed [25]. The accurate rankings provide the method to identify the
rules and binding cores with the excellent correlation sum scores. Protein data bank
structures verifications matching ratio serves the highest correspond percentage.
Two single pattern matching algorithms (called ILDM1 and ILDM2) are proposed
[26], when each of which is composed by smallest suffix automation and forward
finite automation. The proposed single pattern matching algorithms usually scan the
text with the help of a window, whose size is equal to m. In LDM algorithm, when the
window is shifted from previous window to the current window, useful information
is produced through the forward automation is discarded. In order to use fully or
conditionally, the useful information is produced by two single pattern matching
algorithms. From the experimentation, it can be seen that average time complexities
of two algorithms are less than that of RF and LDM for short patterns and that of
BM for long patterns.
M. Ravikumar and M. C. Prashanth
An algorithm is proposed [27] for probe selection from large genomes, which is
fast and accurate. Selection of good probes is based on the criteria of homogeneity
and specify to find the probes, a set of experimental results based on a few genomes
that have been widely used for testing purpose by the other probe design algorithms.
Based on the proposed algorithm, optimal short (20 base) or long(50-70 bases) probes
can be computed for large genomes.
An online type of algorithm is proposed [28] for finding significant matches of
position weight matrices in linear time. The proposed algorithm is based on classical
multi-pattern matching filtrations and super alphabet approaches developed for exact
and approximate key word matching. Some well-known base time algorithms (Naïve
and PLS algorithms) as well six proposed algorithms, ACE, LF, ACLF, NS, MLF,
and MALF are implemented to carry out the experimentation. Among all the above
algorithms, ACE algorithm is theoretically optimal (search speed does not depend
on the matrix length and which is competitive for short matrices).
A low communication-overhead parallel algorithm for pattern identification in
biological sequences is proposed [29], because it is essential in achieving the reasonable speedups of clusters in the interprocessor communication of latency is usually
higher. Identifying genes by comparing their protein sequences to those already identified in databases providing a scalable parallel approximate pattern matching with
predictable communication efficiency is of higher practical relevance.
Using multiple Hash functions, a fast searching algorithm is proposed for biological sequences [30], which improves the performance of existing string matching
algorithms when used for searching DNA sequences. The algorithm has two different
stages like preprocessing and the searching phase, preprocessing phase consists in
computing different shift values for all possible strings of length where a searching
phase of the algorithm is based on a standard sliding window mechanism, where it
reads the rightmost substrings of length q of the current window of the text to calculate the optimal shift advancement. The proposed algorithm serves a good basis for
massive multiple long pattern search.
A pattern matching algorithm is proposed for splicing (donor groups) junction
to recognize the donor in genomic DNA sequences [31]. The pattern information is
generated by using the motif models; it will be done by creating of 9-base sequence
DNA data of two groups. It consists of training 250 positive donor group and a
negative training group of 800 GT containing false donors. By analyzing with the
motif score for both the groups, the score of minimum in the positive scoreboard is
called lower positive bound (Lp), and the maximum negative scoreboard is called
upper negative bound (Un). Based on the positive and negative donor groups found in
motif algorithm, the donor detection algorithm of pattern matching works effectively
and efficiently.
The approach for visualizing to motify discovery in DNA sequences is proposed
[32], which consists nucleotide sequence of strings represent the molecules of RNA
and DNA. The DNA and RNA molecules stored in the databases are taken each
preprocess of string by converting for numeric equivalent sequence and plotted in
3D space for multiple sequence alignment. This multiple sequence alignment aligns
DNA sequences vertically to the similar regions and for the similar columns. The 3D
Analysis of DNA Sequence Pattern Matching …
plotted graph identifies the patterns by rectangles, and further, the mapping reverse
is used to generate the symbolic DNA sequences. They will be stored in a matrix
called matrix alignment, from which the profile matrix nucleotide computed to get
the patterns or motif. The proposed algorithm gives the best performance to visualize
the motif discovering in DNA sequences.
Recognition of DNA sequences, using the acceptor sites algorithm is proposed
[33], as the acceptor site is an important component present in the recognition of
gene. The motif model is generated for site of donor to represent the features of
degeneracy on the site of acceptor. The True and False scores of the acceptor site
motify model exhibits the score of minimum in the score true board as true bound
lower (Li) and false upper bound which is score of maximum in the board false score
(Uf). Using this, the algorithm discovers greater degree match of the pattern with
88.5% acceptor true sites and 91.5% false sites of acceptor. Coefficient correlation
of 0.8004 is obtained, which is the best in the gene structure detection.
An algorithm to find DNA patterns, an improved pattern matching, is proposed
[34], and the algorithm is of predictable theoretical which is to find a text window
moving direction search. The 6 run-pairs were used by the proposed algorithm,
where the right and left arrows indicates the text window direction. When algorithm is compared with the run-pair characters, the opposite direction gave the best
performance result.
DNA computing uses the proposed pattern matching algorithm to solve the engineering problems [35]. The parallel operation method is developed to recognize the
DNA molecules; the model finds the position of a molecule as the one exist search
image by having the classified features of molecule, then the features are applied to
form a network by PCR and gel electrophoresis. The searching time of the image or
pattern is persistent, because sorting and computation both simultaneously performed
in each test.
From the literature survey, we brief some of the challenges involved in DNA
sequences pattern searching, pattern matching, and also compression of DNA
sequences. Some of the challenging issues are summarized as follows.
1. To improve the efficiency of faster string searching algorithms for DNA
2. Analyzing the compressed genomic data in DNA sequence.
3. Approximation of maximal, circular pattern matching in DNA sequences.
4. Building efficient algorithm for DNA profile for recognition.
5. Finding the DNA probes in forensic applications.
6. Searching the compressed DNA sequence efficiently.
3 Conclusion
In this review paper, the different algorithms have been discussed on DNA sequence
pattern searching, matching and also compression, these matching algorithms may
be used in some cases which have been diagnosed from the DNA sequences like the
M. Ravikumar and M. C. Prashanth
virus diseases HIV, malaria, dengue, H1N1, and also corona. This survey paper will
help a lot for the new researchers those who are interested to take up their research
problem in the domain of DNA pattern analysis.
1. Biedermann, A., Taroni, F.: Bayesian networks for evaluating forensic DNA profiling evidence:
a review and guide to literature. Forensic Sci. Int. Genet. 6(2012), 147–157 (2012)
2. Chen, L., Lu, S., Ram, J.: Compressed pattern matching in DNA sequences. In: Proceedings
of the 2004 IEEE 5 Computational Systems Bioinformatics Conference (CSB 2004)
3. Loya Larios, H.E., Montero, R.S., Hernández, D.A.G., Espinoza, L.E.M.: Shape descriptor
analysis for DNA classification using digital image processing. In: (IJCSIS) International
Journal of Computer Science and Information Security, vol. 15, no. 2, pp. 67–71 (February
4. Sarkar, B.K., Chakraborty, C.: DNA pattern recognition using canonical correlation algorithm.
J. Biosci. 40(4), 709–719 (2015)
5. Qura-Tul-Ein, Saeed, Y., Naseem, S., Ahmad, F., Alyas, T., Tabassum, N.: DNA pattern analysis
using finite automata. Int. Res. J. Comput. Sci. IJRCS 1–5
6. Chang, B.C.H., Halgamuge, S.K.: Fuzzy sequence pattern matching in zinc finger domain
proteins. In: Proceedings Joint 9th IFSA World Congress and 20th Nafips International
Conference (Cat. No. 01th8569). IEEE, pp. 1116–1120 (2001)
7. Al-Salami, A.M., Hassanmathkour: Faster string matching based on hashing and bitparallelism. Inform. Proces. Lett. 123, 51–55 (2017)
8. Sung, K.-Y.: Recognition and modeling of RNA pseudoknots using context-sensitive pattern
matching. In: 2006 International Conference on Hybrid Information Technology (ICHIT’06).
9. Sugiyama, Y., Saito, H., Takei, H.: A Similarity Analysis of Dgge Images Using DP Matching.
10. Le, V.Q.: A natural number based linear time filtering approach to finding all occurrences of a
DNA pattern. In: 2006 Fourth International Conference on Intelligent Sensing and Information
Processing, IEEE
11. Huang, Y., Cai, G.: A fast pattern matching algorithm for biological sequences, 2008. In: 2nd
International Conference on Bioinformatics and Biomedical Engineering
12. Huang, Y., Ping, L., Pan, X., Cai, G.: A fast exact pattern matching algorithm for biological
sequences. In: 2008 International Conference on Biomedical Engineering and Informatics
13. Liu, Y., Wu, X., Hua, X., Gaoa, J., Wua, G.: Pattern matching with wildcards based on key
character location. IEEE Iri 2009, July 10–12 2009, Las Vegas, Nevada, USA
14. Radhakrishna, V., Phaneendra, B., Sangeeth Kumar, V.: A two way pattern matching algorithm
using sliding patterns. 2010 3rd International Conference on Advanced Computer Theory and
Engineering (ICACTE), pp. 666–670
15. Xiong, Z.: A composite Boyer-Moore algorithm for the string matching problem. The 11th international conference on parallel and distributed computing. In: Applications and Technologies,
pp. 492–496
16. Sanil Shanker, K. P., Austin, J., Sherly, E.: An algorithm for alignment-free sequence comparison using logical match. In: 2010 The 2nd international conference on computer and
automation engineering (ICCAE), vol. 3, pp. 536–538
17. Kumar, R., Kumar, A., Agarwal, S.: A distributed bioinformatics computing system for analysis
of DNA sequences. In: Proceedings 2007 IEEE Southeast on, pp. 358–363
18. Hoque, S., Istyaq, S., Riaz, M.M.: A hierarchical approach for clustering and pattern matching
of gene expression data. In: 2012 Sixth International Conference on Genetic and Evolutionary
Computing, pp 413–416
Analysis of DNA Sequence Pattern Matching …
19. Upama, P.B., Khan, J.T., Zemim, F., Yasmin, Z., Sakib, N.: A new approach in pattern matching:
codon detection in DNA and RNA using hash function (CDDRHF). In: 2015 18th International
Conference On Computer and Information Technology (ICCIT)
20. Islam, T., Talukder, K.H.: An improved algorithm for string matching using index based shifting
approach. In: 2017 20th International Conference of Computer and Information Technology
(ICCIT), 22–24 December 2017
21. Al Kindhi, B., Afif Hendrawan, M., Purwitasari, D., Sardjono, T.A., Purnomo, M.H.: Distancebased pattern matching of DNA sequences for evaluating primary mutation. In: 2017 2nd
International Conferences on Information Technology, Information Systems and Electrical
Engineering (ICITISEE), pp. 310–314
22. Keerthy, A.S., Manju Priya, S.: Pattern matching in compressed genomic sequence data. In:
Proceedings of the 2nd International Conference on Communication and Electronics Systems
(ICCES 2017) IEEE Xplore Compliant - Part Number: Cfp17awo-Art., pp. 395–399
23. Rahman Azim, M.A., Iliopoulos, C.S., Sohel Rahman, M., Samiruzzaman, M.: A simple, fast,
filter-based algorithm for approximate circular pattern matching. In: IEEE Transactions on
Nano Bioscience, vol. 15, No. 2, March 2016, pp. 93–100
24. Wu, Y., Wu, X., Jiang, H., Min, F.: A Nettree for approximate maximal pattern matching with
gaps and one-off constraint. In: 2010 22nd International Conference on Tools with Artificial
Intelligence, pp. 38–41
25. Chan, T.-M., Lo, L.-Y., Sze-To, H.-Y., Leung, K.-S., Xiao, X., Wong, M.-H.: Modeling associated protein-DNA pattern discovery with unified scores. IEEE/ACM Trans Comput Biol
Bioinform 10(3):696–707
26. Liu, C., Wang, Y., Liu, D., Li, D.: Two improved single pattern matching algorithms. In:
Proceedings of the 16th International Conference On Artificial Reality and Telexistence–Workshops (ICAT’06)
27. Sung, W.-K.: Fast and accurate probe selection algorithm for large genomes. In: Proceedings
of The Computational Systems Bioinformatics (CSB’03)
28. Pizzi, C., Rastas, P., Ukkonen, E.: Finding significant matches of position weight matrices in
linear time. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(1):69–78 (2011)
29. Huang, C.-H., Rajasekaran, S.: Parallel pattern identification in biological sequences on
clusters. IEEE Trans. Nano Biosci. 2(1), 29–34 (2003)
30. Faro, S., Lecroq, T.: Fast searching in biological sequences using multiple hash functions. In: Proceedings of the 2012 IEEE 12th International Conference on Bioinformatics
& Bioengineering (BIBE), Larnaca, Cyprus, 11–13 November 2012, pp. 175–180
31. Yin, M., Wang, J.T.L.: Algorithms for splicing junction donor recognition in genomic DNA
sequences. In: Proceedings. IEEE international joint symposia on intelligence and systems
(Cat. No.98ex174)
32. Rambally, G.: A visualization approach to motif discovery in DNA sequences. In: Proceedings
2007 IEEE Southeastcon, pp. 348–353
33. Zhang, Y., Ruan, X.G.: Algorithms for acceptor sites recognition in DNA. In: Proceedings of
the 5th World Congress on Intelligent Control and Automation, June 15–19 2004, Hangzhou,
P.R. China, pp. 3076–3078
34. Dudás, L.: Improved pattern matching to find DNA patterns. In: 2006 IEEE International
Conference on Automation, Quality and Testing, Robotics (2006)
35. Tsuboi, Y., Ono, O.: Pattern matching algorithm for engineering problems by using DNA
computing. In: Proceedings of the 2003 IEEVASME International Conference on Advanced
Intelligent Mechatronics (Aim 2003), pp. 1005–1008
Sensor-Based Analysis of Gait
and Assessment of Impacts of Backpack
Load on Walking
Chaitanya Nutakki, S. Varsha Nair, Nima A. Sujitha, Bhavita Kolagani,
Indulekha P. Kailasan, Anil Gopika, and Shyam Diwakar
1 Introduction
Gait analysis includes the systematic study of locomotion and the coordination
between nervous and musculoskeletal systems [1]. The emergence of wearable
(accelerometer, gyroscope) [2, 3] and non-wearable sensors used in gait analysis
[4] helps to identify different factors that are influencing gait through biomechanics
and kinematics [5]. The applications of low-cost wearable sensors used in gait analysis is not only limited to analyse the spatio-temporal parameters and gait events,
but also helps to analyse the variability between normal and pathological gait [6].
Physiological factors that influence the gait variability are muscle activity, functional control between muscle and nervous systems [7]. Gait variability was studied
using Froude number to understand the trends in gait patterns [8]. Mathematically,
a walking limb is commonly modelled as an inverted pendulum, where the centre of
mass goes through a circular arc cantered at the foot [8]. Also, quantifiable genderbased gait variability may have clinical and biomechanical importance [9], that can
be analysed by using Froude numbers as a combination of velocity and acceleration.
Speed of walking and posture balancing was analysed using shank and lower
back mounted IMU sensors with different machine learning algorithms to understand the gait alterations in daily life [10]. To classify healthy and pathological gait,
the spatio-temporal parameters were extracted from lower back using accelerometers from healthy and peripheral neuropathy subjects [11]. Biomechanical changes
due to carrying a backpack may cause alterations in gait kinematic patterns and
ground reaction forces which leads to postural changes and musculoskeletal injury
[12]. The kinetic and kinematic changes of gait associated with usage of backpack
C. Nutakki · S. V. Nair · N. A. Sujitha · B. Kolagani · I. P. Kailasan · A. Gopika · S. Diwakar (B)
Amrita School of Biotechnology, Amrita Vishwa Vidyapeetham, Amritapuri campus, Kollam,
Kerala, India
e-mail: shyam@amrita.edu
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
C. Nutakki et al.
[13] remained the same during swing and stance, however, significant changes were
also noticed during swing while carrying backpack [14]. This remains unclear to
understand the changes due to carrying backpacks in young adults.
In our previous studies [2, 3], gait kinematic data extracted using accelerometer
sensors attached to different joints of the body were classified and analysed using
machine learning approaches and inverse dynamic analysis. In this paper, we used
mobile phone accelerometers to study the effects of backpack with different loads
on gait and assessments to understand the lower trunk kinematic changes in young
adults. Also, the gait similarities and gender variability between subject groups were
identified using the Froude number of similar anthropometric characters.
2 Methods
Gait kinematic analysis was performed using 11 triaxial mobile phone accelerometers, attached to the pelvis, left and right (knee, ankle, shoulder, elbow, wrist). A total
of nine subjects with average age of 19–21 (male and female) were selected with no
gait pathological disorders. A total of 3 trials with 2 gait cycles per trail was considered. Before the data analysis, kinematic data was low pass filtered using a Butterworth filter (cut-off frequency of 6 Hz) and data analysis was done using MATLAB
(Mathworks, USA). Gait kinetic and postural differences have been analysed using
Froude number (Eq. 1) as a combination of velocity and gravitational acceleration.
After the collection of data, subjects were divided into three groups according to
subject’s body weight. Subjects weighing between 40 and 50 kg considered as group
1, 50 and 60 kg considered as group 2 and individuals weighing between 60 and
70 kg considered as group 3. Froude number (Fr) (Eq. 1) was calculated from the
given accelerations extracted from accelerometers and gravitational (Eq. 2) constant
during walking.
Fr = v 2 /gL , where v = velocity
g = gravitational acceleration
L = length of the joints
V =a∗t
Sensor-Based Analysis of Gait and Assessment …
2.1 Effect of Backpack on Lower Body
In this study, mobile phone accelerometers were attached to the right and left knee,
pelvis (lumbar 5) region, right and left ankle to extract gait spatio-temporal parameters like walking distance, speed and pelvic movements. 6 healthy subjects who
signed the consent forms before participating in the experiment were recruited. Each
subject was asked to stand up straight, keeping left leg forward and right leg backward
and walk along a straight line at their natural speed for 5 m. The backpack was positioned at the lumbar level 5. Each subject’s gait patterns, and pelvic tilt was evaluated
with and without weights. After the collection of data, all the subjects were divided
into two groups based on their body weight to check the effect of backpack on lower
back with respect to weight of a person. Individuals weighing between 40 and 60 kg
considered as group 1 and individuals weighing between 60 and 80 kg considered
as group 2. Here, in the model, mean activity of each joint rotation Jθ (Eq. 5) during
normal and controlled conditions were analysed. Also, the accelerations at each joint
was also considered across the time S (Eq. 7).
Jθ =
JDisplacement =
S = [Accx, Accy, Accz]
3 Results
3.1 Measured Gait Variability Across Different Subject
Groups Using Froude Analysis
Gait kinematic data was collected and the mean Froude number (Fr) of each joint
during stance and swing was measured. Fr amplitude of right shoulder in group 1&3
and left elbow in group 1&2 and group 2&3 were shown to be statistically significant
(Fig. 1a). While in the lower joints, the Froude numbers of pelvis belonging to group
1&2 were significantly different. Also, the difference in Froude numbers of left ankle
of group 1&2 and group 2&3 were significant (Fig. 1b). However, the variation
of Froude numbers within the groups were not significant suggesting they belong
to same group. Based on these observations, the significant differences among the
groups could be employed to discriminate the gait patterns of different individuals
who share the same anthropometric characteristics.
C. Nutakki et al.
Fig. 1 a Graphical
comparisons of average
Froude amplitudes for upper
body joints (R
shoulder—Right shoulder, R
elbow—Right elbow, R
wrist—Right wrist, L
wrist—Left wrist, L
elbow—Left elbow, L
shoulder—Left shoulder
b Average Froude amplitude
for lower body joints among
different subject groups (R
Knee-Right Knee, R AnkleRight Ankle, Pelvis, L
Ankle- Left ankle, L
Knee-left knee)
3.2 Attribute Selection to Classify Gait
We commenced analysis with the full attribute set and filtered the best attributes by
selectively using WEKA attribute selector. The dataset contained the average Froude
number of right and left (shoulder, wrist, knee, ankle) and the pelvis during static
and dynamic walking. The WrapperSubsetEval with BestFirst search method was
used to evaluate the attributes using learning classifiers (Naïve Bayes, SMO, J48).
Off nine attributes, it was shown that the data belonging to pelvis, knee, shoulder
and wrist was ranked as high. Correlation attribute eval and gain ratio attribute eval
also showed that the pelvis, knee, shoulder and ankle was crucial for classification
of static and dynamic gait movements.
Sensor-Based Analysis of Gait and Assessment …
Fig. 2 Male and female gait
spatio-temporal parameters
associated with Froude
numbers of shoulder, pelvis,
knee and ankle during stance
and swing
3.3 Gender-Based Classification with Respect to Fr Number.
Gait spatio-temporal parameters between male and female subjects associated with
Froude numbers were analysed. Joint motions in the pelvis, knee, shoulder and ankle
from the sagittal plane during stance and swing phases were compared among the
two groups. Males and females had unique gait patterns, although walking speed,
cadence and step length remained the same. Average Fr number of males were higher
compared to females in the shoulder and knee, whereas female gait data showed more
activity in hip and ankle throughout the gait cycle (Fig. 2).
3.4 Effect of Backpack Load on Pelvis during Walking
to Understand the Gait Pathological Condition
Lower body joint kinematic changes in young adults while carrying backpack with
different loads were assessed. The recruited subjects were categorised into two groups
based on the subject’s weight. Subjects weighing between 40 and 60 kg considered
as group 1 and subjects weighing between 60 and 80 kg considered as group 2. When
compared with and without backpack load, for all the subjects, a significant increase
in the pelvis rotation was noticed by adding 10% of body weight as a backpack load
(Fig. 3). Error bars represent the standard deviation.
4 Discussion
Froude analysis was effectively employed to identify dynamic gait similarities
between subjects having similar anthropometric characteristics. The present study
C. Nutakki et al.
Fig. 3 Change in activity of
pelvis in young adults during
normal and loaded
used the Froude number to identify kinematic differences between the subjects during
walking. Though there are difference in gait speed and cadence, similar Froude
number values were found for all the subjects in the group. The significant differences in average Fr amplitude of shoulder, wrist, elbow, knee and pelvis among the
subject groups could help differentiate the subjects with similar anthropometric characters. This shows the gait reliability hinge on the skeletal maturity. Also, gait spatiotemporal parameters and variability in a population of typically males and females
were analysed using Froude analysis. Joint angular motion for female subjects in the
sagittal plane showed more activity in the ankle and pelvis, whereas male subjects
showed more activity in knee and shoulder. Moreover, female subjects showed less
knee extension during swing. Gender-related differences in identifying gait-related
risk factors helps in understanding potential neuro-pathological conditions and allow
mapping appropriate intervention design. Also, by using attribute selector filters, it
was observed that pelvis, knee, shoulder elbow and wrist ranked high and can be
used to abstract gait characteristics. The extracted patterns can be used in machine
learning to develop low-cost models that can help clinicians to analyse pathological
conditions. This might be a possible way for the proper detection of gait-related
The purpose of the backpack-load study was to examine changes in pelvic movement and gait alterations while carrying different weights during walking. Carrying
heavy bags is becoming a cause for musculoskeletal disorders, also affecting the
kinematic behaviour of neck, shoulder, and lower trunk. Pelvis being the region that
connects the lower and upper trunks is considered to balance and stabilise the body
centre of mass during walking, sitting and balancing. Also, forward inclination of
the pelvis and trunk while carrying backpack loads may be associated with abnormal
muscle contractions that may lead to strain in the lower back and possibly, increased
risk of lower back injury. It may be suggested that the backpack weight can be limited
to a range predetermined from our methods relating trunk kinematic change. This
study can further help in lowering risks in studying pre-clinic scenarios such as in
Sensor-Based Analysis of Gait and Assessment …
telemedicine, although more simulations need to be performed to reconstruct gait
from this data.
5 Conclusion
With many other existing techniques to quantify gait have becoming expensive and
lack translations to pre- or post-clinical scenarios, we see this study could help for
low-cost studies. This analysis can be used for continuous monitoring can help reduce
and reconstruct scenarios that may help avoid overcrowded hospital environments.
Although there are 11 attributes that define gait, in this current study, we employed 5
major attributes to estimate the healthy gait and the study suggests that it is feasible
to compute the gait similarities and gender variabilities across subject groups using
Froude amplitudes that may help in identifying walking related conditions with
attributions to age and weight. The developed models can be fine-tuned to a more
economical, simpler and non-invasive sensor-app-based technique to analyse gaitrelated pathological conditions especially for those who live in rural and backward
Acknowledgements This work derives direction and ideas from the Chancellor of Amrita Vishwa
Vidyapeetham, Sri Mata Amritanandamayi Devi. This study was partially supported by Department
of Science and Technology Grant DST/CSRI/2017/31, Government of India and Embracing the
World Research-for-a-Cause initiative.
1. Whittle, M.: Gait Analysis : An Introduction. Butterworth-Heinemann (2007)
2. Nutakki, C., Narayanan, J., Anchuthengil, A.A., Nair, B., Diwakar, S.: Classifying gait features
for stance and swing using machine learning. In: 2017 International Conference on Advances
in Computing, Communications and Informatics (ICACCI). pp. 545–548. IEEE (2017)
3. Balachandran, A., Nutakki, C., Bodda, S., Nair, B., Diwakar, S.: Experimental recording and
assessing gait phases using mobile phone sensors and EEG. In: 2018 International Conference
on Advances in Computing, Communications and Informatics (ICACCI). pp. 1528–1532. IEEE
4. Leu, A., Ristic-Durrant, D., Graser, A.: A robust markerless vision-based human gait analysis system. In: SACI 2011—6th IEEE International Symposium on Applied Computational
Intelligence and Informatics, Proceedings. pp. 415–420 (2011)
5. van Schooten, K.S., Pijnappels, M., van Dieën, J.H., Lord, S.R.: Quality of daily-life gait:
Novel outcome for trials that focus on balance, mobility, and falls. Sensors (Switzerland). 19
(2019). https://doi.org/https://doi.org/10.3390/s19204388
6. Weiss, A., Herman, T., Giladi, N., Hausdorff, J.M.: Objective assessment of fall risk in
Parkinson’s disease using a body-fixed sensor worn for 3 days. PLoS One 9 (2014). https://doi.
7. Herman, T., Giladi, N., Gurevich, T., Hausdorff, J.M.: Gait instability and fractal dynamics of
older adults with a “cautious” gait: why do certain older adults walk fearfully? Gait Posture.
21, 178–185 (2005). https://doi.org/10.1016/j.gaitpost.2004.01.014
C. Nutakki et al.
8. Kramer, P.A., Sylvester, A.D.: Humans, geometric similarity and the Froude number: is “reasonably close” really close enough? Biol. Open. 2, 111–120 (2013). https://doi.org/10.1242/
9. Smith, L.K., Lelas, J.L., Kerrigan, D.C.: Gender differences in pelvic motions and center of
mass displacement during walking: stereotypes quantified. J. Women’s Heal. 11, 453–458
(2002). https://doi.org/10.1089/15246090260137626
10. Mannini, A., Sabatini, A.M.: Machine learning methods for classifying human physical activity
from on-body accelerometers. Sensors (Basel) 10, 1154–1175 (2010). https://doi.org/10.3390/
11. Sejdic, E., Lowry, K.A., Bellanca, J., Redfern, M.S., Brach, J.S.: A comprehensive assessment
of gait accelerometry signals in time, frequency and time-frequency domains. IEEE Trans.
Neural Syst. Rehabil. Eng. 22, 603–612 (2014). https://doi.org/10.1109/TNSRE.2013.2265887
12. Orr, R.M., Johnston, V., Coyle, J., Pope, R.: Reported load carriage injuries of the Australian
army soldier. J. Occup. Rehabil. 25, 316–322 (2015). https://doi.org/10.1007/s10926-0149540-7
13. Hong, Y., Cheung, C.K.: Gait and posture responses to backpack load during level walking in
children. Gait Posture. 17, 28–33 (2003). https://doi.org/10.1016/S0966-6362(02)00050-4
14. Hong, Y., Brueggemann, G.P.: Changes in gait patterns in 10-year-old boys with increasing
loads when walking on a treadmill. Gait Posture. 11, 254–259 (2000). https://doi.org/10.1016/
Wireless Battery Monitoring System
for Electric Vehicle
Renuka Modak, Vikramsinh Doke, Sayali Kawrkar, and Nikhil B. Sardar
1 Introduction
Today, the electric vehicles are becoming popular as fuel prices are becoming more
expensive day by day. Because of this scenario, many vehicle manufacturers are
looking for alternatives from energy sources other than gas. The use of electricity
sources can improve the environment as there is less pollution. In addition, EVs
produce great advantages in terms of energy-saving and environmental protection.
Overcharging the battery can not only significantly reduce battery life but can also
cause serious safety incidents, such as a fire. Therefore, an electric vehicle battery
monitoring system is required, which can communicate battery status to the user to
avoid the indicated problem. Thanks to the advancement of the notification system
design. Internet of things technology can be used to inform the manufacturer and
users regarding battery condition. Cloud is used as an integration platform to acquire
process and transmit data, as it is a generally excellent graphical programming
condition for creating estimations, tests, and control frameworks.
1. To design a prototype model for an electric vehicle.
2. Turn on the cooling circuit if the temperature is exceeded above the limit.
R. Modak (B) · V. Doke · S. Kawrkar · N. B. Sardar
MIT Academy of Engineering, Alandi(D), Pune, Maharashtra 412105, India
e-mail: rdmodak@mitaoe.ac.in
V. Doke
e-mail: vadoke@mitaoe.ac.in
S. Kawrkar
e-mail: sukawrkar@mitaoe.ac.in
N. B. Sardar
e-mail: nbsardar@mitaoe.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
R. Modak et al.
2 Literature Survey
The first research paper we referred to includes charging stations powered by sustainable combines both sustainable and EVs, which could be the key charging infrastructures. As the usage of EVs increases, there will be more recharging problems, and
to overcome this situation, the charging station with battery storage system power
dispatch for EVs, and this system is supported by the non-cooperative game achieves
dynamically adjusting the charging power [1]. This research paper talks about the
extreme concept idea in the traditional substation, which is a transfer station. Different
locations have gained recognition as well as it has been developed. The idea can be
implemented in the charging of electric vehicles, charging capacity improvement, and
improves customer service [2]. In this section, they have talked about the advanced
quality of the electric vehicle charging system, the business market, and structures
a charging system method [3]. In this research, the paper investigates the metering
and charging framework that can effectively show the charging cost of the EV on
the remote of the charging road [4]. In this paper, the outcomes demonstrated that
the proposed keen charging strategy decreased the advantages for owners of electric vehicles. This controller requires fundamental communication with the power
organization to get the power value in signal every 60 min [5].
3 Methodology
Batteries are utilized in EVs ought not to be overcharged or over-released to overlook
injury to the battery, compress the battery life, and create fire or explosion. The
BMS, with the end goal of battery demonstrating, battery state progression, battery
adjusting, and so on, is one of the keys focuses to make sure about the battery and
upgrade the use of the battery in EVs. A car battery plays a vital role in an electric car
to keep it going on the road, thus the electric car battery pack needs to be protected
from damage because of uneven temperature. Depending on the electrochemical used
in the battery, the perfect scale is different, but the ideal temperature of the electric
car battery is 45 °C in order to maintain life for the battery. The state of charge (SoC)
is the level of charge of an electric battery relative to its capacity.
State-of-charge determination: One attribute of the battery monitoring system is
to monitor the condition of charge of the battery. The SoC could signal the user and
limit the charging and discharging process. There are three methods of determining
1. Direct measurement: To measure the SoC directly, one could directly use a voltmeter because the battery voltage reduces more or slighter linearly through the
discharging cycle of the battery.
2. During coulomb-counting: In the coulomb-counting technique, the current going
into or coming out of a battery is combined to construct the respective value of
its charge.
Wireless Battery Monitoring System for Electric Vehicle
3. Through the cooperation of two techniques: In addition, the two procedures could
be combined. The voltmeter is used to detect the battery voltage. Meanwhile,
the battery current could be merged to decide the particular charge going into
and coming out of the battery. The SOC refers to the remaining capacity as a
percentage of the highest available capacity.
4 System Block Diagram
The design blocks in Fig. 1 gives an overview of the proposed BMS system, which
is the combination of sensor network, Wi-Fi module, thermoelectric plate, and
4.1 Sensors
A potential divider module DFR0051 has been chosen as a voltage sensor. The
DFR0051 potential divider module made bolstered resistor divider standards. This
may sense a voltage attainability up to 25 V. Current sensor (ACS712- 05B) supported
Hall Effect guidelines. It delivers 185 mV/A (graciously at +5 V power) yieldingness
and can determine current up to ±5A [6]. The inside and the surrounding temperature
of the battery perform an important task which is choosing the battery execution since
it can modify automatically with the help of temperature. During this exploration, the
LM35 temperature sensor, a simple temperature sensor which is used for the remote
battery checking framework, is operated on 5 V. The output of this sensor is relative
to temperature. The output was estimated with one among the microcontroller inbuilt
analog–digital converter, and an alignment formula given by the manufacturer was
wont to change over the voltage sign to temperature, with an exactness of ±0.5 °C.
Fig. 1 System block diagram
R. Modak et al.
4.2 Wi-Fi Module
The ESP8266 is easy to use, and it is a coffee cost tool and can supply an Internet
connection to the project. The module can work both as an access point (make a
hotspot) and as a station (associate with Wi-Fi); henceforth, it can without much of
stretch get information and transfer it to the Web making IoT as simple as could
reasonably be expected.
5 System Flow Chart
The system flowchart in Fig. 2 illustrates the source code flow of the system. Once
the system is switched ON, it will start to initialize the code. The sensor network of
the system measures the voltage, current, and temperature of the EV battery. And
these measured parameters then convey to the microcontroller ATMEGA16. The
microcontroller passes it on the physical data to the LCD screen to display the realtime value of voltage, current, and temperature sensors. Then, the microcontroller
checks the temperature of the EV battery. If it is less than the threshold value, then
Fig. 2 Wireless BMS flow chart
Wireless Battery Monitoring System for Electric Vehicle
it continues to pass the data to the LCD screen. But, if the temperature of the EV
battery is more than the threshold value which is approx. above 40 °C, then the
microcontroller turns ON the cooling circuit, which is basically a thermoelectric plate
controlled by a relay. And the thermoelectric plate tries to maintain the temperature
of the EV battery.
Before the hardware implementation of this prototype, we visited the electric
vehicle showroom, where we studied the specifications of the EV batteries. But we
were unable to purchase the actual EV battery due to its high cost. So, we decided
to design the EV battery on simulation, according to its specifications. After the
successful output from simulation, we started to design the hardware circuit.
5.1 Electric Vehicle Battery Specification
Rated Voltage: 72 V
Maximum voltage: 84 V
Max. charging current: 20 A
Capacity: 46 Ah
Charging temp.: 0–50 °C
Discharging temp.: − 20 to 60 °C.
The hardware implementation of the proposed system prototype consists of the
electric vehicle battery and the IoT sensors network to monitor the different parameters of the battery. Atmega16 is the main component of the circuit. The IoT sensors
network consists of different sensors. To the EV battery, the different sensors are
connected, which are the current sensors, voltage sensors, and temperature sensors
Fig. 3 System hardware setup
R. Modak et al.
to get the physical data from the battery and to process this data, the sensor network
passes it on to the Atmega16 microcontroller. Atmega16 takes the input data from
the sensors and passes it toward the LCD display to show the real-time data while
charging the EV battery. LED also glows to indicate this working process. Then there
is the ESP8266 WIFI module, to store the physical data of EV battery on the cloud.
To connect this ESP8266 WIFI module, we used the free available software ‘ThingSpeak’ where we get the free cloud storage. This system is designed for some special
purpose, which is to track the data of EV batteries for maintenance purposes. The
other circuit of the project is the temperature monitoring circuit. As we are not only
controlling the parameters of the EV battery but also trying to control the temperature
of the EV battery during its charging and deep discharging process. To control the
temperature of the EV battery, we are using the thermoelectric plate to cool down
the temperature. If the temperature of the EV battery exceeds the threshold level, the
cooling circuit will turn ON and will try to control its temperature. The main purpose
to design this controlling circuit is to protect the consumer from the explosion of the
battery due to high temperature. It is a safety measure circuit. This is how the overall
battery monitoring circuit system works to achieve its aim.
6 Result
The executive performance of WBMS is user friendly and simple to observe parameters like current, voltage, and temperature of the battery through charging and
discharging operation. A wireless battery monitoring system can be registered on
the PC, as well as on smartphones [7].
Figure 4 introduces the temperature feature about the battery. It shows a battery
temperature graph which is relatively fixed at about 30 °C
Figure 5 displays current features of battery for EV on charging and discharging
Fig. 4 Temperature monitoring of WBMS
Wireless Battery Monitoring System for Electric Vehicle
Fig. 5 Current monitoring of WBMS
Fig. 6 Voltage monitoring of WBMS
procedure. In this exploration, a sustained current method is utilized for discharging
around 1.8 A.
Figure 6 displays Voltage characteristic of battery for EV during the charging and
discharging process.
7 Software Preparation
The wireless battery monitoring system checking framework was upgraded utilizing
two programming stages, i.e., Arduino Incorporated Improvement for Ecological
(IDE), AVR studio.
R. Modak et al.
7.1 Arduino IDE
The Arduino IDE is an autonomous stage application that works on C and C++
languages. It is obtained from the IDE for the sorting out programming language [6].
7.2 AVR Studio
AVR studio is an Integrated Development Environment (IDE) created by ATMEL
for creating diverse implanted applications based on an 8-bit AVR microcontroller.
8 Conclusion and Future Scope
The prototype system for wireless monitoring of electric vehicle batteries has been
designed and implemented to increase the lifespan as well as to monitor the different
parameters of the battery such as temperature, voltage, current, etc. In the future, a
highly accurate battery monitoring device will be developed. By using this system,
we can also control different battery parameters. Users can also monitor on its smartphones from anywhere using the Android app. At that time, it is expected that the wireless communication system presented in this paper will contribute to the realization
of an advanced and efficient battery management system.
1. Zhang, J., Yuan, R., Yan, D., Li, T., Jiang, Z., Ma, C., Chen, T.: A non-cooperative game-based
charging power dispatch in electric vehicle charging station and charging effect analysis. In:
Published in 2018 2nd IEEE Conference on Energy Internet and Energy System Integration
(E12) https://doi.org/10.1109/E12.2018.8582178
2. Shuanglong, S., Zhe, Y., Shuaihua, L., Yun, C., Yuheng, X., Bo, L., Fengtao, J., Huan, X.: Study
on group control charging system and cluster control technology of electric vehicle. In: Published
in 2018 2nd IEEE Conference on Energy Internet and Energy System Integration (EI2). https://
3. Zhang, J., Yan, H., Ding, N., Zhang, J., Li, T., Su, S.: Electric vehicle charging network development characteristics and policy suggestions. In: Published in 2018 International Symposium
on Computer, Consumer and Control (IS3C). https://doi.org/10.1109/IS3C.2018.00124
4. Danping, V.Z., Juan, L., Yuchun, C., Yuhang, L., Zhongjian, C.: Research on electric energy
metering and charging system for dynamic wireless charging of electric vehicle. In: Published in
2019 4th International Conference on Intelligent Transportation Engineering (ICITE). Published
in 2019 27th Iranian Conference on Electrical Engineering (ICEE). https://doi.org/10.1109/
5. Chen, Z., Lu, J., Yang, Y., Xiong, R.: Research on the influence of battery aging an energy
management economy for plug-in hybrid electric vehicles. In: IEEE, Issue no 978-1-53863524@2017 IEEE
Wireless Battery Monitoring System for Electric Vehicle
6. Juang, L.W.: Online battery monitoring for state-of-charge and power capability prediction.
Master of Science thesis, University of Wisconsin- Madison, USA (2010)
7. Yang, Y., Chen, B., Su, L., Qin, D.: Research and development of hybrid electric vehicles CANbus data monitor and diagnostic system through OBD-II and Android-based smartphones. Adv.
Mech. Eng. 2013(Article ID 741240), 9 p (2013)
Iris Recognition Using Selective Feature
Set in Frequency Domain Using Deep
Learning Perspective: FrDIrisNet
Richa Gupta and Priti Sehgal
1 Introduction
Iris biometric is a useful authenticating tool [1]. It is used as a tool to avoid frauds
in traditional security systems like password hacking, presentation of fraudulent
documents, etc. The traditional approaches prevalent in field of iris biometric work
by using entire iris data for user authentication. This iris feature set is extracted using
different techniques like Gabor filters, log Gabor filter, 2D wavelets [2, 3], Fourier
transform [4, 5], etc. These approaches show high accuracy rate but they are an easy
prey to several security attacks over the biometric system [6–10].
With the advancement in technology and splurge of data, research is going toward
self-learning approaches. CNN have shown high accuracy with respect to iris recognition whilst requiring minimum human intervention [8, 11 12]. The iris recognition
systems present in the literature work by inputting the segmented or normalized iris
image to a CNN model and processing information from it. Additionally, the images
provided to the model are in spatial domain. This is the widely used and wellinterpreted domain of representing an image. But its frequency counterpart has been
found to be more effective in applying convolution operations over it [13]. This is
advantageous as CNN model’s first layer is convolutional layer, which works directly
on an input image. Frequency domain helps the CNN model to learn parameters efficiently. Further, it is well noted that high frequency components of the transformed
image contain boundary and edge data while lower frequency components represent
smooth regions of the image. Hence, the important information contained only in the
high frequency components can be used for authentication.
R. Gupta (B) · P. Sehgal
University of Delhi, Delhi, India
e-mail: richie.akka@gmail.com
P. Sehgal
e-mail: psehgal@keshav.du.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
R. Gupta and P. Sehgal
In this paper, we propose and present the technique of authenticating user with a
reduced iris feature set using CNN model. The reduced feature set is derived using
approach presented in [1] which works in spatial domain. This set is further transformed to frequency domain and higher frequency components are extracted from
it. This refined data set is presented to the proposed CNN model, FrDIrisNet. With
the help of experiments, we prove that the proposed system is capable of identifying
user with greater accuracy as compared to model with information presented in
spatial domain. Enhanced security of the proposed approach is shown by mitigating
replay attack and database attack over iris recognition system. The experiments are
conducted on CASIA-Iris-Interval v4 DB and IIT Delhi DB. The rest of the paper is
organized as follows: Sect. 2 summarizes literature survey on iris recognition with
reduced feature set, CNN-based model and security with respect to attacks over the
system. Section 3 presents the proposed approach FrDIrisNet and steps involved in
the process. Section 4 discusses the experimental results followed by conclusion in
Sect. 5.
2 Literature Survey
In the recent past, researchers have been exploring and proposing various techniques
for biometric authentication and biometric security using deep learning. Menotti
et al. [14] proposed the use of CNN model to handle spoofing attack on biometricbased authentication system. They proposed and presented the results on iris, face
and fingerprint biometric with high accuracy in most of the cases. Pedro et al.
[15] and Ramachandra et al. [16] presented a technique to detect liveness detection attack on iris biometric by revealing the presence of any contact lens. They
achieved comparable results with other state-of-the-art techniques on IIIT Delhi
Contact lens iris database and Notre Dame cosmetic contact lens database 2013. Liu
et al. [17] projected the use of CNN on heterogeneous set of iris images. The experimental results show high accuracy on Q-FIRE and Casia cross sensor iris databases.
Shervin et al. [18] used pre-trained CNN model for feature extraction on iris and
applied principal component analysis (PCA) to reduce further the dimensionality for
classification using support vector machine (SVM). The proposed approach provides
promising results on IIT Delhi and Casia-1000 Iris Database. Similar work has been
presented by Nguyen Kien et al. [19] who show performance of several pre-trained
models (AlexNet, VGG, Inception etc.) on iris-based authentication. They achieved
a high performance on LG2000 and Casia-Thousand Iris database Waisy et al. [20]
proposed a multi-biometric fusion of left and right eye of iris of each user, using
deep convolutional network followed by rank based fusion. They claim to achieve
promising results on Casia-Iris-Interval v3, IIT Delhi database and SDUMLA-HMT
database. Although, the idea of reduced feature set is not new, yet very limited work
could be found in this area with respect to iris recognition [11, 12, 21–23]. Recently,
Richa and Priti [1] proposed the concept of robust iris regions using local binary
Iris Recognition Using Selective Feature Set …
patterns. They presented the accuracy on CASIA and IIT Delhi database as 98.14%
and 98.22%, respectively.
In this paper, we propose a novel CNN model, FrDirisNet that uses reduced
and relevant feature set in transform domain, for iris recognition and discuss the
implicit advantage of this approach in mitigating replay attack and template attack
3 Proposed Methodology
The working of the proposed methodology is presented in Fig. 1 and discussed in
following subsections on two publicly available databases—CASIA-Iris-Interval v4
and IIT Delhi DB.
3.1 Image Augmentation
The amount of data processed plays an important role when dealing with deep neural
network models. Image augmentation is used to increase the size of the dataset by
applying certain transformations over it. It is believed that smaller dataset at the
training phase tends to over fit the classifier. This limitation can be overcome by
use of artificially produced dataset derived from the original dataset. Deep neural
Fig. 1 Working of the proposed approach
R. Gupta and P. Sehgal
network models are trained over this augmented dataset to make them robust over
small variations in the images.
In our proposed approach, intensity transformation using gamma correction is
applied over the images as proposed by Miki Yuma et al. [24], with Y values 0.65
and 1.5. This increased our dataset by the factor of 3. The gamma correction is applied
on the database shown in Eq. 1:
y = Imax ∗
where x and y are the input and output pixels of an image, respectively.
Imax is the maximum pixel value for input image.
Let the captured eye image be represented by E i, j for ith user jth image. This
image is used for augmenting database by applying intensity transformation over it.
The image in the augmented database is represented by Ii, j .
3.2 Image Pre-Processing
The iris image Ii, j for ith subject, jth eye is of size 320 × 280 pixels. This image is
preprocessed to get the segmented and normalized iris image be represented by N i, j
of size 512 × 64 pixels. This step is performed using OSIRIS software version 4.1
[25], which is a freely available linux-based tool.
3.3 Robust Region Determination
The popular way of authenticating the biometric of a user involves verification of
complete biometric data. This method posed a deterministic approach of authentication. A different non-deterministic approach has been proposed in previous work [1].
The pre-processed normalized image is partitioned into 64 non-overlapping regions
or blocks N bi, j , ∀b = 1, 2, . . . , 64 of size 16 × 32. This value has been empirically
derived by Richa and Priti [1], where it is proved that 64 partitions attained best
results. Average Local Binary Pattern (ALBP) thresholding is applied to each block
N bi, j . ALBP is an improvement over LBP codes [26] which works by averaging
the image around center pixel to remove the sensitivity over the gray value of the
center pixel. The uniform patterns in the form of histograms are further obtained as
H b , ∀b = 1, 2, . . . 64 of size 1 × 243 for ALBP configuration of radius 4 and 16
neighbors. This implies that the features in the form of histogram for each image
H i, j are of size 64 × 243.
The feature vector H i, j is further used to obtain robust iris regions using ChiSquare distance method. Let the identified robust regions for each user be represented
Iris Recognition Using Selective Feature Set …
as R m
i where 1 ≤ m ≤ 64. Subjects having m = 40 robust regions are selected for
experimentation denoted by r i from R m
i . The value of m has been empirically driven.
3.4 Transformation Layer
This layer transforms the selective robust iris regions from the normalized image to
the frequency domain. In proposed approach, frequency transform (FFT) is applied
to transform this data. The higher frequency components from this data are extracted
for all the regions. This supports the convolutional operation, which also works by
extracting the detailed features of an image which increases the efficiency of the
CNN model. The higher frequency components from this data are collated to get a
new image and fed to the CNN model.
For each selected robust region ri ri , FFT is applied on normalized image
Ni,b j , where ri ∈ Rim and b ∈ ri Ni,b j , where ri ∈ Rim and b ∈ ri (refer Sect. 3.3) The
transformed regions are filtered to get higher frequency components using Gaussian
high pass filter (GHPS). This data is collated to form an image denoted as Ii, j Ii, j of
size 128 × 160 constituting of 40 regions each of size 16 × 32 and is represented as
Eq. (2)
Ii, j = collate GHPS FFT Ni,b j
3.5 FrDIrisNet: Convolutional Neural Network Architecture
FrDIrisNet consists of 5 convolution layers (CONV), 5 rectified linear units (ReLU)
layers, 5 pooling layers, a fully connected layer. The classification is performed using
Softmax layer. Each convolution layer is followed by batch normalization layer,
ReLU layer, and MaxPool layer. This is depicted in Fig. 2. The input to FrDIrisNet
is a transformed image of 128 × 160 × 1 pixels. This image is passed through first
convolution layer conv1 of size 5 × 5, with filters f = 56, stride s = 1 and padding
p = 0. The output of 10,83,264 neurons is given to maxpool 1 of size 2 × 2 with s
= 2. The output of 2,70,816 neurons is passed to conv2 of size 5 × 5 with f = 112,s
= 1 and p = 0. 4,80,704 neurons are passed to maxpool2 of size 2 × 2 with s = 2.
The output is successively passed to conv3 of size 3 × 3 with f = 124, s = 1 and
p = 0, maxpool3 of size 2 × 2 with s = 2, conv4 with parameters same as conv3,
maxpool4 of size 2 × 2 with s = 2, conv5 of size 3 × 3 with f = 136, s = 1 and
p = 0, maxpool5 of size 2 × 2 with s = 2. The output of 272 neurons is given to
fully connected layer which is mapped to get final probability. The other important
parameters involved while training the model are MaxEpochs as 30, InitialLearnRate
of 0.001, L2Regularization of 0.0001, and MiniBatchSize of 40.
R. Gupta and P. Sehgal
Fig. 2 FrDIrisNet: The working CNN model for proposed approach
4 Experiments
The experiments are conducted on two publicly available databases CASIA-IrisInterval v4 and IIT Delhi DB. The experiments are shown on 185 subjects with
1324 images and 330 subjects with 1702 images for CASIA-Iris-Interval v4 and
IITD database, respectively, which is determined as explained in [1]. This dataset
is augmented as described earlier in Sect. 3.1, to get the dataset, which is 3 times
the original dataset. The CNN model FrDIrisNet is trained with 80% of the images
while rest 20% images are used for testing.
4.1 Experimental Results
The aim of this paper is to present a CNN model that authenticates the user using
feature set which is reduced and transformed to its frequency counterpart. This is
attained using self-trained model FrDIrisNet. The technique proposed here authenticates the user by selectively getting the robust feature set. It combines the idea of
non-determinism with neural network. The non-deterministic approach makes use
of a subset of available information for authenticating the user.
The results show that the accuracy achieved with FrDIrisNet on augmented and
non-augmented database is comparable but the FAR and FRR values for IITD DB
show a steep decrease in case of augmented database as shown in Table 1. This implies
that FrDIrisNet is capable to attain similar accuracy but at far lower false rates (FAR
and FRR), which is crucial in improving the overall system performance. The FAR
for IITD DB decreases from 6 × 10−3 to 8 × 10–4 while FRR decreased from 2.42
Accuracy (%)
CRR (%)
FAR (%)
FRR (%)
FAR (%)
CASIA-Iris-Interval v4
Augmented database
Non-augmented database
FRR (%)
Accuracy (%)
CRR (%)
Table 1 Comparison of proposed approach on non-augmented vs augmented database on CASIA-Iris-Interval v4 and IITD database over FAR, FRR, accuracy,
and correct recognition rate (CRR)
Iris Recognition Using Selective Feature Set …
R. Gupta and P. Sehgal
to 0.3 for proposed approach. For CASIA-Iris-Interval v4 FAR decreases from 8 ×
10−3 to 2 × 10–3 while FRR decreases from 1.62 to 0.54. The accuracy of the system
is compared with existing state-of-the-art techniques and is presented in Table 2. The
techniques chosen for comparison use optimal iris feature set for user authentication
using traditional approach. Table 3 presents the comparison of proposed approach
with other deep learning-based approaches which use full feature set. Clearly, in both
the tables it can be seen that the proposed system not only achieves high accuracy
but also a significant improvement in FAR and FRR is seen, which governs the false
acceptance and rejection rate of the system.
Table 2 Comparison of proposed approach with respect to traditional approaches using reduced
feature set
Gu et al. [21]
Genetic algorithm to
optimize features
Roy and
Bhattacharya [22]
genetic function
Richa and Priti [1] 98.14
Local binary patterns
with chi-square
IIT Delhi DB
Richa and Priti [1] 98.22
Local binary patterns
with chi-square
Table 3 Comparison of proposed approach with respect to existing deep learning approaches using
image in spatial domain with complete feature set
v4 DB
Waisy et al. [20]
Use multimodal
biometric on eye
Fei [27]
Data-driven Gabor
filter optimization
Use of image
transformed in
frequency domain
Waisy et al. [20]
Use multimodal
biometric on eye
Use of image
transformed in
frequency domain
IIT Delhi DB
Iris Recognition Using Selective Feature Set …
Table 4 Comparison and importance of correct region ordering vs. incorrect region ordering
Correct ordering
Incorrect ordering
FAR (%)
FRR (%)
FAR (%)
FRR (%)
Role in Mitigating Replay Attack
The proposed CNN model learns parameters from only partial, robust iris data,
which is found to contain considerably important biometric information. The nondeterministic approach, of seeking the randomly ordered robust iris regions, plays
a key role in mitigating replay attack. It provides flexibility to the system to choose
a desired order of regions, such that any interception followed by a replay message
is barred from accessing the system. The experiments were performed based on the
approach as described in [2]. The selective determination of robust regions and using
a random subset of these regions for data exchange between sensor and system helps
to mitigate the replay attack. By tapping the interface, an impostor cannot certain the
correct order of these regions for next authentication. The probability as determined
earlier in [1] for getting the correct sequence of these regions is just 1/40!. Table 4
presents the results of incorrect ordered data being presented to FrDIrisNet and its
implication on system which attains low system accuracy with high FRR rate.
Role in Mitigating Template Attack
The innate nature of CNN model to learn by itself and work using those parameters
rule out the need to maintain a database of templates. This facilitates to mitigate
template-based attack on the system, which had been one of the major security
concerns in critical applications.
5 Conclusion
In this paper, we presented a CNN-based approach to authenticate the user using
reduced iris feature set. The proposed model FrDIrisNet, works by extracting the
robust iris regions, transforming them using Fourier transform and extracting high
frequency components. Data augmentation is applied to avoid the problem of overfitting the database and increasing the size of database to 3 times the original database.
The approach is additionally shown to mitigate replay attack and database attack over
the system. The experiments are carried out under different categories and comparison show excellent and competitive results with significant decline in false error
rates on CASIA-Iris-Interval v4 and IITD database.
R. Gupta and P. Sehgal
1. Gupta, R., Sehgal, P.: Non-deterministic approach to allay replay attack on iris biometric.
Pattern Anal. Appl. 1–13
2. Daugman, J.: How iris recognition works. In: Essential Guide to image Process, pp. 715–739.
Academic Press (2009)
3. Wildes, R.P.: Iris recognition: an emerging biometric technology. Proc. IEEE 85(9), 1348–1363
(1997). https://doi.org/10.1109/5.628669
4. Conti, V., Militello, C., Sorbello, F., Vitabile, S.: A frequency-based approach for features
fusion in fingerprint and iris multimodal biometric identification systems. IEEE Trans. Syst.
Man Cybern. Part C Appl. Rev. 40(4), 384–395 (2010). https://doi.org/10.1109/TSMCC.2010.
5. He, X., Lu, Y., Shi, P.: A fake iris detection method based on FFT and quality assessment. In:
Chinese Conference Pattern Recognition, 2008. CCPR’08, pp. 1–4. IEEE (2008). https://doi.
6. Shelton, J., Roy, K., O’Connor, B., Dozier, G.V.: Mitigating iris-based replay attacks. Int. J.
Mach. Learn. Comput. 4(3), 204–209 (2014). https://doi.org/10.7763/IJMLC.2014.V4.413
7. Smith, D.F., Wiliem, A., Lovell, B.C.: Face recognition on consumer devices: reflections on
replay attack. IEEE Trans. Inf. Forensics Secur. 10(4), 736–745 (2015)
8. Rathgeb, C., Uhl, A.: Attacking iris recognition: an efficient hill-climbing technique. In: 2010
20th International Conference on Pattern Recognition (ICPR), pp. 1217–1220. IEEE (2010)
9. Rathgeb, C., Uhl, A.: Secure iris recognition based on local intensity variations. In: Image Analysis and Recognition, vol. 6112 LNCS, no. PART 2, pp. 266–275. Springer, Berlin, Heidelberg
(2010). https://doi.org/10.1007/978-3-642-13775-4_27
10. Tomeo-reyes, I., Liu-jimenez, J., Fernandez-saavedra, B., Sanchez-reillo, R.: Iris recognition :
threat analysis at the user interface level
11. Bolle, R.M., Pankanti, S., Connell, J.H., Ratha, N.K.: Iris individuality: a partial iris model. In:
Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004,
vol. 2, pp. 927–930. IEEE (2004). https://doi.org/10.1109/ICPR.2004.1334411
12. Hollingsworth, K.P.K.P., Bowyer, K.W.K.W., Flynn, P.J.P.J.: The best bits in an Iris code. IEEE
Trans. Pattern Anal. Mach. Intell. 31(6), 964–973 (2009). https://doi.org/10.1109/TPAMI.200
13. Signal Processing. https://dsp.stackexchange.com/questions/38750/why-frequency-domainconversion-is-important-in-digital-image-processing. Accessed 11 Dec 2019
14. David, M., et al.: Deep representations for iris, face, and fingerprint spoofing detection. IEEE
Trans. Inf. Forensics Secur. 10(4), 864–879 (2015)
15. Pedro, S., Luz, E., Baeta, R., Pedrini, H., Falcao, A.X., Menotti, D.: An approach to iris contact
lens detection based on deep image representations. In: 2015 28th SIBGRAPI Conference on
Graphics, Patterns and Images, IEEE, pp. 157–164 (2015)
16. Ramachandra, R., Raja, K.B., Busch, C.: ContlensNet: robust iris contact lens detection using
deep convolutional neural networks. In: IEEE Winter Conference on Applications of Computer
Vision (WACV), pp. 1160–1167 (2017)
17. Liu Nianfeng, T.T., Zhang, M., Li, H., Sun, Z.: DeepIris: learning pairwise filter bank for
heterogeneous iris verification. Pattern Recognit. Lett. 82, 154–161 (2016)
18. Shervin, M., Abdolrashidiy, A., Wang, Y.: An Experimental study of deep convolutional
features for iris recognition. In: 2016 IEEE Signal Processing in Medicine and Biology
Symposium (SPMB), p. 1 (2016)
19. Nguyen Kien, S.S., Fookes, C., Ross, A.: Iris recognition with off-the-shelf CNN features: a
deep learning perspective. IEEE Access 6, 18848–18855 (2018)
20. Al-Waisy, A.S., Qahwaji, R., Ipson, S., Al-Fahdawi, S., Nagem, T.A.: A multi-biometric iris
recognition system based on a deep learning approach. Pattern Anal. Appl. 21(3), 783–802
21. Gu, H., Gao, Z., Wu, F.: Selection of optimal features for iris recognition. In: International
Symposium on Neural Networks. Springer, Berlin, Heidelberg, pp. 81–86 (2005)
Iris Recognition Using Selective Feature Set …
22. Roy, K., Bhattacharya, P.: Optimal features subset selection and classification for iris
recognition. EURASIP J. Image Video Process, 1–20 (2008)
23. Rathgeb, C., Uhl, A., Wild, P.: On combining selective best bits of iris codes. In: European
Workshop on Biometrics and Identity Management. Springer, Berlin, Heidelberg, pp. 227–237
24. Miki Yuma, H.F., Muramatsu, C., Hayashi, T., Zhou, X., Hara, T., Katsumata, A.: Classification
of teeth in cone-beam CT using deep convolutional neural network. Comput. Biol. Med. 80,
24–29 (2017).
25. Rathgeb, C., Uhl A.: Adaptive fuzzy commitment scheme based on iris-code error analysis,
pp. 41–44. Department of Computer Sciences University of Salzburg, A-5020 Salzburg, Austria
26. Li, C., Zhou, W., Yuan, S.: Iris recognition based on a novel variation of local binary pattern.
Vis. Comput. Springer 31(10), 1419–1429 (2015)
27. Fei, H., Han, Y., Wang, H., Ji, J., Liu, Y., Ma, Z.: Deep learning architecture for iris recognition
based on optimal Gabor filters and deep belief network. J. Electron. Imaging 26(2), 023005
Enhancement of Mammogram Images
Using CLAHE and Bilateral Filter
M. Ravikumar, P. G. Rachana, B. J. Shivaprasad, and D. S. Guru
1 Introduction
The world of the human race is now facing serious health issues as an impact of
various factors such as unhealthy food habits, pollution, genetic disorder, and more.
Some of these health issues are not much severe yet some others are life threatening. Cancer is one such disease which is deadliest if not detected and treated the
earliest. Breast cancer is among different cancer types which are seen majorly in
women. Breast cancer is the second leading cause for deaths among women around
the world [1]. So, it is of main concern to detect it as early as possible, because
early detection and diagnosis increase the rate of survival. In that fact, it matters that
detection must be done properly at early stages. Mammography has been considered
as a reliable method for early detection of breast cancer, which uses low-energy
X-rays to examine the human breast for diagnosis and screening. Interpretation of
mammograms requires sophisticated image processing methods that enhance visual
interpretation and its efficacy depends on radiologists experience and knowledge.
Radiologists should be able to detect the subtle signs of cancer which is very challenging as the mammogram images have low contrast. Hence, breast images must be
M. Ravikumar · P. G. Rachana (B) · B. J. Shivaprasad
Department of Computer Science, Kuvempu University, Jnanasahyadri Shimoga, Karnataka, India
e-mail: pgrachana@gmail.com
M. Ravikumar
e-mail: ravi2142@yahoo.co.in
B. J. Shivaprasad
e-mail: shivaprasad1607@gmail.com
D. S. Guru
Department of Computer Science, University of Mysore, Manasagangothri, Mysore, Karnataka,
e-mail: dsguruji@yahoo.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
M. Ravikumar et al.
enhanced in order to improve the contrast and make abnormalities more visible and
easier to identify. Enhancement can be done in spatial domain or frequency domain.
Spatial domain enhancement techniques directly handle pixels, whereas frequency
domain enhancement techniques operate on the transformed coefficient of image.
There are various enhancement techniques in spatial domain such as image inversion
(IN), thresholding, contrast stretching, log transformation (LT), power law transform
(PLT), gray-level slicing (GLS), histogram equalization (HE), adaptive histogram
equalization (AHE), contrast limited adaptive histogram equalization (CLAHE),
conventional filters, and in frequency domain, there are different enhancements that
can be used are image smoothing filters such as Butterworth low-pass filter, Gaussian
low-pass filter and sharpening filters such as Butterworth high-pass filter, Gaussian
high-pass filter, bilateral filter, and notch filters.
Early breast cancer is subdivided into two major categories micro calcifications
and circumscribed, where microcalcifications are small deposits of calcium and they
appear as high-intensity bright spots in mammograms and circumscribed are tumors
formed when an healthy DNA is damaged resulting in unchecked growth of mutated
The rest of the paper is organized as follows: related work is discussed in Sect. 2,
Sect. 3 details the methodology and different enhancement techniques with their
output images, Sect. 4 discusses the results, and Sect. 5 concludes the paper.
2 Literature Review
Logarithmic transformation [2] is a nonlinear basic gray-level transformation function which maps a limited range of low gray values to a broader range of output
levels and thus enhancing the contrast levels and brightness of the image. Histogram
equalization [3–7] is a technique used to improve contrast in images by effectively
spreading out the most frequent intensity values, i.e., stretching out the intensity
range of an image. Contrast limited adaptive histogram equalization [4, 6, 7] used
widely as it enhances the contrast of medical images. Here, the histogram is cut
at some threshold and then equalization is applied, where the CLAHE applied on
small data portions called tiles instead on entire image. Power law transformation
also called gamma correction is an enhancement technique where different levels of
enhancement can be obtained for different values of gamma.
Median filtering [8] is a nonlinear digital filtering technique, which is applied
mainly to remove noise from an image. It is beneficial in conditions where preservation of edges is important while removing noise. Bilateral filtering [9] is a nonlinear,
edge preserving, and noise reducing smoothing filter for images. It replaces the intensity of every pixel with a weighted average of intensity values from nearby pixels.
It can also be used in blocking artifacts. Bilateral filtering avoids the introduction of
blur between objects while still removing noise in uniform areas. Butterworth filter
[10] is a type of signal processing filter designed to have as flat frequency response
as possible (no ripples) in the pass-band and zero roll off response in the stop-band.
Enhancement of Mammogram Images …
Gaussian filter [11] is considered as an isotropic filter with specific mathematical
properties. Further, it is very common in nature and it is used within different applications involving the image processing. It has a unique property of no overshoot to a
step function input while minimizing the rise and fall time. Gaussian filter is a linear
filter. It is usually used to blur an image or to reduce the noise. It reduces the effect
of image’s high frequency components, thus it is a low-pass filter.
It is observed that variants of histogram equalization such as AHE and CLAHE
are widely used in spatial domain image enhancement which increases the global
contrast without loss of any information and yet enhances the contrast. In frequency
domain, it is observed that Gaussian filter and variants of bilateral filters are used
more as they remove noise effectively and preserve edges [12–32].
3 Proposed Methodology
In this section, we discuss the proposed method for enhancement of breast images
and the block diagram of the proposed method is given in Fig. 1.
In image processing, enhancement plays a vital role. Its aim is to enhance the
input image which will be further helpful in segmentation and classification. For the
purpose of experimentation, mammogram images are obtained from MIAS database,
and these images are X-ray images which are given as input for enhancement; the
input images might not always be in the standard size so normalization is needed to
make every sample the same which helps in detailed visualization. After resizing,
enhancement techniques are applied on input image that are in spatial domain and
in frequency domain. Enhancement techniques will remove any noise, and further
enhances the image’s quality.
Firstly, the resized image is given as input to spatial domain enhancement techniques such as log transform, histogram equalization, adaptive histogram equalization, and CLAHE. Among these, CLAHE gives a better enhancement based on
Fig. 1 Block diagram of the proposed method
M. Ravikumar et al.
quantitative measures. It enhances the image’s contrast globally and without any loss
in data. It also reduces the problem of noise amplification which was found in AHE.
Then the output of CLAHE is given as input to the next stage as a pipe to frequency
domain enhancement technique bilateral filtering, which is better among different
frequency domain techniques such as Gaussian filtering, median filtering, and laplacian techniques. This technique not only smoothens the image but also preserves
the edges, thus enhancing the image’s quality. The quantitative measures used are
PSNR, entropy, SSIM, Michelson contrast, AMBE as parameters for image’s quality
enhancement. In the next section, different spatial and frequency domain techniques
are discussed and results are analyzed.
4 Result and Discussion
The superiority of the proposed method is discussed in this section; here, spatial
domain techniques and frequency domain techniques are discussed in detail with
their results. Finally, the proposed method results are given.
To measure the performance of various enhancement techniques, the image is
quantitatively measured using PSNR, AMBE, entropy, Michelson contrast, and
4.1 Enhancement Techniques for Spatial Domain
Spatial domain enhancement techniques operate directly on the pixels of an image.
There are different spatial domain techniques available such as log transform,
histogram equalization variants such as AHE and CLAHE, piecewise transform.
The mammogram image is given as input to the above-mentioned enhancement
techniques. The output image’s quality is measured on quantitative parameters.
Among these, CLAHE gives the better output. The results of gamma correction,
log transform, and different spatial domain approaches are given in the Figs. 2 and
3, respectively.
The results are tabulated and values are plotted on the different quantitative
measures are given in Table 1a Fig. 4 for gamma correction, Table 1b Fig. 5 for
log transform and Table 2a Fig. 6 for spatial domain approaches.
In the same way, enhancement can be performed for frequency domain. That is
described in the next section.
Enhancement of Mammogram Images …
Fig. 2 Different values of a gamma to enhance Mammogram image using Gamma correction,
b c to enhance mammogram image using log transform
Fig. 3 Results obtained from various spatial domain approaches. a Original image, b inversed
image, c log transformed image, d gamma = 0.4, e gamma = 1.6, f gamma = 2.2, g HE, h CLAHE
4.2 Enhancement Techniques for Frequency Domain
Frequency domain enhancement techniques operates on transform coefficient of an
image. There are different techniques available in frequency domain such as median
filter, Gaussian filter, laplacian and bilateral filter Fig. 7. Among these, bilateral
filtering gives the better output.
The results are tabulated on the different quantitative measures are given in Table
2b and values are plotted in the graphs, which are given in the Fig. 8.
We extract the best results obtained from spatial domain, i.e., CLAHE method
and also from frequency domain, i.e., Bilateral Filter method. Finally, we have given
CLAHE as a input to bilateral filter method. Then, the results are observed, which
described in the next section.
4.3 Combination of CLAHE and Bilateral Filter
In the proposed methodology, the output obtained from CLAHE is given as input to
bilateral filtering technique, and the output is measured quantitatively and the results
are given below.
M. Ravikumar et al.
Table 1 Comparison of different quantitative values of a Gamma, b log transform
Bold indicates the relatively best value among different values
Fig. 4 Different values of gamma a represent the results for entropy, MC and SSIM, b represent
the results for AMBE and PSNR
Enhancement of Mammogram Images …
Fig. 5 Different values of c a represent the results for entropy, MC, and SSIM, b represents the
results for PSNR and AMBE
Table 2 Comparison of different quantitative values of a different spatial domain methods,
b different frequency domain methods
Negative slicing
Log transform (LT)
Median blur
Gaussian blur
Low-pass filter
Bilateral filter
Bold indicates the relatively best value among different values
Comparative analysis of the proposed method with CLAHE and bilateral filter
taken in given in Table 3.
Above table defines the quantitative measures and their values for different
enhancement techniques. The performance of various enhancement techniques are
represented graphically in the Fig. 9.
M. Ravikumar et al.
Fig. 6 Different values of spatial methods a represent the results for entropy, MC, and SSIM,
b represents the results for AMBE and PSNR
Fig. 7 Results obtained from various frequency domain approaches. a Low-pass filter, b bilateral
filtering, c Gaussian LPF, d Laplacian, e median
Enhancement of Mammogram Images …
Fig. 8 Different values of frequency methods a represent the results for entropy, MC, PSNR, and
SSIM, b represents the results for PSNR and AMBE
Table 3 Comparative analysis
Bilateral filter
Bold indicates the relatively best value among different values
M. Ravikumar et al.
Fig. 9 Value of the proposed method a represents the results for entropy, MC, SSIM and AMBE,
b represent the results for PSNR
1. Fear, E.C., Meaney, P.M., Stuchly, M.A.: Microwaves for breast cancer detection. IEEE
Potentials 22, 12–18 (2003)
2. Maini, R., Aggarwal, H.: A comprehensive review of image enhancement technique. J. Comput.
2(3) (2010). ISSN 2151-9617
3. Saini, V., Gulati, T.: A comparative study on image enhancement using image fusion. Int. J.
Adv. Res. Comput. Sci. Softw. Eng. 2(10) (2012)
4. Pisano, E.D., Zong, S., Hemminger, B.M., DeLuca, M., Eugene Johnston, R., Muller, K.,
Patricia Braeuning, M., Pizer, S.M.: Contrast limited adaptive histogram equalization image
processing to improve the detection of simulated spiculations in dense mammograms. J. Digital
Imaging 11(4), 193–200 (1998)
5. Grundland, M., Dodgson, N.A.: Automatic contrast enhancement by histogram warping.
Computer Laboratory, University of Cambridge Cambridge, UK
6. Lu, L., Zhou, Y., Panetta, K., Agaian, S.: Comparative study of histogram equalization algorithms for image enhancement. In: InSPIEDefense, Security, and Sensing. International Society
for Optics and Photonics (2010)
7. Sivaramakrishna, R., Obuchowski, N.A., Chilcote, W.A., Cardenosa, G., Powell, K.A.:
Comparing the performance of mammographic enhancement algorithms: a preference study.
Am. J. Roentgenol. 175(1), 45–51 (2000)
8. Reza, A.M.: Realization of the contrast limited adaptive histogram equalization (CLAHE) for
real-time image enhancement. J VLSI Signal Process 38, 35–44 (2004)
9. Ko, S.-J.: Center weighted median filters and their applications to image enhancement. IEEE
Trans. Circ. Syst. 0098–4094, 984–993 (1991).
10. Gairola, A.C., Shah, O.H.: Design and implementation of low pass butterworth filter. IJCRT
6(2) (2018). 2320-2882
11. Seddik, H., Braiek, E.B.: Efficient noise removing based optimized smart dynamic gaussian
filter. Int. J. Comput. Appl. (0975–8887) 51(5) (2012)
12. Pisano, E.D., Cole, E.B., Hemminger, B.M., Yaffe, M.J., Aylward, S.R., Maidment, R., Johnston, E., et al.: Image processing algorithms for digital mammography: a pictorial essay 1.
Radiographics 20(5), 1479–1491 (2000)
Enhancement of Mammogram Images …
13. Pratt, W.K.: Digital Image Processing. Prentice Hall (1989)
14. Lee, J.S.: Digital image enhancement and noise filtering by use of local statistics. IEEE Trans.
Pattern Anal. Mach. Intell. PAMI-2, 165–168
15. Johnson, R.H., Nelson, A.C., Haralick, R.M., Goodsitt, M.M.: Optimal information retrieval
from complex low frequency backgrounds in medical images. In: 11th Annual Conference on
IEEE Engineering in Medicine and Biology Society, pp. 384, 385. IEEE (1989)
16. Raji, A., Thaibaoui, A., Petit, E., et al.: A gray-level transformation-based method for image
enhancement. Pattern Recogn. Lett. 19(13), 1207–1212 (1998)
17. Morrow, W.M., Paranjape, R.B., Region based contrast enhancement of mammograms. IEEE
Trans. Med. Imaging 11(3) (1992)
18. Gonzalez, R.C., Woods, R.E., Eddins, S.L.: Digital Image Processing Using MATLAB,
pp. 167–170, 478. Pearson Education (2007)
19. Xiao, F., Zhou, M., Geng, G.: Detail enhancement and noise reduction with color image
detection based on wavelet multi-scale, pp. 1061–1064 (2011)
20. Jaya, V.L., Gopikakumari, R.: IEM: a new image enhancement metric for contrast and sharpness
measurements. Int. J. Comput. Appl. 79(9) (2013)
21. Panetta, K., Samani, A., Agaian, S.: Choosing the optimal spatial domain mea- sure of
enhancement for mammogram images. Int. J. Biomed. Imaging 2014(Article ID 937849),
8 p (2014)
22. Ye, Z., Mohamadian, H., Pang, S.-S., Iyengar, S.: Image contrast enhancement and quantitative measuring of information flow. In: Proceedings of the 6th International Conference on
Information Security and Privacy (WSEAS’07), Tenerife, Spain, Dec 2007
23. Saleem, A., Beghdadi, A., Boashash, B.: Image fusion-based contrast enhancement. EURASIP
24. Ranota, H.K., Kaur, P.: Review and analysis of image enhancement techniques. Int. J. Inf.
Comput. Technol. 4(6) 583–590 (2014). ISSN 0974-2239 (International Research Publications
25. Mahajan, S., Dogra, R.: A review on image enhancement techniques. Int. J. Eng. Innov. Technol.
(IJEIT) 4(11) (2015)
26. Narnaware, S.K., Khedgaonkar, R.: A review on image enhancement using artificial neural
network and fuzzy logic. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 6(1) (2015)
27. Bedi, S.S., Khandelwal, R.: Various image enhancement techniques- a critical review. Int. J.
Adv. Res. Comput. Commun. Eng. 2(3) (2013)
28. Arun, R., Nair, M.S., Vrinthavani, R., Tatavarti. R.: An alpha rooting based hybrid technique
for image enhancement. Online Publication in IAENG, 24th August 2011.
29. Chaofu, Z., Li-ni, M., Lu-na, J.: Mixed frequency domain and spatial of enhancement algorithm
for infrared image. In: 2012 9th International Conference on Fuzzy Systems and Knowledge
Discovery (FSKD 2012)
30. Wang, C., Ye, Z.: Brightness preserving histogram equalization with maximum entropy: a
variational perspective. IEEE Trans. Consumer Electron. 51(4), 1326–1334 (2005)
31. Abdullah-Al-Wadud, M., Kabir, Md.H., Ali Akber Dewan, M., Chae, O.: A dynamic histogram
equalization for image contrast enhancement. IEEE Trans. Consumer Electron. 53(2), 593–600
32. Sundaram, M., Ramar, K., Arumugam, N., Prabin, G.: Histogram based contrast enhancement for mammogram images. In: 2011 International Conference on, Signal Processing,
Communication, Computing and Networking Technologies (ICSCCN), pp. 842–846. IEEE
33. Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: Proceedings of the
1998 IEEE International Conference on Computer Vision, Bombay, India
Supervised Cross-Database Transfer
Learning-Based Facial Expression
Arpita Gupta and Ramadoss Balakrishnan
1 Introduction
The facial expression conveys messages between humans playing a major role in
communication. Facial expression recognition (FER) is one of the most appropriate
fields in computer vision which could be applied in cognitive psychology, human
and computer interaction, computational neuroscience, and health care [1]. In the
research of recent years, deep learning has achieved useful results in automation of
FER. The main aim of FER is to train a model based on a facial expression dataset
so that the trained classifier can predict the expressions precisely [2]. FER has six
primary categories (sadness, happiness, fear, anger, surprise, and disgust) recognized
by Ekman and Friesen [3] universal in humans and neutral categories.
Most of the studies have considered features of the representation of facial expressions experimenting on the laboratory collected dataset. The datasets collected in
laboratories are posed expressions, aiming at consistency not available in real life.
There is some dataset collected in wild settings (SFEW [4], FER2013 [5, 6]). This
variation in the consistency of the datasets could be overcome by using deep networks
with pretraining. In such models, we use the pre-train method with a source dataset
of the same or different domain and then training on the target dataset needed for
Many researchers are working and studying different methods in the field of FER.
Most of the methods in FER either use the classical method of single training setting
of the models. Moreover, the training and testing of the models are performed on
the same [2]. One of the main issues with deep learning is the need to be trained on
A. Gupta (B) · R. Balakrishnan
Department of Computer Applications, National Institute of Technology, Tiruchirappalli, India
e-mail: arpitagupta2993@gmail.com
R. Balakrishnan
e-mail: brama@nitt.edu
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
A. Gupta and R. Balakrishnan
the labeled dataset, and collecting dataset is very costly [6]. There are studies using
convolution neural network but not many of the deep residual network employing
transfer learning [7]. This all leads to research in the models trained on crossdatabase using transfer learning solving the need for a large labeled dataset. In this
paper, we have experimented on deep residual networks of 50 and 152 layers which
are trained using double fine tuning on the models. The following paper is organized as follows:—Sect. 2 explains the studies related to transfer learning; Sect. 3
describes the proposed work of the paper. Section 4 explains the datasets used in the
experimentation; Sect. 5 explains the results achieved, followed by Sect. 6 conclusion.
2 Related Work
There are many studies proposed in the field of FER. There are many studies that have
used the same training and testing dataset, while very few using the cross-database
and double fine tuning studies in FER. These methods have shown performance with
high accuracy not applicable in tough conditions of real-life scenarios. In this section,
we have discussed some of the studies employing cross-database or transfer learning
technique. There are many techniques for FER, and one of them is appearancebased features, using local- or filter-based features. Another is based on the distance
between parts of the face to detect the expressions. Then there is hybrid in which
both feature-based and distance-based techniques are combined for FER.
There is not much existing work which has used transfer learning with multiple
fine tuning and cross-database settings. One of the existing models has used the cross
dataset in the applications of FER based on subspace transfer learning methos [8].
Another study has used a convolution and max-pooling layer for FER [9]. For feature,
transfer Gabor features and distance features were used for FER application [10].
For unsupervised cross-dataset settings [2] has proposed the super-wide regression
network to bridge the feature space. One more study used GAN for generating images
for unsupervised domain adaption another subdomain in transfer learning for FER
[1]. Deep learning has many methods and some of them used in FER are CNN [8,
11], Inception [12], AlexNet [13], VGG-CNN [14], and many more.
3 Proposed Work
Our proposed work is an experimentation of deep residual network. We have
pretrained our model on two different datasets. The deep residual network (ResNet)
won first prize in the image classification task in ILSVRC 2015 and was proposed
on ImageNet dataset. It consists of connections known as residual connections transferring the knowledge in the layers of the network, acting as skip connections
propagating the gradient through the layers, as shown in Fig. 1.
Supervised Cross-Database Transfer Learning-Based Facial …
Fig. 1 Residual block
We have experimented on two ResNet of 50 and 152 layers. The networks are
firstly pretrained on the larger annotated dataset (VGG and ImageNet) then the target
dataset. The source dataset is the dataset on which the pretraining is done, and the
target dataset is the dataset on which we want the task to perform in this case FER.
The target labels are seven emotions (sadness, happiness, fear, anger, surprise, and
In Fig. 2, the flow of the deep residual network is shown. Both the networks are
trained in the settings shown in Fig. 2. Figure 2 depicts the network architecture of
ResNet of 50 and 152 layers pretrained on ImageNet. We have used the cross-entropy
loss and activation function. The model has a learning rate of 0.1 and is compiled
with the SGD optimizer, and categorical cross-entropy loss is used. Some of the
layers are frozen to fine-tune the network. The models employ batch normalization
for making the network faster.
The networks pretrained only on ImageNet dataset performed poorly, not giving
any significant improvement, whereas model trained twice once has shown competitive results. Fine tuning could be done in two ways either by freezing the layers
or by adjusting the parameters to fit in the observation. This study shows how the
amount of data for training matter in deep learning. The networks are firstly trained
on ImageNet and then fine-tuned for VGG dataset, then finally trained on the target
dataset and fine-tuned. Both the networks ResNet 50 and 152 have shown significant
results. The final output layer is designed to give seven basic emotions labels as
output, and the networks are tested on 30 epochs.
4 Datasets
In this study, we have used three datasets: two for supervised pretraining (ImageNet
and VGG [15]) and the target dataset FER2013 [5, 16]. In this section, we have
discussed the characteristics of the dataset that plays a massive role in the performance
of the models.
A. Gupta and R. Balakrishnan
Fig. 2 Flow of the network
4.1 FER2013 Dataset
Introduced in ICML-2013 FER2013 is a labelled dataset with six basic emotions and
neutral. It contains 35,887 images which are wild in the collection. The dataset is
divided into three categories: original dataset (27,709), public test data (3589), and
final test data.
4.2 ImageNet and VGG Dataset
ImageNet [17] dataset is one of the pretraining datasets for our model that contains
14 million images in the labelled form of 20,000 categories. VGG dataset is our other
pretraining dataset which contains 2622 identities of images in labelled form and has
Supervised Cross-Database Transfer Learning-Based Facial …
proven to be enormous dataset for pretraining. These methods have shown that if the
model is trained on the larger labelled dataset, it could perform better.
We have performed double fine tuning and double pretraining on the network
never done by residual networks model until now using only the face data collected
in the wild. Fine tuning is the technique in which the pretrained model is made to
perform another similar task. In this paper, the model classifies the images done
in pretrained model, fine-tuned to perform emotion classification of images. This
setting has proved to be of a great outcome.
5 Results and Discussions
The study in this paper results achieved by experimenting on ResNet of 50 and 152
layers is shown in Table 1. We have used multiple pretraining and fine tuning of the
network on cross-datasets. Our model has proved that if the network is trained with
enough labelled datasets, the models can perform better. The reason we have used
this is, it solves the problem of vanishing gradient using skip connections. We also
tried the model with single training with ImageNet performing very poorly with the
low accuracy of 29%, which shows that the effect of amount labelled training data.
The ResNet-50 pretrained on ImageNet and VGG dataset has achieved the accuracy of 54.17%. The deeper model with 152 layers has achieved an accuracy of 55%.
The networks have outperformed the existing models. The models are evaluated on
30 epochs as after that, and there was no significant change in the performance. The
model has achieved better accuracy than the existing models.
Figure 3 shows the performance evaluation of the model. The accuracy achieved is
due to better learning as the model is pretrained twice for better emotion classification.
The above graph shows that our models based on ResNet using double pretraining
is superior to all the existing models on FER dataset when pretrained on VGG face
Table 1 Performance evaluation details
Source dataset
Target dataset Accuracy
AlexNet [12]
VGG-CNN [12]
CNN [6]
Mollahosseini [11]
6 Datasets
CNN + MMD [6]
ImageNet and VGG
DETN [6]
ResNet-50 + Double finetuning (Proposed)
ResNet-152 + Double finetuning (Proposed)
A. Gupta and R. Balakrishnan
Fig. 3 Performance evaluation graph
6 Conclusion
We have studied the effect of pretraining in the deep residual network, which has
outperformed the existing frameworks. The main aim to utilize pretraining and to
make the model learn and distinguish better by learning the features from larger
labelled and better quality pretraining datasets. Reason for using deep residual
networks is the advantage of solving the issue of vanishing gradients in because
of skip connections leading to better knowledge transfer. The network was double
fine-tuned for better classification of the network on a cross-database setting. We
have experimented on two deep residual networks one of 50 layers and the deeper
152 layers. Our model has achieved an accuracy of 54.17% and 55% higher than the
existing models showing the effect of multiple training and fine tuning.
1. Wang, X., Wang, X., Ni, Y.: Unsupervised domain adaptation for facial expression recognition
using generative adversarial networks. Comput. Intell. Neurosci. (2018)
2. Liu, N., Zhang, B., Zong, Y., Liu, L., Chen, J. Zhao, G., Zhu, L.: Super wide regression network
for unsupervised cross-database facial expression recognition. In: 2018 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1897–1901. IEEE
3. Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc.
Psychol. 17(2), 124 (1971)
4. Dhall, A., Goecke, R., Lucey, S., Gedeon, T.: Static facial expression analysis in tough conditions: data, evaluation protocol and benchmark. In: 2011 IEEE International Conference on
Computer Vision Workshops (ICCV Workshops), pp. 2106–2112. IEEE (2011)
5. Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., Cukierski,
W., et al.: Challenges in representation learning: a report on three machine learning contests.
In International Conference on Neural Information Processing, pp. 117–124. Springer, Berlin,
Heidelberg (2013)
Supervised Cross-Database Transfer Learning-Based Facial …
6. Li, S., Deng, W.: Deep emotion transfer network for cross-database facial expression recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3092–3099.
IEEE (2018)
7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778
8. Yan, H., Ang, M.H., Poo, A.N.: Cross-dataset facial expression recognition. In: 2011 IEEE
International Conference on Robotics and Automation, pp. 5985–5990. IEEE (2011)
9. Liu, M., Li, S., Shan, S., Chen, X.: Au-aware deep networks for facial expression recognition.
In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture
Recognition (FG), pp. 1–6. IEEE (2013)
10. Xu, M., Cheng, W., Zhao, Q., Ma, L., Xu, F.: Facial expression recognition based on transfer
learning from deep convolutional networks. In: 2015 11th International Conference on Natural
Computation (ICNC), pp. 702–708. IEEE (2015)
11. Devries, T., Biswaranjan, K., Taylor, G.W.: Multi-task learning of facial landmarks and expression. In: 2014 Canadian Conference on Computer and Robot Vision, pp. 98–103. IEEE
12. Mollahosseini, A., Chan, D., Mahoor, M.H.: Going deeper in facial expression recognition
using deep neural networks. In 2016 IEEE Winter Conference on Applications of Computer
Vision (WACV), pp. 1–10. IEEE (2016)
13. Ng, H.-W., Nguyen, V.D., Vonikakis, V., Winkler, S.: Deep learning for emotion recognition
on small datasets using transfer learning. In: Proceedings of the 2015 ACM on International
Conference on Multimodal Interaction, pp. 443–449 (2015)
14. Ouellet, S.: Real-time emotion recognition for gaming using deep convolutional network
features. arXiv:1408.3750 (2014)
15. Parkhi, O.M., et al.: Deep face recognition. In: British Machine Vision Association, pp. 1–12
16. Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., Cukierski, W.,
Tang, Y., Thaler, D., Lee, D.H., Zhou, Y.: Challenges in representation learning: a report on three
machine learning contests. In: International Conference on Neural Information Processing.
Springer, Berlin, Heidelberg, pp. 117–124 (2013)
17. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical
image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–
255 (2009)
Innovative Approach for Prediction
of Cancer Disease by Improving
Conventional Machine Learning
Hrithik Sanyal, Priyanka Saxena, and Rajneesh Agrawal
1 Introduction
Cancer has been a deadly disease which is curable when mostly it is diagnosed at the
right time (at the early stage). Prediction of cancer is always been by the symptoms
seen in the patients which are too many to process by manual systems which are not
only a very tedious process but are time-consuming too. Growth in technology has
made it fast as well as accurate. But, too many symptoms lead to big challenges for
the researchers. ML has been proven to be boon for prediction of diseases particularly
in cancer detection [1].
Since ML also has many different algorithms with different accuracy values,
therefore researchers are continuously working to improve accuracy further. The
data related with biomedical science is increasing day by day and requires them to
be included in the prediction process, and hence, ML algorithms are being modified
to provide high accuracy and efficiency with this increased set of data. Machine
learning not only processes the data for prediction but also helps in preprocessing of
data which may include noises and errors, i.e., it helps in cleaning the data before
H. Sanyal (B)
Department of Electronics & Telecommunications, Bharati Vidyapeeth College of Engineering,
Pune, India
e-mail: hrithiksanyal14@gmail.com
P. Saxena
Department of Computer Science, Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal, India
e-mail: anusaxena1218@gmail.com
R. Agrawal
Comp-Tel Consultancy, Mentor, Jabalpur, India
e-mail: rajneeshag@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
H. Sanyal et al.
Among the various diseases, cancer particularly breast cancer is very deadly and
reoccurs, causing the women to death which is as critical as lung cancer [2]. The
approach of ML requires various classifiers such as decision tree, Naïve Bayes,
support vector machine, logistic regression to classify the data in various ways before
applying prediction. It is performed in two steps, i.e., training and testing.
In this paper, a machine learning algorithm is being modified to enhance the
accuracy and predict the breast cancer efficiently using Wisconsin dataset.
Section 1 introduces the usage of machine learning in cancer diagnosis, Sect. 2
details about techniques of machine learning, Sect. 3 discusses the literature survey,
Sect. 4 enlists with small details about the classifiers of machine learning. In Sect. 5,
we have discussed on the Wisconsin dataset that will be used in the simulation, in
Sect. 6, proposed methodology has been discussed with diagrammatic representation,
and at the end, Sect. 7 will discuss the expected outcome of the proposed system as a
2 Machine Learning Techniques
Machine learning is considered a central part of artificial intelligence. It is a
framework which takes in information, discovers designs, trains itself utilizing the
information and yields a result.
Right off the bat, machines can work a lot quicker than people. A biopsy, for the
most part, takes Pathologist 10 days. A PC can do a prodigious number of biopsies
surprisingly fast.
Machines can accomplish something which people are not capable of. They can
rehash themselves a humongous number of times without getting depleted. After
every cycle, the machine rehashes the procedure to improve. People do it as well, we
call it practice. While the practice may make great, no measure of training can put a
human really near the computational speed of a PC. There is such a great amount of
information out on the planet that people cannot, in any way, shape or form experience
everything. That is the place machines help us. They can accomplish work quicker
than us and make precise calculations and discover designs in the information. That
is the reason they are called PCs.
AI includes PCs finding how they can perform assignments without being unequivocally modified to do as such. For further developed errands, it tends to be tiring for a
human to physically make the required calculations. Practically speaking, it can end
up being increasingly compelling to enable the machine to build up its calculation;
as opposed to having human software engineers indicate each required advance. In
situations where immense quantities of potential answers exist, one methodology is
to mark a portion of the right answers as legitimate. This would then be capable of
utilizing by preparing information for the PC to improve the algorithm(s) it uses to
decide the right answers. For instance, to prepare a framework for the undertaking
of advanced character acknowledgement, the MNIST dataset has regularly been
utilized (Fig. 1).
Innovative Approach for Prediction of Cancer Disease …
Machine Learning
Supervised Learning
Unsupervised Learning
Fig. 1 Machine learning techniques
ML is a subfield of artificial intelligence (AI). The machine learning understands
the structure of data and puts that data into new structural models that are understandable and useful for people. Machine learning uses two types of techniques. These
techniques are as follows:
2.1 Supervised Learning
Supervised learning means a kind of learning which trains a model on known input
and output data. This helps in predicting future outputs accurately. On taking and
learning from known inputs and outputs, it builds and trains a model that would make
predictions based on evidence in the ubiquity of uncertainty. Supervised learning is
mostly used for the prediction when the output of the data is known. Supervised
learning uses classification and regression techniques for building up a predictive
Classification technique is used to predict discrete responses, e.g., whether a
tumour is benign or malignant, whether an email is genuine or spam. This technique
categorizes the input data into different categories. It is most useful when we can
tag, categorize or separate data in classes or groups.
Regression techniques are used to predict continuous responses.
2.2 Unsupervised Learning
Unsupervised learning is a classification of ML which searches hidden patterns or
structures in data. It helps to make inferences from datasets which are consisting
of responses which are not tagged or labelled. Unsupervised learning mostly uses
clustering technique.
H. Sanyal et al.
Clustering It is the most used unsupervised learning technique. It is used for finding
hidden patterns or groupings in datasets and thus analyzing them.
3 Literature Review
Breast cancer is considered to be the most deadly type of cancer amongst the rest
of the cancers. Notwithstanding, being treatable and healable, if not diagnosed at
an early stage, a humongous number of people does not survive since the diagnosis
of the cancer is done at a very late stage when it becomes too late. An effective
way to classify data in medical fields and also in other fields is by using ML data
mining and classifiers, which helps to make important decisions by the methodology
of diagnosis.
The dataset which has been used is UCI’s Wisconsin dataset for breast cancer. The
ultimate objective is to pigeonhole data from both the algorithm and show results in
terms of preciseness. Our result concludes that decision tree classifier amongst all
the other classifiers gives higher precision [1].
Cancer is a dangerous kind of disease, which is driven by variation in cells inside
the body. Variation in cells is complemented with an exponential increase in malignant cell’s growth and control as well. Albeit dangerous, breast cancer is also a very
frequent type of cancer. Among all the diseases, cancer has been undoubtedly the
most, deadly disease. It occurs due to variation and mutation of infectious and malignant cells which spread quickly and infects surrounding cells as well. For increasing
the survival rate of patients, suffering from breast cancer, early detection of the
disease is very much required. Machine learning techniques help in the accurate and
probable diagnosis of cancer in patients. It makes intelligent systems, which learn
from the historical data and keep learning, from the recent predictions, to make the
decisions more accurate and precise [3].
AI is considered man-made consciousness that enjoys an assortment of factual,
probabilistic and improvement strategies that permits PCs to “learn” from earlier
models and to distinguish hard-to-perceive designs from prodigious, loud or complex
informational indexes. This capability is particularly proper to clinical applications,
especially those that depend upon complex proteomic and genomic estimations.
Hence, AI is a great part of the time used in threatening development examination
and revelation. AI is likewise assisting with improving our essential comprehension
of malignancy advancement and movement [2].
Chinese ladies are genuinely undermined by a bosom disease with high dreariness
and mortality. The absence of potent anticipation facsimiles brings about trouble for
specialists to set up a fitting treatment strategy that may draw out patient’s endurance
time [4].
Information mining is a basic part in learning revelation process where keen
specialists are consolidated for design extraction. During the time spent creating
information mining applications, the most testing and fascinating undertaking is
Innovative Approach for Prediction of Cancer Disease …
the ailment expectation. This paper will be useful for diagnosing precise ailment
by clinical experts and examiners, depicting different information mining methods.
Information mining applications in therapeutic administrations hold goliath potential
and accommodation. Anyway, the effectiveness of information mining procedures
on medicinal services space relies upon the accessibility of refined social insurance information. In our present examination, we talk about scarcely any classifier
methods utilized in clinical information investigation. Additionally, not many sicknesses forecast examinations like bosom malignant growth expectation, coronary
illness conclusion, thyroid expectation and diabetic are thought of. The outcome
shows that decision tree calculation suits well for infection expectation as it creates
better precision results [5].
4 Machine Learning Classifiers
Bosom malignancy is the most unmistakable infection in the region of clinical determination which is expanding each year. A relative investigation of three broadly
utilized AI methods is performed on Wisconsin breast cancer database (WBCD) to
anticipate the bosom disease repeat:
Multilayer perceptron (MLP),
Decision tree (C4.5),
Support vector machine (SVM),
K-closest neighbour (K-NN).
Naive Bayes.
Various classifiers are accessible in the business for the arrangement of tremendous
information through AI strategies. A list of some well-known classifiers that are used
in ML is as follows:
1. Bayes Network Classifier
2. Logistic Regression
3. Decision Tree
The Random Tree
The C 4.5 tree (J48)
The Decision Stump
The Random Forest.
5 Dataset Description
For the proposed work, the UCI’s Wisconsin dataset for Breast Cancer has been
used as it is quite popular amongst various Machine Learning implementations. The
Table 1 Description of the
Wisconsin datasets
H. Sanyal et al.
Attribute count Instance count Class count
Original data
Diagnosed data 32
Prognosis data
dataset was primarily used for recognizing and differentiating the malignant samples
from the benevolent samples.
Table 1 shows the different characteristic counting for different Wisconsin datasets
6 Proposed Work
This work proposes to create a decision tree-based cancer patient data processing
environment which will not only be faster but will also provide high accuracy. The
system will leverage the facility of a multi-threaded system in which two different
decision trees shall be created having a different set of attributes processed in parallel.
The results obtained from both the threads shall be combined to get the final results
(Fig. 2).
The proposed system will be executed in the following steps:
1. Identification and separation of attributes for making a decision tree
2. Generation of threads for implementation of multiple decision trees
3. Combining multiple decision trees for getting the final results (Fig. 3).
This approach will have the following time complexities:
1. Separation Complexity
O(z 1 ) = O(k/r ) ∗ r
where k is the count of attributes, r is the count of “dt”.
2. Processing Complexity
O(z 2 ) = O(h ∗ r2 )
where “h” is the scope of the training data, r is the count of “dt”.
3. Combination Complexity
O(z 3 ) = r
where r is the count of “dt”.
4. Overall complexity
Innovative Approach for Prediction of Cancer Disease …
Set - 1
Set - 2
Decision Tree - 1
Decision Tree - 2
Accuracy - 1
Accuracy - 2
Final Accuracy
Fig. 2 System flow of the proposed work
O(h) = O(z 1 ) + O(z 2 ) + O(z 3 )
O(h) = O( p/r ) ∗ r + O(h ∗ r2 ) + O(r )
Comparison of the simple decision tree and the modified decision tree
O(sdt) = O(h ∗ p2 ) [6]
O(mdt) = O( p/r ) ∗ r + O(h ∗ r2 ) + O(r )
From the above two equations, it is clear that the O(mdt) < < O(sdt) when.
r > 1 as the complexity of a simple decision tree will be two high if the number
of attributes are too high.
7 Conclusion and Future Work
This paper is proposing to provide a better decision tree algorithm which will not only
have high performance but will also have high accuracy. Paper delivers a meticulous
review of ML, its techniques and need for the industry and enhancement of artificial
intelligence. Further, the studies of the earlier researches have been presented which
H. Sanyal et al.
Input Dataset
Evaluate Aributes
Apply Division of Aributes
Start One Process for Each Set of
Dataset - 1
Dataset - 2
Apply Gini Index
Apply Gini Index
Calculate Accuracy
Calculate Accuracy
Combine Step
Display Final Accuracy
Fig. 3 Flowchart of the complete proposed system
clearly explains that the focus of the research is on having a better solution from
the existing classifiers in different scenarios. A comparative complexity calculation
of the simple decision tree algorithm & proposed modified decision tree algorithm
shows the enhanced time complexity, and implementation will show how it will
provide comparative accuracy.
This work can be further enhanced by applying other mechanisms of separation
of the attributes for building multiple decision trees. It can also be further enhanced
and tested on real-time data for high performance and accuracies both.
Innovative Approach for Prediction of Cancer Disease …
1. Sanyal, H., Agrawal, R.: Latest trends in machine learning & deep learning techniques and their
applications. Int. Res. Anal. J. 14(1), 348–353 (2018)
2. Singh, S.N., Thakral, S.: Using data mining tools for breast cancer prediction and analysis. In:
2018 4th International Conference on Computing Communication and Automation (ICCCA),
Greater Noida, India, 2018, pp. 1–4. https://doi.org/10.1109/CCAA.2018.8777713
3. Su, J., Zhang, H.: A fast decision tree learning algorithm. In: Proceedings of the 21st national
conference on Artificial intelligence, vol. 1 (AAAI’06), pp. 500–505. AAAI Press
4. Gupta, M., Gupta, B.: A comparative study of breast cancer diagnosis using supervised machine
learning techniques. In: 2018 Second International Conference on Computing Methodologies
and Communication (ICCMC), Erode, 2018, pp. 997–1002. https://doi.org/10.1109/ICCMC.
5. Fu, B., Liu, P., Lin, J., Deng, L., Hu, K., Zheng, H.: Predicting invasive disease-free survival
for early stage breast cancer patients using follow-up clinical data. IEEE Trans. Biomed. Eng.
66(7), 2053–2064 (2019). https://doi.org/10.1109/TBME.2018.2882867
6. Angra, S., Ahuja, S.: Machine learning and its application. In: International Conference on Big
Data Analytics and Computational Intelligence (ICBDAC) (2017)
7. Deepika, M., Kalaiselvi, K.: A empirical study on disease diagnosis using data mining techniques. In: 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), Coimbatore, 2018, pp. 615–620. https://doi.org/10.1109/ICI
Influence of AI on Detection
of COVID-19
Pallavi Malik and A. Mukherjee
1 Introduction
The aspect of digital clinical diagnosis raises concern in terms of certainty
and completeness of medical knowledge. These aspects were duly addressed by
researchers that culminated in better diagnostic features in cognizing the contagious
diseases [1–6]. Due to the onset of COVID-19 pandemic, medical research worldwide has undergone rapid changes not only in terms of treatment protocols but also in
terms of diagnosis. The introduction of computer-based diagnosis has been gradual
due to social and technological reasons. However, researchers as early as 1980 introduced the concept of artificial intelligence in medical practice. Casimir [7] explored
the idea of artificial intelligence methods for medical consultation. Some of the
proposed knowledge-based systems were EMYCIN, EXPERT and AGE. Giger et al.
[8] suggested pattern classification techniques to detect and characterize images of
different patients. A low-power EEG data acquisition was also proposed by Verma
et al. [9]. Due to large data available, it became imperative to adopt data analysis
which would be effective in determination of ailments. Exploration in this context
leads to hybridized data mining techniques to reduce the gaps arising from analysis of large data. Researchers introduced the concept of ensemble classifiers for
diagnosis of different types of tuberculosis which subsequently leads to improved
results. However, the main obstacle in prevention of wide usage of machine learning
in medical diagnosis is lack of training data. The solution to this would be to collect
a varied set of heterogenous data. Sorensan et al. and Marleen addressed the aspect
of weak labels and suggested multiple instance classification to arrest the drawback
P. Malik (B) · A. Mukherjee
University of Engineerring and Management Jaipur, Jaipur, India
e-mail: pallavi_malikk@rediffmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
P. Malik and A. Mukherjee
The pharmaceutical and biotechnology organizations worldwide are working in
tandem with the governments to check the outspread of the COVID-19 pandemic.
This also includes the issues pertaining to the maintenance of the global supply
chain management and possible invention of the vaccine. Research work carried
out by Gundlapally et.al [13] classified that proteins, particularly viral membrane
and those involved in the replication of the genetic material, are the best targets for
vaccine and anti-viral drug. It was further explored that by disrupting this batch of
proteins, the growth of virus may be curtailed largely.
Keeping in view of the research work done so far on diagnosis of contagious
diseases, an attempt has been made to re-visit the dimensions of clinical diagnosis
with known model and training the machine with the available dataset as obtained
from IEEE data port. The basic objective for carrying out this research is to clinically
diagnose the patients for COVID-19 accurately and with minimal physical contact.
1.1 Model Used for Prediction
The pandemic has opened up new frontiers of challenges in clinical diagnosis. It is
true that machine learning can expedite the development of pharmaceutical drugs,
but there are certain areas that need to be cautiously addressed. There are many
hindrances pertaining to limitation of data available as extracting data from such a
pandemic is quite challenging. Apart from this, the need to integrate the data into
machine learning models and accessibility of the data is also an uphill task. Under
these circumstances, certain areas of worth mentioning for the datasets available are
as listed below [14–16]:
(1) Most of the datasets available are based on the protein structure and their molecular interaction with chemical compounds. This is important to facilitate arrival
of new pharmaceutical drugs.
(2) Forecast the rate of rise of infection rates and subsequent spread of the disease
so as to allow the healthcare system to be well prepared beforehand.
(3) Diagnose medical images available of the population for the infection.
(4) Mining of the data available on social media to gauge the public perception and
the subsequent spread of disease.
Of all the above components, the research work attempted in this paper falls
under category (3) as mentioned above. The images obtained from the data available
at IEEE data port have been taken into consideration.
The algorithm used for the purpose has been depicted as hereunder:
(a) X-ray image sets are obtained from the dataset available.
(b) The images are processed based on their Hounsfield unit (HU) values.
This defines the attenuation coefficient of a particular tissue with respect to
attenuation coefficient of water.
Influence of AI on Detection of COVID-19
Fig. 1 Training of the model using TensorFlow in Python 3.6
(c) Image segmentation is done so as to sub-divide the image in various segments
so as to aid in detection of the infection. Convolution neural network (CNN) is
used for the purpose.
(d) The images are then classified for each candidate region. Image classification
process is performed by TensorFlow which is an open-source programming
library in Python.
(e) Finally, the infection probability is predicted using noisy or Bayesian function.
The Bayesian function is used for quantifying the uncertainty of the model.
For the purpose of training twenty-five images of pneumonia patient X-rays,
twenty-five images of COVID-19-positive X-rays and twenty -five X-ray images of
healthy patients are considered to train the model. The code used for training is as
displayed below in Fig. 1.
The output results can be observed from the figure with 100% confidence level
for the images with respect to patients infected with COVID-19. A sample of one
such image of patients infected with the COVID-19 positive is shown in Fig. 2.
It clearly shows the white patches in the pulmonary segment of the image, thus
indicating severe pulmonary congestion. The sample size chosen is small compared
to the dataset which is over 500 just to train the model with the set of data. The epoch
chosen for this was 200 for a batch size of 25 (Fig. 3).
Thus, radiologists can inspect the set of images from the X-ray of different patients
suspected with pulmonary infection and allow the code to detect the possibility of
COVID-19 infection.
P. Malik and A. Mukherjee
Fig. 2 X-ray image of a sample for COVID-19 infected case
2 Conclusions
From the study done so far, it can be inferred that artificial intelligence may be used
for detection of contagious diseases with utmost accuracy. The dataset available from
IEEE dataset was quite helpful and extensive in this regard. This method of using
AI to detect the possibility of COVID-19 infection has reduced the possible human
error in diagnosing the viral infection. In a country like India, community testing on
a massive scale is relatively difficult given the time span required for the test for such
contagious disease. Such measures may be adopted, wherein the scanned images of
pulmonary portion may be used for detection of possible spread of the virus using
AI. This shall not only reduce the dangers of medical practitioners getting infected
while treating the patient but also shall help in error-free detection of the disease.
However, the sample size needs to be sufficient enough to reduce the possibility of
error. Further, it can also be understood that the different models are available and
it may be a possible study to explore the advantages and disadvantages of different
models in the process of prediction of spread and detection of the ailment.
Influence of AI on Detection of COVID-19
Fig. 3 Output in console with the evaluation time
1. Elstein, A.S., Shulman, L.S., Sprafka, S.A.: Medical problem solving: An analysis of clinical
reasoning. Harvard University. Press, Cambridge, MA (1978)
2. Feinstein, A.R.: Clinical judgment. Williams & Wilkens, Baltimore, MD (1967)
3. Komaroff, A. L.: The variability and inaccuracy of medical data. Proceedings IEEE, vol. 67,
no. 9, pp. 1196–1207 (1979)
4. Newell, A., Simon, H.A.: Human problem solving. Prentice-Hall, Engelwood Cliffs, NJ (1972)
5. Nii, H. P., Aiello, N.: AGE (attempt to generalize): A knowledge-based program for building
knowledge based programs. In: Proceedings 6th International Joint Conference Artificial
Intelligence, Tokyo, pp. 645–655 (1979)
6. Nilsson, N.: Principles of Artificial Intelligence. Tioga, Palo Alto, CA (1980)
7. Casimir, K.: Artificial intelligence methods and systems for medical consultation. IEEE Trans.
Pattern Anal. Mach. Intell. 2(5), 464–476 (1980)
8. Giger, M. L.: Computer-aided diagnosis of breast lesions in medical images. In: Computing in
Science and Engineering, vol. 2, no. 5, pp. 39–45, Sept–Oct 2000
9. Verma, N., Shoeb, A., Bohorquez, J., Dawson, J., Guttag, J., Chandrakasan, A.P.: A micropower EEG acquisition SoC with integrated feature extraction processor for a chronic seizure
detection system. IEEE J. Solid-State Circuits 45(4), 804–816 (2010)
10. Sorensen, L., Nielsen, M., Lo, P., Ashraf, H., Pedersen, J.H., de Bruijne, M.: Texture-based
analysis of COPD: A data-driven approach. IEEE Trans. Med. Imaging 31(1), 70–78 (2012)
11. Merleen de Bruijne: Machine learning approaches in medical image analysis : From detection
to diagnosis, Elsevier Medical Image Analysis, vol. 33, pp. 94–97 (2016)
12. Shi, F., et. al.: Review of artificial intelligence techniques in imaging data acquisition
segmentation and diagnosis for COVID 19. IEEE Rev. Biomed. Eng. 1–12 (2020)
13. Gundlapally, J., Kumar, A., Kashyap, A., Saxena, A.K., Sanyal, A.: In search of novel
coronavirus 19 therapeutic targets. Helix 10(02), 01–08 (2020)
P. Malik and A. Mukherjee
14. Yuan, J., Liao, H., Luo, R., Luo, J.: Automatic radiology report generation based on multi-view
image fusion and medical concept enrichment. In: International Conference on Medical Image
Computing and Computer-Assisted Intervention, pp. 721–729 (2019)
15. Wang, X., Peng, Y., Lu, L., Lu, Z., Summers, R. M.: TieNet: Text-image embedding network
for common thorax disease classification and reporting in chest X-rays. In: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 9049–9058 (2018)
16. Hao, J., Kim, Y., Mallavarapu, T., Oh, J.H., Kang, M.: Interpretable deep neural network for
cancer survival analysis by integrating genomic and clinical data. BMC Med. Genomics 12,
1–13 (2019)
Study of Medicine Dispensing Machine
and Health Monitoring Devices
Aditi Sanjay Bhosale, Swapnil Sanjay Jadhav, Hemangi Sunil Ahire,
Avinash Yuvraj Jaybhay, and K. Rajeswari
1 Introduction
It is observed that people in rural areas face some different health issues as compared
to people living in towns and cities. The chronic disease rate in individuals has
increased in rural areas as compared to people in urban areas. In agriculture, there
is a lot of use of chemical pesticides which are used on a large scale for farming but
they are harmful for individuals which may lead to cancer and other severe diseases.
In rural areas, people prefer home remedies instead of actually visiting a doctor
for minor diseases. Home remedies might be good, but sometimes it is difficult to
classify the cause of disease and home remedies might fail. So, providing medicine
dispersing machines to dispense medicine which are prescribed by doctors where
the medicines are dispensed by considering different symptoms.
The medicine disperser machine is an embedded system which contains hardware
and software for its working [1]. Hardware mainly contains IoT-based sensors or
wearable devices which are used to measure the body temperature and body pulse
[2]. In current years, a number of solutions are available for primary health care, but in
rural areas, less facility is available so in emergency time people have to go to nearby
A. S. Bhosale (B) · S. S. Jadhav · H. S. Ahire · A. Y. Jaybhay · K. Rajeswari
Department of Computer Engineering, PCCOE, Pune, India
e-mail: bhosaleaditi01@gmail.com
S. S. Jadhav
e-mail: swapnil.j0207@gmail.com
H. S. Ahire
e-mail: hemangi.sa601@gmail.com
A. Y. Jaybhay
e-mail: avinashjaybhay1919@gmail.com
K. Rajeswari
e-mail: kannan.rajeswari@pccoepune.org
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
A. S. Bhosale et al.
cities for required medicine. So, this is not preferable every time. Sometimes, in
difficult situations, it is not possible. Centre for Development of Advance Computing
(CDAC) are actively working on products in which they can bring out the status of
healthcare management and check how many PHC are available in rural areas and
they also provide mobile based solutions in which they can enable web technologies
through GSM module [3]. Utilizing IOT wearable gadgets estimating the internal
heat level and others like heartbeat rate and so on through sensors and distantly
observed by a specialist. Doctors monitor the physiological parameters of patients
local as well as remote monitoring through sensors, and these input data will be
uploaded to the server and sent to the computer or mobile for the referral of the
doctor [2]. For the prediction of disease, there is a number of algorithms which will
be provided by machine learning. Naïve Bayes, decision tree and J48 algorithms are
used to predict multiple diseases at the same time using the pattern or same relations
between them. The Algorithm must be check first whether they are providing the
correct accuracy or not if it proves to be wrong then the outcome may affect the
human life. So, using data mining and visualization techniques for extracting the
data, and getting the Patient medical history and current disease. So, that data can be
visualised from 2D/3D graph techniques [4].
Improved technology and its application in the healthcare sector will play an
important role in urban as well as in rural areas. There are many software’s to provide
the medical market. The software provides decision making in the field of medicine,
education and training in the medical field. That software can provide how to take care
of health, and online guidelines can be provided by doctors. Additionally analyzed
illness online and gave medication. There are many software packages which provide
home delivery services of medicines [5]. Today’s population can grow rapidly, so
providing healthcare devices is too difficult. The patients can be monitored remotely
and few cases which are in ICU and might be serious at that time Doctors will check
the medical history of patient and give the exact report of patient to Doctor through
email or some other media [6]. In rural areas, many people or families are illiterate so
they do not know how to use apps; sometimes, this is also one of the difficult situations
they can be facing during emergency times. And due to these circumstances, they visit
big and expensive hospitals which are not at all cost effective for lower assets rural
individuals. To control this situation of spreading the disease and reduce the growing
rates of mortality due to a smaller number of facilities, special treatment needs to be
given to health care in rural areas. There are many companies that will be working on
that problem and improving the healthcare medicine delivery system or machine in
rural areas. As per survey for disease prediction, many machine learning algorithms
are used or comparatively big data are also used in the prediction of the disease. In
that some structured and unstructured algorithms like CNN unstructured algorithms
are used over big data [7]. As the hospital’s dataset contains lots of information,
but the only need is to find appropriate data from that latent dataset. Datasets are
used to analyse drug details which are bought by patients and that information aims
to predict the disease they are probably suffering. The parameters included in that
dataset are age, gender, name and prescribed medicine of each patient. For predicting
Study of Medicine Dispensing Machine …
the disease using data mining, different techniques and results illustrate into stacking
method accuracy which is compared to other techniques [8].
2 Literature Review
Due to unavailability of healthcare centres and sometimes healthcare personnel in
remote areas, it becomes difficult for people to get treatment for diseases which they
are facing. For this purpose, the following literature survey has been carried out by
Desai et al. [1] that described a prototype which is a vending machine which dispenses
medicines as per prescription given by doctor. This vending machine is controlled by
microcontroller and provides an online portal which can be accessible for patients
and doctors. Poonam Kumari et al. [2] give an IoT-based approach on monitoring
and transmitting health parameters using sensors like temperature sensor, heart rate
sensor, GSR monitor and ECG through wireless medium; this data is taken from
patient and uploaded on a server and sent to doctor. Ramana Murthy [3] describes
the increasing market of mobile devices in urban as well as in rural areas, and these
mobile devices would be helpful in management of primary healthcare by considering
mobile web technology which will make it transparent and easily accessible. Dar,
K. H [9] As per survey, compared the Private Medical Practitioners and Primary
Health Care, it defines that there is high treatment cost in Private Hospitals so people
prefer Primary health care. Leoni Sharmila et al. [10] give description of few machine
learning techniques in classifying liver dataset and specify that machine learning can
be used for early detection, analysis and prediction of disease. Gurbetal et al. [11]
use artificial neural networks and Bayesian networks for classification of diseases
like diabetes and cardiovascular, Vitabile et al. [12] discuss data collection, fusion,
models and technologies for medical data processing and analysis as well as big
medical data analytics for remote health monitoring using concepts of IoT for sensing
data such as heart rate, electrocardiogram, body temperature, respiratory rate, chest
sounds or blood pressure to monitor patients psychological and health conditions.
Utekar and Umale [6] present an automated IoT-based healthcare system for remotely
located patients which helps doctors by alerting via email if abnormal conditions
are observed in patients by monitoring parameters using sensors like temperature
and heart beat for real-time monitoring. Penna et al. [13] implement an automatic
medicine dispensing machine which can be used in remote area; it stores medicines
and dispenses it according to patients’ condition; and also provides testing of basic
human parameters, blood pressure and temperature. Kimbahune and Pande [14]
specify role of Information Communication Technology in Primary Health Centre
which will help people in villages and tribes by studying the needs of logistic problem
in rural or tribal population and considering existing Primary Health Centre.
Kunjir et al. [4] give insight about classifying and predicting specific disease with
healthcare data; naïve Bayes algorithm is used for prediction; it specifies that data
mining system can be used to avoid wrong clinical decisions; data mining methods
like naïve Bayes and J48 algorithm are compared for their accuracy and performance.
A. S. Bhosale et al.
Dehkordi and Sajedi [8] use techniques of data mining to find pattern in a dataset
which was provided by a research centre in Tehran; it predicts which patient has
referred to which physician to which disease patient are suffering; different data
mining techniques were compared like k-nearest neighbour, decision tree, naïve
Bayes and used stacking classifier which was considered to have better accuracy
than that of single classifiers. Caban et al. [15] give a simple and robust classification
technique that can be used to automatically identify prescription drugs by considering
colour, shape and imprint on the pills to avoid medication error using modified shape
distribution technique for the system. Lee and Ventola [5] describe advantages of
using mobile and mobile application by healthcare Professionals for quick decisions
with less error rate and accessibility in real time. Chen et al. [7] highlighted machine
learning algorithms for better prediction of chronic disease and also proposed a new
convolutional neural network-based multimodal disease risk prediction algorithm
using data from hospital of Central China from 2013 to 2015 and gave accuracy of
94.8% with a convergence speed which is faster than that of the CNN-based unimodal
disease risk prediction algorithm [16]. The paper gives information about big data
predictive analytics for heart disease using machine learning techniques in which
naive Bayes algorithm is used. SVM with sequential minimization optimization
learning algorithm is considered good for medical disease diagnosis application in
paper [17] in which India-centric dataset is used for heart disease diagnosis. Paper
[18] have proposed a medicine dispensing machine in which microcontroller like
Raspberry Pi and Arduino are used; here, Raspberry Pi is used to control a image
processing module which authenticates the amount which is paid, and Arduino is used
for controlling the dispense of medicine and payment module. Logical regression
model is used in [19] with help of machine learning algorithms like decision tree,
random forest and naïve Bayes; here symptoms from users are taken and disease is
predicted accordingly. Paper [20] have compared various algorithms for prediction
of heart disease; in conclusion, hyper-parameter optimization gives better accuracy
than algorithms like KNN, SVM, naïve Bayes, RANDOM FOREST. In [21], it is
mentioned that CNN-UDRP only uses structured data, but in CNN-MDRP, structured
as well as\unstructured data can be used; the accuracy of disease prediction is more
and fast with CNN-UDRP.
3 Methodology
(1) Naive Bayes: In this paper [16], the heart disease prediction is done and for that
naive Bayes algorithm is used; as it gives the highest accuracy and also it is based
on some probabilistic logic where it is capable of working successfully with the
health-related specification, naive Bayes algorithm uses the Bayes theorem where
Bayes theorem is a form of mathematical probabilistic technique where calculating
the probability of event is performed. The mathematical form of Bayes Theorem
Fig. 1.
Study of Medicine Dispensing Machine …
Fig. 1 Flow diagram for database and operations
p(c/x) = p(x/c) p(c)/ p(x)
p(c/x) is the posterior probability of class (c, target) given predictor (x, Ai ), where
Ai = {Ai , Ai , …, Ai }.
is the prior probability of class p(yes/no).
p(x/c) is the likelihood which is probability of predictor given class
is the prior probability of predictor.
(2) SVM is the frontier which best segregates the two classes; SVM is used
for classification and regression analysis as this algorithm falls under supervised
learning. It performs the risk minimization of structure.
The equation of hyper-plane is mentioned below Fig. 2.
wT x + b = 0
where x = input vector, w = adjustable weight vector, and b = bias (Fig. 3).
wT xi + b ≥ 0
where b ≥ 0, yi = +1 and b < 0, yi = −1.
Paper [1] comprises a vending machine to dispense drugs as per a doctor’s
prescription and online portal for generating e-prescription.
Doctor’s portal is accessible to doctors from where they can upload prescription
of medicine for patients, and the dispensing machine will dispense medicine from
the vending machine as per mentioned in the prescription.
In [8], stacking method is mainly used for prediction of diseases. It combines
several machine learning algorithms into one model to improve the efficiency of
A. S. Bhosale et al.
Fig. 2 Block diagram for dispensing machine
Fig. 3 Comparison of the two algorithms with respect to accuracy, precision, recall, F 1 -measure
Stacking method uses two main approaches:
1. Data Collection
The dataset contains all the information of patients. It collects data like prescriptions
of medicine given by doctors. The stacking method considers different doctors for
the accuracy purpose. Some of the doctors may give different medicines for the same
disease, and hence, it considers all the approaches.
Study of Medicine Dispensing Machine …
The purpose was to predict which physician a particular patient has referred. Also,
to predict which type of disease that patient is suffering. Using the dataset, it will
predict the basic diseases like cold, fever, chill, poisoning. Hence, data is collected
to determine the name of disease and the type of doctor
2. Modelling
. The dataset contains a large number of attributes regarding the number of instances.
But, it is difficult for prediction to consider all these attributes, and it may reduce
the accuracy. Hence, to overcome the number of parameter, principle components
analysis was used; it is a dimension reduction method which reduces the number of
Penna et al. [13] used sensors to get health data (particularly for sprain, fever, BP,
headache) and trigger specific compartments, with the help of Arduino and Stepper
motor which dispenses medicines from that compartment.
In [14], input from the patient is taken using sensors like temperature and heart beat
sensor which are controlled by Arduino, It is a controlling unit, and it triggers stepper
motor which dispenses medicine to the patients. The Internet of Things technology is
used to measure the physical parameters of the body. As per the measured parameters,
the disease is calculated major or minor. The outcome depends on the category of
diseases and accordingly the medicine will be dispense out of the machine. So, this is
a 24 h available machine for rural areas people as per their conditions. Also, patient
history as per survey is stored in the cloud or data mining techniques so the doctor
can easily access through the Internet or any other media.
In [4], the machine learning disease prediction algorithm can give less accuracy
when the data set provided is incomplete; so, some unique characteristics of disease
are provided which may reduce the outbreak of this issue.
The prediction can be done by several other algorithms, but use of CNN-MDRP
is done as it gives the highest accuracy of about 94.8%, and the main aim is put
over the prediction of diseases. The hospital dataset of the last 2 years was used for
studying, and through these, they have more focus on the risk prediction.
4 Conclusion
Conclusion of this paper describes the medical dispensing system which would help
the patient in the rural area. As mostly in the rural area, there is unavailability of
the primary healthcare, so these systems can be used to overcome such problems.
These systems are easy to use and can be handled by each individual and gives
the most accurate result. The direction of this paper is to focus on the description
of primary health care in the rural area which will provide them easy access to
facilities. The issues which have been mentioned above will create problems; so, to
improve the medication in the rural area, the vending machine can be used which will
accept the symptoms from the patient and then provide prescribed medicine to them.
IoT-based health monitoring systems are used to get the real-time data of patients
A. S. Bhosale et al.
who might be in remote areas and accordingly provide them the proper medication.
Some more features can be added to the system like blood pressure detection, weight
checking system and some helpful guidelines which will be required to a patient
in case of any inconvenience. As well, machine learning defines an important role
in the classification and prediction of diseases. There are different machine learning
prediction algorithms, and also IoT helps to analyse patients remotely. So, the overall
system can be used to provide a proper medication to the patient.
1. Desai, P., Pattnaik, B., Aditya, T.S., Rajaraman, K., Dey, S., Aarthy, M.: All Time Medicine
and Health Device. Vellore Institute of Technology Vellore, India
2. Neha1, Kumari2, P., Kang3, H.P.S.: Smart Health Monitoring System. UCIM/SAIF/CIL,
Panjab University, Chandigarh, India
3. Ramana Murthy, M.V.: Mobile based Primary Health Care System for Rural India. In:
Mobile Computing and Wireless Networks. CDAC, Electronics City, Bangalore, 560100,
4. Kunjir, A., Sawant, H., Shaikh, N.F.: Data mining and visualization for prediction of multiple
diseases in healthcare. Modern Education Society College of Engineering, Pune
5. Lee, C., Ventola, M.S.: Mobile devices and apps for healthcare professionals: uses and benefits
6. Utekar, R.G., Umale, J.S.: Automated IoT Based Healthcare System for Monitoring of
Remotely Located Patients. Department of Computer Engineering, Pimpri Chinchwad College
of Engineering, Pune 411044
7. Chen, M., Hao, Y., Hwang, K., Fellow, IEEE, Wang, L., Wang*, L.: Disease prediction by
machine learning over big data from healthcare communities
8. Dehkordi*, S.K., Sajedi*, H.: A prescription-based automatic medical diagnosis system using
a stacking method. Department of Mathematics, Statistics and Computer Science, College of
Science, University of Tehran, Tehran, Iran
9. Dar, K.H., Junior Research Fellow: Utilization of the services of the primary health centres in
India—An empirical study. Department of Economics, Central University Jammu, India. C.
Öğretmenoğlu Fiçici, O. Eroğul
10. LeoniSharmila1, S., Dharuman2, C., Venkatesan3, P.: Disease classification using machine
learning algorithms—A comparative study. Ramapuram Campus, SRM University, Chennai,
600089, India
11. Gurbetal2, L., Badnjevic, A.: 61,2,3 I Yeslab Ltd. Sarajevo2: Machine learning techniques for
classification of diabetes and cardiovascular disease. International Burch University, Sarajevo
3, Technical faculty Bihac, University of Bihac, Berina Ali6 Genetics and Bioengineering
International Burch University Sarajevo, Bosnia and Herzegovina
12. Vitabile1( B) , S., Marks2 , M., Stojanovic3 , D., Pllana4 , S., Molina5 , J.M.: Medical Data
Processing and Analysis for Remote Health and Activities Monitoring
13. Penna, M., Gowda, D.V., Shivashankar, J.J.J.: Design and Implementation of Automatic
Medicine Dispensing Machine. Department of ECE, Sri Venkateswara College of Engineering,
14. Kimbahune, S., Pande, A.: mHEALTH-PHC: a community informatic tool for primary
healthcare in India. TCS Innovation Labs, Tata Consultancy Services Mumbai, India
15. Caban†‡, J.J., Rosebrock, A., Yoo‡†, J.S.: Automatic identification of prescription drugs using
shape distribution models. National Intrepid Center of Excellence (NICoE), Naval Medical
Center University of Maryland, UMBC ‡ National Institutes of Health, Bethesda, MD
16. Venkatesh, R., Balasubramanian, C., Kaliappan, M.: Development of big data predictive
analytics model for disease prediction using machine learning technique
Study of Medicine Dispensing Machine …
17. Ghumbre1, S.U., Ghatol2, A.A.: Heart Disease Diagnosis Using Machine Learning Algorithm
1. Computer Engineering Department, College of Engineering Pune, Pune, Maharashtra, India
2 Dr. B.A.T.University, Lonere, Maharashtra
18. Tank, V., Assistant Professor, Warrier, S., Jakhiya, N.: Medicine Dispensing Machine
Using Raspberry Pi and Arduino Controller. Department of Electronics and Communication
Engineering, CHARUSAT, Anand, India
19. Sharma, S., Parmar, M.: Heart Diseases Prediction Using Deep Learning Neural Network
20. Shabaz Ali, N., Divya, G.: Prediction of Diseases in Smart Health Care System using Machine
21. Shirsath, S.S., Prof. Patil, S.: Disease Prediction Using Machine Learning Over Big Data.
Department of Computer Engineering, SITS, Lonavala, India
Building Image Classification Using CNN
Prasenjit Saha, Utpal Kumar Nath, Jadumani Bhardawaj, Saurin Paul,
and Gagarina Nath
1 Introduction
Artificial intelligence (AI) is a well-organized alternative technique to classical modeling techniques. Until the 90s, only traditional machine learning approaches were
used to classify image [1]. But accuracy and scope of the classification task were
surrounded by several challenges such as hand-crafted feature extraction process. In
the past years, the deep neural network (DNN), also we can call as deep learning
, finds composite formation in large datasets using the backpropagation algorithm
[2]. In deep learning, convolutional neural network has achieved very good result
in task like computer vision, especially for image classification. Hubel and Wiesel
discovered that animal visual cortex cells detect light in the small receptive field.
Motivated from this work, in 1980, Kunihiko Fukushima introduced neocognitron
which is a multi-layered neural network able to recognizing visual pattern hierarchically through learning. This network is observed as a theoretical innovation for CNN.
To improve the performance, we can collect large datasets, learn more powerful models and use good techniques to stop overfitting [3]. Convolutional neural networks
have a lot of success for image classification [4]. The theoretical inspiration for CNN
is a multi-layered neural network capable of recognizing visual pattern hierarchically
with the learning. CNN is most widely used in image classification because of its
high accuracy in prediction, since it can predict without any pre-determined features,
where other algorithms failed to achieve.
P. Saha (B) · U. K. Nath · J. Bhardawaj · S. Paul · G. Nath
Assam Engineering College, Jalukbari, Guwahati, India
e-mail: prasenj1ps@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
P. Saha et al.
2 Methodology
2.1 Definition and Working Principle
A CNN is formation of single or multiple blocks of convolution and sub-sampling
layers, after that one or more fully connected layers and an output layer [5]. The
convolutional layers generate feature maps with linear convolutional filters, right
next to activation functions.
2.2 Definition and Working Principle
Convolutional layer is the prime part for a CNN. So, a characteristic learnt in one
region can contest homogeneous design in another region. For a huge image, we take
a tiny portion and move it in between all the points in the huge image (input). While
processing through any part, we twist them into one spot (output). For respective tiny
part of the image where it moves on top of the large image is called riddle (Kernel).
The riddle is later designed based on the backpropagation technique.
2.3 Sub-Sampling or Pooling Layer
Pooling is a type of process which examine an image. It extracts small area of the
convolutional output as input and sub-samples it to produce a single output. Different
types of pooling method are there such as max pooling, mean pooling and average
pooling. Max pooling grab enormous pixel values of a region as shown in Fig. 3.
Pooling minimizes the number of variable to be determined but makes the system
unceasing to translations in shape, size and scale.
2.4 Fully Connected Layer
Fully connected layer is mainly the last part of CNN. This layer takes input from all
neurons in the previous layer and performs operation with individual neuron in the
current layer to generate output.
2.5 Optimizer
Here, adam optimizer is used. Adam a stochastic optimization requires gradients with
the first order also a little bit of memory [6]. Adam method calculates individual
Building Image Classification Using CNN
Fig. 1 Block diagram of the
CNN model
adaptive learning rates for various parameters from gradients which estimates the
first and second moments of it. Dropout is basically used to reduce overfitting. The
word dropout relates to dropping out units in a neural network [7] (Fig. 1).
3 Design Steps
In our model, we have divided our dataset into two groups. One is train and other
is test. The training part consists of 70% image, and test part contains 30% of our
dataset. After that, we have done various data preprocessing like resize, greyscaling,
P. Saha et al.
etc. Then, we have built our CNN model which has several layers. In our model,
when we input our images, it will automatically get the best features of the image,
and by using it, it will classify our test image Input data: Here, we have used around
1000 images. All the images are collected from the nearby areas.
4 Results
4.1 Classification
Classification is done on the three sets of data, where it can classify school building
from the rest of the others with atleast 73% accuracy. Figures 2, 3 and 4 show the
results obtained on the three sets of data for the classification of buildings.
Fig. 2 Classified dataset 1
Fig. 3 Classified dataset 2
Building Image Classification Using CNN
Fig. 4 Classified dataset 3
4.2 Accuracy Versus Training Step
Graph is obtained to show the change in accuracy with the training step, both for the
validation and for the prediction. It shows that with increase in training step, there is
increase in accuracy both for validation and prediction step (Figs. 5 and 6).
4.3 Loss Versus Training Step
Graph is obtained to show the change in loss with the training step, both for the
validation and for the prediction. It shows that with increase in training step, there is
a decrease in loss both for validation and prediction steps.
Fig. 5 Accuracy versus training step
P. Saha et al.
Fig. 6 Loss versus training step
Acknowledgment This work is a part of the project funded by ASTU, CRSAEC20, TEQIP-III
1. Dr. Singh, P.: Application of emerging artificial intelligence methods in structural engineering—
A review. IRJET 05(11) (2018). e-ISSN 2395-0056
2. Sultana, F., Sufian, A., Dutta, P.: Advancements in image classification using convolutional neural network. In: 2018 Fourth International Conference on Research in Computational Intelligence
and Communication Networks (ICRCICN)
3. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional. In:
Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information
Processing Systems, vol. 25, pp. 1097–1105. Curran Associates, Inc. http://papers.nips.cc/paper/
4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf (2012)
4. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
5. Lin, M., Chen, Q., Yan, S.: Network in network. In: 4th 2014. arXiv:1312.4400 [cs.NE]
6. Kingma, D.P., Ba, L.J.: Adam: a method for stochastic optimization. In: International Conference
on Learning Representations, 7–9, 2015, San Diego
7. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple
way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Analysis of COVID-19 Pandemic
and Lockdown Effects on the National
Stock Exchange NIFTY Indices
Ranjani Murali
1 Introduction
The shock effects due to COVID-19 pandemic outbreak had been felt in many
economies, and the effects are expected to persist. The resilience in various sectors
of the economy can also be observed in equity segments in addition to various fiscal
parameters. India being the fifth-largest economy by GDP and 60% of which is
contributed by domestic consumption. In sector-wise GDP figures, the service sector
contributes nearly 60%, industry at 23%, and agriculture at 15.4%. India has one of
the largest workforces of around 520 million. The COVID-19 outbreak has resulted in
complete lockdown on March 24, 2020, and consequent enforcement measures from
the government when the number of cases was around 500, to prevent the spread,
considering India’s mammoth population. The first lockdown of 21 days brought
nearly all the sectors to a grinding halt. The continued outbreak necessitated further
phases of lockdown 2 to 5 till June 30. Each of the phases had gradual relaxation of
economic activities.
The imposed lockdown resulted in several sequence of events such as returning of
migrant laborers to their home states, economic stimulus package provision, travel
restrictions, and use of work from home concept by many companies which further
would impact the economy. Several assistance measures like Shramic train provision,
food supply chain, free rations, direct account transfer, and relief packages to the
economically poor strata of the country were also taken to minimize the effects. These
effects of lockdown measures and the increase in cases are continuously impacting
various sectors of the economy. Stock markets being traded on a daily basis would
be one of the sensitive indicators to analyze the correlations and trends [1]. The
BSE Sensex and NIFTY 50 are the two main stock indices used in Indian equity
R. Murali (B)
Department of Computer Science, University of Toronto, Toronto, Canada
e-mail: rmurali@cs.toronto.edu
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
R. Murali
markets. NIFTY 50 covers around 17 sectors and represents the weighted average
of 50 stocks. In the past, NIFTY index has shown sensitivity to major economic and
social events like Demonetization [2], Brexit, Subprime Mortgage crisis in US, etc.
[3]. This work leverages NIFTY index to analyze the effects of COVID-19 pandemic
and the consequent lockdown events in various sectors of the Indian economy.
2 Literature Survey
Stock market prediction and analysis has been widely reported in literature leveraging
on both unique datasets and novel techniques. The work in [4] reviews relevant work
in financial time series forecasting using deep learning and has grouped the work in
categories of Convolutional neural networks, Deep Belief Networks, and Long ShortTerm Memory. The work helps in gaining insights to the efficiency and applicability
of these architectures for the financial domain providing a comparative performance
analysis. In [2], Demonetization event has been analyzed on the Indian stock markets
using GARCH modeling on the NIFTY index for impact assessment and future
policy decisions. The work in [5] proposes a novel deep learning architecture of
combining Long Short-Term Memory (LSTM) and paragraph vector for financial
time series forecasting which has the capacity to process both numerical and textual
information that converts newspaper articles into distributed representations with
extraction of temporal effects on opening prices. The work in [6] predicts the trend
of the stock instead of the actual values by using adversarial training to make the
neural network more robust and general to overcome the over fitting problem. The
work in [7] proposes deep convolutional neural network architecture to model the
long- and short-term influences of events on the S&P500 stock price movements
and proves comparable to traditional baseline methods by effectively using the event
based stimulus dynamics analysis.
The work in [4] reviews relevant work in financial time series forecasting using
deep learning and has grouped the work in categories of Convolutional neural
networks, Deep Belief Networks, and Long Short-Term Memory. The work helps in
gaining insights to the efficiency and applicability of these architectures for domain
providing a comparative performance analysis. In [8], a three-stage method is used
to obtain the most effective feature set with best risk return prediction capacity and is
identified by filter and function-based clustering. The selected set is then evaluated
by re-prediction of risk and return. In previous work [9], direct prediction of closing
price is attempted using a combined architecture of a Generative Adversarial Network
with a Multi-layer perceptron as the discriminator and the Long Short-Term Memory
as the generator. The work in [10] leverages regression architecture with adversarial
feature changes at test time for accurate prediction. The work studies adversarial
linear regression with multiple learners and approximates the resulting parameters
by an upper bound on the loss function with resulting unique equilibrium. In [11],
impact of COVID-19 on consumption baskets are examined for inflation by analyzing
the credit and debit data to calculate the Consumer Price Index of US. Regression
Analysis of COVID-19 Pandemic and Lockdown Effects …
models have also proven effective in stock market prediction [1] where ARIMA
modeling was used for short-term forecasting during COVID-19. Event-based and
dynamic feature changes have also been effectively handled by adversarial regression
modeling in [12]. This reported research work attempts to analyze the NIFTY trends
based on both discrete events like economic stimulus, lockdown, and independent
variables like COVID-19 cases.
3 Data Analysis Methodology
3.1 Data
The data from [3] including indices NIFTY 50 and sector-wise indices data have
been taken from December 31, 2019, to June 8, 2020, for analyzing the effects of
lockdown and the COVID-19 pandemic.
The total number of COVID-19 positive cases and total deaths on each day were
taken for representing the pandemic parameters [13]. The lockdown and its effects
were represented by dataset created from news articles as attributes representing
economic stimulus given, migrant laborers available, transport provision, and the
stringency of lockdown measures.
3.2 Analysis Methodology
The collected data was analyzed for the trends influenced by lockdown and COVID19 events through four modules.
Data Preprocessing
The first module performs data preprocessing by missing value replacement, feature
extraction, and standardization. The missing values were replaced with last available
temporal data points. Features of lockdown and derived attributes were obtained
in this module. Lockdown phase parameter was represented via enumeration while
economic stimulus and migrant laborer availability parameters were normalized [13].
Transport availability (T allowed ) parameter was obtained by using zone demarcation
data and represented as a percentage compared to pre-lockdown conditions
Tallowed = Rdist∗ Rallowed+ Odist∗ Oallowed+ Gdist∗ Gallowed
During lockdown, the districts were divided by Red(Rdist ), Orange(Odist ),
and Green(Gdist ) zones, each of which were allowed a certain percentage
(Rallowed ,Oallowed ,Gallowed ) of public transport vehicles during different phases (Fig. 1).
R. Murali
Data Preprocess
Trend Analysis
Correlaon Analysis
MulLinear Regression
Model Predicon
Fig. 1 Analysis methodology
Trend Analysis
NIFTY indices were plotted from December 31, 2019, to observe the effects of
COVID-19 pandemic and the consequent lockdown events.
The sensitivity of NIFTY 50 index observed with respect to each lockdown
announcement is illustrated in Table 1. Figure 2 gives the time trend of NIFTY
indices where a steep fall is observed congruous to announcement of the Janata
curfew event of lockdown.
Correlation Analysis
Pearson correlation values of the NIFTY stock indices with the lockdown events and
the COVID-19 pandemic parameters are illustrated by Table 2. Sector-wise analysis has been carried out to understand the broad influence of COVID-19 pandemic
during its incipient stages and lockdown. In case of public sector banks, pharmaceuticals, realty, service sectors, commodities sectors, and government securities started
a suggestive trend well before the lockdown measures indicating the sensitivity for
the pandemic elsewhere in the world.
Certain sectors like government bonds and pharmaceuticals show positive correlation with the increase in number of COVID-19 positive cases indicating investor
sentiment. Sectors like energy, consumption, oil and gas, and FMCG were least correlated with the COVID-19 parameters. Financial sectors and realty showed negative
correlation and were affected by the increase in severity of the pandemic.
Table 1 Variation of NIFTY
50 indices with lockdown
Lockdown phases
Change in NIFTY 50 index
Janata curfew on 22 March
Phase 1: 25 March 2020 to 14
April 2020
Phase 2: 15 April 2020 to 3
May 2020
Phase 3: 4 May 2020 to 17 May −566.4
Phase 4: 18 May 2020 to 31
May 2020
Phase 5: 1 June 2020 to 30 June 152.95
Analysis of COVID-19 Pandemic and Lockdown Effects …
Fig. 2 NIFTY indices before COVID-19 pandemic to June 8, 2020 [3]
Multi-Linear Regression
The attributes of lockdown were used to derive a relation with individual NIFTY
indices to fit a linear regression model.
Yi,t = αi + βi,1 LDt + βi,2 ESt + βi,3 Tt + βi,4 MLt
+ βi,5 Ct + βi,6 Dt + βi,7 NCt + βi,8 NDt + ei,t .
Yi,t NIFTY indices.
βi, n Coefficients.
LD—Lockdown phase.
ES—Economic stimulus.
T—Transport availability.
ML—Migrant Laborers Percentage availability.
C—Number of Total Cases.
D—Number of total COVID-19 deaths.
ND—Number of new COVID-19 deaths.
NC—Number of new COVID-19 cases.
R. Murali
Table 2 Correlation of NIFTY indices with lockdown events
Equity indices Total cases
Total death
– 0.48
– 0.50
– 0.52
– 0.78
– 0.25
– 0.26
– 0.30
– 0.57
– 0.13
– 0.14
– 0.20
– 0.41
– 0.47
– 0.49
– 0.51
– 0.77
– 0.10
– 0.12
– 0.18
– 0.33
NIFTY 4–8 yr 0.74
– 0.84
– 0.83
NIFTY 15 yr 0.67
and abv G-Sec
– 0.70
– 0.68
– 0.15
– 0.17
– 0.22
– 0.48
– 0.12
– 0.14
– 0.18
– 0.45
– 0.18
– 0.20
– 0.23
– 0.52
NIFTY Media – 0.36
– 0.39
– 0.40
– 0.70
– 0.04
– 0.05
– 0.10
– 0.28
NIFTY Realty – 0.48
– 0.50
– 0.51
– 0.79
– 0.40
– 0.43
– 0.45
– 0.72
– 0.31
– 0.33
– 0.37
– 0.63
– 0.82
– 0.79
– 0.77
Table 4 illustrates R2 values, and Table 3 describes the multi-linear regression
model coefficients for lockdown and COVID-19 parameters which have relatively
higher correlation or R2 value when compared to the actual values.
The R2 values of the models are in line with the relative rank obtained by the
Pearson correlation values. NIFTY PSU, NIFTY Composite G-sec, NIFTY Financial
Services, NIFTY 4–8 yr G-sec, NIFTY SERV SECTOR, NIFTY Realty, and NIFTY
Media models have relatively higher correlation with the actual values and hence
can be best estimated with the lockdown and COVID-19 parameters. NIFTY Oil &
Analysis of COVID-19 Pandemic and Lockdown Effects …
Table 3 Linear regression model parameters
NIFTY Realty
− 0.38
− 13.54
− 3.45
− 4.92
New death
− 0.01
− 825.01
Total death
New case
− 0.38
Total cases
− 0.30
− 0.70
− 0.22
Table 4 Correlation of
NIFTY indices with
lockdown events
Equity indices
NIFTY 4–8 yr
− 0.68
NIFTY Commodities
NIFTY Composite G-sec
NIFTY Energy
NIFTY Financial Services
NIFTY 4–8 yr G-sec
NIFTY 15 yr and abv G-Sec
NIFTY Consumption
NIFTY Oil & Gas
NIFTY Pharma
NIFTY Realty
R. Murali
Gas, NIFTY FMCG, and NIFTY Energy have low correlation indicating that they
are not affected by the events of COVID-19 pandemic.
Figure 3 illustrates the actual NIFTY trend and the linear regression models’
predicted values for various indices. The model is able to accommodate the shock
effects in the initial phases and progressively fits closer to the actual trend for later
phases 3 and 4 of the lockdown for all the indices.
Fig. 3 Predicted and actual NIFTY indices
Analysis of COVID-19 Pandemic and Lockdown Effects …
4 Conclusion
COVID-19 pandemic and the consequent lockdowns have impacted the NIFTY
indices. Short- and long-term government securities and NIFTY Pharma indicate
higher positive correlation with increase in the pandemic severity. NIFTY SERV
SECTOR, NIFTY Realty, and NIFTY PSU Bank show negative correlation. Investor
sentiment is turning toward stable government securities and pharma sector due to
volatility in other sectors. The discreet and independent variables as taken in the
multi-linear regression models are successful in capturing the trends in the stock
indices. Since the pandemic is still progressing, further extensive modeling of data
would provide more clear insights and prediction.
1. Ahmar, A.S., Val, E.B.: Sutte ARIMA: Short-term forecasting method, a case: Covid-19 and
stock market in Spain: Science of the Total Environment, vol.729 (2020)
2. Patil, A., Narayan, P., Reddy, Y.V.: Analyzing the impact of demonetization on the Indian
stock market: Sectoral evidence using GARCH Model. Australasian Accounting, Business
and Finance J. 12, 104–116 (2018). https://doi.org/10.14453/aabfj.v12i2.7
3. https://www1.nseindia.com/products/content/equities/indices/historical_index_data.html
4. Sezer, O. B., et al.: Financial Time Series Forecasting with Deep learning: A Systematic
Literature Review:2005–2019, preprint:arxiv:1911.13288v1 (2019)
5. Akita, R., et al.: Deep Learning for Stock Prediction using Numeral and Textual Information:
ICIS (2016)
6. Feng, F., et al.: Enhancing Stock Movement Prediction with Adversarial Training: International
Joint Conference on Artificial Intelligence, pp. 5843–5849 (2019)
7. Ding, X., et al.: Deep learning for Even-Driven stock prediction, International Joint Conference
on Artificial Intelligence (2015)
8. Barak, S., Modarres, M.: Developing an approach to evaluate stocks by forecasting effective
features with data mining methods: Expert Systems with Applications. Elsevier 42, 1325–1339
9. Zhang, K., et al.: Stock Market Prediction Based on Generative Adversarial Network
Heidelberg: Procedia Computer, Elsevier, vol. 147, pp. 400–406 (2019)
10. Liang, T., et al.: Adversarial Regression with Multiple Learners: International Conference on
Machine Learning (2018)
11. Cavallo, A.: Inflation with COVID Consumption Baskets. https://ssrn.com/abstract=3622512
12. Tong, L., et al.: Adversarial Regression with Multiple Learners: International Conference on
Machine Learning (2018)
13. World Health Organization, Coronavirus disease (COVID-2019) situation reports. https://www.
who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports (2020)
COVID-19 Detection Using Computer
Vision and Deep Convolution Neural
V. Gokul Pillai and Lekshmi R. Chandran
1 Introduction
The breakout of novel coronavirus disease (COVID-19) has spread widely across
many populations around the globe, and it is spread from human to human contagious transmittable pneumonia generated by the SARS-COV-2 which has caused a
pandemic situation all over the globe by now. Studies done by WHO states that 16–
21% of people with the virus have caused severe respiratory symptoms with a 4–5%
mortality rate. The possibility of spreading the virus from infected to non-infected
and immune population is about 3.77% [1]. Hence, it is highly necessary to detect
the infected individuals early, and to undergo quarantine and treatment procedures
to prevent a community spread. Polymerase chain reaction laboratory test consider
as a confirmation test for COVID-19 [2], but in our research, we are showing the
capability of CT image for predicting whether patients are affected with corona or not
and also by taking the body temperature of the patients will support our prediction.
The disease can be distinguished by the presence of lung ground-glass blurriness
or opacities in the early stages, accompanied by “paving made of irregular pieces”
and increasing consolidation. These findings led to the increase in CT scans in China,
mainly in the Hubei province, eventually, becoming an efficient diagnostic tool.
Artificial Intelligence (AI) is added in medical imaging deep learning system
which has been developed for identifying and extracting the image features, including
shape and spatial relational features of the image, so that it can give assistance to
V. Gokul Pillai (B) · L. R. Chandran
Department of Electrical and Electronics Engineering, Amrita School of Engineering, Amrita
Vishwa Vidyapeetham, Amritapuri, India
e-mail: gokulpillai05@gmail.com
L. R. Chandran
e-mail: lekshmichandran@am.amrita.edu
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
V. Gokul Pillai and L. R. Chandran
the doctor for taking a wise decision. The convolutional neural network was initially
designed to perform deep learning task, and they use the concept of a “convolution,”
a sliding window or “filter” that passes over the array of input, identifying essential
features and analyzing them one at a time then reducing them down to their essential
characteristics and repeating the process until we get the final product. Nowadays,
CNN is used for enhancing low-light images from videos with very a smaller number
of training data. There are numerous techniques to identify viral pathogens based
on imaging patterns, which are correlated with their particular pathological process
[7, 8]. COVID-19 is characterized by bilateral distribution of dappled profile and
ground-glass blurriness, as shown in selected areas in Fig. 2. In this research work,
our deep learning model was able to classify the CT images of the patient with
COVID-19 positive or not with an accuracy of 90% using deep convolutional neural
network. These observations elucidate the use of the deep learning method to identify
and extract radiological graphical features from CT images for better and efficient
COVID-19 diagnosis.
2 Background
Chest radiographs are the most habitual examination articles in the radiology for
identifying the problem in a patient [9], but now using the chest radiographs for
predicting whether the individual is having COVID-19 or not. SVM, CNN, and
other machine learning algorithms are used for classifying object in medical field
and non-medical fields [10–12]. These classification methods along with computer
vision can now solve pixel classification issues and improve appearance of objects
in biomedical images [13–15]. Deep learning techniques had already solved many
detection problems in identifying the various pathology types by using chest X-ray,
but the best performance was achieved using CNN. Also using X-ray, it is able to
identify broken bones in our body, and it can be achieved through deep learning [16,
17]. Mitosis detection in breast cancer performed by computer vision and the deep
convolutional method reduces the error and improves results. A large number of
RBG images were fed into the deep convolutional network for getting an efficiency
of 71.8% to identify the spot correctly [10].
Convolutional neural networks along with computer vision are used to differentiate between benign and malignant skin lesions in skin cancer. Melanoma classification, seborrheic keratosis classification using CNN and pathogenesis of viral
pneumonia using CT feature extraction was studied [18, 19]. Even Machine learning
with the support of other tools can used for detection of various diseases [20, 21].
Chest imaging using computer tomography (CT) and X-ray are considered as
potential screening tools due to their high sensitivity and expediency. COVID-19
prediction studies are going on with X-ray and CT images using classification
methods such as SVM, transfer learning, and other deep learning techniques [22–28].
In the proposed method, the features are extracted using CNN, classified using K
means, and evaluated using fourfold cross-validation on CT images of COVID-19.
COVID-19 Detection Using Computer Vision and Deep Convolution Neural Network
3 Methodology of COVID-19 Detection Using CNN
The main purpose of our research is detection of COVID 19 and classification of them
into normal, low-risk, and high-risk category. The proposed CNN based COVID-19
detection is shown as in Fig. 1. For our research, the architecture comprised of
following processing steps and classified CT images is shown in Fig. 2.
(a) CT image collection: A pool of CT images of chest having Corona infection
and normal is collected [29].
(b) Feature extraction and training: Feature extraction is done by localizing the lung
region in the CT image and used for CNN training.
(c) CNN model prediction.
3.1 CNN Model and COVID-19 Detection
The architecture of M-inception CNN model for COVID detection is shown in Fig. 1.
In this, we are using CT image which is fed into the two-layer convolutional network,
the convolutional layer would extracting the important features present in the input
vector and analyzing them one at a time then reducing them down to their essential
characteristics After the second max pooling function, we had updated the inception
model by adding dropout of 0.5 in the last two dense classification layers. By adding
our modification, it will prevent it from overfitting. The whole network carries two
major functions, initially the pre-trained inception network is used to convert images
into 1D features vector, the secondary part involves in the main role of prediction
whether the patient is having COVID-19 or not using K means clustering. The entire
part is fully connected network, and it will classify given test CT images with multiple
3.2 Feature Extraction
Feature extraction is done by localizing the lung region of interest in the chest CT
images, so that we can reduce the computation complexity. Reference CT images
showing the various cases are shown in Fig. 2. From the images, based on the features
characterized by infection, the region of interest (ROI) is identified and extracted from
CT images. An ROI is sized as 234*234 pixels. In this, 300 ROI from 503 CT images
with COVID-19 positive patient which include both high- and low-risk level patients
and 300 ROIs from 503 CT images with COVID-19 negative patients are selected.
This helps for building a transfer learning neural network based on inception network.
Fig. 1 M-Inception CNN model for COVID detection
V. Gokul Pillai and L. R. Chandran
COVID-19 Detection Using Computer Vision and Deep Convolution Neural Network
Fig. 2 CT images of CORONA positive and negative
3.3 Model Training
The model is iterated for 400 epochs. A total of 500 ROIs had been used to train
the model, and 100 ROI was extracted for validation. Training is carried out by
updating the normal inception network and adjusting the updated network with our
pre-trained weights. The original inception part was not trained during the training
period. By adding our modification to the original model, we were able to reduce the
error caused by overfitting.
In our research, to identify prominent manifestations of the Coronavirus disease,
we use unsupervised K means feature clustering. The K mean clustering was
performed where the optimal number of clusters was found using the elbow method;
we had clustered the predicted data into three clusters as shown in Fig. 4. By extracting
the features from the training data and flattened it, it is classified into COVID-19 positive high risk and low risk or COVID-19 negative based on the input CT image. On
analyzing the model, we got an accuracy of 93.4% in training data, and it shown in
Fig. 3.
The K means CT image clustering patterns into normal, low risk, and high risk
are as shown in Fig. 4.
3.4 Prediction and Performance Evaluation
After extracting the features from the training set by CNN, the last step is to classify the images based on the features which has been extracted. By activating the
V. Gokul Pillai and L. R. Chandran
Fig. 3 Testing accuracy of CNN model
Fig. 4 K means clustering of Corona cases
classifiers, classification accuracy of our model was improved. In this research, the
incoming data, i.e., the test data, is classified or decision is taken by comparing with
previous CT images. By conducting the test on data which is total unseen, our model
is capable of classifying the CT image with an accuracy of 90.03% as shown in Fig. 5.
COVID-19 Detection Using Computer Vision and Deep Convolution Neural Network
Fig. 5 Accuracy comparison in testing and prediction
Table 1 K fold validation of
the model
Accuracy (%)
The cross-validation is used to test the effectiveness of the CNN model for COVID19 prediction. In the K fold validation (k = 4) where the entire testing data is divided
into four parts, each part is tested, and its accuracy for this study is given as in Table
4 Conclusion
COVID-19 created a pandemic situation of all around the globe taking millions of
lives. So, to diagnose it, we have conducted a research to show the importance of
computer vision and convolutional neural network in medical field for predicting
whether the patients is having COVID-19 or not, and we got an accuracy of 90.4%
using our CNN model in classifying the test CT images into positive and negative
COVID-19. The effectiveness in prediction of the model can be made more accurate
by massive training of different sort of related CT images of lung infections. This
V. Gokul Pillai and L. R. Chandran
diagnosis along with temperature of the patient can be checked for increasing the
accuracy of the prediction of COVID-19.
1. Yang, Y., Lu, Q., Liu, M., Wang, Y., Zhang, A., Jalali, N.: Epidemiological and clinical features
of the 2019 novel coronavirus outbreak in China
2. Bernheim, A.: Chest CT findings in coronavirus disease-19 (COVID-19): Relationship to
duration of infection
3. Wang, D., Hu, B., Hu, C., Zhu, F., Liu, X., Zhang, J.: Clinical characteristics of 138 hospitalized
patients with 2019 novel Coronavirus-Infected Pneumonia in Wuhan, China
4. Chen, N., Zhou, M., Dong, X., Qu, J., Gong, F.: Han epidemiological and clinical characteristics
of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: A descriptive study, Lancet
5. Li, Q., Guan, X., Wu, P., Wang, X., Zhou, L., Tong, Y.: Early transmission dynamics in Wuhan,
China, of novel Coronavirus-Infected Pneumonia. In: The New England Journal of Medicine
6. Huang, C., Wang, Y., Li, X., Ren, L., Zhao, J., Hu, Y.: Clinical features of patients infected
with 2019 novel coronavirus in Wuhan, China in Lancet (2020)
7. Corman, V.M., Landt, O., Kaiser, M., Molenkamp, R., Meijer, A., Chu, D.K.: Detection of
2019 novel coronavirus by real-time RT-PCR. Euro surveillance. In: European communicable
disease bulletin 2020, p. 25 (2020)
8. Chu, D.K.W., Pan, Y., Cheng, S.M.S., Hui, K.P.Y., Krishnan, P., Liu, Y.: Molecular diagnosis
of a novel Coronavirus (2019-nCoV) causing an outbreak of Pneumonia. In: Clinical chemistry
9. Wang, S., Kang, B., Ma, J.: A deep learning algorithm using CT images to screen for Corona
Virus Disease (COVID-19)
10. Shan, F., Gao, Y., Wang, J.: Lung infection quantification of COVID-19 in CT images with
Deep Learning
11. Sethy, P.K., Behera, S.K.: Detection of coronavirus Disease (COVID-19) based on deep
features. https://WwwPreprintsOrg/Manuscript/2020030300/V1 9 March 2020
12. Gomez, P., Semmler, M., Schutzenberger, A., Bohr, C., Dollinger, M.: Low-light image
enhancement of high-speed endoscopic videos using a convolutional neural network. Med.
Biol. Eng. Comput. 57, 1451–1463 (2019)
13. Choe, J., Lee, S.M., Do, K.H., Lee, G., Lee, J.G., Lee, S.M.: Deep Learning-based image
conversion of CT reconstruction kernels improves radiomics reproducibility for pulmonary
nodules or masses. Radiology
14. Kermany, D.S., Goldbaum, M., Cai, W., Valentim, C.C.S., Liang, H., Baxter, S.L.: Identifying
medical diagnoses and treatable diseases by image-based deep learning
15. Negassi, M., Suarez-Ibarrola, R., Hein, S., Miernik, A., Reiterer, A.: Application of artificial
neural networks for automated analysis of cystoscopic images: A review of the current status
and future prospects. World J. Urol. (2020)
16. Wang, P., Xiao, X., Glissen Brown, J.R., Berzin, T.M., Tu, M., Xiong, F.: Development and
validation of a deep-learning algorithm for the detection of polyps during colonoscopy. Nat.
Biomed. Eng.
17. Vezzetti, E., Moreno, R., Moos, S., Tanzi, L.: X-Ray bone fracture classification using Deep
Learning: A baseline for designing a reliable approach
18. Bar, Y., Diamant, I, Wolf, L., Greenspan, H.: Deep learning with non- medical training used
for chest pathology identification
19. Koo, H.J., Lim, S., Choe, J., Choi, S.H., Sung, H., Do, K.H.: Radiographic and CT features of
viral Pneumonia. Radiographics
COVID-19 Detection Using Computer Vision and Deep Convolution Neural Network
20. Vignesh, D., Ramachandran, K. I., Adarsh, S.: Pedestrian detection using data fusion of leddar
sensor and visual camera with the help of Machine Learning. Int. J. Pure Appl. Math. 118, 20
21. Krishna Kumar, P., Ramesh Kumar, K., Ramachandran, K. I.: Feature level fusion of vibration
and acoustic emission signals in tool condition monitoring using machine learning classifiers.
Int. J. Prognostics Health Manage. 9(8), 2153–2648 (2018)
22. Behnke, S.: Hierarchical neural networks for image interpretation. LNCS, vol. 2766 Springer,
Heidelberg, p. 412 (2003)
23. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document
recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998). 412
24. Simard, P.Y., Steinkraus, D., Platt, J.C.: Best practices for convolutional neural networks applied
to visual document analysis. In: Seventh International Conference on Document Analysis and
Recognition, pp. 958–963 (2003) 412
25. Cires, D.C.¸ Alessandro, G., Gambardella, L.M., Schmidhuber, J.: Mitosis detection in breast
cancer histology images with deep neural networks
26. Mahbod, A., Schaefer, G., Wang, C., Ecker, R., Ellinger, I.: Skin lesion classification using
hybrid deep neural network
27. Lakshmi Priyadarsini, S., Suresh, M.: Factors influencing the epidemiological characteristics
of pandemic COVID 19: A TISM approach. Int. J. Healthcare Manage. 1–10 (2020)
28. https://doi.org/10.1148/radiol.2020200274
29. Anand Kumar, A., Kumar, C. S., Negi, S.: Feature normalization for enhancing early detection
of Cardiac Disorders. In: IEEE Annual India Conference. INDICON 2016, (2016)
30. Sreekumar, K. T. , George, K.K., Kumar, C., S., Ramachandran, K. I.: Performance enhancement of the machine-fault diagnosis system using feature mapping, normalisation and decision
fusion. IET Sci. Meas. Technol. 13, 1287–1298 (2019)
Prediction of Stock Indices, Gold Index,
and Real Estate Index Using Deep Neural
Sahil Jain, Pratyush Mandal, Birendra Singh, Pradnya V. Kulkarni,
and Mateen Sayed
1 Introduction
The investment from an individual perspective is an important aspect since just
earning money is not sufficient, it requires hard work hence we need the money to
work hard as well. This is the reason we let the money increase its value over time.
Money lying idle in a bank account causes a lost opportunity. Therefore, the money
should be invested such as to fetch good returns out of it. The options of investment
can be broadly classified as Stocks, Gold and Real Estate. Stocks are company shares
which allow us to participate in a company’s growth and earnings. Stocks are offered
through the stock exchanges and can be bought by the individual. Gold is one of
the precious metals and most popular investment among the commodities section.
Gold market is subject to speculation and also very volatile. Real Estate is another
category where an investor can directly acquire by directly buying commercial and
residential properties. Alternatively, one can opt to purchase shares in REITs, i.e.,
Real Estate Investment Trust or Real Estate ETFs which are traded in the same way
as company stocks.
S. Jain (B) · P. Mandal · B. Singh · P. V. Kulkarni
Maharashtra Institute of Technology, Pune, Maharashtra, India
e-mail: sahilj310@live.com
P. Mandal
e-mail: pratyush.m99@gmail.com
B. Singh
e-mail: birendrasingh1123@gmail.com
P. V. Kulkarni
e-mail: pradnya.kulkarni@mitpune.edu.in
M. Sayed
Persistent Systems Limited, Pune, Maharashtra, India
e-mail: mateen_sayed@persistent.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
S. Jain et al.
2 Related Work
For stock price prediction, Verma et al. [1] proposed an LSTM model which showed
the importance of news data and how they impact the prices of stock indices. Selvin
Sreelekshmy et al. [2] illustrated time-series forecasting on stock prices of three
different NSE listed companies using three different architectures of LSTM, RNN,
and sliding window CNN, with CNN producing the best results. Shah, Dev et al.
[3] proposed an ARIMA and LSTM-based model using news data and a predefined
dictionary to obtain a directional accuracy of 70%. According to the polarity score
of news, a decision to buy, sell, or hold the stock is taken. Mankar, Tejas et al. [4]
exhibited use of social media sentiments from Twitter to predict the prices of stock.
Sentiment scores of tweets collected were fed to an SVM and Naïve Bayes classifiers
separately. The limitation was that only the recent tweets were available, and hence,
an exhaustive historical analysis was not possible.
Suranart et al. [5] suggest a correlation between commodities and sentiment
around them. The authors used the SentiWordNet library to quantify the emotions
expressed in the text for on a weekly basis. The results observed after training the
neural network show that publicly expressed sentiments impact the price movement.
Liu, Dan et al. [6] developed and compared gold prices prediction using historical data
with support vector regression and ANFIS models for a period of 8 years. Dubey et al.
[7] studied and analyzed neural network, radial basis function, and support vector
regression for the duration between June 2008 and April 2013. Keshawn Kunal et al.
[8] observed multiple factors on the movement of gold prices and concluded that a
combination of Dow Jones Industrial Average (DJIA) and Standard Poor’s 500 index
reveals a favorable capability in the forecasting along fed into random forest model.
Tabales et al. [9] built an ANN model which showed better results than other
hedonic models in solving the common problems related non-linearity in extreme
range of prices and effect of outliers.
The above works have considered multitude of factors and models to predict
indices, but there is no evidence of combining multiple factors into a single architecture. Our work combines the historical factors, news related factors, absolute prices,
and movements of indices into a single architecture to achieve better performance.
3 Data Preprocessing and Feature Extraction
The data used for the proposed model is collected from two sources.
Prediction of Stock Indices, Gold Index, and Real Estate Index …
3.1 Numeric Historic Data
Previous historic index values downloaded from yahoo API between current date and
2008. The required attributes of each day, i.e., opening, closing, and average are kept
and other attributes discarded. The missing values are also removed from the dataset.
To predict trend, movement of data for each day is calculated as avgcurrent −avgprevious .
All the data values are normalized between scales of 0–1.
3.2 Textual News Data
News headlines are collected using a custom web scraper. CountVectorizer (n-grams)
and Doc2Vec models are used to represent the textual features in form of vectors.
3.3 Flow Diagrams for Data Preprocessing and Feature
See Fig. 1.
4 Data Preprocessing and Feature Extraction
Regression is the method of determining a relationship between a dependent variable and one or more independent variables such that the value of the independent
variables can be determined by applying some operations to the dependent variables.
It is possible to perform regression-based tasks using various machine learning
techniques such as support vector machines, linear regression, decision trees, and
Fig. 1 Fetching dataset and feature extraction
S. Jain et al.
neural networks. During the training phase, the training data is fed to a machine
learning algorithm and the algorithm thus “learns” the mapping between input and
output variables based on the algorithm’s implementation. After training, the trained
algorithm’s or the model’s performance needs to be evaluated using a completely
independent set of data. This data is known as the testing data. The predictions
for dependent variables made by the model using the independent variables of the
training data are observed. These predictions are then compared with the actual
or expected values of the independent variables in the testing data to evaluate the
performance of the model.
In our experiment, we train two different models, namely a support vector machine
and a deep neural network, to predict future prices of stock indices.
4.1 Support Vector Machine
Support vector machine works by fitting a hyper-plane between a set of points from
the training data mapped into a multi-dimensional space. In support vector regression,
a function is learned which maps input values to real numbers. The best hyper-plane
is the one in which maximum points lie within the decision boundary. The testing is
done on the data of recent one year, and the rest of the data is used for training. For
each training or testing example, a time window of three days was used.
4.2 Deep Neural Network
Deep neural network is a type of machine learning techniques where a model consists
of an input layer, an output layer, and multiple hidden layers. Each layer is further
composed of multiple nodes. An output of one layer is passed as an input to the next
layers. The output of a certain layer is calculated using parameters such as weights
and bias and also by applying activation functions. The output of the last layer is
compared with an expected output to calculate error, which is used to rectify the
values of parameters across the entire neural network. Thus, a neural network is
capable if capturing various complex ideas such as speech, images and text.
A convolutional neural network or a CNN uses a special operation called convolution, which is helpful for capturing spatial or temporal features of an input. CNN
is generally used for image data, but could be applied to time-series data as well.
A long short-term memory layer or an LSTM layer is used to capture attributes of
sequential data, like human speech or weather data over the days.
In our experiment, we create a neural network with CNN layers at the beginning,
followed by LSTM layers and fully connected layers. The output is a single value
which indicated the future price of the index. The training and testing sets are prepared
in a similar fashion as the SVR.
Prediction of Stock Indices, Gold Index, and Real Estate Index …
Table 1 Table of MAPE
values by index and model
SVM (%)
Neural network (%)
Stock index
Gold index
Real estate index
5 Results
We trained and tested our model separately for stock index, gold index, and housing
price index. Also, each model was tested using both SVM and neural network model.
The predictions generated by the model and actual prices were compared to evaluate
the performance of our model. The metric used in our experiment is mean squared
percentage error or MAPE.
MAPE is calculated by taking the average of percentage deviation of predicted
price from actual price for each index. MAPE can be mathematically denoted as
n 1 At - Ft M=
n t = 1 At (1)
where At and Ft are actual values and predicted or forecast values for example number
t and n is the total number of samples. M denotes MAPE (Table 1).
5.1 Visualization of Prediction Results
See Figs. 2, 3 and 4.
6 Conclusion
This paper attempts to make contributions in the research and development of techniques in prediction and analysis of various indices representing different investment
options such as stocks and gold aiding an individual in the decision making. For this
purpose, have used machine learning models which try to find patterns among the
previous historical prices and news sentiments and map it to the future price value. A
major milestone achieved in our work is the integration of historical as well as news
factor which has led to better identification of sudden changes in the future values.
It can be deduced from the proposed implementation and dataset used that neural
network models perform better than the SVM regression models due to the better
mapping of non-linearity and hidden features in the textual news data in the layers
of neural network architecture. The MAPE values corresponding to neural networks
S. Jain et al.
Fig. 2 Prediction results for stock index
Fig. 3 Prediction results for gold index
model is ~ 1% indicating a high amount of learning and hence better predictions.
As part of future work, a model can be implemented which takes as input broad
macroeconomic factors like inflation, GDP, and company related factors (for stocks)
like profit and loss. One of the challenges observed is that of lagging of predicted
prices in terms of actual price at certain instances due to high correlation between the
previous day’s price and predicted price. Our model can be improved further upon
this aspect to reduce the error in prediction and generate better results for the end
Prediction of Stock Indices, Gold Index, and Real Estate Index …
Fig. 4 Prediction results for housing index
1. Verma, I., Lipika, D., Hardik, M.: Detecting, quantifying and accessing impact of news events
on Indian stock indices. In: Proceedings of the International Conference on Web Intelligence,
ACM (2017)
2. Selvin, S., et al.: Stock price prediction using LSTM, RNN and CNN-sliding window model. In:
2017 International Conference on Advances in Computing, Communications and Informatics
(ICACCI). IEEE (2017)
3. Shah, D., Haruna, I., Farhana, Z.: Predicting the effects of news sentiments on the stock market.
In: 2018 IEEE International Conference on Big Data (Big Data). IEEE (2018)
4. Mankar, T., et al.: Stock market prediction based on social sentiments using machine learning.
In: 2018 International Conference on Smart City and Emerging Technology (ICSCET). IEEE
5. Suranart, K., Kiattisin, S., Leelasantitham, A.:Analysis of Comparisons for Forecasting Gold
Price using Neural Network, Radial Basis Function Network and Support Vector Regression.
In: The 4th Joint International Conference on Information and Communication Technology,
Electronic and Electrical Engineering (JICTEE). IEEE (2014)
6. Liu, D., Li, Z.: Gold price forecasting and related influence factors analysis based on random
forest. In: Proceedings of the Tenth International Conference on Management Science and
Engineering Management. Springer, Singapore (2017)
7. Dubey, A.D.: Gold price prediction using support vector regression and ANFIS models. In: 2016
International Conference on Computer Communication and Informatics (ICCCI). IEEE (2016)
8. Keshwani, K., Agarwal, P., Kumar, D.: Prediction of market movement of Gold, Silver and Crude
Oil using sentiment analysis. Advances in Computer and Computational Sciences. Springer,
Singapore, pp. 101–109 (2018)
9. Tabales, J.M. N., Caridad, J.M., Carmona, F.J. R.: Artificial neural networks for predicting real
estate price. Revista de Métodos Cuantitativos para la Economía y la Empresa 15 : 29–44 (2013)
Signal Strength-Based Routing Using
Simple Ant Routing Algorithm
Mani Bushan Dsouza and D. H. Manjaiah
1 Introduction
Ant colony optimization (ACO) [1] uses foraging behavior of ants to determine the
optimal path between the communicating nodes. In these algorithms, optimal solution
to the routing problem is found by the cooperative actions of every node involved in
communication. In such algorithms, control packets act as mobile agents during the
route discovery and maintenance phases. Simple Ant Routing Algorithm (SARA) is
an ACO protocol that detects congestion across the path and reduces overhead.
Signal strength between the communicating nodes is depended on their distances.
As the distance between two communicating nodes across a link increases, the
strength of the signal reduces. By constantly monitoring signal strength, it is possible
to predict, whether the nodes are moving toward or away from each other. If the
signal strength is decreasing continuously, it indicates that the nodes are moving
away from each other. As the nodes move away from each other, link between them
breaks resulting in loss of packets as well as demanding extra overhead in repairing
or establishing new path. We can set a threshold value for the signal strength between
two nodes communicating across the link. When the measured signal strength goes
below the threshold value, route repair process can be initiated so as to avoid data
losses. In this work, SARA protocol is taken as a base and the signal strength is used
as a deciding factor for changing pheromone concentration and route repair process.
The algorithm is simulated using NS2, and it is found that the modified algorithm is
able to provide better throughput and reduce delay during transmission.
M. B. Dsouza (B) · D. H. Manjaiah
Mangalore University, Mangaluru, Karnataka, India
e-mail: mani_bushan@hotmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
M. B. Dsouza and D. H. Manjaiah
2 Related Work
Swarm intelligence (SI) is a technique of achieving optimization through the collective behavior of self-organized simple agents which locally interact with each other
to collectively achieve a goal in a decentralized environment [2, 3]. Ant colony optimization (ACO) is one of the categories of SI that is used to find optimal route in
ad hoc networks [4]. AntNet is one of the first routing algorithms based on ACO. In
this algorithm, forward ants (FANT) are sent proactively toward random destinations
and backward ants (BANT) are used to reply back to the source from the destination.
During the traversal, BANT are used to update the routing table at intermediary nodes
[5]. ARA [6] is another routing algorithm that has similar behavior as that of AntNet.
It uses sequence numbers to avoid duplication of FANTs. It reduces the overhead
by using data packets for route maintenance rather than periodic ants. Every node
tries to repair the link failure and inform the neighbors about the link failure. If a
node fails to recover from the link failure, a route error is sent back to the source
and the source will initiate a fresh route discovery once again. In an improvement,
PERA [7] uses three ant agents. The FANT and BANT have the similar behavior.
And the third type of ants is called uniform FANTs, which are used to reinforce
the discovered routes. They help in avoiding the congestions across the discovered
routes. As an improvement, ANT-E [8] uses expanding ring search [9] to limit the
local retransmission and achieves a better packet delivery ratio. SARA [10] follows
a controlled neighbor broadcast of FANT. Here, only one of the neighboring nodes
is allowed to broadcast the FANT further. Asymmetry in packet transmission occurs
when the traffic flows more on one direction and less on other direction. This causes
the pheromone concentration on one direction of the link to deplete faster than the
other. SARA uses super FANT to balance the pheromone concentration across the
link. In an ACO routing protocol, namely AntHocNet [11], six types of ants are
used. These are proactive FANTs and BANTs, reactive FANTs and BANTs and
repair FANTs and BANTs. Reactive ants are used to set up the routes and proactive
ants are used to maintain the routes. To handle route failure, repair ants are used.
Location of the nodes plays an important role in real-life applications, and it
can be used to achieve further optimization. Determining coordinates of nodes that
is consistent with neighboring nodes is really challenging [12]. There are many
techniques used for estimating locations of nodes. These include acoustic methods
[13], based on directional antenna or antenna array [14, 15], by using infrared [16]
and by measuring received signal strength (RSS) [17]. POSition based ANT colony
routing protocol (POSANT) [18] combines ACO with location information to reduce
the time taken to establish the route and minimize the control packets. This algorithm
assumes that every node knows the position of itself and its neighbors. Enhanced
multi-path dynamic source routing (EMP-DSR) algorithm [19] is an enhancement of
earlier algorithm MP-DSR [20]. It takes into consideration both link reliability and
time consumed by the FANT, while selecting the path. In EMP-DSR [21], a local
repair scheme is used wherein instead of changing whole path, adjacent nodes are
used for re-route. Here, promiscuous mode is used for searching nearest adjacent
Signal Strength-Based Routing Using Simple Ant Routing Algorithm
node. However, this consumes more energy and overhead, thereby decreasing its
efficiency. The algorithm suffers from high congestion, and it does properly handle
the issue of broken link due to node failure. GSM [22] is another protocol that extends
DSR. This protocol uses generalized salvaging and cache update of intermediate node
during route request phase.
3 Proposed Solution
An ad hoc network is an active wireless reconfigurable network in which every
node acts as a router and cooperatively participates in routing activity. Arbitrary
movements of the nodes often break the link between the nodes which leads to either
route repair or fresh route discovery. As it is not possible to avoid link breaks due
to random movement of the nodes, we can predict when the link is going to break
and take alternate action before it really happens. By measuring the received signal
strength (SS), we can predict the link breakage. Signal strength [23] of a node i
situated at a distance x can be calculated as in (1),
SSi x =
G r ∗ G t ∗ St
(4π ∗ x/λ)2
Here, Gr and Gt are the gain of receiving and transmitting antenna. λ is the
wavelength of electromagnetic wave used for transmission and St is the maximum
transmitting power of transmitting antenna. Assuming antenna has circular coverage
area of radius R, average distance between any two mobile nodes is given as 0.9054R
[24]. The threshold value of signal strength [23] for a given link i can be calculated
using the expression (2).
T SSi =
G r ∗ G t ∗ St
(4π ∗ 0.905R/λ)2
For a given node, its antenna gain Gr , cover range R and used wavelength λ are
known. Every node exchanges HELLO packet to its neighboring node containing
its transmitting antenna gain Gt and maximum transmitting power St . Using this, a
node computes threshold TSSi for a given link. It can be observed that the threshold
value does not depend on the position of the node and its value is fixed for a given
neighboring node. By comparing the received signal strength in Eq. (1) with the
threshold value in Eq. (2), we can compute Signal Strength Metric (SSM).
For a given link that connects two nodes say i and j, if RSSix < TSSi, then
pheromone concentration for that link is reduced by γ according to the following
condition (3).
ph(i, j) =
ph(i, j) − γ , ph(i, j) > γ
ph(i, j) ≤ γ
M. B. Dsouza and D. H. Manjaiah
By reducing the pheromone level, the probability of selecting this link for communication can be reduced which can result in sustaining route for longer time. There
are two cases during which route repair process gets activated. These situations are
as follows. Whenever RSSix ≤ 0.5*TSSi , then the link is assumed to be broken and
for an actively used link, route repair process is activated. However, if the link is not
used for communication, then route entry for that link is removed from the routing
The status of the link can also be evaluated based on the number of successful
and unsuccessful transmission across the link. Whenever a successful transmission
occurs across a link, packet transmission value is decreased by δ. Similarly for an
unsuccessful transmission, the packet transmission value increased by λ. This can
be formulated as (4)
NTxi =
NTxi − δ,
NTxi + λ,
for successful transmission
for unsuccessful transmission
When the value of NTxi exceeds maximum transmission attempts (MAX_Tx), for a
given link, then the link is assumed to be problematic. In this situation, route repair
process is activated. Deep search algorithm (DSA) [25] is used during route repair
process. The DSA is a modified form of expanding ring search (ERS) [26] with
the value of TTL set to 2. In DSA, Repair FANT (R_FANT) is broadcasted and a
Route Repair Timer (RRT) is set. When R_FANT reaches a node with valid route to
destination, it unicasts Repair BANT (R_BANT) to the sender. However, if the node
is unable to repair the route within RRT, an error message is sent back to the source
4 Result Analysis
Simulation of the algorithm was carried out using NS 2.34 simulator over a square
area of 1000 × 1000 m2 for a time of 100 s. A pause time of 20 s was maintained
with the number of nodes varied from 10, 30, 50, 70, 90 and 120. Values of the other
parameters are as follows T0 = T1 = 100 ms, F = 5, τ = 1 s, δ = 1, α = 0.7, γ
= 1, MAX_Tx = 5, frequency = 2.48 GHz Node initial energy 20 J, T x = 0.003 J
and Rx = 0.001 J. Two protocols, namely SARA and Signal Strength-based Simple
Ant Routing Algorithm (SS-SARA), were used, and the results are tabulated. It was
observed that the modification indeed provides a better throughput with the increase
in number of nodes.
Due to the random motion of the nodes, they may move away from each other,
resulting in link failure. As the nodes move far away from each other, their signal
strength reduces, and when the strength is lower than the threshold value, route repair
is initiated. In a sparse environment, where the number of nodes is less, route repair
Signal Strength-Based Routing Using Simple Ant Routing Algorithm
Fig. 1 Variation of normalized routing load with number of nodes
may not be effective and the source may be forced to initiate a fresh discovery once
again. This causes extra control packets to float across the network. It is evident
in Fig. 1, which shows higher control overhead with less number of nodes, as the
number of nodes increases, the route repair may be effective which results in reduced
As congestion across the node is not separately considered, there is no significant
difference between the delays in both the protocols. However, it can be observed the
SS-SARA takes slightly more time in delivering packets; this may be attributed to the
fact that SS-SARA predicts link breakage and try to resolve the problem locally. Such
an action could lead to additional packets being generated and may cause congestion
that results in queuing of packets. This is evident from Fig. 2.
As there are less chances of link failure in SS-SARA, it is able to deliver more
packets. This is due to precaution that is taken in SS-SARA, which initiates route
failure routine even before the actual route failure, as shown in Fig. 3.
Lower delay and less packet loss at moderately dense network lead to the higher
throughput, as evident by Fig. 4.
5 Conclusion
Maintaining stable path for a prolonged time in MANET is challenging due to the
rapid random movement of the nodes. As the nodes move away from each other,
their signal strength decreases, and when it goes below a threshold level, we can
M. B. Dsouza and D. H. Manjaiah
Fig. 2 Variation of end-to-end delay with increasing number of nodes
Fig. 3 Variation of PDR with increasing number of nodes
Signal Strength-Based Routing Using Simple Ant Routing Algorithm
Fig. 4 Variation of throughput with increasing number of nodes
anticipate that the link will fail. Thus, before a link failure occurs, we can repair the
route and select an alternate path. This way we can sustain the path for a longer time.
In the proposed protocol, received signal strength is used as a measure for activating
route repair. Simulation of the algorithm indicates that the modification does provide
a better packet delivery and decently increase the throughput of the network. From
this, we can conclude that received signal strength can be effectively used in providing
better packet delivery and throughput in a moderately dense network but it does come
with an increased delay and routing overhead.
1. Correia, F., Vazão, T.: Simple ant routing algorithm strategies for a (Multipurpose) MANET
model. Ad Hoc Netw. 8, 810–823 (2010). https://doi.org/10.1016/j.adhoc.2010.03.003
2. Di Caro, G., Dorigo, M.: AntNet: Distributed Stigmergetic Control for Communications
Networks. J Artif Intell Res 9, 317–365 (1998). https://doi.org/10.1613/jair.530
3. Chatterjee, S., Das, S.: Ant colony optimization based enhanced dynamic source routing algorithm for mobile Ad-hoc network. Inf Sci (Ny) 295, 67–90 (2015). https://doi.org/10.1016/j.
4. Bettstetter, C., Hartenstein, H., Pérez-Costa, X.: Stochastic Properties of the Random Waypoint
Mobility Model. Wirel Networks 10, 555–567 (2004). https://doi.org/10.1023/b:wine.000003
5. Günes M, Spaniol O, Kähmer M (2003) Ant-routing-algorithm (ARA) for mobile multi-hop
ad-hoc networks-new features and results. Conf: the Med-Hoc Net 2003. 9–20
M. B. Dsouza and D. H. Manjaiah
6. Dorigo M, Stützle T (2003) The Ant Colony Optimization Metaheuristic: Algorithms,
Applications, and Advances. Handb. Metaheuristics 250–285
7. Kordon AK (2009) Swarm Intelligence: The Benefits of Swarms. Appl. Comput. Intell. 145–
8. Anderson, C.: Swarm Intelligence: From Natural to Artificial Systems. Eric Bonabeau, Marco
Dorigo. Guy Theraulaz. Q Rev Biol 76, 268–269 (2001). https://doi.org/10.1086/393972
9. Di Caro G, Dorigo M Mobile agents for adaptive routing. Proc. Thirty-First Hawaii Int. Conf.
Syst. Sci.
10. Baras J, Mehta H (2003) A probabilistic emergent routing algorithm for mobile ad hoc networks.
WiOpt03 Model. Optim. Mob.
11. Sethi S, Udgata SK (2010) An Efficient Multicast Hybrid Routing Protocol for MANETs.
Commun. Comput. Inf. Sci. 22–27
12. Pu, I.M., Shen, Y., (2009) Enhanced Blocking Expanding Ring Search in Mobile Ad Hoc
Networks. : 3rd Int. Conf. New Technol. Mobil, Secur (2009)
13. Di Caro, G., Ducatelle, F., Gambardella, L.M.: AntHocNet: an adaptive nature-inspired algorithm for routing in mobile ad hoc networks. Eur Trans Telecommun 16, 443–455 (2005).
14. Navarro-Alvarez E, Siller M, O’Keefe K (2013) GPS-Assisted Path Loss Exponent Estimation
for Positioning in IEEE 802.11 Networks. Int J Distrib Sens Networks 9:912029. https://doi.
15. Kushwaha M, Molnar K, Sallai J, et al Sensor node localization using mobile acoustic beacons.
IEEE Int. Conf. Mob. Adhoc Sens. Syst. Conf. 2005.
16. Niculescu, D., Nath, B.: Position and orientation in ad hoc networks. Ad Hoc Netw. 2, 133–151
(2004). https://doi.org/10.1016/s1570-8705(03)00051-9
17. Kułakowski, P., Vales-Alonso, J., Egea-López, E., et al.: Angle-of-arrival localization based
on antenna arrays for wireless sensor networks. Comput Electr Eng 36, 1181–1186 (2010).
18. Want, R., Hopper, A., Falcão, V., Gibbons, J.: The active badge location system. ACM Trans
Inf Syst 10, 91–102 (1992). https://doi.org/10.1145/128756.128759
19. Bahl P, Padmanabhan VN RADAR: an in-building RF-based user location and tracking system.
Proc. IEEE INFOCOM 2000. Conf. Comput. Commun. Ninet. Annu. Jt. Conf. IEEE Comput.
Commun. Soc. (Cat. No.00CH37064)
20. Kamali, S., Opatrny, J., (2007) POSANT: A Position Based Ant Colony Routing Algorithm
for Mobile Ad-hoc Networks. : Third Int. Conf. Wirel. Mob, Commun (2007)
21. Asl, E.K., Damanafshan, M., Abbaspour, M., (2009) EMP-DSR: An Enhanced Multi-path
Dynamic Source Routing Algorithm for MANETs Based on Ant Colony Optimization. , et al.:
Third Asia Int. Conf. Model, Simul (2009)
22. Leung R, Liu J, Poon E, et al MP-DSR: a QoS-aware multi-path dynamic source routing
protocol for wireless ad-hoc networks. Proc. LCN 2001. 26th Annu. IEEE Conf. Local Comput.
23. Aissani M, Senouci MR, Demigna W, Mellouk A (2007) Optimizations and Performance Study
of the Dynamic Source Routing Protocol. Int. Conf. Netw. Serv. (ICNS ’07)
24. Sung D-H, Youn J-S, Lee J-H, Kang C-H (2005) A Local Repair Scheme with Adaptive
Promiscuous Mode in Mobile Ad Hoc Networks. Lect. Notes Comput. Sci. 351–361
25. Hassan J, Jia S Performance analysis of expanding ring search for multi-hop wireless networks.
IEEE 60th Veh. Technol. Conf. 2004. VTC2004-Fall. 2004
26. Correia, F., Vazão, T., Lobo, V.J., (2009) Models for Pheromone Evaluation in Ant Systems
for Mobile Ad-hoc Networks. : First Int. Conf. Emerg. Netw, Intell (2009)
Fake News Detection Using
Convolutional Neural Networks
and Random Forest—A Hybrid
Hitesh Narayan Soneji and Sughosh Sudhanvan
1 Introduction
The rapid spread of unwanted, unsolicited news, rumors, etc., has increased exponentially over the years due to the boom in communication and technology making the
Internet more easily accessible to the public. Furthermore, due to the lack of proper
verification of the author of the news article, anyone on the web can write a fake
article, and due to easy access by the public, the news spreads amazingly fast. The
motive to spread fake news articles can be due to various factors such as political,
monetary, and revenge. According to [1], in six weeks around the time of the 2016
presidential election, research suggests that as many as 25% of Americans visited a
fake news website.
Most US adults (62%) depend on news primarily sourced from social media. 66%
of Facebook users, 59% of Twitter users, and 70% of Reddit users depend on the
subsequent platforms for their news [2].
The increase and major awareness about this type of fake articles began during
the 2016 United States presidential elections, which lead to many controversial news
articles; the spread of such articles was purely for political gain. There have also
been many documented incidents where fake news created riots, mob lynching, etc.,
around the world. Another reason for the increase of fake news articles besides
the traditional news media is the huge presence of social media sites and people
sharing those fake articles without verifying them. Social media sites like Facebook,
Instagram, Twitter, and WhatsApp have given a huge boost to the spread of these
fake articles.
H. N. Soneji (B) · S. Sudhanvan
MPSTME, NMIMS University, Mumbai, India
e-mail: hiteshsoneji25@gmail.com
S. Sudhanvan
e-mail: bssughosh27@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
H. N. Soneji and S. Sudhanvan
A need to bifurcate the news articles into true or false has become very necessary;
the spread of fake news articles has created a lot of problems, the general public
cannot differentiate easily between the fake and true articles because of the various
factors like the writing style and use of words that make the article feel real. The use
of human fact-checking is also slow and not a cost-effective method. Hence, a lot
of researches has been carried out to bifurcate the articles using automated methods
and models using machine learning (ML), deep learning (DL), and natural language
processing (NLP).
In this paper, we have proposed a method using convolutional neural networks
(CNN), random forest (RF), and PhishTank API [3] to differentiate between Phishing
and Non-Phishing URLs. The model is divided into three phases. The first phase
is URL detection and differentiating it into phishing and non-phishing using the
PhishTank API. The article text body is searched for any URL and is then verified.
The second phase consists of data pre-processing where the data is cleaned and
shaped into the required format. In the last phase, CNN and RF classifiers are trained
and used to classify the articles into fake and true.
The remaining paper is organized as follows: Sect. 2 discusses related work in this
field. In Sect. 3, we discuss the dataset and our proposed model—PhishTank API,
data pre-processing, and the classifiers used for evaluation. Section 4 is the result
analysis of the proposed model, and Sect. 5 includes the conclusion and future work.
2 Related Work
Granik et al.[4] have used a dataset consisting of Facebook News Posts from Buzzfeed
News, and the shares, comments, and reactions were also taken into consideration
and then were classified as fake or true using Naive Bayes and Bag of Words method,
which gave the authors an accuracy of 74%. Kaliyar [5] has worked in the same lines,
and instead of using Naive Bayes as the classifier has used the CNN model and used
term frequency–inverse document frequency(tf–idf) for pre-processing and achieved
an accuracy of 91.3%.
In [6, 7] different machine learning classifiers were compared using tf–idf for
pre-processing. The models are compared, Katsaros et al. [6], and CNN gave a better
accuracy compared to the other models, while Poddar et al. [7] used support vector
machine (SVM) which performed better. Lin et al. [8] have, on a similar basis as the
above papers compared different machine learning models on 134 different features,
from which XGBoost gave the best accuracy of 86.6% compared to all other models.
Han and Mehta [9] have compared machine learning and deep learning models on
three features, normalized frequency, bi-gram tf–idf, and the union of the above two
for training the model. While the machine learning models performed poorly, CNN
and Recurrent Neural Network (RNN) gave accuracies above 80%.
Kim and Jeong [10] tried to detect Korean fake news using fact database which
is made by human’s direct judgment after collecting obvious facts. They have used
Bidirectional Multi-Perspective Matching for Natural Language Sentence (BiMPM),
Fake News Detection Using Convolutional Neural Networks and Random …
as a deep learning model, but its accuracy decreases with an increase in the length
of sentence. It got an accuracy of 88% for true and 47% for false.
Al-Ash et al. [11] have detected fake news by combining frequency terms, inverse
document frequency, and frequency reversed with tenfold cross-validation using a
support vector machine algorithm classifier which gave accuracy of 96.74% across
2561 articles. Reddy et al.[12] have used a hybrid approach called a multinomial
voting algorithm which gave an accuracy of 94%, whereas most of the machine
learning algorithms have accuracies around 80%.
Girgis et al. [13] used the LAIR dataset on RNN and long short-term memory
(LSTM) deep learning models to classify articles as fake or not fake. RNN model
gave the best accuracy. Kong et al. [14] have created n-gram vectors from the text
and applied deep learning model on this. The best model gave an accuracy of 97%.
Verma et al. [15] by using FastText model for word embeddings and LSTM along
RNN for classification, which gave them a high accuracy of 94.3%.
On evaluating the different models used in [16–18], Hlaing and Kham [16]
mentioned decision tree (DT), RF as giving the best accuracies. Shabani and Sokhn
[17] used logistic regression (LR), SVM, and neural network (NN) which gave high
accuracies, while Harjule et al. [18] showed that RNN gave the highest accuracy.
Refs. [19–21] have analyzed the different approaches taken for fake news classification among which Lahlou et al. [19] list linguistic and network approach as being
the best suited, Mahid et al. [20] have proposed a new model for classification, and
Manzoor et al. [21] discussed the different classifiers.
3 Dataset
We have used a dataset that serves our needs to classify the news article. We are using
our dataset from Kaggle.com [22] named Fake-News. The news articles are labeled
with a 0 and 1 to identify whether the article is true or false. The dataset used was
part of the Kaggle challenge. The dataset is given the CSV format. The news articles
mentioned in the list contain news from different categories—political, business, etc.
The contents of the dataset are provided in Table 1 (Table 2).
A total of 20,800 news article entries are found in the dataset. We split the dataset
into training and testing. As shown in Table 2 for CNN (title metadata), the split is
Table 1 Description of the dataset columns
Unique ID for news article
The title of the news article
The author of the news article
The text body of the article
Used to mark the article as potentially unreliable, 1: Unreliable, 0: Reliable
Table 2 Accuracies for the
different combinations of the
H. N. Soneji and S. Sudhanvan
Model used (metadata)
Accuracy (%)
CNN (title)
CNN (text)
RF (author)
CNN (title + text)
CNN (title) + RF (author)
CNN (text) + RF (author)
CNN (title + text) + RF (author)
30% and 70% for training and testing, respectively, for CNN (text metadata), the
split is 50% each for training and testing, and for the RF (author), the split is 50%
each for training and testing.
4 Proposed Model
Figure 1 shows the flowchart of the proposed model: As we can see our proposed
model is divided into four phases: 1. URL classification using PhishTank API, 2.
Data pre-processing, 3. Training of classification models and combination, and 4.
4.1 URL Classification
We have used an API called PhishTank which has a large database of URLs containing
details of an URL whether it is phishing or not. Our motive here was to check for
all the URLs if they are phishing or not. If they are phishing, then we can surely say
that a news article is fake.
For example, we consider ‘’ as the URL. When we pass
this URL in the API call, then the database will be searched, and we got the result
as it is a phishing URL.
An API Call Example:
Fake News Detection Using Convolutional Neural Networks and Random …
Fig. 1 Proposed model flowchart
4.2 Data Pre-processing
Data pre-processing is an important step to convert the given data into the desired
structure. Out of all the features in the dataset, only the required features were selected
for the different classification models used. The title, text, author, and label were used
in the different models, for instance, CNN for title classification used the title and
the label, CNN for text classification used text and label, and RF used author and
label. For the combined model, we used four features from the dataset title, author,
text, and label.
Figure 2 shows the steps followed for data pre-processing which are discussed
H. N. Soneji and S. Sudhanvan
Fig. 2 Steps followed for data pre-processing
Removing Punctuation: Removal of punctuation marks (full stop, comma, question
mark, etc.) in the given data.
Word Tokenization: With the help of word tokenization, we can convert our text
into a list of words. A 2D list is made, in which each sub-list constitutes one news
article entry. We used the Natural Language Kit (NLTK) module of Python for word
Stop Word Removal: Stop word is a commonly used word such as ‘a,’ ‘an,’ ‘the,’
‘in,’ ‘for,’ and ‘of’ which need to be ignored for getting better accuracies and making
the training and testing data consistent. We have used NLTK for stop word removal.
Building vocabulary and finding maximum sentence length: A vocabulary is
built for the training and testing data, with the help of the tokens created in the
word tokenization step, and this vocabulary serves as a corpus for holding the words.
Greater the vocabulary size better would be the results. The maximum sentence
length helps us to assign the appropriate embedding values to the CNN model.
Word Embedding: Converting the given words into word vectors with the help of
the pre-trained word2doc model GoogleNews, the model contains vectors of size
300. Word embeddings are used to capture the semantic meaning of the words. The
GoogleNews model has more than 3 billion such words which help in the word
embedding process.
Tokenizing and Pad Sequencing: Finally, we tokenize our text corpus into a vector
of integers based on the embeddings, and then padding is done to have an equivalent
size vector. Padding size is determined from the maximum sentence length found
4.3 Model Architecture
The hybrid approach implemented in this model consists of the use of two different
algorithms, CNN, and RF. It is a combination of the algorithms, where CNN is used
when title and text are used for classification, whereas RF is used when the author
name is used for classification. Using CNN for author metadata consumed unnecessary time and gave inferior results as expected; hence, we tried other algorithms and
found out that RF performed the best compared to others. Using a hybrid approach—
combination of CNN and RF—helped us improve the overall accuracy and reduced
the overall space and time complexity. Both models are discussed below:
Fake News Detection Using Convolutional Neural Networks and Random …
Convolutional Neural Networks (CNN): In deep learning, CNN is a class of neural
networks, which is most used for image classification. CNN consists of an input and
an output layer and multiple hidden layers. The hidden layers consist of a series of
convolutional layers that convolve with a dot product. The different layers in a CNN
model are embedding layer, convolutional layer, MaxPooling layer, flatten layer, and
fully connected layer.
We have used the CNN model for two features text and title. The following
architecture is used to build the model:
Embedding Layer: At the input layer, for the title metadata, the tokens are embedded
into a vector of size 50, and any text whose length in less than 50 is padded to make
it 50. Sentence length is this case becomes 50, whereas in the case of text metadata,
the vector size is kept at 1000. The outputs are then forwarded into the convolutional
Convolutional Layer: For the text metadata, five filters with five different sizes are
used to extract the features. The activation function used is ReLu. Similarly, for the
title metadata, we use three filters of three different sizes.
MaxPooling Layer: The main work of the MaxPooling Layer is to reduce the size
of the feature map. MaxPooling is used to retain the most important features.
Flatten Layer: Flattening is used to transform the 2D matrix of features into a vector.
Fully Connected Layer: The fully connected layer converts the passed matrix from
the flattening layer into an output range between 0 and 1 using the sigmoid function.
Random Forest: Random forest fits several decision tree classifiers, on the training
data, and uses the average of all the decision trees for better predictive accuracy. We
have used the random forest classifier model for the author name classification, it
helps to distinguish between authors of fake and true news articles. We chose the
random forest model because it gave better results compared to other models for
author classification.
5 Result Analysis
5.1 Performance Parameters
We have used different parameters for analyzing our results: Confusion matrix
consists of true positive, true negative, false positive, and false negative. We have
used precision and recall, F1 score, and accuracy.
True positive (TP) indicates the outcome where the model predicts the true label as
H. N. Soneji and S. Sudhanvan
False positive (FP) indicates the outcome where the model predicts a true label as
True negative (TN) indicates an outcome where the model predicts a false label as
False negative (FN) indicates the outcome where the model predicts a false label as
Precision gives the proportion of positive identifications that are correct.
Precision =
Recall gives the proportion of actual positive that are identified correctly.
Recall =
F1 score is a function of precision and recall and is used to seek a balance between
precision and recall.
F1 = 2 ∗
Precision ∗ Recall
Precision + Recall
Accuracy shows how close the value obtained is to its true value.
5.2 Results and Analysis
We had three modules for detecting fake news using different metadata (author, text,
and title). We have presented the confusion matrix for each model as below:
From the above confusion matrices Figs. 3, 4, and 5, we can calculate TP, TN,
FP, and FN and hence can calculate precision–recall and F1 score. For CNN model
(text), precision was 0.96, recall was 0.97, and F1 score was 0.96. For the CNN
Fig. 3 Confusion matrix for
Fake News Detection Using Convolutional Neural Networks and Random …
Fig. 4 Confusion matrix for
Fig. 5 Confusion matrix for
model (title), precision was 0.94, the recall was 0.94, and the F1 score was 0.94, and
for the RF model (author), precision was 0.96, the recall was 0.96, and the F1 score
was 0.96. From the scores obtained, we can see high scores for F1 for all the three
models, which indicates a balance between the precision and recall values.
The accuracies for title, author, and text are 94%, 96%, and 96%, respectively.
When we consider the combined accuracy for title and text, we get 95%, same way
for title and author, we get 95%, and the rest two gives 96%. When we combine all
three, we get an accuracy of 95.33%. So according to the accuracies, we can infer
that when we detect fake news using the text and author, we get the best results. For
detecting phishing URLs in the news text, we used PhishTank to verify, track, and
share phishing URLs. If the URL is already in its database and is reported spam, then
we can directly declare the news as fake as there is a phishing URL in it, which in
turn increases the overall accuracy rate.
On comparing, with other models from the literature survey, in [6, 13, 14], and
[15], the different authors have applied deep learning models, i.e., CNN and RNN
for detection of fake news. In all of these models, their accuracy is about 90% itself
and the dataset used is either based on one category of news or dataset is very small
to correctly detect the articles which was not used while training.
Our model is an advanced model as compared to these, since our accuracy reaches
about 95% and also we have incorporated a large dataset so that more articles can be
included for training.
H. N. Soneji and S. Sudhanvan
6 Conclusion
In today’s world, there is a rapid spread of fake news due to ease of access to social
media sites, messaging, etc. That is the reason a person should verify the authenticity
of the news article which is time-consuming and at times cannot be done. With our
model, the user can easily check the authenticity by providing the title, author, and
article text to our system. With the pre-trained machine learning and deep learning
models, the output whether it is fake or not is obtained. This system is very efficient
as we obtained a combined accuracy of around 96%, and hence, it is reliable.
Although this work is giving high accuracies, we wish to continue working on
the same. Our next challenges are: (i) to be able to train the models with even larger
datasets by keeping the accuracy same or above par (ii) to try and increase the
accuracy even more by taking some extra features like the URL of the news article.
1. Mike, W.: The (almost) complete history of ‘fake news’ retrieved from https://www.bbc.com/
news/blogs-trending-42724320 22 Jan 2018
2. Gottfried, J., Shearer, E.: News use across social media platforms 2016. Retrieved from https://
www.journalism.org/2016/05/26/news-use-across-social-mediaplatforms-2016/ 26 May 2016
3. Phishtank API Developer site. https://www.phishtank.com/api_info.php
4. Granik, M., Mesyura, V.: Fake news detection using naive Bayes classifier. 2017 IEEE First
Ukraine Conference on Electrical and Computer Engineering (UKRCON), Kiev, pp. 900–903
5. Kaliyar, R.K.: Fake News Detection Using A Deep Neural Network, 2018 4th International
Conference on Computing Communication and Automation (ICCCA), Greater Noida, India,
pp. 1–7 (2018)
6. Katsaros, d., Stavropoulos G., Papakostas, D.: Which machine learning paradigm for fake
news detection? 2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI),
Thessaloniki, Greece, pp. 383–387 (2019)
7. Poddar, K., Amali, G.B.D., Umadevi, K.S.: Comparison of various machine learning models
for accurate detection of fake news. In: 2019 Innovations in Power and Advanced Computing
Technologies (i-PACT), Vellore, India, pp. 1–5 (2019)
8. Lin, J., Tremblay-Taylor, G., Mou, G., You, D., Lee, K. Detecting fake news articles. In: 2019
IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, pp. 3021–3025
9. Han, W., Mehta, V.: Fake news detection in social networks using machine learning and
deep learning: Performance evaluation. In: 2019 IEEE International Conference on Industrial
Internet (ICII), Orlando, FL, USA, pp. 375–380 (2019)
10. Kim, K., Jeong, C.: Fake news detection system using article abstraction. In: 2019 16th International Joint Conference on Computer Science and Software Engineering (JCSSE), Chonburi,
Thailand, pp. 209–212 (2019)
11. Al-Ash, H.S., Wibowo, W.C.: Fake news identification characteristics using named entity recognition and phrase detection. In: 2018 10th International Conference on Information Technology
and Electrical Engineering (ICITEE), Kuta, pp. 12–17 (2018)
12. Reddy, P.B.P., Reddy, M.P.K., Reddy, G.V.M., Mehata, K.M.: Fake data analysis and detection using ensembled hybrid algorithm. In: 2019 3rd International Conference on Computing
Methodologies and Communication (ICCMC), Erode, India, pp. 890–897 (2019)
Fake News Detection Using Convolutional Neural Networks and Random …
13. Girgis, S., Amer, E., Gadallah, M.: Deep Learning Algorithms for Detecting Fake News in
Online Text. In: 2018 13th International Conference on Computer Engineering and Systems
(ICCES), Cairo, Egypt, pp. 93–97 (2018)
14. Kong, S.H., Tan, L.M., Gan, K.H., Samsudin, N.H.: Fake news detection using deep learning. In:
2020 IEEE 10th Symposium on Computer Applications and Industrial Electronics (ISCAIE),
Malaysia, pp. 102–107 (2020)
15. Verma, A., Mittal, V., Dawn, S.: FIND: Fake information and news detections using deep
learning. In: 2019 Twelfth International Conference on Contemporary Computing (IC3), Noida,
India, pp. 1–7 (2019)
16. Hlaing, M.M.M., Kham, N.S.M.: Defining news authenticity on social media using machine
learning approach. In: 2020 IEEE Conference on Computer Applications(ICCA), Yangon,
Myanmar, pp. 1–6 (2020)
17. Shabani, S., Sokhn,M.: Hybrid machine-crowd approach for fake news detection. In: 2018 IEEE
4th International Conference on Collaboration and Internet Computing (CIC), Philadelphia,
PA, pp. 299–306 (2018)
18. Harjule, P., Sharma, A., Chouhan, S., Joshi, S.: Reliability of news. in: 2020 3rd International Conference on Emerging Technologies in Computer Engineering: Machine Learning
and Internet of Things (ICETCE), Jaipur, India, pp 165–170 (2020)
19. Lahlou, Y. El Fkihi, S., Faizi, R.: Automatic detection of fake news on online platforms: A
survey. In: 2019 1st International Conference on Smart Systems and Data Science (ICSSD),
Rabat, Morocco, pp. 1–4 (2019)
20. Mahid, Z. I., Manickam, S., Karuppayah, S.: Fake news on social media: Brief review on
detection techniques. In: 2018 Fourth International Conference on Advances in Computing,
Communication and Automation (ICACCA), Subang Jaya, Malaysia, pp. 1–5 (2018)
21. Manzoor, S.I., Singla J., Nikita: Fake news detection using machine learning approaches:
A systematic Review. In: 2019 3rd International Conference on Trends in Electronics and
Informatics (ICOEI), Tirunelveli, India, pp. 230–234 (2019)
22. Fake-news Dataset, retrieved from https://www.kaggle.com/c/fake-news/data?select=train.csv
An Enhanced Fuzzy TOPSIS in Soft
Computing for the Best Selection
of Health Insurance
K. R. Sekar, M. Sarika, M. Mitchelle Flavia Jerome, V. Venkataraman,
and C. Thaventhiran
1 Introduction
In big data, the more complex form is dealing with decision making and various
methods are suggested to solve different problem in making decisions. In those
methods, the solution for solving problems related to uncertain qualitative information in decision making is fulfilled by fuzzy linguistic terms and linguistic decision making (LDM). In LDM, hesitant decision makers are represented by fuzzy
linguistic values. Similar to the above method, another method is proposed which is
the conventional fuzzy linguistic approach in which the linguistic variables are artificial language words, leading to certain constraints like linguistic terms numbers,
complexity in computational, absence of accuracy, and information loss in the estimate process. Many similar methods are suggested to minimize these constraints,
such as the linguistic model, 2-tuple model, linguistic virtual model and the “hesitant fuzzy linguistic term set". After concluding with the pros and cons of all the
above methods proposed, we came to the conclusion that the decision making with
hesitant fuzzy terms in linguistic sets are persistent in LDM and different hesitant
fuzzy decision making are regarded to be the best model.Multi criteria Decision
Makers are still limited in various application. This paper introduced one method
called R-TOPSIS.. The approach proposed has also been validated using dispersion
statistics and similarity to test the classic TOPSIS method [1]. In LDM methods,
different criteria hesitant decision matrixes are expressed as linguistic assessments
alternatives of linguistic assessments by all decision makers in community multicriteria decision making considered to be simple linguistic term, empty set, interval
K. R. Sekar · M. Sarika · M. Mitchelle Flavia Jerome · C. Thaventhiran
School of Computing, Sastra Deemed University, Thanjavur, India
e-mail: sekar1971kr@gmail.com
V. Venkataraman (B)
School of Arts Science and Humanities, Sastra Deemed University, Thanjavur, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
K. R. Sekar et al.
in linguistic or hesitant fuzzy set, etc. In this paper proposed spherical fuzzy values
are introduced with their score and accuracy under Intuitionistic sets and novel spherical fuzzy values evaluated by intervals. To test the approach developed, this method
solves a problem of multiple selection criteria among 3D printers [2].
In this paper we presented the MCDM principle for decision-making and also
the best idea to overcome all the bad results in decision-making. We have been
commonly used in different fields for multi-criteria decision-analysis. One of the
MCDA methods called TOPSIS has been analyzed and categorized into various
application areas such as chain supply, environmental situation, conventional energy
sources, business orientated, healthcare applications. There are different methods
such as fuzzy values, Intuitionistic hesitant fuzzy values. Finally this is the most
popular approach for group decision making [3]. This paper proposed a hybrid
approach to mHealth applications, called AHP. AHP used criterion weight and subcriteria to be calculated, and TOPSIS approach was used to achieve the ranking of
various applications. The approach proposed is used to pick the proper application
of mHealth in this digital world [4]. Tools are required in the monitoring system to
incorporate information, priorities and expectations from a community of Technical
Experts. This paper proposes a weighted TOPSIS which is adjusted and unmodified.
The weighted modified TOPSIS approach is a simple and effective technique for
solving decision-making problems where ideal and non-ideal solutions are profiled
by decision-makers [5]. Cloud Computing has developed into a way to deploy business processes and various applications. It also suggested two metrics, EC and EIF,
for a better appreciation and study of trends, trend. A case study on generic parameters and parameters is proposed to rate effective values with the use of enhanced
TOPSIS method [6]. Medical tourism is some part of health-care tourism. This paper
suggested two decision making strategies with multi-criteria, DEMATEL and Fuzzy
hesitant TOPSIS, to expose the interrelationships between the variables. Such findings would potentially allow the assignment of medical tourism investments in some
developing countries [7]. It also implemented a method which is in particular the
TOPSIS method, in which we defined and presented Fuzzy hesitant TOPSIS method
for a multi-criteria group decision-making scenario, in which LINGUISTIC and
INTUITIONSTIC decision-makers are given the optimal solutions. In the article
fuzzy weightage has been calculated in two ways. One is every linguistic like low,
medium, high and very high has got its significant intuitionistic values that all will
be available with the scale of 1 to 10. The same has given as triplets. In our research
work the above said has applied with the objective based weightage. So the accuracy
has been increased through the work.
2 Relative Work
This paper focuses on systematic human error investigation and Fuzzy TOPSIS to
define primary human error features in Taiwan’s emergency departments. MCDM
framework and hierarchical process approach have been introduced. This is used
An Enhanced Fuzzy TOPSIS in Soft Computing for the Best Selection …
to calculate the important purpose of error feature to determine the accuracy of the
criteria [8]. This paper proposes regression-TOPSIS, analyzing data from the disaster.
This resource provides decision-makers and planners with open risk information to
identify the most risk zone for flood risk management on a national and local scale
[9]. This paper proposes an incorporated methodology based on SWOT approach,
AHP, and F-TOPSIS are used to evaluate energy sustainable planning. It describes
the critical factors and sub-factors for sustainable energy planning. The results validate the strength and scientific approach to develop and estimate energy assessing
for renewable energy planning [10]. HFLTS and MAGDM are applied using the two
methods, the first method deals with VIKOR and the second method deals with the
TOPSIS method, resulting in a compromise between group benefit and individual
regret and similarly with + ve and -ve solutions[11]. Vague Set TOPSIS is implemented for hotel selection in order to save travelers’ effort and time based on certain
hotel options. The support decision algorithm is proposed to satisfy mathematical
evidence, computer scientific stochastic model experiments and numerical case study
using numerical values, resulting in the selection of the best hotel between several
hotels [12]. Integration of GIS Geographical Information System—GIS, AHP and
TOPSIS methods are applied in order to obtain structural and pair quantification, as
well as in order to obtain the result for appropriate locations for industrial development for comparison between elements and for priority ranking purposes[13].
Fuzzy TOPSIS is carried out in sustainable acid rain control for the appropriate
selection of affordable acid rain prevent options for society, under environmental,
social economic and institutional considerations and positive and negative ideal solutions result in different energy sources in the highest ranked [14]. The modified and
unmodified weightage of TOPSIS study are incorporated to prefer selection methods
in surveillance. The modified weightage TOPSIS method resulted in high calculation
of relative measures to the closeness of ideal-solution and the unmodified weightage
TOPSIS analysis results in similar estimation of the relative measures to the closeness
of ideal-solution.
3 Methods
δpq =
α pq β pq η pq ,
q ε B̃, cq = max c̄pq , q ε B̃
cq cq cq
The above mentioned normalization technique is to maintain the property that
consequences from normalized triangular fuzzy numbers p [0, 1]. By taking diverse
potential in every condition, the weighted normalized hesitant fuzzy decision matrix
produced by.
V = v pq lxm , p = 1, 2, . . . m and q = 1, 2, . . . , n.
K. R. Sekar et al.
This consequences in the weighted normalized hesitant fuzzy decision matrix,
that the dynamic V pq , p , q are statistical normalized positive absolute fuzzy scores
and their ranges p [0, 1]. Then, outline the fuzzy positive-ideal solution (FPIS, Ã*)
and fuzzy negative-ideal solution (FNIS, Ã# ) as.
dp =
d1 (v pq ,
vq∗ ), d p =
d1 (v pq ,
vq# ), p = 1, 2, . . . , m
where d1 denotes the distance calculated between two hesitant fuzzy numbers. A
closeness coefficient is described to determine the ranking order of all alternatives
are ranked once the d p and d p of each alternative Ā p ( p = 1, 2, …, m) had been
calculated. The coefficient of relative closeness each chance formed by.
CCi =
, p = 1, 2, . . . , m
An possible à i is closer to the FPIS ( Ã*) and more distant from FNIS ( Ã# ) as
CCi ways to deal with 1. The relative closeness coefficient makes a decision the
positioning request all things are considered and select the high quality one from
among a lots of attainable other options. Illustration of MAGDM problem:
A MAGDM problem can be discussed as follows:
X s = x spq
1 c11
2 ⎜ c21
. ⎜.
. ⎜.
. ⎝.
l cl1
decision matrix.
R s = r spq l×m = , s ε T1 .
s ⎟
. ⎟
⎟, s ε T1 , be decision matrix of sth
. ⎟
. ⎠
For weight vector W = (w1 ,w2 ,…,wn ) ε T1 of the attributes, it is feasible to get
the weighted
statistical normalized decision matrix as.
Z s = z spq l×m = w j r spq l×m =, s ε T1 .
The resultant multi criteria group decision matrix Z1 = z spq l×m by applying the
Z1 =
αs Z S = z pq l×m (5).
where α = α1 , α2 , ..., αt 1 is the weight of hesitant decision matrix where αt ≥
0 αs = 1, and z pq =
αs z pq
, yi =
z pq , p ∈ M the total of all the elements
An Enhanced Fuzzy TOPSIS in Soft Computing for the Best Selection …
in the ith row of Z1 = z pq
and then results in the complete characteristic
values z p ( p ε M) of the substitutes Ā p ( p ε M). To obtain this, the weight vector
α = α1 , α2 , ..., αt 1 of DMs provides an important role in Multi Attribute group
decision making.
4 Results and Discussions
In the research article work, the proposed applications, methodologies and outcomes
literally gives the eye opener for the upcoming researchers. So far using the TOPSIS
enterprises applications are 3D printers, health care, business resources, supply chain,
surveillance system, cloud computing, Medical tourism and sustainable energy planning. The methodologies used to gauge the applications were R-TOPSIS, novel
interval valued spherical fuzzy, Analytical Hierarchical Processing, Weight Normalization, DEMATEL, SWOT Analysis, Fuzzy Analytical Hierarchical Processing and
PROMOTHEE. The outcomes of the methodologies are to determine accuracy of
criteria, calculate to the closeness of ideal solution and finest ranking selection for
hotels and best investment in medical tourism. The new method introduced here is
more logical and efficient to solving various types of MCDM problems than the other
approaches. Juptyer Python Interface has measured and used an Anaconda navigator
in our research work an improved TOPSIS in the Health Insurance Program offers
a greater accuracy of 92.26 percent It is necessary to make the final decision by
applying MCDM algorithms to infer better results for the same problems in order to
illustrate the applicability and potential of the method.
4.1 Illustration Work
Linguistic values
LEGEND 1: C1—Individual plan, C2—Family plan, C3—Entry age, C4—Premium, C5—Claim,
C6—Sum assured
K. R. Sekar et al.
Health insurance
Health insurance
Health insurance
5 Limitation of Hesitant TOPSIS
In the hesitant TOPSIS method one of the main limitations of the out-ranking is
based on two factors. One is about the attribute selection which is otherwise called
as criterions. The attribute preference changes the due addition and deletion that
will affect the ranking. Second one in the article that contains two objectives like
beneficial and non beneficial attributes according to the application scenarios and
another one objective applied is weightage which depends on the previous objective
beneficial and non beneficial attributes. The above said phenomena will reversal or
change the OUT-RANKING (Tables 1, 2, 3, 4, 5, 6 and 7).
6 Conclusion
In the research work, identification of the good insurance company is the major and
the hectic task for the customers. Selecting good insurance is very much be needed
An Enhanced Fuzzy TOPSIS in Soft Computing for the Best Selection …
in our day to day practice because to get back the good service from them. We have
taken the criteria like Individual Plan, Family Plan, Entry Age, Premium, Claim,
Sum Assured and alternatives around four insurance companies. Multi-criterion with
multi objectives were taken into account for the evaluating and ranking the insurance
companies using the enhanced Fuzzy TOPSIS methodology. Three decision-makers
with their linguistics and Intuitionistic values were incorporated for this study. The
stable rankings were produced to identify the feasible insurance according to the
need of the customer’s facilities. All the decision-makers’ Intuitionistic values were
integrated as and named as an aggregation. In the aggregation table, we applied two
objectives like beneficial, non beneficial and the second objective is weightage has
be provided according to the beneficial order as a triple form. All the criterions and
alternatives were thoroughly investigated using final positive ideal solutions which
will increase the service and reduce the cost factor. On the other hand, final negative
ideal solutions will increase the cost and reduce the services. Both the above methods
were analyzed through the coefficient of closeness among the alternatives and welltaken criteria. In our contribution is to put the weightage as triple form according to
the basis of the first objective beneficial and non-beneficial ordering. In the TOPSIS
work the ranking insurance companies were made it optimum way and really the
eye-opener for the upcoming researchers to do more work on the same basis for
different applications.
Table 1 Intuitionistic values
Health insurance
HI 1
Table 2 Intuitionistic values
Health insurance
HI 1
K. R. Sekar et al.
Table 3 Intuitionistic values
Health insurance
HI 1
Table 4 Aggregation phase
8 3
HI 1
6.33 8 3
5.33 8 4
8 2
4.33 8 4
5.67 8
4.67 6 2
3.67 6 1
6 1
4.67 7 3
6.33 8 1
8 2
4.33 7 4
5.67 8 3
5.67 8
4.67 8 1
5.33 7 3
5.33 8 2
4.67 8 3
5.33 7 1
7 2
7 2
B—Beneficial, NB—Non beneficial Using (1)
Table 5 Final positive ideal solution (FPIS)
Health insurance
Using (3)
Table 6 Final negative ideal solution (FNIS)
Health insurance
Using (3)
An Enhanced Fuzzy TOPSIS in Soft Computing for the Best Selection …
Table 7 Coefficient of closeness
Health insurance
DI +
Coefficient of closeness
Using (4 & 5)
1. de Farias Aires, R.F., Ferreira, L.: A new approach to avoid rank reversal cases in the TOPSIS
method. Comput. Indus. Eng. 132, 84–97 (2019)
2. Gündoğdu, F.K., Kahraman, C.: A novel fuzzy TOPSIS method using emerging interval-valued
spherical fuzzy sets. Eng. Appl. Artif. Intell. 85, 307–23 (2019)
3. Palczewski, K., Sałabun, W.: The fuzzy TOPSIS applications in the last decade. Procedia
Comput. Sci. 159, 2294–303 (2019)
4. Rajak, M., Shaw, K.: Evaluation and selection of mobile health (mHealth) applications using
AHP and fuzzy TOPSIS. Technol Soc 59, 101186 (2019)
5. El Allaki, F., Christensen, J., Vallières, A.: A modified TOPSIS (Technique for Order of Preference by Similarity to Ideal Solution) applied to choosing appropriate selection methods
in ongoing surveillance for Avian Influenza in Canada. Preventive veterinary medicine. 165,
36–43 (2019)
6. Garg, D., Sidhu, J., Rani, S.: Improved TOPSIS: A multi-criteria decision making for research
productivity in cloud security. Comput. Stand. Interf. 65, 61–78 (2019)
7. Nilashi, M., Samad, S., Manaf, A.A., Ahmadi, H., Rashid, T.A., Munshi, A., Almukadi,
W., Ibrahim, O., Ahmed, O.H.: Factors influencing medical tourism adoption in Malaysia:
A DEMATEL-Fuzzy TOPSIS approach. Comput. Indus. Eng. 137, 106005 (2019)
8. Hsieh, M.C., Wang, E.M., Lee, W.C., Li, L.W., Hsieh, C.Y., Tsai, W., Wang, C.P., Huang, J.L.,
Liu, T.C.: Application of HFACS, fuzzy TOPSIS, and AHP for identifying important human
error factors in emergency departments in Taiwan. Int. J. Indus. Ergon. 67, 171–9 (2018)
9. Luu, C., von Meding, J., Mojtahedi, M.: Analyzing Vietnam’s national disaster loss database
for flood risk assessment using multiple linear regression-TOPSIS. Int. J. Disaster Risk Reduct.
40, 101153 (2019)
10. Solangi, Y.A., Tan, Q., Mirjat, N.H., Ali, S.: Evaluating the strategies for sustainable energy
planning in Pakistan: An integrated SWOT-AHP and Fuzzy-TOPSIS approach. J. Clean. Prod.
236, 117655 (2019)
11. Wu, Z., Xu, J., Jiang, X., Zhong, L.: Two MAGDM models based on hesitant fuzzy linguistic
term sets with possibility distributions: VIKOR and TOPSIS. Inf. Sci. 473, 101–20 (2019)
12. Kwok, P.K., Lau, H.Y.: Hotel selection using a modified TOPSIS-based decision support
algorithm. Decision Support Systems 120, 95–105 (2019)
13. Ramya, S., Devadas, V:. Integration of GIS, AHP and TOPSIS in evaluating suitable locations
for industrial development: A case of Tehri Garhwal district, Uttarakhand, India. J. Clean. Prod.
238, 117872 (2019)
14. Onu, P.U., Quan, X., Xu, L., Orji, J., Onu, E.: Evaluation of sustainable acid rain control options
utilizing a fuzzy TOPSIS multi-criteria decision analysis model frame work. J. Clean. Prod.
141, 612–25 (2017)
Non-intrusive Load Monitoring
with ANN-Based Active Power
Disaggregation of Electrical Appliances
R. Chandran Lekshmi, K. Ilango, G. Nair Manjula, V. Ashish, John Aleena,
G. Abhijith, H. Kumar Anagha, and Raghavendra Akhil
1 Introduction
The worldwide demand for energy is increasing rapidly and is predicted to double
between 2020 and 2050. Hence, proper energy usage study should be carried out. In
1890s, C.W. Hart puts forward the purpose of NILM [1]. NILM is the process for
analyzing the power of a house and deducing the consumption of energy of those
particular appliances. The device has unique load signature that will allow the NILM
system to analyze patterns and recognize which devices are running. In order to
get useful input and energy-saving steps, real-time load monitoring method is more
effective [2].
R. C. Lekshmi (B) · G. N. Manjula · V. Ashish · J. Aleena · G. Abhijith · H. K. Anagha · R. Akhil
Department of Electrical and Electronics Engineering, Amrita School of Engineering, Amrita
Vishwa Vidyapeetham, Amritapuri, India
e-mail: lekshmichandran@am.amrita.edu
V. Ashish
e-mail: ashishv@am.students.amrita.edu
J. Aleena
e-mail: aleenajohn@am.students.amrita.edu
G. Abhijith
e-mail: abhijithgokul@am.students.amrita.edu
H. K. Anagha
e-mail: anaghahari@am.students.amrita.edu
R. Akhil
e-mail: akhilraghavendra@am.students.amrita.edu
K. Ilango
Department of Electrical and Electronics Engineering, Amrita School of Engineering, Amrita
Vishwa Vidyapeetham, Chennai, India
e-mail: kilango2002@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
R. C. Lekshmi et al.
Non-intrusive load monitoring (NILM) methods have become a major replacement for energy disaggregation, as it helps in separating the individual consumption.
The growth of NILM methods has been stimulated by using the platform of Internet
of Things (IoT). NILM system analyzes voltage and current waveforms that classify
load signatures that can be related to the design and state of operation of individual
devices. Based on the analysis and processing of signals, it performs decomposition
and identification of loads in order to determine power consumption of all loads [3].
Many systems using NILM techniques seek energy conservation, performing
energy use surveys, both in residential and industrial. One of NILM’s major advantages is that it decides the actions of appliances from a centralized point without the
need for individual sensors at individual loads. It is a pattern classification-based
recognition algorithm and judges changes in load signature [3]. Choosing the load
signature is a major task [4, 5].
The proposed paper helps to identify appliances with different active powers using
NLM methods. The data of electrical loads and the classification methods used by a
neural network are used to check the model accuracy for selecting items [6]. Thus,
training of these appliances uses machine learning algorithms such as the artificial
neural network to achieve greater accuracy of recognition with less error [7].
This paper consists of following sections. Section 2 involves the background of
the NILM algorithms. Section 3 defines ANN-based active power disaggregation
with results. Section 4 gives the conclusion.
2 Background
Energy disaggregation and appliance identification algorithm using machine
learning, deep learning and optimization are under progress [8, 9]. Machine learning
algorithms of NILM include supervised and unsupervised algorithms [10]. Supervised learning algorithms have several limitations that include additional marking
before training, as they are designed for limited implementations. Any NILM model
uses either transient or steady-state features of electrical appliances. Each differs in
feature extraction, sampling periods, number of samples, training time, processing
time, etc. Decision tree is used for the identification of loads in NILM [3]. Super-state
HMMs or factorial hidden Markov model (FHMM) is used for modeling a household
and identifying different power states of appliances [11]. The Viterbi (sparse) algorithm and particle filter are used as algorithms for the disaggregation. However, these
algorithms do not provide more accurate results for disaggregation, that is not able to
run when the number of appliances grow [12]. On the basis of calculation trends and
a set with pre-specified trends, a machine learning algorithm can identify appliances
from aggregated datasets. This paper focuses on real-time disaggregation of active
power using ANN by reducing the complexity of disaggregation algorithms.
Non-intrusive Load Monitoring with ANN-Based Active Power …
3 ANN Based Active Power Disaggregation
Neural networks and classification algorithms are used to predict patterns of use, to
classify and cluster electrical systems for the detection of real time machines [13, 14].
In this study, artificial neural network (ANN) is used to perform the disaggregation
of active power of different appliances. ANN enables the cluster of a new dataset
with an earlier trained dataset. The disaggregation process flow is shown as in Fig. 1.
The ANN modeling for NILM has the following steps involved,
Data collection
Feature extraction
Training and validating
Estimation or prediction.
3.1 Data Collection and Feature Extraction
Data collection from smart meter is the initial step for monitoring the load as given
in Fig. 1. The measured current and voltage using smart meter is used for the feature
extraction. The data is collected with a sampling rate of 4 kHz. The next step is to
obtain steady-state features, transient state features and nontraditional features. This
model extracts active power and RMS value of current as steady-state features for
Fig. 1 Block diagram of NILM with ANN disaggregation
R. C. Lekshmi et al.
3.2 Training and Validating
After extracting the features, the next role is to identify the loads which are active at
particular time. Figure 3 gives the process involved in the training using a flowchart.
Training requires individual appliance feature data for artificial neural network so
they can correctly identify and segregate the loads with respect to time. Figure 2
shows the steady-state features and the target as active power estimation of each
appliance used for training. The proposed NILM considers four different appliances
such as bulb, hairdryer, laptop charger and table fan. The design specifications of
these appliances are specified in Table 1. The characteristic features are extracted
for each load and given for training. Artificial neural network model consists of two
input layers for each feature, ten hidden layers and four output layers which correspond to active power of individual loads. The network was trained with Bayesian
regularization algorithm as it can produce most accurate outputs. After training for
1000 epochs, the neural network was found with a mean squared error (MSE) of
1.07995 for training and MSE of 1.5447 during the testing and validating of the
model as shown in Figs. 2, 3, 4 and 5.
Fig. 2 Mean squared error during training and testing
Non-intrusive Load Monitoring with ANN-Based Active Power …
Fig. 3 Flowchart of training
using ANN
Table 1 Appliances
specifications used for
Voltage (V) Active power Reactive power
Table fan
Hair dryer
Laptop charger 230
3.3 Estimation or Prediction
The trained ANN model can be used for the actual load estimation. The root mean
square error (RMSE) is calculated for each real testing using the Eq. (1), where N is
the number of sample, p is the predicted value and q is the actual value.
R. C. Lekshmi et al.
Fig. 4 Flowchart of ANN
( p − q)2
RMSE = ⎝
The effectiveness of the model in estimating the active power is analyzed for
different scenarios as shown below.
Scenario 1: All Loads are Present
In scenario 1, RMS current and active power consumption of bulb, hairdryer, laptop
charger and table fan were taken and tested. The RMS current and active power
obtained for scenario 1 are shown as in Fig. 6. Figure 7 shows actual and estimated
ANN active power of different appliances. Table 2 shows RMSE values for different
Non-intrusive Load Monitoring with ANN-Based Active Power …
Fig. 5 ANN with feature inputs and outputs for NILM
Fig. 6 Total RMS current and total power consumption for scenario 1
scenarios. From Table 2, it can be inferred that the individual appliance power can
be identified with an average root mean square error of 0.0783.
Scenario 2: Any 3 Loads are Present
In scenario 2, RMS current and active power consumption of hairdryer, laptop charger
and table fan were taken and tested. The RMS current and active power obtained for
R. C. Lekshmi et al.
Fig. 7 Actual verses predicted ANN estimation for scenario 1
Table 2 Root mean square error of ANN estimation of different loads at different scenarios
RMSE of Bulb
laptop charger
RMSE of table
Average RMSE
scenario 3 are shown as in Fig. 8. Figure 9 shows actual and estimated ANN active
power of different appliances. From Table 2, it can be inferred that the individual
appliance power can be identified with an average root mean square error of 0.1031.
Scenario 3: Any 2 Loads are Present
In scenario 3, RMS current and active power consumption of bulb, hairdryer and
laptop charger were taken and tested. The RMS current and active power obtained for
scenario 3 are shown as in Fig. 10. Figure 11 shows actual and estimated ANN active
power of different appliances. From Table 2, it can be inferred that the individual
appliance power can be identified with an average root mean square error of 0.0631.
Non-intrusive Load Monitoring with ANN-Based Active Power …
Fig. 8 Total RMS current and total power consumption for scenario 2
Fig. 9 Actual versus predicted ANN estimation for scenario 2
Scenario 4: Any One Load is Present
In scenario 4, the RMS current and active power consumption of bulb are tested.
The RMS current and active power for scenario 4 are shown as in Fig. 12. Figure 13
shows actual and estimated ANN active power of different appliances. From Table
2, it can be inferred that the individual appliance power can be identified with an
average root mean square error of 0.0124.
R. C. Lekshmi et al.
Fig. 10 Total RMS current and total power consumption for scenario 3
Fig. 11 Actual versus predicted ANN estimation for scenario 3
Scenario 5: Different Loads Operated on Different Time Zones
In this scenario, different loads are operated at different time and verified the effectiveness of the ANN model estimation. The RMS current and active power obtained
for scenario 5 are shown as in Fig. 14. Figure 15 shows actual and estimated ANN
active power of different appliances. From Table 2, it can be inferred that the individual appliance power can be identified with an average root mean square error
percentage of 0.1642.
From Table 2, it can be inferred that NILM with ANN-based power estimation
has an overall RMSE of 0.08422 when tested for different load combinations.
Non-intrusive Load Monitoring with ANN-Based Active Power …
Fig. 12 Total RMS current and total power consumption for scenario 4
Fig. 13 Actual versus predicted ANN estimation for scenario 4
Fig. 14 Total RMS current and total power consumption for scenario 5
R. C. Lekshmi et al.
Fig. 15 Actual versus predicted ANN estimation for scenario 5
4 Conclusion
Energy management with affordable cost is a concern in today’s world. NILM with
single-point measurement and appliance monitoring reduces the sensor cost and
effective monitoring of the system. The ability to identify power usage based on
signatures for electrical appliances will be essential for smart electronic control
systems to automate their power consumption. In this paper, the effectiveness of the
ANN for power disaggregation of appliances from single-point smart meter measurement was studied. From the analysis of different scenarios, it is obvious that ANN
is able to recognize each system and also have better estimation with less error. In
future, work can be extended focusing on identification and estimation of appliances
with similar power characteristics.
1. Hart, G.W.: Nonintrusive appliance load monitoring. Proc. IEEE 80(12), 1870–1891 (1992)
2. Hui, .L.Y., Logenthiran, T., Woo, W.L.: Non-intrusive appliance load monitoring and identification for smart home. In: 2016 IEEE 6th International Conference on Power Systems (ICPS),
pp. 1–6. New Delhi (2016)
3. Sreevidhya, C., Kumar, M., Karuppasamy, I.: Design and implementation of non-intrusive load
monitoring using machine learning algorithm for appliance monitoring. In: 2019 IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing
(INCOS), pp. 1–6. Tamil Nadu, India (2019). https://doi.org/10.1109/INCOS45849.2019.895
4. Yang, H., Chang, H., Lin, C.: Design a neural network for features selection in non-intrusive
monitoring of industrial electrical loads. In: 2007 11th International Conference on Computer
Supported Cooperative Work in Design, pp. 1022–1027. Melbourne, VIC (2007)
5. Hasan, T., Javed, F., Arshad, N.: An empirical investigation of V-I trajectory based load signatures for non-intrusive load monitoring. In: 2014 IEEE PES General Meeting | Conference and
Exposition, p. 1. National Harbor, MD (2014)
6. Popescu, F., Enache, F., Vizitiu, I., Ciotîrnae, P.: Recurrence plot analysis for characterization of appliance load signature. In: 2014 10th International Conference on Communications
(COMM), pp. 1–4. Bucharest (2014)
Non-intrusive Load Monitoring with ANN-Based Active Power …
7. Jiao, D., Yu, H.Y., Wu, X.: A new construction method for load signature database of load
identification. In: 2018 2nd IEEE Conference on Energy Internet and Energy System Integration
(EI2), pp. 1–6. Beijing (2018)
8. Iksan, N., Sembiring, J., Haryanto, N., Supangkat, S.H.: Appliances identification method of
non-intrusive load monitoring based on load signature of V-I trajectory. In: 2015 International
Conference on Information Technology Systems and Innovation (ICITSI), pp. 1–6. Bandung
9. Lin, Y., Tsai, M.: A novel feature extraction method for the development of nonintrusive
load monitoring system based on BP-ANN. In: 2010 International Symposium on Computer,
Communication, Control and Automation (3CA), pp. 215–218. Tainan (2010)
10. Rajasekaran, R.G., Manikandaraj, S., Kamaleshwar, R.: Implementation of machine learning
algorithm for predicting user behavior and smart energy management. In: 2017 International
Conference on Data Management, Analytics and Innovation (ICDMAI), pp. 24–30. Pune (2017)
11. Wang, L., Luo, X., Zhang, W.: Unsupervised energy disaggregation with factorial hidden
Markov models based on generalized backfitting algorithm. In: 2013 IEEE International
Conference of IEEE Region 10 (TENCON 2013), pp. 1–4. Xi’an (2013)
12. Chang, H., Chen, K., Tsai, Y., Lee, W.: A new measurement method for power signatures of
nonintrusive demand monitoring and load identification. IEEE Trans. Ind. Appl. 48(2), 764–771
13. Janani, K., Himavathi, S.: Non-intrusive harmonic source identification using neural networks.
In: 2013 International Conference on Computation of Power, Energy, Information and
Communication (ICCPEIC), pp. 59–64. Chennai (2013)
14. Srinivasan, D., Ng, W.S., Liew, A.C.: Neural-network-based signature recognition for harmonic
source identification. IEEE Trans. Power Delivery 21(1), 398–405 (2006)
15. Abubakar, I., Khalid, S.N., Mustafa, M.W., Shareef, H., Mustapha, M.: An overview of nonintrusive load monitoring methodologies. In: 2015 IEEE Conference on Energy Conversion
(CENCON), pp. 54–59. Johor Bahru (2015)
16. Dhananjaya, W.A.K., Rathnayake, R.M.M.R., Samarathunga, S.C.J., Senanayake, C.L., Wickramarachchi, N.: Appliance-level demand identification through signature analysis. In: 2015
Moratuwa Engineering Research Conference (MERCon), pp. 70–75. Moratuwa (2015)
17. Lin, Y., Tsai, M.: Non-intrusive load monitoring by novel neuro-fuzzy classification considering
uncertainties. IEEE Trans. Smart Grid 5(5), 2376–2384 (2014)
18. Lin, Y.-H., Tsai, M.-S.: Application of neuro-fuzzy pattern recognition for non-intrusive
appliance load monitoring in electricity energy conservation. In: 2012 IEEE International
Conference on Fuzzy Systems, pp. 1–7. Brisbane, QLD (2012)
19. Alcalá, J., Ureña, J., Hernández, Á., Gualda, D.: Event-based energy disaggregation algorithm
for activity monitoring from a single-point sensor. IEEE Trans. Instrum. Meas. 66(10), 2615–
2626 (2017)
20. Sultanem, F.: Using appliance signatures for monitoring residential loads at meter panel level.
IEEE Trans. Power Delivery 6(4), 1380–1385 (1991)
21. Ridi, A., Gisler, C., Hennebert, J.: ACS-F2—a new database of appliance consumption signatures. In: 2014 6th International Conference of Soft Computing and Pattern Recognition
(SoCPaR), pp. 145–150. Tunis (2014)
22. Chang, H., Lin, C., Lee, J.: Load identification in nonintrusive load monitoring using steadystate and turn-on transient energy algorithms. In: The 2010 14th International Conference on
Computer Supported Cooperative Work in Design, pp. 27–32. Shanghai, China (2010)
23. Seema, P.N., Deepa, V., Manjula, G.N.: Implementation of consumer level intelligence in a
smart micro-grid along with HEMS based price prediction scheme. In: 2016 IEEE 1st International Conference on Power Electronics, Intelligent Control and Energy Systems (ICPEICES),
pp. 1–5. Delhi (2016). https://doi.org/10.1109/ICPEICES.2016.7853143
Prediction of Dimension of a Building
from Its Visual Data Using Machine
Learning: A Case Study
Prasenjit Saha, Utpal Kumar Nath, Jadumani Bhardawaj, and Saurin Paul
1 Introduction
The circumstances where we need to measure something but without having a measurement tool, we only can suffice ourselves with the estimation provided by our
eyes. There is a whole field in science, called photogrammetry, using which the size
or dimension of a structure can be deduced by just the 2D image of it. Photogrammetry is the science and technology of obtaining reliable information about physical
objects and the surroundings over the process of recording, surveying and illustrating
photographic images and patterns of electromagnetic radiant imagery and other phenomena. In general photogrammetric approach can be applied collectively with the
laser survey are being achieve for the complicated surface measurements. Various
types of structural form can also be figured out applying this approach [1]. In this
paper, we will inspect the condition of camera positions right after image capturing
to achieve the best result [2]. In this paper, the size of a building is being predicted
from its image by using machine learning using the concept of the photogrammetry.
In machine learning, decision tree is being used.
2 Methodology
2.1 Definition and Working Principle
Photogrammetry depends on certain type of a photographic report as the origin of
data. Such a report represents a two-dimensional registration of, in general, a threeP. Saha (B) · U. K. Nath · J. Bhardawaj · S. Paul
Assam Engineering College, Guwahati, India
e-mail: prasenj1ps@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
P. Saha et al.
Fig. 1 Distance calculation
using the concept of
dimensional object. After absolute contraction, each photograph may be observed
as a central prospect projection of the object [3]. The operation of photogrammetry
is being used for substantial research from the continuous and rapid evolution of
sensors to methodologies in different fields [4] (Fig. 1).
2.2 Data Acquisition
It is interested in collecting reliable instruction about the equity of surfaces and
objects. The remotely accepted instruction can be divided into three categories: Geometric information involves the spatial position and the shape of objects. Physical
information refers to properties of electromagnetic radiation, temporal information
is similar to the change of an item in time, mostly achieved by analyzing several
images which were recorded at different times.
2.3 Photogrammetric Instruments
The various photogrammetry instruments are as follows. A rectifier is kind of a copy
machine for making plane rectifications. To accompanish orthophotos, an orthophoto
projector is mandatory. A comparator is a instrument that is used to measure points
on a diapositive (photo coordinates). A stereo plotte achieves the renewal central
projection to orthogonal projection in an analog fashion. The analytical plotter gives
the transformation computationally.
Prediction of Dimension of a Building from Its Visual Data …
2.4 Decision Tree
Decision tree is one of the most extremely used inaugural assumption algorithms [5].
Decision tree algorithm is mainly used for analyzing data and forecasting. The algorithm is easy to understand and implement, and it is easy to evaluate the model through
static testing [6]. Machine learning is one of the finest fields of computer science
world which has given the innumerable and invaluable solutions to the mankind to
solve its complex problems. Decision tree is one such modern solution to the decisionmaking problems by learning the data from the problem domain and building a model
which can be used for prediction supported by the systematic analytics [7]. It is a
function closed to discrete value and also can be treated to a Boolean function. It is an
inductive learning algorithm based on the case, which can be commonly used to form
a classification and prediction models. Focusing on a group of disorder, no rules’
cases, the classification rules can be analyzed to a set of rules according to decision
tree. In various ways to solve classification problems, decision trees are commonly
used as a method. It is a predictive modeling method for dividing the search space
into a number of subsets through “divide and rule”. A tree must be required to build
for the modeling of classification, clustering and prediction, which the classification
process using this method. Once the tree is built, tuple in datasets is applied and get
the classification result. It is a tree structure similar to the flowchart, using top-down
recursive way. The internal nodes in this tree are compared by attribute value and
judge the branches under this node according to different attribute value. Finally,
the conclusion can be got from the leaf nodes. The entire process is repeated on the
new node as a child of the root tree. Figure 1 shows the basic structure the decision
tree. From the perspective of the whole tree, decision tree represents disjunction of
attribute value restraint’s conjunction. Every way root to the leaves correspond to a
set of attributes of conjunction, correspond to the tree itself (Fig. 2).
Fig. 2 Basic structure of a
decision tree
P. Saha et al.
3 Prediction Model
3.1 Input Data
A dataset is built by composing different images from the Guwahati City. The dataset
consists of a building image its both front view and side view, along with the distance
of the image for each of the images and the angle of view. The angle of view is
calculated from the focal length of the camera.
3.2 Working of the Model
In this model, the dataset is divided into two parts training and testing. The training
part consists of 80%, and test part contains 30% of our dataset. The images are then
encoded to their respective R, G and B pixel values by using Opencv and built a
separate dataset to store the pixel values. The R, G and B pixel values are stored into
a separate database along with its respective elements of the previous database. Now,
decision tree regression model is first used to train the images from the dataset with at
least 80% of the dataset, and after the training, the model is used to predict the image
distance, angle of view, and the number of floors from the testing dataset (Fig. 3).
4 Results
The images are encoded to the respective R, G and B pixel values. The pixel values
for three sample images are found to be as follows, For the first image is: 110, 200,
112, for the second image 112, 90, 140 and for third image is 60, 100, 200. The
below graph shows distance along with the X , Y , Z pixel values and along with their
respective distances (Figs. 4 and 5).
Fig. 3 Block diagram of the prediction model
Prediction of Dimension of a Building from Its Visual Data …
Fig. 4 Pixel value of images for R, G, B in histogram
Fig. 5 Pixcel value of images for R, G, B in scattering
4.1 Output
The predicted distance for the above buildings is found to be 10, 8 and 18 m. It has
been found that the accuracy of the model is 98.27%. The below graph shows the
accuracy versus training step, and it shows that with increase in the training step, the
accuracy increases (Fig. 6).
P. Saha et al.
Fig. 6 Accuracy versus
training step
5 Conclusion
Thus, we can predict the image distance of a building from its visual data with
machine learning with 98.27% accuracy. In the similar way, we can predict the angle
of view of the image. Thus, using the concept of photogrammetry, we can find the
length and breadth of a building. Hence, the total building area of the building is
calculated from its image.
Acknowledgment This work is a part of the project funded by ASTU, CRSAEC20, TEQIP-III.
1. Sužiedelytė-Visockienė, J., Domantas, B.: Digital photogrammetry for building measurements
and reverse-engineering. Geodesy Cartogr. (2009)
2. Aminia, A.S.: Optimization of close range photogrammetry network design applying fuzzy
computation. Int. Arch. Photogram. Rem. Sens. Spat. Inf. Sci. XLII-4/W4, 31–36 (2017)
3. Mikhail, E.M.: An introduction to photogrammetry. In: Proceedings of SPIE 0045, Coherent
Optics in Mapping. Rochester, United States (1974)
4. Mancini, F., Salvini, R.: Application of photogrammetry for environmental research. Int. J.
Geo-Inform. (2019)
5. Zhong, Y.: The analysis of cases based on decision tree. In: 7th IEEE International Conference
on Software Engineering and Service Science (ICSESS) (2016)
6. Tang, X., Liu, Z., Li, T., Wu, W., Wei, Z.: The application of decision tree in the prediction
of winning team. In: 2018 International Conference on Virtual Reality and Intelligent Systems
(ICVRIS), pp. 239–242. Changsha (2018). https://doi.org/10.1109/ICVRIS.2018.00065
7. Patil, S., Kulkarni, U.: Accuracy prediction for distributed decision tree using machine learning approach. In: 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), pp. 1365–1371. Tirunelveli, India (2019). https://doi.org/10.1109/ICOEI.2019.
Deep Learning Algorithms for Human
Activity Recognition: A Comparative
Aaditya Agrawal and Ravinder Ahuja
1 Introduction
In the vast domain of computer science research, human activity recognition is
emerging as a core concept to understand and develop computer vision and human–
computer interaction. It forms the core of scientific endeavors like health care,
surveillance, and human–computer interaction. But the new field is besotted with
difficulties like sensor placement, sensor motion, and video camera installations in
places to be monitored, tangled background, and we perform the diversity in the
ways activities. To tackle the challenges mentioned above, a more efficient approach
would be to analyze the information gathered from unit sensors that can perform
inertial measurements worn by the user/tester or as a built-in function in the user’s
smartphone track of his/her movements. For this purpose, a tri-axial accelerometer,
a sensor is built-in smartphones to track the user’s movements.
Human Activity Recognition: Human activity recognition (HAR) is the mechanism
for “classifying sequences of accelerometer data recorded by the specialized harness
or smartphones into known well-defined movements.” It is the process to understand
human body gestures or movements by using sensors to ascertain that human activity
or action is taking place. Our daily activities can be simplified and automated if any
HAR system recognizes them (e.g., smart lights, maybe that recognizes hand gesture).
Most importantly, HAR systems are of two types—unsupervised and supervised. A
HAR system based on supervised method functions only after previous training is
A. Agrawal (B) · R. Ahuja
School of Computer Science and Engineering, Galgotias University, Greater Noida, Uttar
Pradesh, India
e-mail: aadityaagr1007@gmail.com
R. Ahuja
e-mail: ahujaravinder022@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
A. Agrawal and R. Ahuja
done with staunched datasets. A HAR system based on the unsupervised method can
configure with a collection of regulations during augmentation.
A short description of the various uses of HAR in diverse environments is provided
1.1 Surveillance System
HAR surveillance systems were first instated in public places like airports and banks
to avoid crimes and perilous activities occurring at public places. This idea of human
activity prediction was introduced by Ryoo [1]. The results confirmed that the HAR
surveillance system is able to recognize multiple in progress human interactions at
the former stage. Lasecki et al. [2] proposed a system called named Legion: AR that
would yield robust brings into force activity recognition by feeding already existing
identification models with real-time activity recognition by incorporating inputs as
gained from the multitude at any public places.
1.2 Healthcare
In healthcare systems, HAR is instated in hospitals, residential areas, and rehabilitation centers for multiple reasons, such as supervising elderly people living in
rehabilitation centres, disease prevention, and managing chronic diseases [3]. About
rehabilitation centers, HAR system is particularly useful to track the activities of
elderly people and fall detection, monitor physical exercises of children with motor
disabilities and troubling motion conditions like autism spectrum disorders, monitor
patients with dysfunction, post-stroke patients, patients with psychomotor slowing
and abnormal conditions for cardiac patients [4, 5]. This monitoring ensures timely
clinical intervention.
1.3 Human–Computer Interaction
In this category, activity recognition is usually applied in gamercizing and gaming,
such as Wii [6, 7], Kinect, Nintendo [8, 9], other full-body movement-based exercises,
and gaming for adults and even for adults who neurological ailment [10]. HAR detects
body gestures carried out by any person and accordingly directs the completion of
stalwart tasks [11]. Senior citizens and some adults who are suffering from some
category of neurological ailment can carry out uncomplicated gestures to engage
with exergames and games without any discomfort. This feature also allows surgeons
to have impalpable command over intraocular image monitoring by incorporating
normalized free-hand maneuver.
Deep Learning Algorithms for Human Activity Recognition …
The paper is divided into the following sections: Sect. 2 contains various features
used in sarcasm detection, Sect. 3 describes various datasets used, Sect. 4 contains
various techniques used, and Sect. 5 contains performance reported by researchers
followed by Sect. 6, which contains a conclusion and future scope.
2 Related Work
The domain of HAR has proliferated with substantial progress in the last decade.
Different research and surveys focusing on various approaches have been carried
out to identify human actions and the latter’s significant effect in the real-world
scenario. The different types of course of action can be categorized into four classes,
as described below.
2.1 Based on Sensor
The below paragraphs describe some of the literature focusing on activity recognition
hinged on sensors (sensor-based). Chen et al. [12] have carried a thorough study
in the sub-field of sensor-based work in human activity recognition. Their study
classifies previous researches in two principal classifications: (i) sensor-based (based
on the sensor) versus vision-based (based on vision) and (ii) knowledge-driven based
versus data-driven based. This paper, however, concentrates on activity identification
methods, which are data-centric.
Alternately, one other study carried by Wang et al. [13] exemplified the various
approaches of deep learning or deep neural network for HAR by incorporating
sensors. Their study categorizes existing writings in activity identification on three
factors, viz. stimulus modality, application area, and deep learning models. This
paper outlines the research in activity recognition and focuses on the deep model,
which is employed to analyze the information gathered from sensors.
2.2 Based on Wearable Device
The following paragraphs demonstrate some of the literature that focuses on the elucidation of activity recognition based on wearable devices. Labarador and Lara [14]
conducted their study on human activity recognition with wearable sensors. Their
study expounds a comprehensive analysis of a variety of structural faults in a human
activity recognition system. Some of them are sensor selection and traits, protocol
and compilation of information, processing methods, consumption of energy, and
recognition performance.
A. Agrawal and R. Ahuja
Alternately, Cornacchia et al. [15] conducted an in-depth survey and accordingly
divided the current exploration work into two comprehensive categories: global body
movement activity, which involves displacement/motion of the entire body (e.g.,
jogging, walking, and running) and localized reciprocity activities, which include
movements dealing with extremities (e.g., usage of an object). This particular literature also outlines the categorization on the basis of two factors—sensor type that is
being utilized and position on the human body where the sensor is placed, such as
sensor mounted on the user’s chest or wrist.
2.3 Based on Radio Frequency
The following paragraphs describe some of the literature on the elucidation of activity
recognition based on radio frequency. Scholz et al. [16] conducted a study in the
domain of activity recognition based on equipment-free radio. Their study classifies
previous research progress in this field into two primary categories- device/equipment
free radio-based activity recognition (DFAR) and device/equipment-free radio-based
localization (DFL).
Alternately, the study conducted by Amendola et al. [17] summarizes the radio
frequency identification (RFID) technology usage in health-related Internet of Things
(IoT) applications. Their study discusses the multiple uses of RFID tags in cases
such as the ambient passive sensors that comprise temperature sensors and volatile
compound sensors and tags, which are body-centric such as implantable and wearable
Another study conducted by Wang and Zhou [18] reviews existing exploratory
work in the domain of activity identification based on radio technology. Their study
sums up the existing researches in four crucial classifications: (i) based on Wi-Fi,
(ii) based on RFID, (iii) based on Zig-Bee, and (iv) additional radio-based such as
microwave and FM radio. The researchers suggest a comparative analysis of all the
methodologies mentioned above by incorporating the following characteristics, such
as coverage, precision, types of activities, and the cost of deployment.
2.4 Based on Vision
The below paragraph describes some research work that focuses on the elucidation of activity identification based on vision. Vrigkas et al. [19] study the existing
body of exploratory analysis, which contains a vision-based approach for activity
identification. They categorized their research into two major classes: multi-modal
and uni-modal approaches. Alternately, the study conducted by Herath et al. [20]
focused on the undertaking of crucial research in the domain of activity identification/recognition by deploying methods that are based on vision. Their study catalogs
Deep Learning Algorithms for Human Activity Recognition …
the previous works into two main groups: deep neural network-based solutions and
representation-based solutions.
3 Deep Neural Network
Deep neural network or deep learning is the subset/category of machine learning.
A DNN comprises multiple levels of nonlinear operations with many hidden layers
known as neural nets. The main focus of deep learning is to analyze feature hierarchies, where features at towering levels of the hierarchy are established with the help
of features at lower levels. We will be employing two such networks to create deep
learning models which are mentioned below.
3.1 Convolutional Neural Network (CNN)
CNN, a type of feed-forward artificial neural network, is broadly used in image
recognition and processing. It executes a set of descriptive and generative tasks
using deep learning, often using computer vision that involves image and video
recognition. Two major operations performed by CNN are pooling and convolution,
which are applied along the temporal dimensions of sensor signals. Since HAR
involves classifying time series data, we will be using a 1D CNN model since, in 1D
CNN, the kernel slides along one direction.
3.2 Recurrent Neural Network (RNN)
Unlike other types of traditional neural network where all the inputs/outputs behave
independently of one another, in RNN, the outputs of the previous steps are provided
as input to the present step. In this paper, we will be using the long short-term
memory (LSTM) network, which is a category of RNN (a better version of RNN). The
architecture of LSTM was prompted by the analysis of existing RNNs for error flow,
which established that prolonged time lags were remote to pre-existing structure, the
reason being that back-propagation of an error erupts or deteriorate exponentially.
Focusing on the architecture of an LSTM, the layers or LSTM layers contain a union
of incessantly coupled blocks, technically called the memory blocks or cells. These
cells or memory blocks represent or share its concept (a comparative version) with
the memory chips inside a digital computer. Every block consists of one or multiple
incessantly threaded memory blocks/cells, three multiplicative units—the output,
forget gates, and the input. They are responsible for providing uninterrupted analogs
of reading, resetting, and writing operations for the cells.
A. Agrawal and R. Ahuja
4 System Design
Human activity recognition can be categorized into four broad aspects, as showcased
in Fig. 1. These phases are (i) selecting the appropriate sensor and its deployment,
(ii) collection of data using sensors (using wearable type in our case), (iii) preprocessing of the data like normalization and feature selection, and (iv) using deep
learning algorithms to recognize activities.
4.1 Data Collection
A tri-axial accelerometer in the smartphone was used to collect the data. The smartphones were carried in the pocket by 36 users while performing the six activities,
which had a sampling rate of 20 Hz (20 values per second). The accelerometer’s data
captures the acceleration of the users in the X-axis, Y-axis, and Z-axis (that’s why
the term tri-axial), as shown in Fig. 2a. These axes represent the motion of the user in
the horizontal, sideways, downward/upward and backward/forward directions. The
distribution of dataset with respect to activities is shown in Fig. 2b. The dataset
contains 1,098,207 rows and 6 columns.
4.2 Data Pre-processing
Data pre-processing is the method employed to transform the available data as per
the format that the machine can accept and fed to the algorithm. Our dataset is in a
text file format. The data from the file is read, and then each of the accelerometer
components (x, y, and z) is normalized. The accelerometer data is converted and
transformed into a time-sliced representation and loaded in the data frame. We have
Sensor SelecƟon and
CollecƟng data from
Data pre-processing
and Feature
Fig. 1 Process involved in human activity recognition
Developing a
Classifier to
Recognize acƟviƟes
Deep Learning Algorithms for Human Activity Recognition …
Fig. 2 (a) Direction of movement recorded by accelerometer. (b) Frequency of data corresponding
to each activity
to add encoded value for own activities since the deep neural network cannot work
with non-numerical labels.
4.3 Training and Test Set
The neural network needs to learn from some of the users who have been through
experiment, and after that, we need to check how well the neural network predicts
the movements of the persons it has not seen before. We split the training and test
sets in 80/20 parts.
4.4 Reshaping Data and Prepare for the ML Model
The data stored in the data frame needs to be formatted to be fed into the neural
network. Dimensions used would be (i) number of periods—The number of periods
within one record, (ii) number of sensors—The value is three as we are using acceleration over the x, y, and z-axis, (iii) number of classes—This is the number of nodes
for the output layer in neural network.
4.5 Building and Training the Model
We will be designing two models, namely CNN and LSTM, with the same set of
data and training and test sets. The CNN model will involve one convolution layer
followed by a max-pooling layer and another convolution layer. This will result in a
fully connected layer that would be associated with the Softmax layer. As given in
A. Agrawal and R. Ahuja
Fig. 3 CNN architecture
Time steps x
Fig. 4 Stacked LSTM architecture
Fig. 3, Softmax function/layer is a category of squashing function limiting the output
in the range from 0 to 1. This allows the output to be read as a probability.
Long short-term model consists of around two exhaustively interlinked layers and
two layers of LSTM (LSTM layers), heaped on each other with about 60 hidden units
each. Neural network like RNN is made up of cyclic links, enabling it to understand
the temporal dynamics of historical data. A concealed layer in RNN consists of
several nodes where every node has a motive for producing the present hidden state
and a result or output by incorporating the present input and the prior hidden state
[2]. Similar concept is involved in LSTM, but instead of nodes, there are memory
cells. LSTM unit consists of a cell, an input, and output gate (Hidden State), and a
forget gate. The three gates are responsible for regulating the flow of data into and
out of the cell. These gates control when to overlook any prior hidden state and when
it’s time to modify the rules with new information.
We are using stacked LSTM as the addition of layers increases the level of
abstraction of input observation (Fig. 4).
5 Results
The next step is to train both the models with the training data that we prepared. The
hyperparameter is used for the training: A batch size of 400 records will be used and
will train the model for 50 epochs. Plotting the learning curve for both the models,
as seen in Figs. 5 and 6, respectively, the CNN model faced issues during the testing
phase. The test loss seems to rise after 20 epochs, while test accuracy maintains
Deep Learning Algorithms for Human Activity Recognition …
Fig. 5 Plot for learning
curve for CNN
Fig. 6 Plot for learning
curve for LSTM
consistency until 50 epochs. For the model to be more accurate, the test loss curve
should have a downward curve. It is noteworthy that the model performs well during
the training phase as the training loss has a downward curve.
In the case of the LSTM model, from Fig. 6, we can make out that the model
seems to learn pretty well as the test loss forms a downward curve with increasing
epochs. Similarly, the test accuracy curve sees an increase at the start and then seems
to acquire a consistent course along with the training accuracy. It is evident from both
the learning curves that the LSTM model is more accurate compared to the CNN
model. Figure 7 represents the prediction accuracy for the CNN model. Looking at
the diagonal matrix, the accuracy of the CNN model turns out to be 87%. The model
faced difficulty in identifying activities like upstairs and standing.
The LSTM model, its confusion matrix in Fig. 8, tells us the model accurately
predicts activity like walking and faces difficulty in identifying activities like standing
and upstairs. Looking at the diagonal matrix, the LSTM model’s accuracy turns out
A. Agrawal and R. Ahuja
Fig. 7 Confusion matrix for
Fig. 8 Confusion matrix for
to be approximately 92%. Both the models had little bit problems identifying the
same set of activities.
Deep Learning Algorithms for Human Activity Recognition …
6 Conclusion and Future Scope
In this study, the LSTM model came out to be more accurate with an accuracy rate
of 92% compared to the CNN model, which yielded 87% accuracy in identifying
various everyday physical activities using a tri-axial accelerometer (wearable sensor).
The data was obtained from various individuals in real-world conditions with the
accelerometer device or smartphone carried by the subjects in their pockets. Our
study’s goal was to compare how two deep neural network models like convolutional
and long short-term memory neural networks would perform comparatively on the
given set of data. With more vigorous tuning, it is estimated that CNN can perform
even better, and so can the LSTM model. For future work, we plan to create hybrid
models that combine the concept of two or more deep neural networks. We also plan
to divide the dataset based on the ages of the participants/subjects to increase the
abstraction level of our model and get new insights.
1. ARyoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming
videos. In: 2011 International Conference on Computer Vision, pp. 1036–1043. IEEE, 6 Nov
2. Lasecki, W.S., Song, Y.C., Kautz, H., Bigham, J.P.: Real-time crowd labeling for deployable activity recognition. In: Proceedings of the 2013 Conference on Computer Supported
Cooperative Work, pp. 1203–212, 23 Feb 2013
3. Banos, O., Damas, M., Pomares, H., Prieto, A., Rojas, I.: Daily living activity recognition based
on statistical feature quality group selection. Expert Syst. Appl. 39(9), 8013–8021 (2012)
4. Chen, L., Nugent, C.D., Wang, H.: A knowledge-driven approach to activity recognition in
smart homes. IEEE Trans. Knowl. Data Eng. 24(6), 961–974 (2011)
5. Jalal, A., Uddin, M.Z., Kim, J.T., Kim, T.S.: Recognition of human home activities via depth
silhouettes and transformation for smart homes. Indoor Built Env. 21(1), 184–190 (2012)
6. Han, J., Shao, L., Xu, D., Shotton, J.: Enhanced computer vision with microsoft Kinect sensor:
a review. IEEE Trans. Cybernet. 43(5), 1318–1334 (2013)
7. Zhang, Z.: Microsoft Kinect sensor and its effect. IEEE Multimedia 19(2), 4–10 (2012)
8. Huynh, D.T.: Human activity recognition with wearable sensors. Doctoral dissertation.
Technische Universität
9. Lawrence, E., Sax, C., Navarro, K.F., Qiao, M.: Interactive games to improve quality of life for
the elderly: Towards integration into a WSN monitoring system. In: 2010 Second International
Conference on eHealth, Telemedicine, and Social Medicine, pp. 106–112. IEEE, 10 Feb 2010
10. Lange, B., Chang, C.Y., Suma, E., Newman, B., Rizzo, A.S., Bolas, M.: Development and evaluation of low-cost game-based balance rehabilitation tool using the Microsoft Kinect sensor.
In: 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology
Society, pp. 1831–1834. IEEE, 30 Aug 2011
11. Yoshimitsu, K., Muragaki, Y., Maruyama, T., Yamato, M., Iseki, H.: Development and
initial clinical testing of “OPECT”: an innovative device for fully intangible control of the
intraoperative image-displaying monitor by the surgeon. Oper. Neurosurg. 10(1), 46–50 (2014)
12. Chen, L., Hoey, J., Nugent, C.D., Cook, D.J., Yu, Z.: Sensor-based activity recognition. IEEE
Trans. Syst. Man Cybernet. Part C Appl. Rev. 42(6), 790–808 (2012)
13. Wang, J., Chen, Y., Hao, S., Peng, X., Hu, L.: Deep learning for sensor-based activity
recognition: a survey. arXiv preprint arXiv:1707.03502 (2017)
A. Agrawal and R. Ahuja
14. Lara, O.D., Labrador, M.A.: A survey on human activity recognition using wearable sensors.
IEEE Commun. Surv. Tutor. 15(3), 1192–1209 (2012)
15. Cornacchia, M., Ozcan, K., Zheng, Y., Velipasalar, S.: A survey on activity detection and
classification using wearable sensors. IEEE Sens. J. 17(2), 386–403 (2016)
16. Scholz, M., Sigg, S., Schmidtke, H.R., Beigl, M.: Challenges for device-free radio-based
activity recognition. In: Workshop on Context Systems, Design, Evaluation and Optimisation,
Dec 2011
17. Amendola, S., Lodato, R., Manzari, S., Occhiuzzi, C., Marrocco, G.: RFID technology for
IoT-based personal healthcare in smart spaces. IEEE Internet Things J. 1(2), 144–152 (2014)
18. Wang, S., Zhou, G.: A review on radio based activity recognition. Digit. Commun. Netw. 1(1),
20–29 (2015)
19. Herath, S., Harandi, M., Porikli, F.: Going deeper into action recognition: a survey. Image Vis.
Comput. 1(60), 4–21 (2017)
20. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and
other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)
Comparison of Parameters
of Sentimental Analysis Using Different
Akash Yadav and Ravinder Ahuja
1 Introduction
Nowadays, we see the usage of social media is increasing exponentially, and by this,
various sectors are targeting social media platform as their launch pad for example—
usage of social media in influence the elections self-promotions, etc., so there is need
to analyze the opinion as it draws responses to various responses available on social
media. Twitter is one place where people view their views very strongly on different
issues. For examining user thoughts, sentimental analysis has become a significant
source for the purpose of solving hidden pattern in a large number of tweets with the
help of machine learning algorithms. We prepared our work with ten algorithms to
sort outperformance of classifiers. We used feature extraction and machine learning
algorithm in two different entities. Our main contribution is to find out the best
classification algorithm to be applied to get the maximum potential of sentimental
analysis by comparing four significant factors of performance of each classification
algorithm which is described as F1-score, precision, accuracy, and recall.
2 Related Work
The sentiment140 dataset we used in our work was generated using the automated
labeling method [1]. Baccianella et al. [1] they used automation to take the lead
A. Yadav (B) · R. Ahuja
School of Computing Science and Engineering, Galgotias University, Greater Noida, Uttar
Pradesh, India
e-mail: akashyadav1197@gmail.com
R. Ahuja
e-mail: ahujaravinder022@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
A. Yadav and R. Ahuja
of emoticons found in texts. They are a combination of symbols which show some
type of feeling which depicts different polarities. Tweets were gathered and formed
a dataset that shows all the details of texts with their polarities. A total of 1.6 million
records were collected, and they performed machine learning algorithms, and performance was measured. Algorithms used in [1] are Naive Bayes, SVM, and maximum
entropy. Various recent research has been done on sentimental analysis. We have
seen these works based on subjectivity or sentiment level.
We have seen in [2] that it uses both movie review and sentiment140 dataset and
performed five machine learning algorithms on feature combinations, which shows
the unique perspectives on sentimental analysis. Algorithms used in this were Naive
Bayes, support vector machine, and maximum entropy.
3 Methodology Used
This section describes the model to do our experiment for sentiment classification.
The whole process of classification. Various steps are involved in the process of our
system. We have taken a dataset and which is labeled that is why we have developed
a supervised algorithm. In the first process, we have collected the data in the from
one source and indexed it for the preprocessing. Next, we have applied various text
preprocessing processing, text cleaning, etc., and in the next step, we have performed
data visualization to understand the data on which we are working and performing
various feature extraction and later use these extractions for performing at least
ten classifications. Next, we analyze the performance of each classification will be
compared as the final result so that you can select which rating has a higher rate of
outcome with which feature extraction.
3.1 Data Description
The sentiment140 dataset consists of six columns which include [3] sentiment, Id,
date, query, user, text below the table best describes the format of the dataset (Table
Table 1 Format of the dataset
4 Apr
3 Mar
Comparison of Parameters of Sentimental Analysis Using …
3.2 Performance Metrics
We have discussed that we have four metrics to compare our results, and these
measures will help in achieving the output, which is as follows.
Precision is a measure of closeness between two or more values it tells how accurate
our model is of predicted truth.
It is termed as how many actual positive our model capture through labeling it as
It is a measurement of the process’s accuracy. It uses two entity precision and recall for
calculation. Harmonic mean of these entites helps in understanding more accurately
in gaining results.
Accuracy simply in machine learning means division between the numbers of correct
predictions by total numbers of the input number.
3.3 Data Preprocessing and Text Cleaning
Data preprocessing contains various steps, which include tokenization, text cleanup,
and encoding. Usually, raw tweet or texts contain mentions, escape negation words,
special characters, different cases, URLs, user name, hash tag, punctuation, etc.,
which do not resemble any sentiment by removal of non-performing texts which do
not originate any polarity, using [4] third-party Python package NLTK stop word
corpus. After cleanup of textual data, the remaining texts are then stored in CSV for
the further feature extraction process.
A. Yadav and R. Ahuja
3.4 Feature Extraction
Bag of Words
A bag of words is simply a representation of the occurrence of words within a
document. It firstly creates a vocabulary of words exists in our dataset and starts
counting the occurrence of vocabulary in our document.
N-Gram Features
We have experimented with unigram, bigram, and trigram as features and applied
machine learning classifiers for sentimental analysis. We have used the pipeline to
create feature vector and evaluation.
Term Frequency Inverse Document Frequency (TF-IDF)
Tf-IDF simply states that how an important a tokenized text in a document. Simply
converts raw corpus.
3.5 Classification Models
We have used ten classifiers to our feature vectors, so here these are used in our work.
Ridge Classifier
Ridge classifier uses ridge regression to find the estimate to minimize the bias and
variance in linear classification. Its primary objective is to reduce the sum of the
square of irregularities.
Multi-Layer Perceptron
Perceptron is a classifier that is used for the leaning model. It changes its values only
if any error occurs; it does not have any learning rate and regularization parameter.
Comparison of Parameters of Sentimental Analysis Using …
Multinomial Naïve Bayes Classifiers
Multinomial Naïve Bayes classifiers are used to calculate the probabilistic result. It is
based on the Bayesian theorem. This classifier calculates the conditional probability
of specific words given a domain as the relative frequency of term q in the record
belongs to the domain (d).
K-Nearest Neighbor
It helps in classifier by merely assigning them a label of class if its mean is closer to
the observation.
Bernoulli Naive Bayes Classifier
Bernoulli Naive Bayes classifier is used to determine when the prediction is in
Boolean form. It is much similar to multinomial NB.
AdaBoost Classifier (Adaptive Boosting)
It is a combination of many algorithms used to improve the performance of the model.
SGD Classifier
This classifier is used to determine the decrease in the loss of irregularity or errors
to make the model become fit for classification.
Passive Aggressive Classifier
It is a simple classifier for large scaling learning which does not require any learning
rate and have regularization parameter which is openly available in sklearn package.
Linear SVC is part of the support vector machine, which helps in that explore a
non-probabilistic binary algorithm by creating a hyperplane between two classes.
A. Yadav and R. Ahuja
Logistic Regression
Logistic regression simply learns to estimate the result with the help of its previous
dataset; it is very efficient and does not require advanced computational requirements.
4 Experiment and Result Evaluation
We have experimented with dataset firstly to clean our tweets, and then, we performed
the data visualization and by the following results came out.
During the examination, the dataset we try to find the exact numbers of positive
and negative sentiment exists in the dataset (Table 2).
With the help of word cloud, we were able to form a cloud of words which shows
most frequent in our dataset and help us in finding out the words which suit with
different polarity as below figures will explain the text of tweets related to positive
and negative polarity (Fig. 1).
By this, we can analyze how our data is defined in the dataset and how people’s
opinions generally contain similar words that are used to define both the sentiments—
positive and negative. Some words like, “today”, “one”, “still” can be termed as
neutral. Words like “sad”, “bad”, “hate”, “suck”, “wish”, etc., make sense as negative
words (Fig. 2).
Table 2 Number of the
dataset in both sentiments
Data in numbers
Fig. 1 Word cloud of negative words in the dataset
Comparison of Parameters of Sentimental Analysis Using …
Fig. 2 Word cloud of positive words in the dataset
Table 3 Words in the dataset
and representing sentiments
Count of the word in the dataset
“love” in negative tweets
“lol” in positive tweets
“lol” in negative tweets
In this word cloud of positive tweets, neutral words, like “today”, “tonight”,
“still”, etc., are present. Also, words like “thank”, “haha”, “awesome”, “good”, etc.,
stand out as the positive words. With the help of word cloud, we found that words
that generally symbolize positive polarity do not mean the same in the context of
sentences. As we saw, “love” is used in negative tweets. “lol” is used in both contexts
of sentiment; this study is shown in Table 3.
During the process of data visualization, we used data to search the most frequent
tokens and least frequent tokens of both polarities.
In negative tokens, the most used token is words like “just”, “work”, “day”, “got,”
which does not convey any negative emotions. On the other hand, words like sad,
bad, miss, sorry, and hate clearly represent negative emotions.
In positive tokens, most used tokens are excellent, love like, thanks, new, which
clearly represent positive emotions, and other frequent words like just, got, today,
day convey minimal emotions.
For the further process of our experiment, we have to split the dataset, which
breaks down our dataset into three parts, which are as follows.
Train set: 98% of the dataset.
Development set: 1% of the total dataset.
Test set: 1% of the total dataset.
A. Yadav and R. Ahuja
We have used text blob as a baseline for our project. It is used as a point of
reference. Text blob gave 61.41% accuracy on the validation set.
The next step in our experiment is to apply feature extraction on these tokenized
dataset records, which converts the text into vector forms.
After performing feature extraction, we experimented with all the classifiers we
have discussed in earlier segment firstly we have performed n-gram features and
calculated validation accuracy of the bag of words and tf-IDF using n-gram. The
outcome showed us which feature extraction shows the maximum accuracy at an
exact number of features and following reading came out as results:
Bigram Tfidf at 90,000 features gives the highest validation accuracy at 82.45%
trigram Count Vectorizer at 80,000 features gives to have the highest accuracy.
Now, we have seen which feature gives the highest accuracy; so now, we perform
all the classifier on these vectors at their maximum features. After these performances,
we store these values in two different tables.
Term frequency and inverse document frequency bigram
Accuracy (%) Precision Recall F1 score
Ridge classifier
Negative 82.29
Logistic regression
Negative 82.43
Multi-layer perceptron
Negative 76.39
Negative 79.86
Stochastic gradient descent
Negative 78.81
Linear SVC
Negative 82.26
Negative 80.15
K-nearest neighbor
Negative 72.55
Bernoulli Naïve Bayes
Negative 79.91
Negative 70.23
Negative 82.41
Passive aggressive classifier
Multinomial NB
Adaboost classifier
L1-based linear SVC
Comparison of Parameters of Sentimental Analysis Using …
Count vectorizer trigram
Accuracy (%)
F1 score
Ridge classifier
Logistic regression
Multi-layer perceptron
Passive aggressive classifier
Stochastic gradient descent
Linear SVC
Multinomial NB
K-neighbors classifier
Bernoulli NB
Adaboost classifier
L1-based linear SVC
5 Conclusion and Future Scope
The above work addresses the task of examining the performance of classifiers in our
system. After this work, we are able to determine which classifier gives a maximum
result with a combination of feature extractor in the sentimental analysis of tweets
or comments. By selecting appropriate methods for sentimental, we can achieve
higher success, which helps in today’s business models. For the future, we will try
the combination of feature extractions, which may help in achieving a high success
rate than this paperwork provides.
A. Yadav and R. Ahuja
1. Baccianella, S., Esuli, A., Sebastiani, F.: Sentiwordnet 3.0: an enhanced lexical resource for
sentiment analysis and opinion mining. In: LREC, vol. 10, no. 2010, pp. 2200–2204, May 2010.
2. Iqbal, N., Chowdhury, A.M., Ahsan, T.: Enhancing the performance of sentiment analysis
by using different feature combinations. In: 2018 International Conference on Computer,
Communication, Chemical, Material and Electronic Engineering (IC4ME2), pp. 1–4. IEEE,
Feb 2018
3. https://help.sentiment140.com/for-students
4. Bindal, N., Chatterjee, N.: A two-step method for sentiment analysis of tweets. In: 2016
International Conference on Information Technology (ICIT), pp. 218–224. IEEE, Dec 2016
System Model to Effectively Understand
Programming Error Messages Using
Similarity Matching and Natural
Language Processing
Veena Desai, Pratijnya Ajawan, and Balaji Betadur
1 Introduction
The development and growth of technology bettered computers that are remarkably
intelligent to read the mind of a human and display it on the screen. Human speech
can be understood by humans, but the computer does not have a brain of its own, it
must be trained and prepared in a manner to understand the human speech, recognize it, analyze it, process it, and respond appropriately. This is performed using
programming techniques like natural language processing (NLP). In 1950, Alan
Turing published an article titled “Computing Machinery and Intelligence” which
proposed what is now called as Turing test as a criterion of intelligence that was
used to decode the Russian messages, i.e., enigma [1]. Natural language processing
is the process of understanding the speech and responding to it accordingly. It deals
with two parts, natural language understanding and natural language generating. The
speech is first segmented, and then it is divided into smaller parts called tokens. For
each token, parts of speech are predicted. Simultaneously stemming and lemmatization that associate with doing things properly with the use of vocabulary and
morphological analysis of words, commonly aiming to remove inflectional endings
only and to return the base or dictionary form of a word. Stop words will be identified, and the followed data is presented to the computer for further processing. The
algorithm recognizes the input accurately and renders a specific answer for even
short queries.
V. Desai (B) · P. Ajawan · B. Betadur
KLS Gogte Institute of Technology, Belgaum, India
e-mail: veenadesai@git.edu
P. Ajawan
e-mail: psajawan@git.edu
B. Betadur
e-mail: balajibetadur@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
V. Desai et al.
2 Methodology and Preprocessing
There are preprocessing methods carried on the raw data and as well as the input
query to analyze and respond. Natural language processing supports to perform these
methods. The preprocessing methods are shown in Fig. 1.
Once the input query and the data are processed, the algorithm uses cosine similarity matching to compare the input query and the processed data. The data with
maximum similarity is returned as a result. Python provides various modules to carry
out all these functions easily.
Fig. 1 Sequence of events in
the algorithm
System Model to Effectively Understand Programming Error …
Fig. 2 Data after
2.1 Tokenization
When raw data is fed to the algorithm, it includes unwanted noise which can cause
defects, they need to be corrected. The algorithm performs tokenization as the first
method which tokenizes both data and the input query and proceeds to further preprocess techniques. Data after tokenization is represented in Fig. 2. Every word is treated
as a token, and it appears as shown in Fig. 2.
2.2 Stop Words Removal
Stop words are helping verbs in the sentence, and they do not perform a much
important role in natural language processing and similarity matching and also for
the computer to understand the speech. Data before removal of stop words and after
removal of stop words is represented in Figs. 3 and 4, respectively.
The word cloud describes the frequency of words that occurred in the data. It is
represented in Fig. 5. The size of the word in the word cloud is directly proportional
to the frequency of the word in the data.
Fig. 3 Data before removal
of stop words
Fig. 4 Data after removal of stop words
V. Desai et al.
System Model to Effectively Understand Programming Error …
Fig. 5 Word cloud of the tokens after tokenization
2.3 Count Vectorizer
Count vectorizer provides a simplistic way to both tokenize a collection of text
documents and build a vocabulary of known words, but also to encode new documents
using that vocabulary. The below syntax is of count vectorizer along with all of its
parameters. Method fit_transform applies to feature extraction objects such as count
vectorizer and Tfidf transformer (Table 1).
class sklearn.feature_extraction.text.CountVectorizer(input = ‘content’, encoding
= ‘utf-8’, decode_error = ‘strict’, strip_accents = None, lowercase = True, preprocessor = None, tokenizer = None, stop_words = None, token_pattern = ‘(?u)\b\w\w
+ \b’, ngram_range = (1, 1), analyzer = ‘word’, max_df = 1.0, min_df = 1,
max_features = None, vocabulary = None, binary = False, dtype = < class
The “fit” element regards to the feature extractor. The “transform” element
receives the data and returns transformed data. Count vectorizer counts the frequency
of each word present in the data. In Table 1, [4] the first row shows the words that
Table 1 Word frequency table
V. Desai et al.
Fig. 6 List of samples to illustrate IDF and tfidf transformer
are in the data and the second row says the frequency of occurrence of that word in
the data.
Figure 7 shows the computed IDF values by calling tfidf_transformer.fit
(word_count_vector) on the word counts. The IDF value is inversely proportional to
the frequency of the words in the sample data shown in Fig. 6, which also indicates
IDF value is directly proportional to the uniqueness of the word [3]. Lower the IDF
value of a word, the less unique it is to any particular document.
In Fig. 8, only a few have values and the others do not have any value; this is
because the first document is “SQL statements execute ()—multiple outputs” all the
words in this document have a tf-idf score and everything else shows up as zeroes.
Word “a” is missing from this list.
This is due to the internal preprocessing of count vectorizer where it removes
single characters. From Fig. 9, the frequency of the word in the document is inversely
proportional to the tfidf value. From this, it can be said that uniqueness is inversely
proportional to the tfidf value.
With Tfidftransformer algorithm will systematically compute word counts using
CountVectorizer and then compute the Inverse Document Frequency (IDF) values
and only then compute the Tf-idf scores.
With Tfidfvectorizer on the contrary, the algorithm will do all three steps at once.
Under the hood, it computes the word counts, IDF values, and Tf-idf scores all using
the same dataset [2].
System Model to Effectively Understand Programming Error …
Fig. 7 IDF weights table
2.4 Similarity Matching
Cosine similarity is a measure of similarity between two nonzero vectors of inner
product space that measure the cosine of the angle between them. Similarity matching
is the process of matching two data. i.e., comparison of two things and returns the
index of similarity.
In Eq. (1), Ai and Bi are components of vector A and B, respectively. The resulting
similarity ranges from − 1 to 1, where −1 represents that two vectors are opposite to
each other and 1 represents that two vectors are completely similar. Zero represents
the orthogonality and other values show intermediate similarities or dissimilarities.
Figure 9 illustrates the same with an example.
Eqn 1: Cosine similarity matching
Ai Bi
= i=1
similarity = cos(θ ) =
i=1 Bi
Figure 10 represents data where the user will compare and find the similarity
match between different items. It shows you the data and the code where the user
Fig. 8 tfidf values table
Fig. 9 Cosine similarity match
V. Desai et al.
System Model to Effectively Understand Programming Error …
Fig. 10 Similarity match working
is defining three variables and assigning them some data. The result of the same is
shown in Fig. 11.
After comparing, the outcome will be in the form of a matrix where the words are
compared with the three variables and they have two possible values they are 1 and
0. If the value is one, it indicates they are similar and if the value is 0 they are not
similar, i.e., they are orthogonal.
When the query is encountered, it is matched with all the data samples and returns
a list of values that have maximum similarity values, among which the top n values
will be selected and returned as the response. In Fig. 12, the first row has the maximum
similarity which will be the most suitable answer.
Fig. 11 Output of code in Fig. 11
V. Desai et al.
Fig. 12 Listing of top 10 matched items in data
3 Algorithm
The algorithm is the most important part of the project by which the entire process is
analyzed. Figure 13 illustrates the flow of algorithm. The algorithm steps are given
Fig. 13 Flow of algorithm
System Model to Effectively Understand Programming Error …
The user will provide the query as input, the algorithm will get the input and
The input given by the user is preprocessed by experiencing many steps like
tokenization, stop-words removal, etc.
Now the refined query is searched in the database for its match with the help of
cosine similarity match.
If the data is obtained with the maximum match, then the corresponding answer
is returned to the user.
If the given query did not match or matched with very less CSI value, then the
data from the other sources is obtained for the query provided by the user.
The obtained data is supplemented to the main database along with the query
asked by the user.
The obtained data is returned to the user and requests for the next input.
Step 1 to step 7 are repeated until the user is finished with all his queries.
4 System Architecture
The complete model is flexible and scalable that even if there is abrupt traffic, the
system can handle it. There is local data that is first explored for matching the input
query, and once the solution is found, the corresponding answer is returned as a
response. This process is achieved by cosine similarity matching technique which
returns the data with respect to the user’s input query which has maximum similarity
In case if there is no match in the local data or there is data with less similarity
value, then the answer corresponding to that data cannot be returned as that data is
inappropriate. The algorithm needs to find the answer to that, which is done by web
scraping. Here the algorithm takes the input given by the user preprocesses it and
then searches the answer in the web using web scraper from the documents and many
other resources from the web, then returns the top 10 results to the algorithm, which
in turn returns it to the user.
The algorithm not only returns the result to the user but also saves the copy of
the new question asked by the user and its corresponding answer in the local data, if
in a case the similar or related question is repeated, i.e., asked by the user then the
answer can be given directly from the local data faster than the other method, i.e.,
web scraping method.
5 Scalability of the System
For the system functionality, there are many search operations frequently over enormous agricultural documents or data from the database or local data so that the
V. Desai et al.
solution can be found. When these functions are going on every single day by thousands of user’s the data requirement grows bigger and bigger that the data in the
database will not be enough and also it’s a limited data if in case the number of user’s
increase and the type of queries reaching us come up to be very different from the
database, then algorithm needs to find the alternative for the problem.
The algorithm starts collecting data from other different resources to get results
for the user’s queries. Professional documents can be obtained from different sources
and the data retrieved can be added to a database which makes the database a huge
pool of data and that must be a problem for a machine to process the data. This makes
a delay in fetching the result and causes a delay in the process where the end-user
must wait until the result loads. This can be solved by the web scraping methods
which lets the user get the data from World Wide Web immediately after the query
is asked.
Web scraping is made easy by Python modules like beautiful soup, selenium, etc.
The algorithm uses selenium for auto operating and downloading data automatically
from the World Wide Web into the local database without any human input. The data
get downloaded automatically every month and added to the database. The data for
every month is also saved separately. Selenium creates auto clicks and downloads
the new data file and moves the files to the required location.
6 Conclusion and Result
This work shows the potential of data extraction besides how high availability and
scalability can be achieved. For the same process, this prototype is designed that
classifies and fetches the required solution and descriptions collecting everything
in a queryable database with simple grammar to retrieve the collected knowledge
anytime thereafter. Sometimes the data is obtained from various resources which
make the system more scalable, accurate, and efficient. The problem is solvable in
a single click without consuming much time. It can be advanced in such a manner
that when the error has occurred, automatically the system will recognize the error
and perform the complete algorithm providing a solution for the user or implements
itself automatically if desired. There is no need for a complex general system when
the user can tackle more efficiently the same issue from a domain-specific point of
1. Gómez-Pérez, P., Phan, T.N., Küeng, J.: Knowledge extraction from text sources using a
distributed mapreduce cluster. In: 2016 27th International Workshop on Database and Expert
Systems Applications (DEXA), pp. 29–33. IEEE (2016)
System Model to Effectively Understand Programming Error …
2. Wei, Q., Zhong, C., Yu, J., Luo, C., Chen, L.: Human-machine dialogue system development
based on semantic and location similarity of short text model. In: 2018 2nd IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC),
pp. 2029–2032. IEEE (2018)
3. Revathy, M., Madhavu, M.L.: Efficient author community generation on NLP based relevance feature detection. In: 2017 International Conference on Circuit, Power and Computing
Technologies (ICCPCT), pp. 1–5. IEEE (2017)
4. Google Images (google.com)
Enhanced Accuracy in Machine
Learning Using Feature Set Bifurcation
Hrithik Sanyal, Priyanka Saxena, and Rajneesh Agrawal
1 Introduction
During the last few decades, medical research is being taking the advantage of highspeed data processing through technology [1]. Researchers applied different techniques, such as screening, to identify the stages of the diseases before they indicate
symptoms. Additionally, experts have developed new approaches for the extrapolation of diagnosis and treatment in the early stages of the various diseases. With the
development of new methods, the accurate prediction of disease and correct diagnosis
has become one of the most challenging and exciting tasks for physicians.
The large volume of data is being collected extremely fast in the field of biomedical and is a rich foundation of information in medical research. Machine learning
encourages us to portion this huge data and extract information from this as per
the past diagnosis and identify hard-to-see symptomatic knowledge from enormous
and boisterous dataset according to World Health Organization (WHO) [1]. For
example, breast cancer has some aspects like survivability and recurrence (return of
cancer after treatment). These are a critical phenomenon in breast cancer behavior,
inherently related to the death of the patient [2]. Prediction of diseases is possible
by deciphering the information from the data which is found to be indicative of the
H. Sanyal (B)
Department of Electronics and Telecommunications, Bharati Vidyapeeth College of Engineering
Pune, Pune, India
e-mail: hrithiksanyal14@gmail.com
P. Saxena
Department of Computer Science, Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal, India
e-mail: anusaxena1218@gmail.com
R. Agrawal
Comp-Tel Consultancy, Mentor, Jabalpur, India
e-mail: rajneeshag@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
H. Sanyal et al.
disease. Application of ML in the medical domain is growing rapidly due to the
effectiveness of its approach in prediction and classification, especially in medical
diagnosis to predict diseases like breast cancer, now it is widely applied to biomedical
The purpose of this work, in general, is to predict the disease from the historical
data and in the research carried out, the comparative analysis of ML techniques is
provided to increase the prediction rate using the example of breast cancer dataset
and also is to provide better accuracy evaluation with higher speed.
In this paper, Sect. 1 introduces the usage of ML in cancer diagnosis, Sect. 2
details about techniques of machine learning and their descriptions, Sect. 3 discusses
the literature survey, Sect. 4 discusses the use of feature-set bifurcation and Wisconsin
data set that will be used in the simulation. Section 5 details the proposed methodology with diagrammatic representation, and Sect. 6 results obtained are discussed
and finally at the end in Sect. 7, we have the future work and conclusion as well.
2 Machine Learning Techniques
2.1 Machine Learning Techniques
Machine learning is a framework which takes information, uses algorithmically
developed designs, trains it by utilizing the input information and yields a result
which is fed back to use it in future processing. ML is a subset of artificial intelligence (AI). The machine learning understands the structure of data and puts that data
into new structural models that are understandable and useful for people. Machine
learning uses different types of techniques. These techniques are:
Supervised Learning: Supervised learning prepares a model on known information
and yields information. This aide of classification and regression in foreseeing future
yields precisely. On taking and learning from known inputs and outputs, it builds
and trains a model that would forecast accurately from the known facts. Supervised
learning is typically castoff for prediction when the output of the data is known.
Unsupervised Learning: Unsupervised learning is a classification of ML which
searches hidden patterns or structures in data. It helps to make inferences from
datasets which are consisting of responses that are not tagged or labelled. Unsupervised learning mostly uses clustering technique. Clustering is furthermost used unsupervised learning technique. It is used for finding concealed outlines or federations
in datasets and thus analyzing them.
Enhanced Accuracy in Machine Learning Using Feature …
2.2 Machine Learning Classifiers
Various classifiers have been designed by the researchers in the business. A list of
some well-known classifiers that are used in ML is as follows:
Logistic regression
Naïve Bayes network classifier
Decision tree.
Logistic Regression: Logistic regression is a classification algorithm in ML which
makes use of one or more autonomous input variables to come up with an outcome/
output. It works on discrete output labels which can attain any of the two outputs such
as yes/no, true/false, 1/0. Logistic regression is specifically implied for arrangement;
it helps see how a lot of autonomous factors influence the result of the dependent
variable. The main shortcoming of the logistic regression algorithm is that it possibly
works when the anticipated variable is discrete.
Naïve Bayes Network Classifier: It is based on Bayes’ algorithm assumes the presence of a specific feature in a class which is not related to any other feature’s presence.
It is useful for an an enormous informational collection. Despite the fact its methodology is very complex, Naïve Bayes is known to perform better than the rest of the
other classifiers calculations in ML. The Naïve Bayes classifier requires an extremely
modest quantity of prepared informational index to give a correct estimation of the
fundamental boundaries for getting the outcomes. When compared to other classifiers, they are extremely fast in nature. One big disadvantage is that their estimation
is based on probability and hence not always very accurate.
Decision Tree: The decision tree algorithm makes use of a tree structure using the
“if-then” rules which are mutually exclusive in classification. The procedure of the
decision tree starts on, by separating the information into little pieces/structures and
thus connecting it with a tree, which will increase gradually. This procedure goes
on and proceeds on the preparation of information collections until the endpoint is
reached, i.e., the termination point. A decision tree is easy to comprehend, envision
and imagine. It may work with very little datasets for predictions. One disadvantage
of the decision tree is that it can make simple trees very complex, which may become
waste later on. Similarly, it may be very precarious subsequently, ruining the entire
structure of the decision tree.
3 Literature Review
Breast cancer is considered to be the most deadly type of cancer among the rest of the
cancers. Notwithstanding, being treatable and healable, if not diagnosed at an early
stage, a humongous number of people do not survive since the diagnosis of the cancer
is done at a very late stage when it becomes too late. An effective way to classify data
H. Sanyal et al.
in medical fields and also in other fields is by using ML data mining and classifiers,
which helps to make important decisions by the methodology of diagnosis.
The dataset which has been used is UCI’s Wisconsin dataset for breast cancer. The
ultimate objective is to pigeonhole data from both the algorithm and show results in
terms of preciseness. Our result concludes that decision tree classifier among all the
other classifiers gives higher precision [1].
Cancer is a dangerous kind of disease, which is driven by variation in cells inside
the body. Variation in cells is complemented with an exponential increase in malignant cell’s growth and control as well. Albeit dangerous, breast cancer is also a very
frequent type of cancer. Among all the diseases, cancer has been undoubtedly the
most deadly disease. It occurs due to variation and mutation of infectious and malignant cells which spread quickly and infect surrounding cells as well. For increasing
the survival rate of patients, suffering from breast cancer, early detection of the
disease is very much required. Machine learning techniques help in the accurate and
probable diagnosis of cancer in patients. It makes intelligent systems, which learn
from the historical data and keeps learning, from the recent predictions, to make the
decisions more accurate and precise [3].
AI is considered man-made consciousness that enjoys an assortment of factual,
probabilistic and improvement strategies that permits PCs to “learn” from earlier
models and to distinguish hard-to-perceive designs from prodigious, loud or complex
informational indexes. This capability is particularly proper to clinical applications,
especially those that depend upon complex proteomic and genomic estimations.
Hence, AI is a great part of the time used in threatening development examination
and revelation. AI is likewise assisting with improving our essential comprehension
of malignancy advancement and movement [2].
Chinese ladies are genuinely undermined by the bosom disease with high dreariness and mortality. The absence of potent anticipation facsimiles brings about
trouble for specialists to set up a fitting treatment strategy that may draw out patient’s
endurance time [4].
Information mining is a basic part in learning revelation process where keen
specialists are consolidated for design extraction. During the time spent creating
information mining applications, the most testing and fascinating undertaking is
the ailment expectation. This paper will be useful for diagnosing precise ailment
by clinical experts and examiners, depicting different information mining methods.
Information mining applications in therapeutic administrations hold goliath potential
and accommodation. Anyway, the effectiveness of information mining procedures
on medicinal services space relies upon the accessibility of refined social insurance information. In our present examination, we talk about scarcely any classifier
methods utilized in clinical information investigation. Additionally, not many sicknesses forecast examinations like bosom malignant growth expectation, coronary
illness conclusion, thyroid expectation and diabetic are thought of. The outcome
shows that decision tree calculation suits well for infection expectation as it creates
better precision results [5].
Enhanced Accuracy in Machine Learning Using Feature …
Table 1 Depiction of the
breast cancer datasets
Attribute count Instance count Class count
Original data
Diagnosed data 32
Prognosis data
4 Feature Set Bifurcation
In machine learning, decision tree classifier has been always a good choice for the
authors due to its high prediction accuracy. But when the feature set is large means the
attributes are too many then it reduces the performance and accuracy both. Therefore,
this work proposes to form subsets from the main feature set which are used in
implementing multiple decision trees. This can be done by simple bifurcation or
maybe some algorithm is used for the same. In this work, simple bifurcation of
feature set has been applied and implementation gives encouraging results.
To recognize and differentiate the threatening samples from benevolent samples,
UCI’s Wisconsin dataset for breast cancer is being used in different ML implementations.
Table 1 shows the different characteristic counting for different Wisconsin datasets
5 Proposed Work
This work proposes to create a decision tree-based cancer patient data processing
environment which will not only faster but will also provide high accuracy. The
system will leverage the facility of the multi-threaded system in which two different
decision trees shall be created having a different set of attributes processed in parallel.
The outcomes acquired from both the threads shall be combined to get the final results.
The proposed system will be executed in the following steps:
Identification and separation of attributes for making a decision tree
Generation of threads for implementation of multiple decision trees
Combining multiple decision trees for getting the final results.
This approach will have following time complexities:
Separation Complexity
O(z1) = O(k/r ) ∗ r
where k is count of the attributes, and r is count of “dt”.
Processing Complexity
H. Sanyal et al.
SET - 1
SET - 2
Fig. 1 System flow of the proposed work
O(z2) = O(h ∗ r 2)
where “h” is the scope of the training data, and r is count of “dt”.
Combination Complexity
O(z3) = r
where r is the count of “dt”.
Overall Complexity
O(h) = O(z1) + O(z2) + O(z3)
O(h) = O( p/r ) ∗ r + O(h ∗ r 2) + O(r )
Comparison of a simple decision tree and modified decision tree:
O(sdt) = O(h ∗ p2)[6]
O(mdt) = O( p/r ) ∗ r + O(h ∗ r 2) + O(r )
Enhanced Accuracy in Machine Learning Using Feature …
Fig. 2 Flowchart of the
complete proposed system
Input Dataset
Evaluate Attributes
Apply Division of
Start One Process for
Each Set of Attributes
Label & Feature
Set - 1
Label & Feature
Set - 2
Apply Gini Index
Apply Gini Index
Calculate Accuracy
Calculate Accuracy
Combine Step
Display Final
From the above two equations, it is clear that the O(mdt) O(sdt) when r > 1
as the complexity of simple decision tree will be two high if the number of attributes
are too high.
6 Results Obtained
Decision tree is one of the utmost used classifiers in machine learning which provides
high accuracy in prediction of correct results. It is a supervised learning algorithm
and works on a dataset which uses a LABEL (Result) and Feature Set (Inputs). It
Table 2 Accuracies obtained
for the bifurcated feature sets
and the composite feature set
H. Sanyal et al.
S. No.
Applied set
Accuracy obtained
First bifurcated feature set
Second bifurcated feature set
Composite feature set
learns from the provided dataset (training set) and evaluates results based on test
inputs. Its basic strategy is to apply a condition on the input features and generate the
binary results. The results are filtered with the application of each condition forming
a binary decision tree. Since conditions are applied on each feature from the feature
set, its accuracy is based on both relevant and non-relevant features.
In this effort, a new feature set bifurcation technique is being applied. The
feature set has been divided into two groups, and different decision trees have been
formed to obtain the accuracies. These accuracies have been averaged to obtain
the accurateness of the system. This proposed system has been implemented using
Wisconsin dataset, and two sets of attributes have been formed to generate result
accuracy depicted in the following table and graphs.
Table 2 shows the accuracies obtained from the execution of the implemented
system using Python.
Interface: From the graph, it is shown that the accuracy obtained from the distributed
feature sets is different due to the impact of the features. Accuracy of the first feature
set is 94.15%, whereas the same from second feature set is 86.55%. The average
accuracy is 90.35% in the available work bench.
Fig. 3 Accuracies obtained from the two feature set along with average accuracy
Enhanced Accuracy in Machine Learning Using Feature …
Fig. 4 Comparison of accuracies obtained for the composite feature set and the distributed feature
set average accuracy
Interface: The second graph shows the accuracies obtained from the composite
feature set which is 90.06% and the average accuracy is 90.35%. It is found that
the average accuracy is more than the composite accuracy to prove the hypothesis is
correct. The reason for same is also explicit as the data available for decision making
is smaller, filtered set remaining after application of the other features on the main
7 Conclusion and Future Work
This paper is proposing to provide a better decision tree algorithm implementation
which has not only has high performance but is also having high accuracy. Paper
provides details of machine learning, its techniques and need for the industry and
enhancement of artificial intelligence. Further, the studies of the earlier researches
have been presented which clearly explains that the focus of the research is on having
a better solution from the existing classifiers in different scenarios. It is found that the
enhancement of accuracies and performance has been focused and for that no work
applies bifurcation technique of feature set. A comparative complexity calculation
of the simple decision tree algorithm and proposed modified decision tree algorithm
shows the enhanced time complexity, and implementation shows the comparative
accuracy between composite and average precisions, where average precision of the
bifurcated distributed dataset is found to be better than the accuracy obtained from
the composite feature set.
H. Sanyal et al.
This work can be further enhanced by applying other mechanisms of separation
of the attributes for building multiple decision trees. It can also be further enhanced
and tested on real-time data for high performance and accuracies both.
1. Sanyal, H., Agrawal, R.: Latest trends in machine learning & deep learning techniques and their
applications. Int. Res. Anal. J. 14(1), 348–353 (2018)
2. Singh, S.N., Thakral, S.: Using data mining tools for breast cancer prediction and analysis. In:
2018 4th International Conference on Computing Communication and Automation (ICCCA),
pp. 1–4. Greater Noida, India (2018). https://doi.org/10.1109/CCAA.2018.8777713
3. Su, J., Zhang, H.: A fast decision tree learning algorithm. In: Proceedings of the 21st National
Conference on Artificial İntelligence (AAAI’06), vol. 1, pp. 500–505. AAAI Press (2006)
4. Gupta, M., Gupta, B.: A comparative study of breast cancer diagnosis using supervised machine
learning techniques. In: 2018 Second International Conference on Computing Methodologies
and Communication (ICCMC), pp. 997–1002. Erode (2018). https://doi.org/10.1109/ICCMC.
5. Fu, B., Liu, P., Lin, J., Deng, L., Hu, K., Zheng, H.: Predicting invasive disease-free survival
for early stage breast cancer patients using follow-up clinical data. IEEE Trans. Biomed. Eng.
66(7), 2053–2064 (2019). https://doi.org/10.1109/TBME.2018.2882867
6. Angra, S., Ahuja, S.: Machine learning and its application. In: International Conference on Big
Data Analytics and Computational Intelligence (ICBDAC) (2017)
Author Index
Abhijith, G., 371
Adarsh, 79
Aggarwal, Archit, 211
Agrawal, Aaditya, 391
Agrawal, Rajneesh, 281, 427
Aher, Jayshree Ghorpade, 31
Ahire, Hemangi Sunil, 297
Ahmad, Anzar, 21
Ahuja, Ravinder, 391, 403
Ajawan, Pratijnya, 413
Akhil, Raghavendra, 371
Aleena, John, 371
Ambily, Gouri Babu, 109
Anagha, H. Kumar, 371
Anil, Alna, 109
Anusree, P. S., 71
Aravind, Akhil Chittathuparambil, 161
Ashish, V., 371
Babulekshmanan, Parvathy Devi, 55
Balakrishnan, Ramadoss, 273
Balasubramanian, P., 71
Betadur, Balaji, 413
Bhardawaj, Jadumani, 307, 385
Bhosale, Aditi Sanjay, 297
Biswas, Rupanjana, 161
Bodda, Sandeep, 169
Chandran, Lekshmi R., 323
Desai, Veena, 413
Deshpande, Uttam U., 63
Diwakar, Shyam, 55, 109, 161, 169, 231
Doke, Vikramsinh, 239
Dsouza, Mani Bushan, 341
Gokul Pillai, V., 323
Gola, Bhavya, 211
Gopika, Anil, 231
Gujarathi, Priyanka, 191
Gupta, Arpita, 273
Gupta, Gaurav, 79
Gupta, Richa, 249
Guru, D. S., 261
Harisudhan, Athira Cheruvathery, 169
Hari, Vishnu, 109
Hirekodi, Ashwini R., 63
Ilango, K., 371
Jadhav, Swapnil Sanjay, 297
Jain, Ashik A., 79
Jain, Sahil, 333
Jamil, Kaif, 9
Jaybhay, Avinash Yuvraj, 297
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Singapore Pte Ltd. 2021
V. K. Gunjan et al. (eds.), Cybernetics, Cognition and Machine
Learning Applications, Algorithms for Intelligent Systems,
Jithendra Gowd, K., 39
Johri, Prashant, 9
Kailasan, Indulekha P., 231
Karande, Aarti, 1
Kawrkar, Sayali, 239
Kharche, Shubhangi, 139
Kolagani, Bhavita, 231
Kulkarni, Pradnya V., 333
Kumar, Dhanush, 169
Kumari, Neha, 131
Kumar, Rajeev, 131
Lekshmi, R. Chandran, 371
Magadum, Ashok P., 63
Malik, Pallavi, 291
Mandal, Pratyush, 333
Mane, Shree G., 117
Manjaiah, D. H., 341
Manjula, G. Nair, 371
Manoj, Gautham, 109
Mapari, Vinit, 1
Mehta, Harsh Nagesh, 31
Mitchelle Flavia Jerome, M., 361
Modak, Renuka, 239
Mohan, Krishna, 169
Mukherjee, A., 291
Murali, Ranjani, 313
Nair, Anjali Suresh, 161
Nair, S. Varsha, 231
Narasimhulu, V., 39
Narkhede, Anurag, 1
Nath, Gagarina, 307
Nath, Utpal Kumar, 307, 385
Nizar, Nijin, 161
Nutakki, Chaitanya, 231
Pandita, Sahil, 139
Pandurangi, Bhagyashri R., 63
Parmar, Praptiba, 203
Patel, Deep, 153
Patel, Foram, 153
Author Index
Patel, Nihar, 153
Patel, Vibha, 153
Patil, Sandip Raosaheb, 191
Paul, Saurin, 307, 385
Pawar, Mayur M., 117
Prashanth, M. C., 221
Priya, Lekshmi Aji, 169
Priyadarshini, L., 185
Puthanveedu, Akshara Chelora, 169
Rachana, P. G., 261
Radhamani, Rakhi, 109
Rai, Ajeet, 89
Raj, Aditya, 79
Rajeev, Anjali, 169
Rajeswari, K., 297
Ranbhare, Shubham V., 117
Rastogi, Deependra, 9
Raveendran, Praveen, 109
Ravikumar, M., 177, 221, 261
Roy, Suman, 95
Sabarwal, Munish, 9
Saha, Prasenjit, 307, 385
Sahu, Ashish, 139
Sampath Kumar, S., 177
Sanghani, Disha, 203
Sankla, Tushar, 211
Sanyal, Hrithik, 281, 427
Saravana Kumar, S., 95
Sardar, Nikhil B., 117, 239
Sarika, M., 361
Sasidharakurup, Hemalatha, 55
Sathianarayanan, Sreehari, 55
Savitha, G., 79
Saxena, Priyanka, 281, 427
Sayed, Mateen, 333
Sehgal, Priti, 249
Sekar, K. R., 361
Senthilkumar, T., 47
Shah, Dhruvil, 153
Shekhar, Shashi, 21
Shivakumar, G., 177
Shivaprasad, B. J., 261
Shrinivasan, Lakshmi, 185
Singh, Aadityanand, 139
Singh, Birendra, 333
Soneji, Hitesh Narayan, 349
Sudhanvan, Sughosh, 349
Sujitha, Nima A., 231
Author Index
Thaventhiran, C., 361
Walimbe, Varun, 139
Venkataraman, V., 361
Venu, Sukriti Nirayilatt, 161
Vijayan, Asha, 169
Yadav, Akash, 403