WEATHER PREDICTION AND FORECASTING SYSTEM USING IOT AND DATA SCIENCE AT INDIA METEOROLOGICAL DEPARTMENT LODHI ROAD, NEW DELHI SUMMER TRAINING REPORT IN PARTIAL FULFILMENT OF THE AWARD OF FULL TIME Bachelor of Technology in Electronics and Communication Engineering by Shesh Narayan Singh (1900970310152) Under the Guidance of Shri. K C Sai Krishnan Scientist F India Meteorological Department Galgotias College of Engineering and Technology, Greater Noida. (Affiliated to Dr. A.P.J Abdul Kalam Technical University, Lucknow) July, 2022 DECLARATION I hereby declare that the report entitled “WEATHER PREDICTION AND FORECASTING SYSTEM USING IOT AND DATA SCIENCE” submitted by me to the Branch of Electronics And Communication Engineering , India Meteorological Department(IMD) ,New Delhi in partial fulfillment of the requirements for the award of the degree of Bachelor of Electronics And Communication Engineering is a record of bona-fide work carried out by me under the supervision of Shri. K C Sai Krishnan, Scientist ‘F. I further declare that the work reported in this report has not been submitted and will not be submitted, either in part or in full, for the award of any other degree or diploma of this institute or of any other institute or university. Shesh Narayan Singh (1900970310152) Date: ii ABSTRACT The system proposed is an advanced solution for weather monitoring that uses IoT to make its real time data easily accessible over a very wide range. The system deals with monitoring weather and climate changes like temperature, humidity, wind speed, moisture, light intensity, UV radiation and even carbon monoxide levels in the air; using multiple sensors. These sensors send the data to the web page and the sensor data is plotted as graphical statistics. The data uploaded to the web page can easily be accessible from anywhere in the world. The data gathered in these web pages can also be used for future references. The project even consists of an app that sends notifications as an effective alert system to warn people about sudden and drastic weather changes. For predicting more complex weather forecasts that can’t be done by sensors alone we use an API that analyses the data collected by the sensors and predicts an accurate outcome. This API can be used to access the data anywhere and at any time with relative ease and can also be used to store data for future use. Due to the compact design and fewer moving parts this design requires less maintenance. The components in this project don’t consume much power and can even be powered by solar panels. Compared to other devices that are available in the market the Smart weather monitoring system is cheaper and cost effective. This project can be of great use to meteorological departments, weather stations, aviation and marine industries and even the agricultural industry. Key Words: Internet of Things (IoT), development boards, embedded systems, Raspberry pi, NodeMCU, ESP8266, Arduino IDE, Ubidots, and API iii ACKNOWLEDGEMENT I am feeling extremely satisfied presenting this summer training report entitled “WEATHER PREDICTION AND FORECASTING SYSTEM USING IOT AND DATA SCIENCE”. I take this opportunity to express my acknowledgement and deep sense of gratitude to the individuals for rendering valuable assistance and gratitude to me. Their input has played a vital role in the success of this summer training & formal piece of acknowledgement may not be sufficient to express the feeling of gratitude towards people who have helped me in successfully completing my summer training. I wish to express my indebted gratitude and special thanks to Shri. K C Sai Krishnan, Scientist ‘F’, India Meteorological Department ,New Delhi and Dr. Kuldeep Srivastava, Scientist ‘E’, India Meteorological Department ,New Delhi who in spite of being very busy with their duties, took time to hear, guide and keep me on the corner path allowing me to carry out my summer training in the best possible way it could happen. I do not know where I would have been without them. I would like to express my gratitude to Mrs. Komal Srivastava, Scientific Assistant, India Meteorological Department, New Delhi for supporting and helping me throughout the project work during my summer internship. I would like to express my gratitude to Dr. Lakshamanan. M , Head of Department of ECE for encouraging me and creating a stimulating and supportive working environment conducive for the completion of my summer training report. I would also like to thank the co-employees who cooperated and answered all my queries to a great degree of satisfaction when I was getting acquainted with the work. Shesh Narayan Singh (1900970310152) iv CERTIFICATE v CONTENTS OF THE REPORT Chapter No. 1 2 3 CONTENTS Page No. Title Page i Declaration ii Abstract iii Acknowledgement iv Certificate v Table of Contents vi List of Figures vii INTRODUCTION To India Metereological Department 1 1.1 Introduction 1 1.2 Current Forecasting Organization 3 1.3 Responsibility of Forecasting Centre 4 1.4 Forecast Scheme 6 About The Project 8 2.1 Introduction 8 2.2 Components Used 9 2.3 Methodology 12 2.4 Experimentation 14 2.5 Implementation Setup 15 2.6 Result 16 Conclusion 22 References 23 Appendix 24 vi LIST OF FIGURES Figure No. Title Page No. 1.1 Meteorological sub-divisions of the Country 4 1.2 Weather forecasting organisation of IMD 6 2.1 DHT11 Sensor 9 2.2 BMP180 Sensor 10 2.3 ESP8266 NodeMCU Module 10 2.4 16 x 2 LCD display 11 2.5 Historical Weather Dataset of Kanpur City 14 2.6 Plot for each factor for 10 years 14 2.7 Plot for each factor for 1 years 14 2.8 Circuit Diagram of IOT based Weather Monitoring 15 System 2.9 Circuit of IOT based Weather Monitoring System 16 2.10 Result of IOT based Weather Monitoring System 17 2.11 Experimental Result 17 vii CHAPTER-1 Introduction to INDIA METEOROLOGICAL DEPARTMENT 1.1 INTRODUCTION The India Meteorological Department (IMD) is the principal government agency in the country in all matters relating to meteorology and allied subjects. It has continuously ventured into new areas of application and services, and steadily built upon its infrastructure in its history of 146 years. It has simultaneously nurtured the growth of meteorology and atmospheric science in India and is poised at the threshold of an exciting future. India, being a tropical country, experiences various severe weather events like cyclones, severe thunderstorms, squalls, flash floods, snow avalanches, heat waves, cold waves, heavy rainfall etc. These severe weather events can cause widespread loss to life and property. Owing to high impact of severe weather events and its consequential influence on social, cultural, commercial, health, defense, transport etc. and the increased public awareness, it is felt that there is a requirement of a well laid out system/methodology for monitoring of these weather events by India Meteorological Department (IMD). Considering these, IMD has brought out Standard Operation Procedure (SOP), to provide uniform monitoring of weather, especially disastrous weather events. The manual contains chapters on General Forecasting Organization of IMD, Satellite Application in Weather Forecasting, Radar Application in Weather Forecasting, Public Weather Services, Heavy Rainfall Warning Services, Thunderstorm Warning Services, Heat & Cold Wave Warning Services, Fog Warning Services, Nowcasting Services, Multi-hazard Early Warning System , Urban Meteorological Services, Marine Weather Forecasting Services , Meteorological Communication and Early Warning Dissemination, Post Event survey and Forecast Verification. This manual will prove to be very helpful to the operational forecasters and will serve as a valuable document for carrying out research activities. 1 The reduction of damage due to a disastrous weather event depends on several factors viz. the skill in their prediction, timely dissemination of warnings and the public perception about the credibility of the official predictions and warnings. While formulating these guidelines, we have involved experts from various forecasting units of IMD so that a standard procedure is followed throughout the Country for effective analysis, monitoring and dissemination of warning to minimize damage to life and property from disastrous weather events. The Mission of India Meteorological Department is: “To effectively forecast High Impact Weather events to strengthen Disaster Preparedness Mechanism.” Weather forecasting in India commenced with the establishment of India Meteorological Department (IMD) in 1875 and over a period of time, a network of forecasting organizations has been developed in IMD. Being a tropical country, India experiences severe weather events like cyclones, severe thunderstorms, flash floods, snow avalanches etc. To understand the science behind such weather systems there is need to understand tropical meteorology in different space and time scales. With the development of science and technology and advancements in computers together with induction of observational aids like Doppler weather radars and satellites, there has been better understanding of weather phenomena in all the scales leading to improvement in daily operational weather forecasts. Recent modernization of IMD’s activities including commissioning of newly acquired modern observing equipment, induction of high resolution numerical weather prediction models and the utilization of high power computing systems for running numerical models etc. has led to further improvements in the quality of forecasting services. The forecasting service has also gained importance in sector specific applications and demands have increased for providing sector specific tailor made high resolution forecast products in both spatial and temporal scale. These demands are met through a strong organizational set up of the forecasting services. The forecasting service has also gained importance in sector specific applications and demands have increased for providing sector specific tailor made high resolution forecast products in both spatial and temporal scale. These demands are met through a strong organizational set up of the forecasting services. Details of the same are given in this chapter. 2 The forecasting organization is set up with the following objectives: ● To improve coordination between all relevant operational centers across the country at the national, regional and state level, in matters related to daily forecasting. ● To update the products & warnings several times a day to meet user expectations. ● To increase consistency and accuracy of all the forecast products for different services viz. general weather, agromet, marine, aviation, mountain weather etc. through use of better fitted 2techniques, collaborative work, and complementary roles. ● To issue district level forecasts and nowcasts and bring out further improvements in the system. This is a challenge as it deals with downscaling short range forecasts to district level. ● To improve city forecasts which require location specific assessment of the weather scenario. ● To provide marine forecast (for both high seas and coastal areas) and Fishermen Warning and further improvements in the system. ● To provide Cyclone warning services for the Low pressure systems forming over the North Indian Ocean through modernized tropical cyclone tracking modules/systems. ● To provide impact based warning services related to different weather scenarios. ● To provide an impact based weather forecast for heavy rainfall for the capital cities. ● To take into account the new requirements of forecasting Services. ● To do Research & Development work to support betterment of the services. 1.2 Current Forecasting Organization National Weather Forecasting Centre (NWFC) at IMD New Delhi is coordinating IMD’s forecasting activities for the entire country and the Weather Central, IMD, Pune functions as the standby Centre for NWFC. While the Regional Meteorological Centres (RMCs) carry out weather monitoring and forecasting for their respective regions, the Meteorological Centers (MCs) at the state capitals do the same for their respective states. Cyclone related operational activities are being monitored and coordinated by Cyclone Warning Directorate (CWD), IMD, New Delhi in the headquarters’ level. This unit also functions as the Regional Specialized Meteorological Centre (RSMC) for tropical cyclones for the WMO region. Area Cyclone Warning Centres (ACWCs) and Cyclone Warning Centres (CWCs) take care of the cyclone warning services of the coastal states as well as marine weather services, as per their area of responsibility. The Hydrometeorology Division, IMD, New Delhi coordinates the Flood forecasting related services being carried out through Flood Meteorological Offices (FMOs) and collects the 3 data and prepares rainfall statistics for the entire country. Agromet Forecasting Services are coordinated by Agricultural Meteorology Division, IMD Pune whereas the liaisoning work related to Agromet services are carried out by Agro Advisory Service Division (AASD), IMD, New Delhi. 1.3 Responsibility of Forecasting Centres In order to deliver effective forecast and related services to general public and different user agencies including disaster management authority, India Meteorological Department (IMD) has a three-tier structure for providing weather forecasts and warnings for natural calamities like heavy 3rainfall, snowfall, thunderstorm, hailstorm, heat wave, cold wave etc. The National Weather Forecasting Centre (NWFC) at IMD Headquarters, New Delhi issues All India Weather Bulletin for 36 meteorological subdivisions of the country as a whole on daily basis and the same is updated three times within twenty four hours. This bulletin more or less serves as a guidance bulletin for the subordinate offices and based upon that bulletin, the forecasting centres of Regional Meteorological Centres (RMCs) and State Meteorological Centres (SMCs) issue forecasts and warnings at the district level. The three tier structure of the forecasting services is summarized below: ● National Weather Forecasting Centre (NWFC) : Functions from IMD New Delhi and is responsible for weather monitoring and forecast for the entire country. Forecasts are issued in the sub divisional scale from this centre four times a day. Figure 1.1: Meteorological sub-divisions of the Country 4 Regional Weather Forecasting Centre (RWFC) : The RWFCs function from the Regional Meteorological Centres (RMCs) situated at New Delhi, Mumbai, Nagpur, Kolkata, Guwahati and Chennai. They monitor weather and issue forecasts/warnings for their area of responsibility in sub divisional scale/parts of the subdivisions. A region normally consists of a few meteorological subdivisions. The Regional Meteorological Centre also has the responsibility to issue district wise forecasts and warnings, district wise/location specific nowcasts and city/tourism forecasts for the state in which it is located. In the case of Maharashtra, this responsibility is however shared between RWFC Mumbai and RWFC Nagpur. Among the 5different RWFCs, those located at Kolkata, Chennai and Mumbai function also as ACWCs and carry out the marine forecast services and cyclone warning services for their respective areas of responsibility. ● State Weather Forecasting Centre (SWFC) : The SWFCs situated at the state capitals carry out weather monitoring and issue district level forecasts/warnings and district wise/location specific nowcasts for the state for which they are responsible. It also has the responsibility to issue city forecasts for important cities /tourist places within the state. The SWFCs at Ahmedabad, Thiruvananthapuram and Bhubaneshwar function also as CWCs and carry out the marine forecast services and cyclone warning services for their respective areas of responsibility. ● Weather Central (WC), Pune : The functioning of Weather Central, Pune is similar to that of NWFC in respect of technical analysis and finalisation of forecast. The centre archives past weather data, prepares Indian Daily Weather Report (IDWR) and carries out analysis and documentation of seasonal weather for publication purposes. The centre also regularly carry out analysis of surface and upper air weather charts of both morning and evening and do its archival in digital format for its future use. These digitised charts are circulated to different forecast centres also immediately after their preparation on a daily basis for their display and archival for future use. In addition to this, the centre conducts map discussion every Friday in which the realised weather of the previous meteorological week and the forecast for the running week is discussed in detail. These map discussions are attended by officials from IITM, students from Universities and also by retired officials of IMD and IITM etc. 5 Figure.1.2: Weather forecasting organisation of IMD 1.4 Forecast scheme The scheme for issue of forecast by various offices is shown in Table 1.2. (i) Forecasts will be issued 4 times a day by NWFC. It includes one main weather bulletin around mid day and remaining three are updates of the same. RWFCs/SWFCs will issue forecast two times a day; one main bulletin around mid day and an update in the night. (ii) All the bulletins should include time of issue and time of observations based on which it is issued which forms the basis of bulletin, validity period of the bulletin and next time of issue of update. (iii) Meteorological day is considered from 0830 hrs IST of any day concerned to 0830 hrs IST of the next day. The forecasts issued from NWFC/RWFCs/SWFCs on any day are valid for 120 hours (five days) from the date and time of issue. Thus, forecasts issued for Day 1 on say 11th of a month in the morning, midday, evening and night all would be valid upto 0830 hrs of 12th only. And the forecast issued for Day 5 on 11 th in all the above bulletins will be valid from 0830 hrs IST of 15th upto 0830 hrs of 16th. Thus the validity period is not exactly 120 hrs with respect to the time of issue of every forecast bulletin within a day. (iv) The outlook will be valid for a subsequent period of 48 hrs (2 days). For example, forecasts issued on say 11th of a month in the morning, midday, evening and night will 6 contain the outlook for 16th and 17th valid from 0830 hrs IST of 16th upto 0830 hrs of 18th. (v) Forecasts issued for Day 1 cover the weather expected during 24 hrs period, Day 2 for the period between 24 to 48 hrs, Day 3 for the period between 48 to 72 hours, Day 4 for the period between 72 to 96 hours and Day 5 for the period between 96 to 120 hrs. (vi) Warnings for severe weather expected are also included in the bulletins as per the ongoing season. For example warning for fog/cold wave etc are included during the winter season whereas warning for heatwaves are included during the summer/pre monsoon season as per the criteria followed. (vii) The forecast and warnings issued from NWFC are in the subdivisional scale for the country as a whole whereas the same from SWFC are in the district scale for the state concerned. Forecast and warning issued from RWFC are for the subdivision as whole or for sectors of subdivision while it also issue district level forecast and warning for the state in which it is located. (viii) The weather bulletin contains a brief summary of the observations, description of the prevailing synoptic situations and significant features in addition to the forecast, warnings and the outlook. 7 CHAPTER-2 ABOUT THE PROJECT 2.1 INTRODUCTION Weather prediction is the task of predicting the atmosphere at a future time and a given area. This has been done through physical equations in the early days in which the atmosphere is considered fluid. The current state of the environment is inspected, and the future state is predicted by solving those equations numerically, but we cannot determine very accurate weather for more than 10 days and this can be improved with the help of science and technology. Machine learning can be used to process immediate comparisons between historical weather forecasts and observations. With the use of machine learning, weather models can better account for prediction inaccuracies, such as overestimated rainfall, and produce more accurate predictions. Temperature prediction is of major importance in a large number of applications, including climate-related studies, energy, agricultural, medical, or etc. There are numerous kinds of machine learning calculations, which are Linear Regression, Polynomial Regression, Random Forest Regression, Artificial Neural Network, and Recurrent Neural Network. These models are prepared dependent on the authentic information gave of any area. Contribution to these models is given, for example, if anticipating temperature, least temperature, mean air weight, greatest temperature, mean dampness, and order for 2 days. In light of this Minimum Temperature and Maximum Temperature of 7 days will be accomplished. A measurement of physical quantities requires the right technique to do it. This is done to obtain the characteristics of the system and an accurate measuring sensor. With accurate measurements, the quality of a sensor or measurement system can be known precisely. Educational method about measuring and calibrating a measuring instrument and control require practical and relevant media to be implemented directly in the field. This article discusses the devising of an Internet of Things (IoT)-based system to measure, read and process the physical quantities of weather conditions. The weather conditions mentioned are; temperature and humidity, intensity of sunlight, rainfall, also wind speed and direction. The reading of these quantities was carried out with analog and digital sensors integrated with the ESP 8266 microcontroller. This sensory system was placed in the field station. The results of reading and processing on the microcontroller are uploaded to the online server. A client system, called a base station, requests periodic sensor data to the server. The results of data acquisition are then processed again in Raspberry Pi media to be displayed in 8 layers and stored in Excel form. The results of this study can be used for calibration media analog and digital sensors that can measure the quantities measured by weather stations. Stored data can also be used as a learning medium Measurement analysis and characterization of measuring instruments. 2.2 Components Used ● DHT11 temperature and humidity module ● BMP180 biometric sensor ● Generic ESP8266 Module ● 16 x 2 LCD display ● Arduino IDE ● Jupyter 2.2.1 DHT11 temperature and humidity module Figure 2.1. DHT11 SENSOR DHT11 is a digital sensor responsible for collecting temperature and humidity data from your surroundings. It senses the temperature of the surroundings. Its a 4-pin device. We should connect a 10k resistor between pin 1 and pin 2. Pin 1 is connected to the 3.3V. Pin 4 is connected to GND. Pin 2 is the output pin which gives input to the nodemcu pin D4. Pin 3 is left empty.It consists of a humidity sensing component,a NTC temperature sensor and a IC on a backside of the sensor. It has three terminals namely: ● Vcc ● GND ● Data Vcc connects to 5V supply, GND connects to GND and data pin connects to A0 of Arduino. 9 2.2.2 BMP180 barometric sensor: Figure 2.2. BMP180 Sensor The above illustrated module is a barometric sensor which is capable of measuring atmospheric data; it can give out data like, atmospheric pressure at ground level, atmospheric pressure at sea level and altitude.It is a very small module with 1mm x 1.1mm (0.039in x 0.043in).It measures the absolute pressure of the air around it. It has a measuring range from 300 to 1100hPa with an accuracy down to 0.02 hPa. It can also measure altitude and temperature.The BMP180 barometric sensor communicates via I2C interface. This means that it communicates with the Arduino using just 2 pins.The BMP180 measures both pressure and temperature, because temperature changes the density of gasses like air. At higher temperatures, air is not as dense and heavy, so it applies less pressure on the sensor. At lower temperatures, air is more dense and weighs more, so it exerts more pressure on the sensor. 2.2.3 Generic ESP8266 Module: Figure 2.3. ESP8266 NODEMCU MODULE The above illustrated module is called generic ESP8266 which is responsible for connecting the weather monitoring system to the internet. This module is inserted on a breakout board adapter so that ESP8266 can be interfaced on a breadboard.It is the heart of the device. It provides the platform for IOT. Its a wifi module having esp8266 firmware within. All the other sensors are connected to this microcontroller. They send the measured values to it and it uploads all the values to the cloud where the values are analysed. The developer of this board is ESP8266 Open 10 Source Community. It has an operating system called XTOS. The CPU is ESP8266(LX106). It has an in-built memory of 128 KBytes and a storage capacity of 4MBytes. ● Processor: L106 32-bit RISC microprocessor core based on the Tensilica Diamond Standard 106Micro running at 80 or 160 MHz ● Memory: ○ 32 KiB instruction RAM ○ 32 KiB instruction cache RAM ○ 80 KiB user-data RAM ○ 16 KiB ETS system-data RAM ● External QSPI flash: up to 16 MiB is supported (512 KiB to 4 MiB typically included) ● IEEE 802.11 b/g/n Wi-Fi ○ Integrated TR switch, balun, LNA, power amplifier and matching network ○ WPA/WPA2 authentication, or open networks ● 17 GPIO pins ● I²C (software implementation) ● UART on dedicated pins, plus a transmit-only UART can be enabled on GPIO2 ● 10-bit ADC (successive approximation ADC) 2.2.5 16 x 2 LCD display: Figure 2.4. 16 x 2 LCD DISPLAY We are utilizing a 16 x 2 LCD display to showcase sensor data locally and it can display 16 alphanumeric characters in 2 rows. An I2C display module is used in this project to reduce the number of wires that connect from microcontroller to LCD display to four; otherwise we need to connect 16 wires. I2C display module operates on I2C bus and has the following four pins: ●SDA – Serial data. ●SCL – Serial clock. ●Vcc – 5V. ●GND – ground 11 2.2.6 Arduino IDE: Arduino is an open-source hardware and software company, project, and user community that designs and manufactures single-board microcontrollers and microcontroller kits for building digital devices. Its hardware products are licensed under a CC BY-SA license, while software is licensed under the GNU Lesser General Public License (LGPL) or the GNU General Public License (GPL),permitting the manufacture of Arduino boards and software distribution by anyone. Arduino boards are available commercially from the official website or through authorized distributors. Arduino board designs use a variety of microprocessors and controllers. The boards are equipped with sets of digital and analog input/output (I/O) pins that may be interfaced to various expansion boards ('shields') or breadboards (for prototyping) and other circuits. The microcontrollers can be programmed using the C and C++ programming languages, using a standard API which is also known as the Arduino language, inspired by the Processing language and used with a modified version of the Processing IDE. 2.2.7 Jupyter: Project Jupyter is a non-profit, open-source project, born out of the IPython Project in 2014 as it evolved to support interactive data science and scientific computing across all programming languages. Jupyter will always be 100% open-source software, free for all to use and released under the liberal terms of the modified BSD license. Jupyter is developed in the open on GitHub, through the consensus of the Jupyter community. For more information on our governance approach, please see our Governance Document. All online and in-person interactions and communications directly related to the project are covered by the Jupyter Code of Conduct. This Code of Conduct sets expectations to enable a diverse community of users and contributors to participate in the project with respect and safety. 2.3 Methodology The dataset utilized in this arrangement has been gathered from Kaggle which is “Historical Weather Data for Indian Cities” from which we have chosen the data for “Kanpur City”. The dataset was created by keeping in mind the necessity of such historical weather data in the community. The datasets for the top 8 Indian cities as per the population. The dataset was used with the help of the worldweatheronline.com API and the wwo_hist package. The datasets contain hourly weather data from 01-01-2009 to 01-01-2020. The data of each city is for more than 10 years. This data can be used to visualize the change in data due to global warming or can 12 be used to predict the weather for upcoming days, weeks, months, seasons, etc. Note: The data was extracted with the help of worldweatheronline.com API and we cannot guarantee the accuracy of the data. The main target of this dataset can be used to predict the weather for the next day or week with huge amounts of data provided in the dataset. Furthermore, this data can also be used to make visualization which would help to understand the impact of global warming over the various aspects of the weather like precipitation, humidity, temperature, etc. In this project, we are concentrating on the temperature prediction of Kanpur city with the help of various machine learning algorithms and various regressions. By applying various regressions on the historical weather dataset of Kanpur city we are predicting the temperature like first we are applying Multiple Linear regression, then Decision Tree regression, and after that, we are applying Random Forest Regression. 2.3.1 Machine Learning: Machine learning is relatively robust to perturbations and does not need any other physical variables for prediction. Therefore, machine learning is a much better opportunity in the evolution of weather forecasting. Before the advancement of Technology, weather forecasting was a hard nut to crack. Weather forecasters relied upon satellites, data model’s atmospheric conditions with less accuracy. Weather prediction and analysis have vastly increased in terms of accuracy and predictability with the use of the Internet of Things, for the last 40 years. With the advancement of Data Science, Artificial Intelligence, Scientists now do weather forecasting with high accuracy and predictability. 2.3.2 Uses Of Algorithm: There are different methods of foreseeing temperature utilizing Regression and a variety of Functional Regression, in which datasets are utilized to play out the counts and investigation. To Train, the calculations 80% size of information is utilized and 20% size of information is named as a Test set. For Example, if we need to anticipate the temperature of Kanpur, India utilizing these Machine Learning calculations, we will utilize 8 Years of information to prepare the calculations and 2 years of information as a Test dataset. The as opposed to Weather Forecasting utilizing Machine Learning Algorithms which depends essentially on reenactment dependent on Physics and Differential Equations, Artificial Intelligence is additionally utilized for foreseeing temperature: which incorporates models, for example, Linear regression, Decision tree regression, Random forest regression. To finish up, Machine Learning has enormously changed the worldview of Weather estimating with high precision and predictivity. What's more, in the following couple of years greater progression will be made utilizing these advances to precisely 13 foresee the climate to avoid catastrophes like typhoons, Tornados, and Thunderstorms. Figure 2.5.Historical Weather Dataset of Kanpur City Figure.2.6. Plot for each factor for 10 years Figure.2.7. Plot for each factor for 1 years 2.4 Experimentation The record has just been separated into a train set and a test set. Each information has just been labeled. First, we take the trainset organizer. We will train our model with the help of histograms and plots. The feature so extracted is stored in a histogram. This process is done for every data in the train set. Now we will build the model of our classifiers. The classifiers which we will take into account are Linear Regression, Decision Tree Regression, and Random Forest Regression. With the help of our histogram, we will train our model. The most important thing in this process is to tune these parameters accordingly, such that we get the most accurate results. Once the training is complete, we will take the test set. Now for each data variable of the test set, we will extract the features using feature extraction techniques and then compare its values with 14 the values present in the histogram formed by the train set. The output is then predicted for each test day. Now in order to calculate accuracy, we will compare the predicted value with the labeled value. The different metrics that we will use confusion matrix, R2 score, etc. 2.5 Implementation Setup The implemented system consists of a main block NodeMCU and sensors are connected to the nodemcu.Nodemcu collects the information from different sensors , then it sends data to thingspeak. Figure 2.8. Circuit Diagram of IOT based Weather Monitoring System 15 Figure 2.9. Circuit of IOT based Weather Monitoring System 2.6 RESULT After sensing the data from different sensor devices, which are placed in particular area of interest. The sensed data will be automatically sent to the web server, when a proper connection is established with the server device. 2.6.2 IoT Hardware Implementation: Figure 3 shows two different hardware implementations of the IoT devices. It should be noted that both the IoT device implementations are connected to a 5 V power supply through a microUSB interface. Both of the IoT devices require about 100 to 400 mA current, depending on the usage. However, suppose low reporting rates (e.g., one sample per second or slower) is acceptable. In that case, the ESP 8266 device can be configured for deep-sleep modes, which reduces the overall power consumption significantly. The deep-sleep mode is a desirable feature for the remote implementation of IoT devices. 16 Figure 2.10. Result of IOT based Weather Monitoring System 2.6.2 Software Implementation: Figure 2.11. EXPERIMENTAL RESULT 17 The results of the implementation of the project are demonstrated above. Multiple Linear Regression: This regression model has high mean absolute error, hence turned out to be the least accurate model. Given below is a snapshot of the actual result from the project implementation of multiple linear regression. Actual Prediction diff 2019-12-22 17:00:00 24 16 0.0 8.7 6 6 10 10.17 -0.17 2019-10-17 14:00:00 30 25 0.0 7.2 7 7 10 10.06 -0.06 2019-03-26 17:00:00 33 23 0.0 10.2 7 8 10 11.27 -1.27 2019-12-03 23:00:00 27 18 0.0 8.7 6 1 10 10.54 -0.54 2019-12-15 05:00:00 24 16 0.0 8.7 6 1 10 10.06 -0.06 ... ... ... ... ... ... ... ... ... ... 2019-11-01 15:00:00 35 26 0.0 8.7 8 8 10 10.48 -0.48 2019-04-09 22:00:00 42 33 0.0 12.8 9 1 10 10.61 -0.61 2019-11-27 04:00:00 32 22 0.0 8.7 7 1 10 9.99 0.01 18 2019-09-11 01:00:00 36 29 0.0 12.7 7 1 10 9.61 0.39 2019-09-13 03:00:00 30 26 0.0 9.2 6 1 9 9.33 -0.33 1711 rows × 3 columns Decision Tree Regression: This regression model has medium mean absolute error, hence turned out to be the little accurate model. Given below is a snapshot of the actual result from the project implementation of multiple linear regression. Actual 2019-12-22 17:00:00 24 16 0.0 8.7 6 6 2019-10-17 14:00:00 30 25 0.0 7.2 7 7 2019-03-26 17:00:00 33 23 0.0 10.2 7 8 2019-12-03 23:00:00 27 18 0.0 8.7 6 1 2019-12-15 05:00:00 24 16 0.0 8.7 6 1 ... ... ... ... ... ... ... 2019-11-01 15:00:00 35 26 0.0 8.7 8 8 19 Prediction diff 10 10.0 0.0 10 10.0 0.0 10 10.0 0.0 10 10.0 0.0 10 10.0 0.0 ... ... ... 10 10.0 0.0 2019-04-09 22:00:00 42 33 0.0 12.8 9 1 2019-11-27 04:00:00 32 22 0.0 8.7 7 1 2019-09-11 01:00:00 36 29 0.0 12.7 7 1 2019-09-13 03:00:00 30 26 0.0 9.2 6 1 10 10.0 0.0 10 10.0 0.0 10 9.0 1.0 9 8.0 1.0 1711 rows × 3 columns Random Forest Regression: This regression model has low mean absolute error, hence turned out to be the more accurate model. Given below is a snapshot of the actual result from the project implementation of multiple linear regression. Actual 2019-12-22 17:00:00 24 16 0.0 8.7 6 6 2019-10-17 14:00:00 30 25 0.0 7.2 7 7 2019-03-26 17:00:00 33 23 0.0 10.2 7 8 2019-12-03 23:00:00 27 18 0.0 8.7 6 1 2019-12-15 05:00:00 24 16 0.0 8.7 6 1 ... ... ... ... ... ... ... 20 Prediction diff 10 10.00 0.00 10 9.99 0.01 10 10.00 0.00 10 10.00 0.00 10 10.07 -0.07 ... ... ... 1711 2019-11-01 15:00:00 35 26 0.0 8.7 8 8 2019-04-09 22:00:00 42 33 0.0 12.8 9 1 2019-11-27 04:00:00 32 22 0.0 8.7 7 1 2019-09-11 01:00:00 36 29 0.0 12.7 7 1 2019-09-13 03:00:00 30 26 0.0 9.2 6 1 21 10 10.09 -0.09 10 12.97 -2.97 10 10.00 0.00 10 9.35 0.65 9 8.31 0.69 rows × 3 columns CHAPTER-3 CONCLUSION All the machine learning models: linear regression, various linear regression, decision tree regression, random forest regression were beaten by expert climate determining apparatuses, even though the error in their execution reduced significantly for later days, demonstrating that over longer timeframes, our models may beat genius professional ones. Linear regression demonstrated to be a low predisposition, high fluctuation model though polynomial regression demonstrated to be a high predisposition, low difference model. Linear regression is naturally a high difference model as it is unsteady to outliers, so one approach to improve the linear regression model is by gathering more information. Practical regression, however, was high predisposition, demonstrating that the decision of the model was poor and that its predictions can't be improved by the further accumulation of information. This predisposition could be expected to the structure decision to estimate temperature dependent on the climate of the previous two days, which might be too short to even think about capturing slants in a climate that practical regression requires. On the off chance that the figure was rather founded on the climate of the past four or five days, the predisposition of the practical regression model could probably be decreased. In any case, this would require significantly more calculation time alongside retraining of the weight vector w, so this will be conceded to future work. Talking about Random Forest Regression, it proves to be the most accurate regression model. Likely so, it is the most popular regression model used, since it is highly accurate and versatile. Below is a snapshot of the implementation of Random Forest in the project. Weather Forecasting has a major test of foreseeing the precise outcomes which are utilized in numerous ongoing frameworks like power offices, air terminals, the travel industry focuses, and so forth. The trouble of this determining is the mind-boggling nature of parameters. Every parameter has an alternate arrangement of scopes of qualities. 22 REFERENCES The following content has been taken from the 1. https://mausam.imd.gov.in/ 2. https://jupyter.org/about#:~:text=Project%20Jupyter%20is%20a%20non,comp uting%20across%20all%20programming%20languages. 3. https://www.ibm.com/in-en/cloud/learn/machine-learning 4. https://www.kaggle.com/ 5. https://www.arduino.cc/ APPENDIX A1 23 CODE DESIGN AND IMPLEMENTATION OF WEATHER PREDICTION AND FORECASTING SYSTEM USING DATA SCIENCE Importing Needed Packages In [ ]: import warnings warnings.filterwarnings('ignore') import numpy as np import pandas as pd import matplotlib.pyplot as plt import sklearn from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from sklearn.linear_model import LinearRegression from sklearn import preprocessing %matplotlib inline Reading CSV file as weather_df and making date_time column as index of dataframe weather_df = pd.read_csv('kanpur.csv', parse_dates=['date_time'], index_col='date_time') weather_df.head(5) In [ ]: Checking columns in our dataframe In [ ]: weather_df.columns Now shape In [ ]: weather_df.shape In [ ]: weather_df.describe() Checking is there any null values in dataset In [ ]: weather_df.isnull().any() Now lets separate the feature (i.e. temperature) to be predicted from the rest of the 24 featured. weather_x stores the rest of the dataset while weather_y has temperature column. weather_df_num=weather_df.loc[:,['maxtempC','mintempC','cloudcover','humidity','tempC', 'sunHour','HeatIndexC', 'precipMM', 'pressure','windspeedKmph']] weather_df_num.head() In [ ]: Shape of new dataframe In [ ]: weather_df_num.shape Columns in new dataframe In [ ]: weather_df_num.columns Ploting all the column values weather_df_num.plot(subplots=True, figsize=(25,20)) In [ ]: Ploting all the column values for 1 year In [ ]: weather_df_num['2019':'2020'].resample('D').fillna(method='pad').plot(subplots=True, figsize=(25,20)) In [ ]: weather_df_num.hist(bins=10,figsize=(15,15)) In [ ]: weth=weather_df_num['2019':'2020'] weth.head() In [ ]: weather_y=weather_df_num.pop("tempC") weather_x=weather_df_num Now our dataset is prepared and it is ready to be fed to the model for training.it’s time to split the dataset into training and testing. train_X,test_X,train_y,test_y=train_test_split(weather_x,weather_y,test_size=0.2,random_state=4) 25 In [ ]: In [ ]: train_X.shape In [ ]: train_y.shape train_x has all the features except temperature and train_y has the corresponding temperature for those features. in supervised machine learning we first feed the model with input and associated output and then we check with a new input. In [ ]: train_y.head() Multiple Linear Regression In [ ]: plt.scatter(weth.mintempC, weth.tempC) plt.xlabel("Minimum Temperature") plt.ylabel("Temperature") plt.show() In [ ]: plt.scatter(weth.HeatIndexC, weth.tempC) plt.xlabel("Heat Index") plt.ylabel("Temperature") plt.show() In [ ]: plt.scatter(weth.pressure, weth.tempC) plt.xlabel("Minimum Temperature") plt.ylabel("Temperature") plt.show() In [ ]: plt.scatter(weth.mintempC, weth.tempC) plt.xlabel("Minimum Temperature") plt.ylabel("Temperature") plt.show() In [ ]: model=LinearRegression() model.fit(train_X,train_y) In [ ]: 26 prediction = model.predict(test_X) In [ ]: #calculating error np.mean(np.absolute(prediction-test_y)) print('Variance score: %.2f' % model.score(test_X, test_y)) for i in range(len(prediction)): prediction[i]=round(prediction[i],2) pd.DataFrame({'Actual':test_y,'Prediction':prediction,'diff':(test_y-prediction)}) In [ ]: In [ ]: Decision Tree Regression from sklearn.tree import DecisionTreeRegressor regressor=DecisionTreeRegressor(random_state=0) regressor.fit(train_X,train_y) In [ ]: In [ ]: prediction2=regressor.predict(test_X) np.mean(np.absolute(prediction2-test_y)) print('Variance score: %.2f' % regressor.score(test_X, test_y)) for i in range(len(prediction2)): prediction2[i]=round(prediction2[i],2) pd.DataFrame({'Actual':test_y,'Prediction':prediction2,'diff':(test_y-prediction2)}) In [ ]: In [ ]: Random Forest Regression from sklearn.ensemble import RandomForestRegressor regr=RandomForestRegressor(max_depth=90,random_state=0,n_estimators=100) regr.fit(train_X,train_y) In [ ]: In [ ]: prediction3=regr.predict(test_X) np.mean(np.absolute(prediction3-test_y)) In [ ]: 27 print('Variance score: %.2f' % regr.score(test_X, test_y)) for i in range(len(prediction3)): prediction3[i]=round(prediction3[i],2) pd.DataFrame({'Actual':test_y,'Prediction':prediction3,'diff':(test_y-prediction3)}) In [ ]: In [ ]: from sklearn.metrics import r2_score Calculating R2-score for Multiple Linear Regression print("Mean absolute error: %.2f" % np.mean(np.absolute(prediction - test_y))) print("Residual sum of squares (MSE): %.2f" % np.mean((prediction - test_y) ** 2)) print("R2-score: %.2f" % r2_score(test_y,prediction ) ) In [ ]: Calculating R2-score for Decision Tree Regression print("Mean absolute error: %.2f" % np.mean(np.absolute(prediction2 - test_y))) print("Residual sum of squares (MSE): %.2f" % np.mean((prediction2 - test_y) ** 2)) print("R2-score: %.2f" % r2_score(test_y,prediction2 ) ) In [ ]: Calculating R2-score for Random Forest Regression In [ ]: from sklearn.metrics import r2_score print("Mean absolute error: %.2f" % np.mean(np.absolute(prediction3 - test_y))) print("Residual sum of squares (MSE): %.2f" % np.mean((prediction3 - test_y) ** 2)) print("R2-score: %.2f" % r2_score(test_y,prediction3 ) ) 28