Diet Recommendation System Submitted in partial fulfillment of the requirements of Summer Internship Program 2022-23 By Akansh Jatav 21CE2050 Aditya Wani 21CE1377 Prachiti Chavan 21CE1344 Shreya Kumar 21CE1322 Supervisor Mrs. Sumithra T. V. Department of Computer Engineering Ramrao Adik Institute of Technology, Sector 7, Nerul, Navi Mumbai (Under the ambit of D. Y. Patil Deemed to be University) July 2023 Ramrao Adik Institute of Technology (Under the ambit of D. Y. Patil Deemed to be University) Dr. D. Y. Patil Vidyanagar, Sector 7, Nerul, Navi Mumbai 400 706. Certificate This is to certify that, the Internship project entitled “Diet Recommendation System” is a bonafide work done by Akansh Jatav 21CE2050 Aditya Wani 21CE1377 Prachiti Chavan 21CE1344 Shreya Kumar 21CE1322 and is submitted in the partial fulfillment of the requirement for the Summer Internship Program 2022-23 Supervisor Mrs. Sumithra T. V. Dr. Dhananjay Dakhane Internship Coordinator Dr. Amarsinh Vidhate Head of Department Dr. Mukesh D. Patil Principal Abstract People from all around the world are getting more concerned in their health and way of life in today's modern environment. However, avoiding junk food and exercising alone are insufficient; we also need to eat a balanced diet. We can live a healthy life with a balanced diet based on our height, weight, and age. A food recommendation engine using a content-based approach is an important tool for promoting healthy eating habits. This type of engine uses information about the nutritional content and ingredients of foods to make personalized recommendations to users. One of the key advantages of a content-based approach is that it considers an individual's dietary restrictions and preferences, such as allergies or food preferences. By providing users with tailored recommendations, a content-based food recommendation engine can help them make better choices about what to eat and improve their overall health. Contents 1. Introduction 1.1 Overview 1.2 Motivation 1.3 Problem Statement and Objectives 2. Literature Survey 2.1 Survey of Existing System 2.2 Limitations of Existing System 3. Proposed System 3.1 Problem Statement 3.2 Proposed Methodology / Techniques 3.3 System Design 3.4 Details of Hardware and Software Requirements 4. Results and Discussion 4.1 Implementation Details 4.2 Project Outcomes 5. Conclusion References Chapter 1 Introduction 1.1 Overview There are many factors known that influence an individual’s health. Physical exercise, sleeping, nutrition, heredity, pollution, among other external factors. Being nutrition one of the biggest modifiable factors in our lives, it is not surprising that small changes can induce significant outcomes. Our diet system is based on Indian cuisine. Indian cuisine consists of a variety of regional and traditional cuisines native to India. Given the diversity in soil, climate, culture, ethnic groups, and occupations, these cuisines vary substantially and use locally available spices, herbs, vegetables, and fruits. Therefore, it is essential that we accommodate this dietary diversity into our diet system. It is highly likely that affordability is a major barrier to improving diets in India. However, our diet system mainly focuses on the most used ingredients in India which overcomes this problem. Some molecules are known to have a positive effect on health, namely, in fighting diseases. Being able to identify which ingredients contain the higher concentrations may help us treating and preventing them. Moreover, by including these ingredients in tasty and affordable meals, it can promote a shift in the nutritional habits of the population. 1.2 Motivation Dietary guidelines are a translation of scientific knowledge on nutrients into specific dietary advice. They represent the recommended dietary allowances of nutrients in terms of diets that should be consumed by the population. The guidelines promote the concept of nutritionally adequate diets and healthy lifestyles from the time of conception to old age. Formulation of dietary goals and specific guidelines would help in providing required guidance to people in ensuring nutritional adequacy. The dietary guidelines could be directly applied for general population or specific physiological or high risk groups to derive health benefits. They may also be used by medical and health personnel, nutritionists and dietitians. 1.3 Problem Statement and Objectives Dietary quality is recognized as a key factor affecting human nutritional status, and empirical studies have linked dietary diversity to micro-nutrient adequacy and dietary risk factors to morbidity and mortality from non-communicable diseases. The major food issues of concern are insufficient/ imbalanced intake of foods/nutrients. The common nutritional problems of public health importance in India are low birth weight, protein energy malnutrition in children, chronic energy deficiency in adults, micronutrient malnutrition and diet-relate noncommunicable diseases. However, diseases at the either end of the spectrum of malnutrition (under and over-nutrition) are important. Recent evidence indicate that undernutrition in utero may set the pace for diet-related chronic diseases in later life. With undernutrition, overnutrition, and micronutrient deficiencies afflicting the country, India experiences a triple burden of malnutrition. Recent decades have seen modest progress when it comes to health in India, but progress has been uneven and inequitable. Therefore this Project’s aim is to recommend recipes to the user based on their nutrient levels which is a suitable solution for the malnutrition cases in India. Chapter 2 Literature Survey Nutrition is a basic human need and a prerequisite to a healthy life. A proper diet is essential from the very early stages of life for proper growth, development and to remain active. Food consumption, which largely depends on production and distribution, determines the health and nutritional status of the population. The recommended dietary allowances (RDA) are nutrient-centered and technical in nature. Apart from supplying nutrients, foods provide a host of other components (non-nutrient phytochemicals) which have a positive impact on health. Since people consume food, it is essential to advocate nutrition in terms of foods, rather than nutrients. Emphasis has, therefore, been shifted from a nutrient orientation to the food-based approach for attaining optimal nutritional status. Dietary guidelines are a translation of scientific knowledge on nutrients into specific dietary advice. They represent the recommended dietary allowances of nutrients in terms of diets that should be consumed by the population. The guidelines promote the concept of nutritionally adequate diets and healthy lifestyles from the time of conception to old age. 2.1 Survey of Existing System As observed in many online websites, diet plans are being recommended on the basis of BMI. Body mass index (BMI) is a measure of body fat based on height and weight that applies to adult men and women. However, according to an October 24 2022 article in the Montreal Gazette, criticism is growing that BMI—a formula that uses height and weight to measure body fat—is a “flawed, crude, archaic and overrated proxy for health.” Those opposed to its continued use argue that it was developed based on white males and has little validity for other racial and ethnic groups, and that it is sometimes used to deny certain people joint replacements and other surgeries. Experts have also pointed out that BMI fails to take into account factors such as how much fat versus muscle a patient has, the distribution of fat in their body (typically, fat around the waist increases disease risk more than fat in other places), and their metabolic health. Some research suggests that other measures of body fat, such as skinfold thicknesses, bioelectrical impedance, underwater weighing, and dual energy x-ray absorption, may be more accurate than BMI. Yet, BMI is still being used as a diagnostic tool for body fat measurement. fig 2.1 BMI Calculators fig 2.2 BMR calculator A recent study published in the journal found out that changes in a person’s basal metabolic rate (BMR) — essentially, the number of calories they burn at rest each day — has quite a bit to do with both how well people lose weight and how easily it is to keep the pounds from returning. Whether an individual aims to lose or gain weight, knowing how many calories are required every day while at rest, and also once they factor in physical activity is very important. A 2005 meta-analysis study on BMR* showed that when controlling all factors of metabolic rate, there is still a 26% unknown variance between people. Essentially, an average person eating an average diet will likely have expected BMR values, but there are factors that are still not understood that determines BMR precisely. Therefore, all BMR calculations, even using the most precise methods through specialists, will not be perfectly accurate in their measurements. Not all human bodily functions are well understood just yet, so calculating total daily energy expenditure (TDEE) derived from BMR estimates are just that, estimates. Recipe Recommender Limitations: • Limited Personalization: Recipe recommenders typically rely on user inputs such as dietary restrictions, preferences, and health goals. However, the recommendations may not account for individual variations in nutritional needs, food tolerances, or allergies. • Lack of Contextual Understanding: Recipe recommenders may not consider cultural, regional, or seasonal factors that influence food choices and availability. Additionally, they may not account for specific dietary approaches or ethical considerations (e.g., vegetarianism, veganism) unless explicitly specified. It's essential to consider these limitations when using BMI, BMR, or recipe recommenders, that a diet recommendation system should be used as a supportive tool and not as a substitute for professional medical advice or guidance from a registered dietitian or nutritionist. Individual health conditions, allergies, and specific dietary requirements should also be considered when implementing personalized dietary recommendations. 2.2 Limitations of Existing System Challenges regarding recommendation algorithms In order to calculate nutritional recommendations for users, any algorithm needs the following information: 1) User information (e.g., likes, dislikes, food consumption, or nutritional needs): Similar to recommendation systems in other domains, food recommendation systems also face with the cold-start problem when the system is used in the first time . This problem can be surmounted by using information about users’ previous meals to calculate similarity and then recommend new recipes to users . However, this solution requires many user efforts and abates the desire of system usage. How many recipes the system should have? The quantity of gathered recipes should be large enough to accommodate the preferences of many users and vary the recommended recipes while still minimize the time for making recommendations. This is a tricky problem when the system tries to balance between the variety of recommendations and system response time. 2) A set of constraints or rules: Considering more constraints and rules in the recommendation process will improve the quality of recommendations. For instance, with a user who has heart disease, the system should recommend menus with less fat and salt. Moreover, it is very necessary to detect the conflicts among the constraints or rules which prevent the recommendation algorithms from finding a solution. However, with the large database (e.g., thousands of foods/recipes), checking constraints/rules in the database brings negative effects for system performance . In addition, food recommendation systems should take into account constraints with regard to the availability of ingredients in the households for the purpose of helping users to save money and prevent the food waste behavior. The challenge here is how to propose food which meets health situations and nutritional needs of users, as well as taking advantages of the ingredients that are currently in the fridge. In this scenario, recommendation systems seems to require many efforts from users because users have to report the consumption of all ingredients regularly and this can prevent users from using the system permanently. Challenges regarding changing eating behavior of users: Nowadays, many people are suffering health problems because of inappropriate eating habits. For instance, some people eat too much food compared to their physical activity level and gradually become obese. Whereas, others (e.g., the elderly, the dieters) restrict extremely nutrition intake and this leads to malnutrition. Therefore, one of the main functions of food recommendation systems is understanding users’ eating behaviors and persuading them to change eating behaviors in positive ways. However, this is a big challenge for food recommendation systems because eating is a lifelong behavior which is influenced by many factors, especially psychological factors. Hence, food recommendation systems should integrate health psychology theory in order to stimulate users to comply healthy eating behaviors. The first approach can be used by applying one simple change at a specific time until the user behavior becomes habitual. Another approach can be applied for food recommendation systems by comparing to the ideal nutrient. Users can find the structure of ideal diet according to the age and physical activity level from reliable resources (e.g., USDA, DACH) and then compare what food they ate to what is recommended . Chapter 3 Proposed System The designed diet recommendation system provides a suitable Recipe containing a key ingredient recommended by the system based on their nutritional requirements put by the user. By providing users with tailored recommendations, a content-based food recommendation engine can help them make better choices about what to eat and improve their overall health. The web application uses content-based approach with Scikit-Learn, FastAPI and Streamlit. 3.1 Problem Statement With under-nutrition, over-nutrition, and micro-nutrient deficiencies afflicting the country, India experiences a triple burden of malnutrition. Recent decades have seen modest progress when it comes to health in India, but progress has been uneven and inequitable. Hence the need of the hour is to create a diet recommendation system that is based on the nutrient levels so that the recipes suit every body’s nutritional needs. 3.2 Proposed methodology / Techniques The nutrients considered by the system to recommend a diet are Calcium, Fats, Fiber, Carbon, Sodium, Thymine, Riboflavin, Niacin, Pantothenic acid, Biotin, Vitamin B6, Vitamin C and Protein. Our model is a content-based recommendation engine. A content-based recommendation engine is a type of recommendation system that uses the characteristics or content of an item to recommend similar items to users. It works by analyzing the content of items, such as text, images, or audio, and identifying patterns or features that are associated with certain items. These patterns or features are then used to compare items and recommend similar ones to users. Why content-based approach? • No data from other users is required to start making recommendations. • Recommendations are highly relevant to the user. • Recommendations are transparent to the user. • You avoid the “cold start” problem. • Content-based filtering systems are generally easier to create. There were 2 data sets used: 1. ifct2017: The Indian Food Composition tables provides nutritional values for 528 key ingredients. 2. Cleaned_Indian_Food_Dataset – data set of all Indian food items with their recipes. The web application is built using HTML, CSS and JavaScript. To translate the paper working to our website we used the unsupervised implementation of Nearest Neighbors algorithm as it does exactly what our requirement is for the website. The unsupervised implementation of nearest neighbor search refers to finding the nearest neighbors of a given data point in a dataset without utilizing any labeled target information. It is primarily used for tasks such as clustering, anomaly detection, and data exploration. The algorithm stores the entire training dataset in memory and makes predictions based on the similarity of new data points to the existing training data. The k-nearest neighbors (KNN) algorithm is a simple and popular machine learning algorithm used for both classification and regression tasks. It is a type of instance-based or memory-based learning, where the algorithm stores the entire training dataset in memory and makes predictions based on the similarity of new data points to the existing training data. Here's an overview of the KNN algorithm: 1. Training Phase: • The KNN algorithm starts by storing the feature vectors and corresponding labels of the training dataset. • The algorithm does not learn explicit parameters or construct a model during the training phase. 2. Prediction Phase: • Given a new, unseen data point, the KNN algorithm calculates the distance between this point and all points in the training dataset. • The distance metric can be Euclidean distance, Manhattan distance, or any other appropriate distance measure. • The algorithm selects the k nearest neighbors (data points) to the new point based on the calculated distances. • For classification, the algorithm assigns the class label that is most frequent among the k nearest neighbors. • For regression, the algorithm calculates the average (or weighted average) of the labels of the k nearest neighbors. 3. Choosing the Value of k: • The choice of the parameter k, the number of neighbors to consider, is important in the KNN algorithm. • A small value of k can be sensitive to noise or outliers in the data, potentially resulting in overfitting. • A large value of k may smooth out the decision boundaries, potentially leading to underfitting. • The optimal value of k depends on the dataset and should be determined through experimentation and validation. 4. Scaling Features: • Before applying KNN, it is often necessary to scale or normalize the features of the dataset. This is important because KNN makes predictions based on the distances between data points, and if the features have different scales, some features may dominate the distance calculation. The KNN algorithm is relatively easy to understand and implement, however, it can be computationally expensive for large datasets since it requires calculating distances between the new point and all training points. Additionally, careful consideration of feature scaling and choosing the appropriate value of k is crucial for optimal performance. In scikit-learn (sklearn), the sklearn.neighbors module provides various algorithms and tools for nearest neighbor-based learning tasks. Within this module, the sklearn.neighbors class implements the k-nearest neighbors algorithm, which is a popular method for classification and regression tasks. The main purpose of sklearn.neighbors is to perform nearest neighbor-based computations, such as finding the k-nearest neighbors to a given data point or making predictions based on the neighbors' labels or values. In our case we return the ingredients which match the closest nutritional requirements. The fit() function is used to train the model by storing the training dataset in memory. It takes the feature vectors (X) and corresponding labels (y) as input. It stores this information in the internal memory of the KNN model. Then, we use the kneighbors() function. The kneighbors() method is a function provided by scikit-learn's KNeighborsClassifier and KNeighborsRegressor classes. It is used to find the k nearest neighbors of a given query data point(s) in a trained KNN model. The method returns the indices and distances of the k nearest neighbors. User Inputs • • Physical Metrics Nutritional deficiency Metrics Processing Using the KNN algorithm to find 5 nearest matching recipe ingredients Word processing Tokenising the ingredient and searching the ingredient in the cleaned Indian recipes database Output List of ingredients and recipes 3.3 System Design Our model extracts ingredients from the data set based on the nutritional deficiencies stated in the input. User flow: User's will request to system by providing their physical information and after analyzing the data as a response the system (ML model) will recommend a diet based on the user information accordingly. System Architecture 1. User's will enter their nutritional deficiency label on the website. 2. The information will then go through the ML model in following manner: 2.1 KNN is used to find the ingredients fitting the nutritional requirements of the user 3. The System will then recommend diet to the users. ‘ The Indian Diet recommendation system successfully recommends Indian ingredients to the users based on the nutirient values given to it as the inputs. Modules that have been used in Python are: 1) Scipy and Pylab from NumPy: NumPy is a library for the Python programming language that adds support for big, multidimensional arrays and matrices as well as a wealth of high level mathematical operations that can be performed on these arrays. Pylab is a module in the matplotlib library that provides a MATLAB-like interface to create plots and visualizations in Python 2) Pandas: pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. 3) Matplotlib: Matplotlib is a cross-platform, data visualization and graphical plotting library for Python and its numerical extension NumPy. It provides an object-oriented API for embedding plots into applications. 4) StandardScaler from sklearn.preprocessing: An open-source Python library called scikitlearn uses a unified interface to implement a variety of machine learning, pre-processing, cross-validation, and visualization algorithms. The data values are standardized into a standard format using the StandardScaler() function. scaler=StandardScaler() prep_data=scaler.fit_transform(extracted_data.iloc[:,3:17].to_numpy()) Where, • ‘scaler’ is the StandardScaler() object initialized • ‘fit_transform()’ combines the fit() and transform() methods into a single step, allowing you to fit the transformer on the data and transform the data in one call. • ‘to_numpy()’ converts a pandas DataFrame or Series object into a NumPy array. 5) NearestNeighbors from sklearn.neighbors: the NearestNeighbors class provides an unsupervised implementation of nearest neighbor search. It allows you to find the k-nearest neighbors or neighbors within a specified radius for a given data point or set of data points. neigh = NearestNeighbors(metric='cosine',algorithm='brute') neigh.fit(prep_data) where, • ‘neigh’ is an object of the NearestNeighbors class using the cosine metric. Metric to use for distance computation. cos(theta) = (A * B) / (||A|| * ||B||) • The fit() function is used to train the model by storing the training dataset in memory. It takes the feature vectors (X), ‘prep_data’ in this case and stores data in internal memory of the knn model. 6) From sklearn pipeline() and sklearn.preprocessing FunctionTrasnformer: The Pipeline function in scikit-learn is a utility class that allows you to chain multiple transformers and an estimator into a single object. It simplifies the process of creating and applying a sequence of data transformations followed by a final estimator, especially in machine learning pipelines with multiple preprocessing steps. transformer = FunctionTransformer(neigh.kneighbors,kw_args={'return_distance':False}) pipeline=Pipeline([('std_scaler',scaler),('NN',transformer)]) • Where ‘transformer’ is an object of the FunctionTransformer class taking in the kneighbors function as a parameter • And the pipeline taking the scaler and the transformer as paremeters. 7) train_test_split from sklearn.model_selection: train_test_split has been imported. The train_test_split() method is used to split our data into train and test sets. from sklearn.model_selection import train_test_split TrainSet, TestSet = train_test_split(extracted_data,test_size=0.2, random_state=42) Where, ‘train_test_split()’ is the splitter, test_size is the size of the test set, random_state is the seed for the randomizer. 3.4 Details of Hardware and Software Requirements The project is created with: • • • • • • • • • Python: 3.10.8 scikit-learn 1.1.3 Pandas: 1.5.1 Streamlit: 1.16.0 NumPy: 1.21.5 beautifulsoup4 4.11.1 Tensorflow 2.13 NLTK: Natural Language Toolkit Matplotlib Software used: Visual Studio Code: Microsoft created Visual Studio Code, popularly known as VS Code, a C# editor for Windows, Linux, and macOS that uses the Electron Framework. Debugging support, syntax highlighting, intelligent code completion, snippets, code refactoring, and integrated Git are among the features. The theme, keyboard shortcuts, options, and extensions that offer more functionality may all be changed by users. Python (Version-3.10.9): A high-level, all-purpose programming language is Python. Code readability is prioritized in its design philosophy, which makes heavy use of indentation. Python has garbage collection and dynamic typing. It supports a variety of programming paradigms, including procedural, object-oriented, and functional programming as well as structured programming (especially this). HyperText Markup Language (HTML): The preferred markup language for texts intended to be viewed via a web browser is HTML. Technologies like Cascading Style Sheets (CSS) and programming languages like JavaScript can help. HTML documents are downloaded from a web server or local storage by web browsers, who then turn them into multimedia web pages. HTML initially featured cues for the document's design and semantically explains the structure of a web page. Cascading Style Sheets (CSS): A style sheet language called CSS is used to describe how a page presented whether written in a markup language like HTML or XML (including XML dialects like SVG, MathML, or XHTML). The World Wide Web's foundational technologies, along with HTML and JavaScript, include CSS. The purpose of CSS is to make it possible to separate content from presentation, including layout, colors, and fonts. Hardware used: Laptop: A laptop or notebook is a compact, transportable personal computer (PC), usually referred to as a laptop or notebook for short. Although 2-in-1 PCs with a detachable keyboard are frequently marketed as laptops or as having a "laptop mode," most laptops have a clamshell design with a flat panel screen (typically 11-17 in or 280-430 mm in diagonal size) on the inside of the upper lid and an alphanumeric keyboard (typically QWERTY) on the inside of the lower lid. Wi-Fi: The IEEE 802.11 family of standards, which is frequently used for wireless networking and enables neighboring devices to exchange data through radio waves, provides the foundation for the Wi-Fi family of wireless network protocols. In public spaces like coffee shops, hotels, libraries, and airports, wireless local area networks (WLAN) are frequently used to link laptop computers, tablet computers, smartphones, smart televisions (smart TVs), printers, and smart speakers together as well as to a wireless router to connect them to the Internet. Chapter 4 Results and Discussion There are many different ways to build recommending systems, some use algorithmic and formulaic approaches like Page Rank while others use more model-centered approaches like collaborative filtering, content based, link prediction, etc. We have used model-centered approaches to build our project. Our website can be referred only after the user gets a certified medical report. The model accepts complete as well as partial nutrient requirement ranges. Input: complete range from a randomly selected test input from train_test_split Output: input: [.01,.03], For thiamin and riboflavin the range is 10: 12 Output: As stated before, we have built a user friendly website using Bootstrap and JavaScript. Fig 4.1 Home page of website We present the user with 2 options, a daily calorie calculator and a form to customize a diet plan. The daily calorie calculator is a BMR takes the age, gender, height and weight as input and returns back the number of calories the user needs to consume for the basic bodily functions. Fig 4.2 Daily Calories Calculator Our diet plan form takes an input from the user about the levels of various nutrients mentioned in their report. Based on the input we recommend a diet plan. Fig 4.3 Customize your diet plan form Chapter 5 Conclusion The goal of this project was to find data of all the Indian recipes and ingredients with their respective nutrient compositions to build a diet recommendation system for India that is for the entire diverse population; specifically the Rural population. Train, evaluate and test a model to be able to predict food items on the basis of the nutrient deficiencies of an individual. Finally, to build a user- friendly web application as a step forward in building a recommendation system. Our next step would be to make the model capable of recommending region-specific diets and also to consider the user’s taste preferences. References • End to End Recipe Cuisine Classification , AWS Lambda functions, BeautifulSoup, Python, Sci-Kit Learn https://towardsdatascience.com/https-towardsdatascience-com-end-to-end-recipecuisine-classification-e97f4ac22104 • Building a Website Starter with FastAPI https://levelup.gitconnected.com/building-a-website-starter-with-fastapi-92d077092864 • A Simple Example of Pipeline in Machine Learning with Scikit-learn https://towardsdatascience.com/a-simple-example-of-pipeline-in-machine-learning-withscikit-learn-e726ffbb6976 • Reference code https://github.com/vishalvermaCred/ML_project • Report of a diet recommendation Website using machine learning https://www.irjet.net/archives/V8/i4/IRJET-V8I4702.pdf • A Hybrid Approach Based Diet Recommendation System using ML and Big Data Analytics https://assets.researchsquare.com/files/rs-1044422/v1_covered.pdf?c=1664358567 • Explaining & Implementing Content Based, Collaborative Filtering & Hybrid Recommendation Systems in Python https://towardsdatascience.com/recommendation-systems-explained-a42fc60591ed • An Introduction to Recommender Systems https://www.iteratorshq.com/blog/an-introduction-recommender-systems-9-easyexamples/ • Recommender Systems: Behind the Scenes of Machine Learning-Based Personalization https://www.altexsoft.com/blog/recommender-system-personalization/ • Dietary guidelines for Indians https://www.nin.res.in/downloads/DietaryGuidelinesforNINwebsite.pdf • Building a Food recommendation system https://towardsdatascience.com/building-a-food-recommendation-system-90788f78691a