Natural Language Processing MINI-PROJECT REPORT ON “Sentiment Analysis of Movie Reviews” BY Shloka Daga (A774) Namrata Jaiswal (A780) Bhavesh Maurya (A765) Under the guidance of Internal Guide Prof. Suresh Mestry Juhu-Versova Link Road Versova, Andheri(W), Mumbai-53 Department of Computer Engineering University of Mumbai October- 2022 Declaration We wish to state that the work embodied in this project titled “Sentiment Analysis of movie Reviews ” forms our own contribution to the work carried out under the guidance of “Prof. Suresh Mestry ” at the Rajiv Gandhi Institute of Technology. I declare that this written submission represents m y ideas in my own words and where others' ideas or words have been included, I have adequatel y cited and referenced the original sources. I also declare that I have adhered to all principles of academic honest y and integrit y and have not misrepresented or fabricated or falsified any idea/data/fact/source in my submission. I understand that any violation of the above will be cause for discipli nary action by the Institute and can also evoke penal action from the sources which have thus not been properl y cited or from whom proper permission has not been taken when needed. 1. Shloka Daga (A774) 2. Namrata Jaiswal (A780) 3. Bhavesh Maurya (A765) Abstract Sentiment analysis is Contents Sr. No. Contents Pg. No. Introduction 1.1 Introduction Description...………………………………………….. 1 1 Proposed System 2.1 Proposed Statement……………………………………………… 2 2.2 Design Details…………………………………………………… Architecture / Flow Diagram……………………………………. 2.3 Text Pre-Processing Steps………………………………………. 2.4 Model Processing Steps………………………………………… 2.5 Software and Hardware Requirements………………………….. 2.6 Dataset Used……………………………………………………. 3 Results 3.1 Results………………………………………………………………. 4 1 Conclusion CHAPTER 1 Introduction 1.1 Introduction Description Sentimental analysis contextual mining of text which identifies and extracts subjective information in source material, and helping a business to understand the social sentiment of their brand, product or service while monitoring online conversations. However, analysis of social media streams is usually restricted to just basic sentiment analysis and count based metrics. This is akin to just scratching the surface and missing out on those high value insights that are waiting to be discovered. So what should a brand do to capture that low hanging fruit? With the recent advances in deep learning, the ability of algorithms to analyse text has improved considerably. Creative use of advanced artificial intelligence techniques can be an effective tool for doing in-depth research. Internet Sentiment Analysis is a common NLP task that Data Scientists need to perform. This is a straightforward guide to creating a barebones movie review classifier in Python. Future parts of this series will focus on improving the classifier. For this analysis we’ll be using a dataset of 50,000 movie reviews taken from IMDb. The data was compiled by Andrew Maas and can be found here: IMDb Reviews. 1 CHAPTER 2 Proposed System 2.1 Proposed Statement The goal of Sentiment Analysis of Movie Reviews is either positive or negative review, the dataset which is used is "IMDB Dataset of 50K Movie Reviews" and the following machine learning which I used is Logistic Regression , Random Forest and LinearSVC. For this analysis we’ll be using a dataset of 50,000 movie reviews taken from IMDb. The data was compiled by Andrew Maas and can be found here: IMDb Reviews The data is split evenly with 35k reviews intended for training and 15k for testing your classifier. Moreover, each set has 7.5k positive and 7.5k negative reviews. IMDb lets users rate movies on a scale from 1 to 10. To label these reviews the curator of the data labelled anything with ≤ 4 stars as negative and anything with ≥ 7 stars as positive. Reviews with 5 or 6 stars were left out. 2.2 Design Details Architecture / Flow Diagram 2.3 Text Pre-Processing Steps 2.3.1 Tokenization 2.3.2 Stop words removal 2.3.3 Any other test preprocessing steps 2.3.4 N-gram Model 2.4 Model Processing Steps 2.4.1 LinearSVC 2.4.2 Random Forest 2.4.3 Logistic Regression 2 2.5 Software and Hardware Requirements • I3-3200u • 2Gb Ram • Intel HD 4000 • Pycharm • Jupyter Notebook • Windows 2.6 Data Used Kaggle imdb-dataset-of-50k-movie-reviews 3 CHAPTER 3 Results 3.1 Results Steps 1: This is main page for taking an Input Fig. 3.1: Main Page Step 2: Click on Submit for Summarization process. Fig. 3.2: Input page Step 3: After Clicking on submit button summarized output will be displayed. 4 Fig 3.3: Text processing and summarized output 5 CHAPTER 4 Conclusion and future work We’ve gone over several options for transforming text that can improve the accuracy of an NLP model. Which combination of these techniques will yield the best results will depend on the task, data representation, and algorithms you choose. It’s always a good idea to try out many different combinations to see what works. an higher accuracy on this data can be attained with a different combination of the things outlined in this Project. The next parts of this series will explore deep learning approaches to building a sentiment classifier. 6 REFERENCES [1] Baxendale, P. B., “Machine-made index for technical literature—an experiment”. IBM Journal of Research and Development, 2(4), 354–361,Year-1958 [2] Conroy, J. M., & O’leary, ”Text summarization via hidden markov models”. In Proceedings of the 24th annual international acm sigir conference on researchand development in information retrieval (pp. 406–407),Year-2004 [3] Edmundson, “New methods in automatic extracting”. Journal of the ACM (JACM), 16(2), 264–285,Year-1969 [4] Erkan, G., & Radev, “Lexrank: Graph-based lexical centrality as salience in text summarization.” Journal of Artificial Intelligence Research, 22, 457–479. Year-2004 [5] Kupiec, J., Pedersen, J., & Chen,” A trainable document summarizer”. In Proceedings of the 18th annual international acm sigir conference on researchand development in information retrieval (pp. 68–73),Year-1995 [6] Luhn, H. P., “The automatic creation of literature abstracts”. IBM Journalof research and development, 2(2), 159–165., Year-1958 [7] Wikipedia : Automatic_summarization. 7