Uploaded by mauryabhavesh10

nlp report format

advertisement
Natural Language Processing
MINI-PROJECT REPORT
ON
“Sentiment Analysis of Movie Reviews”
BY
Shloka Daga (A774)
Namrata Jaiswal (A780)
Bhavesh Maurya (A765)
Under the guidance of
Internal Guide
Prof. Suresh Mestry
Juhu-Versova Link Road Versova, Andheri(W), Mumbai-53
Department of Computer Engineering
University of Mumbai
October- 2022
Declaration
We wish to state that the work embodied in this project titled
“Sentiment Analysis of movie Reviews ” forms our own contribution to the
work carried out under the guidance of “Prof. Suresh Mestry ” at the Rajiv Gandhi
Institute of Technology.
I declare that this written submission represents m y ideas in my own words
and where others' ideas or words have been included, I have adequatel y cited and
referenced the original sources. I also declare that I have adhered to all principles
of academic honest y and integrit y and have not misrepresented or fabricated or
falsified any idea/data/fact/source in my submission. I understand that any
violation of the above will be cause for discipli nary action by the Institute and can
also evoke penal action from the sources which have thus not been properl y cited
or from whom proper permission has not been taken when needed.
1. Shloka Daga (A774)
2. Namrata Jaiswal (A780)
3. Bhavesh Maurya (A765)
Abstract
Sentiment analysis is
Contents
Sr. No. Contents
Pg. No.
Introduction
1.1 Introduction Description...…………………………………………..
1
1
Proposed System
2.1 Proposed Statement………………………………………………
2
2.2 Design Details……………………………………………………
Architecture / Flow Diagram…………………………………….
2.3 Text Pre-Processing Steps……………………………………….
2.4 Model Processing Steps…………………………………………
2.5 Software and Hardware Requirements…………………………..
2.6 Dataset Used…………………………………………………….
3
Results
3.1 Results……………………………………………………………….
4
1
Conclusion
CHAPTER 1
Introduction
1.1 Introduction Description
Sentimental analysis contextual mining of text which identifies and extracts subjective
information in source material, and helping a business to understand the social sentiment of their
brand, product or service while monitoring online conversations. However, analysis of social media
streams is usually restricted to just basic sentiment analysis and count based metrics. This is akin to
just scratching the surface and missing out on those high value insights that are waiting to be
discovered.
So what should a brand do to capture that low hanging fruit? With the recent advances in
deep learning, the ability of algorithms to analyse text has improved considerably. Creative use of
advanced artificial intelligence techniques can be an effective tool for doing in-depth research.
Internet Sentiment Analysis is a common NLP task that Data Scientists need to perform.
This is a straightforward guide to creating a barebones movie review classifier in Python. Future
parts of this series will focus on improving the classifier.
For this analysis we’ll be using a dataset of 50,000 movie reviews taken from IMDb. The data was
compiled by Andrew Maas and can be found here: IMDb Reviews.
1
CHAPTER 2
Proposed System
2.1 Proposed Statement
The goal of Sentiment Analysis of Movie Reviews is either positive or negative review,
the dataset which is used is "IMDB Dataset of 50K Movie Reviews" and the following
machine learning which I used is Logistic Regression , Random Forest and LinearSVC.
For this analysis we’ll be using a dataset of 50,000 movie reviews taken from IMDb. The
data was compiled by Andrew Maas and can be found here: IMDb Reviews
The data is split evenly with 35k reviews intended for training and 15k for testing your
classifier. Moreover, each set has 7.5k positive and 7.5k negative reviews.
IMDb lets users rate movies on a scale from 1 to 10. To label these reviews the curator of
the data labelled anything with ≤ 4 stars as negative and anything with ≥ 7 stars as positive.
Reviews with 5 or 6 stars were left out.
2.2 Design Details
Architecture / Flow Diagram
2.3 Text Pre-Processing Steps
2.3.1 Tokenization
2.3.2 Stop words removal
2.3.3 Any other test preprocessing steps
2.3.4 N-gram Model
2.4 Model Processing Steps
2.4.1 LinearSVC
2.4.2 Random Forest
2.4.3 Logistic Regression
2
2.5 Software and Hardware Requirements
•
I3-3200u
•
2Gb Ram
•
Intel HD 4000
•
Pycharm
•
Jupyter Notebook
•
Windows
2.6 Data Used
 Kaggle
imdb-dataset-of-50k-movie-reviews
3
CHAPTER 3
Results
3.1 Results
Steps 1: This is main page for taking an Input
Fig. 3.1: Main Page
Step 2: Click on Submit for Summarization process.
Fig. 3.2: Input page
Step 3: After Clicking on submit button summarized output will be displayed.
4
Fig 3.3: Text processing and summarized output
5
CHAPTER 4
Conclusion and future work
We’ve gone over several options for transforming text that can improve the accuracy of an NLP
model. Which combination of these techniques will yield the best results will depend on the task,
data representation, and algorithms you choose. It’s always a good idea to try out many different
combinations to see what works. an higher accuracy on this data can be attained with a different
combination of the things outlined in this Project. The next parts of this series will explore deep
learning approaches to building a sentiment classifier.
6
REFERENCES
[1]
Baxendale, P. B., “Machine-made index for technical literature—an experiment”.
IBM Journal of Research and Development, 2(4), 354–361,Year-1958
[2]
Conroy, J. M., & O’leary, ”Text summarization via hidden markov
models”. In Proceedings of the 24th annual international acm sigir conference
on researchand development in information retrieval (pp. 406–407),Year-2004
[3]
Edmundson, “New methods in automatic extracting”. Journal of the ACM (JACM),
16(2), 264–285,Year-1969
[4]
Erkan, G., & Radev, “Lexrank: Graph-based lexical centrality as salience in text
summarization.” Journal of Artificial Intelligence Research, 22, 457–479. Year-2004
[5]
Kupiec, J., Pedersen, J., & Chen,” A trainable document summarizer”. In
Proceedings of the 18th annual international acm sigir conference on
researchand development in information retrieval (pp. 68–73),Year-1995
[6]
Luhn, H. P., “The automatic creation of literature abstracts”. IBM Journalof
research and development, 2(2), 159–165., Year-1958
[7]
Wikipedia : Automatic_summarization.
7
Download