Uploaded by The Info City

Classification of Movie Reviews

advertisement
BEYKOZ UNIVERSITY
NATURAL LANGUAGE
PROCESSING
"CLASSIFYING MOVIE REVIEWS" USING
PYTHON
Name: Heraa Farooq
Synopsis
INTRODUCTION:
Sentiment relates to the meaning of a word or sequence of words and is
usually associated with an opinion or emotion. And analysis? Well, this is
the process of looking at data and making inferences; in this case, using
machine learning to learn and predict whether a movie review is positive
or negative.
People are interested in watching movies by looking at the reviews are
positive or negative.
Sentiment analysis is a commonly used NLP(natural language processing)
technique to determine whether the text is positive, negative, or neutral. It
has been frequently used to look at customer satisfaction based on review
sentiment or serve as an additional perspective when we doing analysis on
text data.
DATASET:
The dataset which we will use in sentiment analysis is the International
Movie Database(IMDb) reviews for 50,000 reviews of movies from all
over the world, its a binary classification dataset categorizing each review
in a positive or negative. It has 25000 samples for training and 25000 for
testing.
PREPARATION OF DATA
??. Clean and Preprocess
The raw text is pretty messy for these reviews so before we can do any
analytics we need to clean things up.
??. Vectorization
In order for this data to make sense to our machine learning algorithm
we’ll need to convert each review to a numeric representation, which we
call vectorization.
??. Tokenizing
Tokenization is the process of breaking down chunks of text into smaller
pieces. spaCy comes with a default processing pipeline that begins with
tokenization, making this process a snap. In spaCy, you can do either
sentence tokenization or word tokenization:
??. Word tokenization breaks text down into individual words.
??. Sentence tokenization breaks text down into individual sentences.
Tools Required:
Spacy, NLTK
Download