Uploaded by Mohammad Riaz

Anower Zihad Project Documentation

advertisement
BANGLA FAKE N EWS D ETECTION USING M ACHINE
L EARNING , D EEP N EURAL N ETWORK AND
T RANSFORMER MODELS
by
Anower Hossen Zihad
18102008
BACHELOR OF SCIENCE
IN
COMPUTER SCIENCE AND ENGINEERING
.
Chittagong Independent University
Chittagong, Bangladesh
November, 2022
This project titled, “Bangla Fake News Detection using Machine Learning, Deep
Neural Network and Transformer models”, submitted by Anower Hossen Zihad,
Roll No.: 18102008, Session: September 2018, has been accepted as satisfactory in
partial fulfillment of the requirement for the degree of BACHELOR OF SCIENCE in
Computer Science and Engineering on 10th November, 2022.
BOARD OF EXAMINERS
Risul Islam Rasel
Assistant Professor
SSE
HOD(CSE)
(Supervisor)
ii
Candidate’s Declaration
This is to certify that the work presented in this thesis entitled, “Bangla Fake News Detection using Machine Learning, Deep Neural Network and Transformer models”, is the
outcome of the research carried out by Anower Hossen Zihad under the supervision of
Risul Islam Rasel, Assistant Professor, School of Science and Engineering, Chittagong
Independent University, Chittagong-4000, Bangladesh.
It is also declared that neither this thesis nor any part thereof has been submitted anywhere else for the award of any degree, diploma, or other qualifications.
Signature of the Candidate
Anower Hossen Zihad
18102008
iii
Dedication
This study is dedicated to my parents for always believing in me.
iv
Contents
Certification
ii
Candidate’s Declaration
iii
Dedication
iv
List of Figures
vii
List of Tables
viii
Acknowledgement
ix
Abstract
x
1 Introduction
1.1 Problem Statement . . .
1.2 Motivation of the Study .
1.3 Scope of the Study . . .
1.4 Objectives of the Project
1
1
1
2
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Literature Review
4
3 Data Description and Feature Selection
3.1 Data Collection and Merging . . . . . . . . . . . .
3.1.1 Primary collection of fake news [19]: . . .
3.1.2 BanFakeNews: [20] . . . . . . . . . . . . .
3.1.3 Bangla Fake-Real News Small Dataset: [6]
3.2 Data Pre-Processing: . . . . . . . . . . . . . . . .
3.2.1 Stop Words Removal . . . . . . . . . . . .
3.2.2 Punctuation Removal . . . . . . . . . . . .
3.2.3 Numeric Digits Removal . . . . . . . . . .
3.2.4 Special Character/Emoticons Removal . . .
3.3 Feature Extraction: . . . . . . . . . . . . . . . . .
v
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
8
8
8
9
9
10
10
10
11
11
3.3.1
3.3.2
Count Vectorizer . . . . . . . . . . . . . . . . . . . . . . . . . 11
TF-IDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Methodology
4.1 Machine Learning Classifier Algorithms: . .
4.1.1 Multinomial naive Bayes . . . . . .
4.1.2 Logistic Regression . . . . . . . . .
4.1.3 K-Nearest Neighbors . . . . . . . .
4.1.4 Support Vector Machine . . . . . .
4.1.5 Decision Tree . . . . . . . . . . . .
4.1.6 AdaBoost . . . . . . . . . . . . . .
4.2 Deep Neural Netowork Classifiers: . . . . .
4.2.1 Convolutional Neural Network . . .
4.2.2 Long Short-Term Memory network
4.2.3 Bidirectional LSTM . . . . . . . . .
4.2.4 CNN-LSTM . . . . . . . . . . . . .
4.2.5 CNN-BiLSTM . . . . . . . . . . .
4.3 Transformer Classifiers: . . . . . . . . . . .
4.3.1 BERT . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Experimental Results
5.1 Result on Final Dataset: . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Recall and Accuracy Comparison on Seperate Datasets: . . . . . . . .
5.3 Comparison of the Proposed Models with Previous Best Performing
Studies: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 ROC curves and AUC of proposed models . . . . . . . . . . . . . . .
5.4.1 ROC curves and AUC of ML classifiers . . . . . . . . . . . .
5.4.2 ROC curves and AUC of DNN classifiers . . . . . . . . . . .
5.4.3 ROC curves and AUC of Transformer models . . . . . . . . .
5.5 Developing a Web App and Integrating the model . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
14
14
14
15
16
16
17
18
18
19
20
21
22
22
23
25
. 25
. 26
.
.
.
.
.
.
27
28
28
28
29
30
6 Conclusions
33
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.2 Future Prospects of Our Work . . . . . . . . . . . . . . . . . . . . . . . 33
References
34
vi
List of Figures
3.1
Sample Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
Architecture of the proposed models .
K-Nearest Neighbors . . . . . . . . .
Support Vector Machine . . . . . . . .
Decision Tree . . . . . . . . . . . . .
Decision Tree . . . . . . . . . . . . .
Convolutional Neural Network . . . .
Long Short-Term Memory unit . . . .
Bidirectional LSTM . . . . . . . . . .
Architecture of CNN-LSTM model . .
Architecture of CNN-BiLSTM model
The Transformer – Model Architecture
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
15
16
17
18
19
20
20
21
22
23
5.1
5.2
5.3
5.4
5.5
5.6
ROC and AUC- ML Classifiers . . . . . . .
ROC and AUC- DNN Classifiers . . . . . .
ROC and AUC- Bangla-BERT . . . . . . .
Abstract workflow of the app . . . . . . . .
Example of real news prediction in web app
Example of fake news prediction in web app
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
28
29
29
30
31
32
vii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
List of Tables
3.1
Dataset description . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1
5.2
5.3
Classification report of all the models . . . . . . . . . . . . . . . . . . . 25
Performance of the ML and DNN models on seperate datasets . . . . . 26
Comparison with previous two benchmark scores . . . . . . . . . . . . 27
viii
9
Acknowledgement
I want to start by thanking Allah for his grace, which allowed me to complete my
Bachelor’s degree without experiencing major setbacks.
I also want to express my gratitude to my project supervisor Mr. Risul Islam Rasel. All
along the journey, sir served as my mentor. Without the assistance of my experienced
supervisor, it would be challenging to prepare a solid research paper. He has made
himself available whenever I needed help and guided me all the way from gathering
data to creating classifier models and then, finally writing a research paper.
ix
Abstract
News Categorization is one of the primary applications of Text Classification. In this era
of virtualization, ensuring factual information circulation is more necessary than ever.
Even though, there are hundreds of studies on this field in languages that are rich in
resources, there has not been much significant work in Bangla. And the reasons for that
are- lack of resources and limited language processing tools. For this study, a portion of
fake news data are collected and then merged with the datasets of Bangla fake news that
are publicly available. The final dataset has 4678 news in total of which half are real
and the other half are fake. The dataset is experimented with multiple Machine Learning(LR, SVM, KNN, MNB, Adaboost and DT), Deep Neural Network(LSTM, BiLSTM, CNN, LSTM-CNN, BiLSTM-CNN) and Transformer(Bangla-BERT, m-BERT)
models to attain some state of the art results. Three of the best performing models areCNN, CNN-LSTM, BiLSTM with 95.9%, 95.5%, and 95.3% accuracy respectively. All
of the developed ML, DNN and Transformer models are also applied to the previously
existing datasets separately and 1.4% to 3.8% improvement in accuracy is seen. Besides
accuracy, the models show great increase in recall on fake news data compared to the
previous studies.
x
CHAPTER 1. INTRODUCTION
1
Chapter 1
Introduction
Fake news classification is the task of identifying real and fake news given a set of news
as input. Because of the internet- e-papers, online news portals, and social media have
become substitutes for newspapers in many ways. The flow of information has never
been this easy. This free flow of information comes at the cost of misinformation getting circulated from many sources. The circulation of fake news, ill-intended agendas,
or distasteful satire creates confusion, conflict, and unrest among mass people. In the
worst cases, they can lead to violence. We have witnessed communal violence fueled
by fake news on social media many times before from Ramu [1] to Bhola [2]. There are
many other instances [3] of violence and unrest sparked by fake news on social media
or online news portals. Therefore, monitoring virtual space and filtering out fake news
is a necessity.
1.1
Problem Statement
Bangla Fake News Detection using Machine Learning, Deep Neural Network and Transformer models
1.2
Motivation of the Study
Text Classification has been a hot topic in Natural Language Processing(NLP) for a
while. Over the past few decades, a variety of statistical and machine learning techniques have been used to extract significant features and correctly classify textual data.
CHAPTER 1. INTRODUCTION
2
Bag of Words, TF-IDF and Word embedding (Word2Vec) are such kind of feature extraction techniques. These features have been used to train supervised learning models
like LR, SVM, or Naive Bayes to categorize the documents. Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) are two examples of deep learning
algorithms that have been used to extract useful features from unstructured textual data
to categorize the documents. Even though, less in number, there have been some good
works in Bangla language processing as well. However, most of the applications of the
modern Deep Learning and Transformer models are still not being taken full advantage
of mainly because of the scarcity of collected data and limitation of language processing
tools. In the fake news classification category, there have been a handful of works using
ML models but no mentionable work with DL and Transformers technology. Moreover,
previous works in this field were done using imbalanced datasets and a limited collection of news.
1.3
Scope of the Study
Whilst expanding news collection has been a challenge, a primary dataset is added to
the previously available datasets of real and fake Bangla news. Then the final dataset
is balanced so that there are the same number of data for each label. In this work, I
experiment with traditional ML models, and DNN models with and without ensemble.
Finally, transformers- mBERT and Bangla-BERT are applied to attain some interesting
results. Comparative results of all these models are presented to determine the best
performing model. Six ML, five DL, and two Transformer model are used to train and
predict fake news. Then the best performing model is integrated into a web app that can
predict real or fake news data.
1.4
Objectives of the Project
The main addition of this study can summarized by the following:
• Expand previously available Bangla real and fake news datasets by adding collected fake news and merging all of them together.
• Build ML, DNN, and Transformer models to categorize Bangla real and fake
news.
CHAPTER 1. INTRODUCTION
3
• Present comparable analysis and result of the developed models on the final
dataset as well as comparing the performance of the models on seperate datasets
to find out whether the proposed models have a better prediction accuracy than
previously proposed models.
• Intgrate the best performing model into a web app that can predict real or fake
news data.
CHAPTER 2. LITERATURE REVIEW
4
Chapter 2
Literature Review
There are some works of Bangla fake news classification with ML models but DNN
and Transformers models are not utilised to their full potential yet. In this section, the
works that are related to my work are feature
Title of the paper
Field of
text classification
No.
of
classes
Dataset
Report
Algos
Result
Limitations
Detection
of
Bangla
Fake
News
using
MNB and SVM
Classifier
Fake news
detection
2
Fake
news=993
Real
news=1548
SVM
MNB
96.67%
93%
i.Limited
data
ii.Imbalanced
dataset
iii.Duplicate
data
Using
Social
Networks
to
Detect
Malicious
Bangla
Text Content
Spam detection
2
Spam=646
Ham=1319
MNB
82.42%
i.Very limited data
ii.Only
one ML
model
employed
CHAPTER 2. LITERATURE REVIEW
5
Title of the paper
Field of
text classification
No.
of
classes
Dataset
Report
Algos
Result
Limitations
Hateful Speech
Detection
in
Public
Facebook
Pages
for the Bengali
Language
Hate
Speech
Detection
2
Hate
speech
=3127
Nonhateful
comment
=1999
SVC
NB
RF
AB
GRUbased
48%
24%
52%
39%
70%
i.Underwhelming
performance
Evaluating Machine Learning
Algorithms For
Bengali
Fake
News Detection -
Fake news
detection
2
Real
News=
239
Fake
News=
239
SVM
LR
RF
VEC
GNB
57.32%
78.6%
61.1%
76.3%
84.4%
i.Very limited data
ii.Only
news
headlines
iii.Only
ML models
FakeDetect:
Bangla
Fake
News Detection
Model based on
Different Machine Learning
Classifiers
Fake news
detection
2
Real
News=
49.5k
Fake
News=
2.3k
PAC
MNB
SVM
LR
DT
RF
93.8%
86.9%
93.5%
92.5%
86.8%
93.2%
i.Imbalanced
dataset
ii.Duplicate
data
Bengali
Fake
News Detection
Fake news
detection
2
Total= 726
LR
82.90%
i. Limited
data
ii.
No clear
dataset description
CHAPTER 2. LITERATURE REVIEW
6
Title of the
paper
Field
of text
classification
No.
of
classes
Dataset
Report
Algos
Result
Limitations
A
Study
towards
Bangla Fake
News Detection
Using
Machine
Learning
and
Deep
Learning
Fake
news
detection
2
Real
News=
55k
Fake
News=
7k
SVM
GRU
CNN
BiLSTM
text
91%
70.1%
96%
96%
i.Imbalanced
dataset
ii.Duplicate
data
iii.Reusing
different
version
of same
dataset
Bengali text
document
categorization
based
on very deep
convolution neural
network
Text
Categorization
13
969,000
text documents
of 13 categories
GloVe+CNN+LSTM
GloVe+VDCNN
m-BERT
76.96%
95.2%
92.45%
-No mentionable
limitation
Satire
Detection
2
Satire
News=1480
Real
News=1480
CNN
0.96%
i. Limited
data
Automatic
Detection
of
Satire
in
Bangla
Documents:
A
CNN
Approach
Based on Hybrid Feature
Extraction
Model
-
+Fast-
CHAPTER 2. LITERATURE REVIEW
7
Title of the paper
Field
of text
classification
No.
of
classes
Dataset
Report
Algos
Result
Limitations
Bangla Fake
News Detection Based On
Multichannel
Combined
CNN-LSTM
Fake
news
detection
2
Fake
news=48k
Real
news=1.3k
CNN-LSTM
75.05%
i.Imbalanced
dataset
ii.Duplicate
data
Bangla
Text
Classification
using
Transformers
News
Categorization
6
14106
news of 6
catgories
BERT-base XLMRoBERTa-base
XLM-RoBERTalarge
91.28%
92.7%
93.4%
No mentionable
limitation
Abusive
Bangla comments detection on Facebook
using
transformerbased
deep
learning models
Spam
detection
2
44001
facebook
comments in
total
BERT ELECTRA
85%
84.92%
i.
Data
split
unknown
Bangla-BERT:
TransformerBased Efficient
Model
for
Transfer
Learning and
Language
Understanding
Fake
news
detection
2
Real
news=48k
Fake
news=1.3k
Bangla-BERT
Accuracy=i.Imbalanced
99.2%
dataset
Reii.Duplicate
call=
data
94.2%
iii.Poor
recall
-
CHAPTER 3. DATA DESCRIPTION AND FEATURE SELECTION
8
Chapter 3
Data Description and Feature Selection
3.1
Data Collection and Merging
Real news data are abundant but the scarcity of publicly available Bangla fake news
collection makes the work challenging. Moreover, collecting Bangla fake news has not
been easy as all the possible fake news sources are blocked by the government. I had
to use the Internet Archive(https://archive.org/ ) to go back in the timeline and collect
fake news data. Then I combined the dataset I created with the two publicly available
datasets of Bangla news, both of which contain labeled real and fake news.
3.1.1
Primary collection of fake news [19]:
I have collected 500 Bangla fake news data via Internet Archive. These news are mostly
from the domains- ChannelDhaka, Earki, Motikontho. These websites contains fake
news of two sorts- misinformation and satire.
3.1.2
BanFakeNews: [20]
This dataset contains 48000 real news from popular Bangla newspapers like Kalerkantho, Prothom Alo, etc. and 1300 fake news from different news portals and e-papers
such as Motikontho, Banglabeats, Bangaliviralnews, Shadhinbangla24, Prothombhor
and more. However, among the 1300 fake news, there are 123 duplicate entries of the
same news which are removed.
CHAPTER 3. DATA DESCRIPTION AND FEATURE SELECTION
3.1.3
9
Bangla Fake-Real News Small Dataset: [6]
This dataset has additional 993 fake news and around 1500 real news from similar
sources as the other datasets. This dataset contained 45 duplicate fake news as well.
Lastly, after merging all the datasets another duplicate check is performed to remove
redundancy among different datasets. The final dataset contains 2339 fake news. The
number of real news is also adjusted so that the proportion is right with the fake news.
From the final dataset 25% of data are kept aside for testing and the other 75% are used
in training. The final dataset has following structure:
Number of attributes
Category
Training
data size
Test
size
Real News
1754
585
3
data
Total Ratio of
Training and Test
data
75 : 25
Fake News
1754
585
Table 3.1: Dataset description
3.2
Data Pre-Processing:
A necessary step when working with text in Natural Language Processing(NLP) is text
cleaning or text pre-processing. This kind of noisy text data need to be cleaned before
feeding them into the machine learning model since real-life human-written text often
contains words with incorrect spellings, short words, special symbols, emojis, etc. Text
pre-processing also ensures better results from the classifiers in most of the cases. There
might be unnecessary symbols, emoticons, or stopwords in texts which do not help the
models in classification and instead increase noise. To reduce the noise I removed
columns that are unrelated to the study and null values. All kinds of punctuations such
as !, ’, , . etc. and special characters like %, (, ¡, etc. are removed from the texts as
well. Stopwords are removed using the BNLP toolkit’s collection of Bangla stopwords.
Unfortunately, I could not find a stemmer for the Bangla language that works properly,
so no stemmer is used on the dataset.
CHAPTER 3. DATA DESCRIPTION AND FEATURE SELECTION
10
Figure 3.1: Sample Pre-processing
3.2.1
Stop Words Removal
One of the preprocessing techniques that is most frequently utilized across many NLP
applications is stop word removal. The simple idea is to exclude words that appear
frequently throughout all of the corpus’s documents. Pronouns and articles are typically
categorized as stop words. These words are not highly discriminative because they have
little relevance in some NLP tasks like information retrieval and classification. I used
BNLP tookit’s Bangla stopwords collection to eliminate the frequently used, but not so
polarising words.
3.2.2
Punctuation Removal
The removal of punctuation marks, which are used to separate text into sentences, paragraphs, and phrases, is an essential NLP preprocessing step. Since punctuation marks
are frequently used in text, their removal has an impact on the outcomes of any text
processing approach, particularly those that depend on word and phrase occurrence frequencies.
3.2.3
Numeric Digits Removal
Depending on the use cases, numbers occasionally don’t contain any essential information in the text. A study of fake news detection falls into one those cases. Therefore,
getting rid of them is preferable to keeping them.
CHAPTER 3. DATA DESCRIPTION AND FEATURE SELECTION
3.2.4
11
Special Character/Emoticons Removal
Non-alphanumeric characters are known as special characters. The most common
places to find these characters are in comments, references, currency figures, etc. These
characters introduce noise into algorithms and don’t improve text comprehension. However, these letters and symbols can be removed using regular-expressions (regex). Any
possible use of emoticons are also eliminated from the texts in a similar manner.
3.3
Feature Extraction:
The feature extraction process has been carried out on input texts for the Machine Learning classifier algorithms. Count Vectorizer and TF-IDF Vectorizer (Term FrequencyInverse Document Frequency) are used for extracting features and to be represented
with numerical values. Count Vectorizer is used to convert a given text into a vector
based on the number of times each word appears across the full text. A vocabulary of
size 86795 feature words is created from the unique words in the corpus. Tf-idf, short
for Term Frequency-Inverse Document Frequency, is a metric that quantifies the significance of a word in a corpus or collection of documents. The 86k words are used to
extract unigram features.
3.3.1
Count Vectorizer
Characters and words are not understood by machines. So, in order for a machine to
understand text data, it must be represented in numerical form. Text can be transformed
into numerical data with the Countvectorizer tool. Text data can be used directly in
machine learning and deep learning models, including text categorization, thanks to
Countvectorizer.
3.3.2
TF-IDF
In information retrieval, tf–idf, short for term frequency–inverse document frequency,
is a numerical statistic that is intended to reflect how important a word is to a document
in a collection or corpus. It is often used as a weighting factor in searches of information retrieval, text mining, and user modeling. The tf–idf value increases proportionally
to the number of times a word appears in the document and is offset by the number of
documents in the corpus that contain the word, which helps to adjust for the fact that
CHAPTER 3. DATA DESCRIPTION AND FEATURE SELECTION
12
some words appear more frequently in general.
Term frequency, tf(t,d), is the relative frequency of term t within document d,
ft,d
t′ ∈d ft′ ,d
tf(t, d) = P
(3.1)
where ft,d is the raw count of a term in a document, i.e., the number of times that term
t occurs in document d. Note the denominator is simply the total number of terms in
document d (counting each occurrence of the same term separately).
The inverse document frequency, idf is a measure of how much information the word
provides, i.e., if it is common or rare across all documents. It is the logarithmically
scaled inverse fraction of the documents that contain the word (obtained by dividing the
total number of documents by the number of documents containing the term, and then
taking the logarithm of that quotient):
idf(t, D) = log
N
|{d ∈ D : t ∈ d}|
(3.2)
Then tf–idf is calculated as
tfidf(t, d, D) = tf(t, d) · idf(t, D)tfidf(t, d, D) = tf(t, d) · idf(t, D)
(3.3)
A high weight in tf–idf is reached by a high term frequency (in the given document)
and a low document frequency of the term in the whole collection of document.
CHAPTER 4. METHODOLOGY
Chapter 4
Methodology
Figure 4.1: Architecture of the proposed models
13
CHAPTER 4. METHODOLOGY
4.1
14
Machine Learning Classifier Algorithms:
Six common Machine Learning Classifiers such as, Multinomial Naive Bayes(MNB),
Logistic Regression(LR), K Nearest Neighbour(KNN), Support Vector Machine(SVM,
Decision Tree(DT) and AdaBoost are applied to find out performance on the news
dataset. The parameters are examined and tuned to get optimized results. alpha=1.0 is
set for MNB. LR uses lbfgs solver, default penalty and c values. KNN shows best result
when n neighbour is set to 12. SVM uses rbf kernel and random state 7. DT operates
with gini criterion to measure the quality of the split. Adaboost performs best when
n estimators is set to 100.
4.1.1
Multinomial naive Bayes
With a multinomial event model, samples (feature vectors) represent the frequencies
with which certain events have been generated by a multinomial (p1 , . . . ,pn ) where pi is
the probability that event i occurs (or K such multinomials in the multiclass case). A
feature vector x =(x1 ,. . . ,xn ) is then a histogram, with xi counting the number of times
event i was observed in a particular instance. This event model represents the occurrence of a word in a single document and is frequently used for document classification.
The likelihood of observing a histogram x is given by
P
n
( ni=1 xi )! Y xi
pki
p(x | Ck ) = Qn
i=1 xi ! i=1
(4.1)
If a given class and feature value never occur together in the training data, then the
frequency-based probability estimate will be zero, because the probability estimate is
directly proportional to the number of occurrences of a feature’s value. This is problematic because it will wipe out all information in the other probabilities when they
are multiplied. Therefore, it is often desirable to incorporate a small-sample correction,
called pseudocount, in all probability estimates such that no probability is ever set to be
exactly zero. This way of regularizing naive Bayes is called Laplace smoothing when
the pseudocount is one, and Lidstone smoothing in the general case.
4.1.2
Logistic Regression
Predictive analytics and categorization frequently make use of this kind of statistical
model, also referred to as a logit model. Based on a given dataset of independent
CHAPTER 4. METHODOLOGY
15
variables, logistic regression calculates the likelihood that an event will occur, such as
raining and not raining. Given that the result is a probability, the dependent variable’s
range is 0 to 1. In logistic regression, the odds—that is, the probability of success
divided by the probability of failure—are transformed using the logit formula. This
logistic function is expressed by the following formulas and is also frequently referred
to as the log odds or the natural logarithm of odds.:
p(x) =
4.1.3
1
1 + e−(x−µ)/s
(4.2)
K-Nearest Neighbors
An example of a supervised learning method used for both classification and regression
is K-nearest neighbors (KNN). The distance between the test data and all of the training
points is calculated by KNN in an effort to predict the proper class for the test data.
Then, K number of points that is most similar to the test data are selected. The KNN
algorithm estimates the likelihood that the test data belong to each of the K training
data classes, and the class with the highest likelihood is chosen. The ideal choice of k
depends on the data; typically, higher values of K lessen the impact of noise on classification but obscure the lines between classes. A good K can be chosen using a variety
of heuristic methods.
Figure 4.2: K-Nearest Neighbors
CHAPTER 4. METHODOLOGY
4.1.4
16
Support Vector Machine
A supervised machine learning approach called Support Vector Machine (SVM) is used
for both classification and regression. Although we also refer to regression problems,
categorization is the most appropriate term. Finding a hyperplane in an N-dimensional
space that clearly classifies the data points is the goal of the SVM method. The number
of features determines the hyperplane’s size. The hyperplane is essentially a line if there
are just two input features. The hyperplane turns into a 2-D plane if there are three input
features. Imagining something with more than three features gets challenging.
Figure 4.3: Support Vector Machine
4.1.5
Decision Tree
A decision support tool known as a decision tree employs a tree-like model to illustrate
options and their potential outcomes, including utility, resource costs, and chance occurrence outcomes. One technique to show an algorithm that solely uses conditional
control statements is to use this method.
Although they are also a common technique in machine learning, decision trees are
frequently employed in operations research, notably in decision analysis, to help find a
plan that is most likely to succeed. A decision tree is a flowchart-like structure in which
each internal node represents a ”test” on an attribute (e.g. whether a coin flip comes
up heads or tails), each branch represents the outcome of the test, and each leaf node
represents a class label (decision taken after computing all attributes). The paths from
CHAPTER 4. METHODOLOGY
17
root to leaf represent classification rules.
In decision analysis, a decision tree and the closely related influence diagram are used
as a visual and analytical decision support tool, where the expected values (or expected
utility) of competing alternatives are calculated.
Figure 4.4: Decision Tree
4.1.6
AdaBoost
Adaptive Boosting, often known as AdaBoost, is a statistical classification meta-algorithm.
The performance can be enhanced by combining it with a variety of other learning
methods. The results of the other learning algorithms, or ”weak learners,” are merged
to create a weighted total that represents the boosted classifier’s final results.
Although AdaBoost can be used to multiple classes or bounded intervals on the real
line, it is often seen for binary classification. AdaBoost is adaptive in that it modifies
future weak learners in favor of instances that prior classifiers incorrectly classified. It
may be less prone to the overfitting issue than other learning algorithms in particular
situations. It can be demonstrated that the final model converges to a strong learner
even if the performance of each individual learner is just marginally better than random
guessing.
CHAPTER 4. METHODOLOGY
18
Figure 4.5: Decision Tree
4.2
Deep Neural Netowork Classifiers:
Different Deep Neural Networks (DNN) such as Convolutional Neural Network(CNN),
Long Short-Term Memory(LSTM), Bidirectional Long Short-Term Memory(BiLSTM),
and a combination of CNN with LSTM(CNN-LSTM) and CNN with BiLSTM(CNNBiLSTM) are used to conduct experimentation on our data. Before data is fed to the
DNN models they are transformed into numerical values with help of One Hot Encoding. Maximum news length is set to 300 words, vocabulary size=86000 and I use pre
padding while encoding. All the models use adam optimizer with the default learning rate of 0.001. These models are trained for 12 epochs with a batch size of 64.
Earlystopping(ES) is incorporated into the models to avoid overfitting. ES looks to
improve val accuracy with a min delta of 0 and patience=3.
4.2.1
Convolutional Neural Network
Convolutional Neural Network (ConvNet/CNN) is a Deep Learning method that can
take in an input image and assign importance (learnable weights and biases) to different
aspects and objects in the image whilst also being able to distinguish between them.
Multilayer perceptrons are transformed into CNNs. Fully connected networks, or multilayer perceptrons, are those in which every neuron in one layer is connected to every
neuron in the following layer. Due to their ”full connectivity,” these networks are vulnerable to data overfitting. Regularization or overfitting prevention methods frequently
involve penalizing training parameters (such as weight decay) or cutting connectivity
(skipped connections, dropout, etc.) By utilizing the hierarchical structure in the data
and assembling patterns of increasing complexity using smaller and simpler patterns
CHAPTER 4. METHODOLOGY
19
embedded in their filters, CNNs adopt a novel strategy for regularization. CNNs are
therefore at the lower end of the connectivity and complexity spectrum.
Figure 4.6: Convolutional Neural Network
A three-layer Convolutional Neural Network (CNN) is used with three different kernel
sizes such as 4, 6, and 8. These layers use 32 filters each. A dropout layer with 0.5 rate
is introduced to skip overfitting. Downsampling of the features is done by max-pooling
layer with the pool size 2. Lastly, relu activation function adds non-linearity, and the
probability distribution of the classes is calculated in the final step using an output layer
with sigmoid activation.
4.2.2
Long Short-Term Memory network
: Long Short-Term Memory network (LSTM) is a special kind of Recurrent Neural
Network(RNN) capable of learning long-term dependencies. LSTM features feedback
connections as opposed to typical feedforward neural networks. Such a recurrent neural
network (RNN) may analyze whole data sequences in addition to single data points
(such as photos) (such as speech or video).
A cell, an input gate, an output gate, and a forget gate make up a standard LSTM unit.
The three gates control the flow of data into and out of the cell, and the cell remembers
values across arbitrary time intervals.
Since there may be lags of uncertain length between significant occurrences in a time
series, LSTM networks are well-suited to classifying, processing, and making predictions based on time series data.
CHAPTER 4. METHODOLOGY
20
Figure 4.7: Long Short-Term Memory unit
In this work, the embedding layer has 40 embedding vector features with a max input
length of 300 words. I apply 100 LSTM hidden units, default dropout 0, and finally
sigmoid activation function.
4.2.3
Bidirectional LSTM
Bidirectional LSTM(BiLSTM) is a sequence processing model made up of two LSTMs,
one of which processes data in a forward way and the other in a backward direction.
It can use data from both sides and, unlike standard LSTM, the input flows in both
directions. In both directions of the sequence, it is an effective tool for modeling the
sequential dependencies between words and phrases.
Figure 4.8: Bidirectional LSTM
CHAPTER 4. METHODOLOGY
21
In conclusion, BiLSTM reverses the direction of information flow by adding one more
LSTM layer. It simply means that in the additional LSTM layer, the input sequence
flows backward. The outputs from the two LSTM layers are then combined using a
variety of methods, including average, sum, multiplication, and concatenation.
The implementation of BiLSTM model is quite similar to the LSTM model of this study
with the same parameter values.
4.2.4
CNN-LSTM
The embedding layer is followed by a 1D convolutional layer with 32 filters of size
three and a 1D max-pool layer of pool size 2. It uses relu activation function to add
non-linearity. After that LSTM model of 100 units is added with the sigmoid activation
function.
Figure 4.9: Architecture of CNN-LSTM model
CHAPTER 4. METHODOLOGY
4.2.5
22
CNN-BiLSTM
CNN-BiLSTM also follows the CNN-LSTM architecture with the only difference being, that the convolutional layer is added on top of the BiLSTM model instead of LSTM
model.
Figure 4.10: Architecture of CNN-BiLSTM model
4.3
Transformer Classifiers:
The NLP Transformer is a new architecture that tries to solve problems sequence-tosequence while resolving long-distance dependencies with ease. It solely relies on selfattention to compute the input and output representations, employing neither convolutions nor sequence-aligned RNNs.
CHAPTER 4. METHODOLOGY
23
Figure 4.11: The Transformer – Model Architecture
A Multi-Head Attention layer precedes a layer of Feed Forward Neural Network in the
encoder block. On the other hand, the decoder features an additional Masked MultiHead Attention. Several identical encoders and decoders are layered on top of one
another to form the encoder and decoder blocks. The number of units in both the encoder stack and the decoder stack is equal. A hyperparameter is the sum of the units of
encoder and decoder.
4.3.1
BERT
Google created Bidirectional Encoder Representations from Transformers (BERT), a
transformer-based machine learning approach for pre-training natural language processing (NLP). BERT is at its core a transformer language model with a variable number of
encoder layers and self-attention heads. There are two models of the original English-
CHAPTER 4. METHODOLOGY
24
language BERT: 12 encoders with 12 bidirectional self-attention heads comprise the
BERT-BASE, while 24 encoders with 16 bidirectional self-attention heads comprise
the BERT-LARGE.
4.3.1.1
mBERT [21]
The transformer model m-BERT is pre-trained over 104 languages taking about 110M
parameters into account. bert-base-multilingual-cased is employed on fined-tuned on
the news dataset. The batch size is 32 trained in five epochs.
4.3.1.2
Bangla-BERT [18]
Bangla-Bert is a pre-trained language model of Bengali language using mask language
modeling. I use sagorsarker/Bangla-bert-base model and fine-tune it so that the best
performance can be obtained in our dataset. Batch size is set to 8 while training.
CHAPTER 5. EXPERIMENTAL RESULTS
25
Chapter 5
Experimental Results
5.1
Result on Final Dataset:
We can see the performance of all the models of three approaches(ML, DNN, and Transformer) in Table 5.1.
Method
ML models
DNN models
Transformer models
Classifier
Pr
Re
f1-score
Acc
SVM
95.2
95.2
95.2
95.2
LR
94.1
94.1
94.1
94.1
MNB
91.6
91.5
91.4
91.5
KNN
88.9
88.5
88.5
88.5
DT
85
85
85
85
Adaboost
84.9
84.9
84.9
84.9
CNN
95.9
95.9
95.9
95.9
LSTM
94.5
94.4
94.4
94.4
BiLSTM
95.3
95.3
95.3
95.3
CNN+LSTM
95.5
95.5
95.5
95.5
CNN+BiLSTM
94.9
94.9
94.9
94.9
Bangla-BertBase
94.1
94.1
94.1
93.3
mBert
93.9
93.9
93.9
93.8
Table 5.1: Classification report of all the models
It is evident that among all the Machine Learning approaches SVM achieves better
scores on Pr(95.2%), Re(95.2%), f1-score(95.2%), and Acc(95.2%) than all other mod-
CHAPTER 5. EXPERIMENTAL RESULTS
26
els. LR is the closest one to that score with an f1-score score(94.1%) of SVM. MNB,
KNN, DT, Adaboost have f1-score score of (91.5%),(88.5%),(85%), and (84.9%), respectively.
DNN models have a significant improvement in terms of results. The CNN model has
the best accuracy of 95.9% among all the models. But, CNN-LSTM and BiLSTM models are not far from the CNN model with an accuracy of 95.5% and 95.3% respectively.
LSTM has 94.5% and CNN-BiLSTM offers 94.9% accuracy.
Transformer models perform moderately on the fake news dataset. Bangla BERT model
scores 94.1% accuracy whereas mBERT makes it to 93.9%
5.2
Recall and Accuracy Comparison on Seperate Datasets:
I apply fake news detection models built for this study on publicly available datasets
separately. This helps me to find out the efficiency of my models by comparing them
with previous benchmark scores.
Dataset
Description
Criterion
ML models
MNB LR
BanFakeNews
[20]
55k real, 2k
fake
(with
duplicate)
Bangla FakeReal News
Small Dataset
[6]
1.5k real, 1k
fake
(with
duplicate)
Final
Dataset
2.3K
real,
2.3k fake (no
duplicate)
DNN models
KNN SVM DT
AB
LSTM
Bi
LSTM
CNN
CNN- CNNLSTM BiLSTM
Recall(fake
46.6
news)
47.8
35.9
87.6
93
92.5
91.5
99.4
94
90.3
92.9
95.8
97.8
97
99.4
98.9
99
99.4
92.5
99.8
99.4
99.6
Recall(fake
77.1
news)
88.4
88.4
91.6
97
96.4
98.8
96.8
92.5
95.6
99.6
95.4
95.3
96.8
98.1
98
97.5
97.5
96.2
97.6
88.5
Recall(fake
88.2
news)
94.1
83.3
96
86
85.6
93.8
94.2
96.3
93.5
94.9
94.1
88.5
95.2
85
84.9
94.4
95.3
95.9
95.5
94.9
93.3
Accuracy
Accuracy
Accuracy
91
Table 5.2: Performance of the ML and DNN models on seperate datasets
CHAPTER 5. EXPERIMENTAL RESULTS
27
From Table 5.2 we can see that ML classifier models suffer badly in detecting fake
news. That’s why recall of fake news is low. Despite that these ML models don’t fail to
score a high accuracy because of a massively imbalanced dataset. Thus, the accuracy
does not represent actual fake news detection efficiency.
However, on the BanFakeNews dataset, DNN models perform well: BiLSTM model
with 99.4% recall on fake news and CNN with 99.8% highest accuracy.
The Bangla Fake-Real News Small Dataset [6] has a similar pattern in terms of recall
with LSTM scoring 98.8% on fake news recall, whereas Decision Tree surprisingly has
the best accuracy of 98.1%.
The best performing model on the Final Dataset which combines three following datasets
such as- BanFakeNews, Bangla Fake-Real News Small Dataset, and Fake500 is CNN
with an accuracy of 95.9%. BiLSTM has a recall of 96.3% of fake news detection on
that dataset. Due to computational limitations, the BanFakeNews dataset has not been
trained with transformer models, so that comparison is kept out of this table.
5.3
Comparison of the Proposed Models with Previous
Best Performing Studies:
Dataset
BanFakeNews [20]
Classifier
Accuracy
Previous Best Performing models
Bi-LSTM with
Fasttext
96
Best performing
model of this study
LSTM
99.8
SVM
96.7
Adaboost
98.1
Previous Best PerBangla Fake-Real Small Dataset [6] forming models
Best performing
model of this study
Table 5.3: Comparison with previous two benchmark scores
In Table 5.3, it is shown that, in the biggest collection of news dataset BanFakeNews,
[12] gains up to 96% accuracy with GloVe and FastText on top BiLSTM model. My
proposed LSTM model has a 3.8% improvement in accuracy on that dataset.
The Bangla Fake-Real Small Dataset showed 96.67% accuracy with SVM model in
[6], which is very much consistent with my finding with the SVM model. However,
in my experiment, Adaboost performs better on this dataset with an accuracy of 98.1%
CHAPTER 5. EXPERIMENTAL RESULTS
28
setting a better benchmark for the dataset.
5.4
ROC curves and AUC of proposed models
A Receiver Operator Characteristic(ROC) curve is a graphical representation of True
Positive Rate plotted against False Positive Rate, which is used to show the diagnostic ability of binary classifiers. And the area underneath the ROC curve is called Area
Under Curve(AUC). The closer the curve is to the top-left corner, the better the performance. AUC scores are added to the legends of the classifiers.
5.4.1
ROC curves and AUC of ML classifiers
SVM and LR classifiers have the best ROC curves among ML algorithms.
Figure 5.1: ROC and AUC- ML Classifiers
5.4.2
ROC curves and AUC of DNN classifiers
ALL the DNN classifiers have a very similar pattern which makes it look like they are
on top of one another.
CHAPTER 5. EXPERIMENTAL RESULTS
Figure 5.2: ROC and AUC- DNN Classifiers
5.4.3
ROC curves and AUC of Transformer models
Bangla-BERT has an AUC score of 0.925.
Figure 5.3: ROC and AUC- Bangla-BERT
29
CHAPTER 5. EXPERIMENTAL RESULTS
5.5
30
Developing a Web App and Integrating the model
From experimentation I have found that the CNN model has the best accuracy and f-1
score among all models. Therefore, I designed and implemented a web app, and then integrated this model to the app which can give real-time result of whether a news is real or
fake. The app is deployed on heroku with following url: https://bfake.herokuapp.com
To develop the web app following technology has been used:
• Flask: A small scale python web app framework. I used it to build the backend
server.
• HTML, CSS, Bootstrap: To create the web page architecture and style it.
• Ajax: Ajax is used to show prediction of the given input in same page without
having to redirect/reload.
Internal workflow of the is given below:
Figure 5.4: Abstract workflow of the app
CHAPTER 5. EXPERIMENTAL RESULTS
31
Output Example:
An authentic news collected from Prothom Alo from the url
https://www.prothomalo.com/world/pakistan/q5tlwy3fqo is give 0.74 prediction score
(i.e. real news), which is the correct output.
Figure 5.5: Example of real news prediction in web app
CHAPTER 5. EXPERIMENTAL RESULTS
A satire from Earki is predicted as fake news with a score of 0.14.
Figure 5.6: Example of fake news prediction in web app
32
CHAPTER 6. CONCLUSIONS
33
Chapter 6
Conclusions
6.1
Conclusions
This study combines prior research conducted by other researchers, then delivers a
proper and better corpus of Bangla fake news and real news. Previous works in the
field of Bangla fake news detection gravely suffer from poor data quality like- duplicate
entries and greatly imbalanced datasets. I remove duplicate data and fix the ratio of fake
and real news. Then I merge them with my collection of fake news data. Out of the three
approaches taken in this study, DNN models are the most remarkable ones with better
performance scores. CNN shows the most promising result. Among the ML models,
SVM performs exceptionally well, which is also seen in past works. The transformer
models show a good outcome as well. My models produce a state-of-the-art result for
the previously available fake news datasets with a 1.4% to 3.8% increase in accuracy
and much better recall on detecting fake news.
6.2
Future Prospects of Our Work
To improve upon this study more data can be added to the corpus from sources that are
different from the ones in the corpus. In the future, I aim to develop a web app with the
best performing model to predict the authenticity of news given input.
REFERENCES
34
References
[1] “Buddhist temples, homes burned, looted in Ramu’ bdnews24.com.
https://bdnews24.com/bangladesh/buddhist-temples-homes-burned-looted-inramu
[2] “4
die
in
Bhola
as
violence
erupts
over
FB
https://m.theindependentbd.com/arcprint/details/220317/2019-10-21
post”
[3] “Mobs
beat
five
dead
for
‘kidnapping”
thedailystar.net
https://www.thedailystar.net/frontpage/news/mobs-beat-2-dead-kidnapping1774471
[4] : Ahmed, Hadeer, et al. ‘Detecting Opinion Spams and Fake News Using Text
Classification’. Security and Privacy, vol. 1, no. 1, Jan. 2018, p. e9. DOI.org
(Crossref), https://doi.org/10.1002/spy2.9.
[5] Umer, Muhammad, et al. ‘Fake News Stance Detection Using Deep Learning
Architecture (CNN-LSTM)’. IEEE Access, vol. 8, 2020, pp. 156695–706. IEEE
Xplore, https://doi.org/10.1109/ACCESS.2020.3019735.
[6] M. G. Hussain, M. Rashidul Hasan, M. Rahman, J. Protim and S. Al Hasan, ”Detection of Bangla Fake News using MNB and SVM Classifier,” 2020 International
Conference on Computing, Electronics Communications Engineering (iCCECE),
2020, pp. 81-85, doi: 10.1109/iCCECE49321.2020.9231167.
[7] Islam, Tanvirul, et al. ‘Using Social Networks to Detect Malicious Bangla Text
Content’. 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), 2019, pp. 1–4. IEEE Xplore,
https://doi.org/10.1109/ICASERT.2019.8934841.
[8] Ishmam, Alvi Sharmin, Sadia. (2019). Hateful Speech Detection in Public Facebook Pages for the Bengali Language. 555-560. 10.1109/ICMLA.2019.00104.
REFERENCES
35
[9] Mugdha, Shafayat Bin Shabbir, et al. ‘Evaluating Machine Learning Algorithms For Bengali Fake News Detection’. 2020 23rd International Conference
on Computer and Information Technology (ICCIT), 2020, pp. 1–6. IEEE Xplore,
https://doi.org/10.1109/ICCIT51783.2020.9392662.
[10] F. Islam et al., ”Bengali Fake News Detection,” 2020 IEEE 10th International Conference on Intelligent Systems (IS), 2020, pp. 281-287, doi:
10.1109/IS48319.2020.9199931.
[11] Sraboni, Tasnuba, et al. FakeDetect: Bangla Fake News Detection Model
Based on Different Machine Learning Classifiers. Brac University, 2021.
dspace.bracu.ac.bd, http://dspace.bracu.ac.bd/xmlui/handle/10361/14979.
[12] Hossain, Elias Kaysar, Md Jalal Uddin Joy, Abu Zahid Rahman, Md. Mizanur Rahman, Md. (2021). A Study Towards Bangla Fake News Detection Using
Machine Learning and Deep Learning. 10.1007/978-981-16-5157-1 7.
[13] M. Z. H. George, N. Hossain, M. R. Bhuiyan, A. K. M. Masum and S. Abujar, ”Bangla Fake News Detection Based On Multichannel Combined CNNLSTM,” 2021 12th International Conference on Computing Communication
and Networking Technologies (ICCCNT), 2021, pp. 1-5, doi: 10.1109/ICCCNT51525.2021.9580035.
[14] Sen, Arnab Mridul, Maruf Islam, Md Saiful. (2019). Automatic Detection of
Satire in Bangla Documents: A CNN Approach Based on Hybrid Feature Extraction Model.
[15] Hossain, Md. Rajib, et al. ‘Bengali Text Document Categorization
Based on Very Deep Convolution Neural Network’. Expert Systems
with Applications, vol. 184, Dec. 2021, p. 115394. ScienceDirect,
https://doi.org/10.1016/j.eswa.2021.115394.
[16] Alam, Tanvirul Khan, Akib Alam, Firoj. (2020). Bangla Text Classification using
Transformers.
[17] Aurpa, Tanjim Taharat, et al. ‘Abusive Bangla Comments Detection on
Facebook Using Transformer-Based Deep Learning Models’. Social Network
Analysis and Mining, vol. 12, no. 1, Dec. 2021, p. 24. Springer Link,
https://doi.org/10.1007/s13278-021-00852-x.
[18] M. Kowsher, A. A. Sami, N. J. Prottasha, M. S. Arefin, P. K. Dhar and T. Koshiba,
”Bangla-BERT: Transformer-Based Efficient Model for Transfer Learning and
REFERENCES
36
Language Understanding,” in IEEE Access, vol. 10, pp. 91855-91870, 2022, doi:
10.1109/ACCESS.2022.3197662.
[19] Anower Hossen Zihad. 500 Bangla Fake News. Kaggle. DOI.org (Datacite),
https://doi.org/10.34740/KAGGLE/DSV/4222728. Accessed 19 Sept. 2022.
[20] BanFakeNews. https://www.kaggle.com/datasets/cryptexcode/banfakenews. Accessed 19 Sept. 2022.
[21] Devlin, Jacob, et al. BERT: Pre-Training of Deep Bidirectional
Transformers for Language Understanding. 2018. DOI.org (Datacite),
https://doi.org/10.48550/ARXIV.1810.04805.
Download