Cyberbullying Detection NLP Thesis-Proposal

CYBERBULLYING DETECTION WITH NATURAL LANGUAGE PROCESSING ON SOCIAL MEDIA by Sonwabile Balite 0109285168083 A research proposal is submitted in partial fulfilment of the requirements for the bachelor's degree in Computing The Belgium Campus Supervisor: Anila Joy 25/05/2022 Table of Contents 1 INTRODUCTION AND BACKGROUND ..........................................................................................................2 1.1 Introduction ...............................................................................................................................................2 1.2 Background ................................................................................................................................................2 2 RESEARCH PROBLEM...................................................................................................................................3 3 AIM, OBJECTIVES and RESEARCH QUESTION ..............................................................................................4 3.1 AIM .......................................................................................................................................................4 3.2 OBJECTIVE .............................................................................................................................................5 3.3 RESEARCH QUESTIONS .........................................................................................................................5 3.3.1 Main Question ..................................................................................................................................5 3.3.2 Sub Question...........................................................................................................................................5 4 PRELIMINARY LITERATURE REVIEW ............................................................................................................6 4.1 Theoretical background/conceptual framework..................................................................................6 4.2 Related work/studies .................................................................................................................................7 5 DESIGN, METHODOLOGY AND ETHICS......................................................................................................8 5.1 RESEARCH APPROACH ..........................................................................................................................8 5.2 METHODOLOGICAL CHOICE .................................................................................................................9 5.3 RESEARCH STRATEGY......................................................................................................................... 10 5.4 RESEARCH DESIGN ............................................................................................................................. 11 5.4.1 Data collection ............................................................................................................................... 11 5.4.2 Population and sample .................................................................................................................. 11 5.4.3 Experimentation, Evaluation and Interpretation of the results .................................................... 11 5.5 TIME HORIZON................................................................................................................................... 12 5.6 ETHICS ................................................................................................................................................ 13 6 DELINEATIONS, TIMELINE, BUDGET AND LIMITATIONS .......................................................................... 13 7 ASSUMPTIONS ......................................................................................................................................... 14 8 OUTCOMES, CONTRIBUTION AND SIGNIFICANCE ................................................................................... 14 9 CONCLUSION............................................................................................................................................ 15 1 References ............................................................................................................................................................. 17 Research Proposal 1 INTRODUCTION AND BACKGROUND 1.1 INTRODUCTION The abundance of electronic devices that can connect to the internet in today’s day and age has made it possible for bullying to move into the realm of technology, as is now known as cyberbullying. Cyberbullying affects many people negatively daily, especially a lot of children and adolescents. Thus, concerns about cyberbullying are rising. A lot of research is taking place in attempts of diminishing cyberbullying and the effects that it has on its victims. A lot of previous research has delved into the impacts had on mental health from cyberbullying, and more recently the order of the day has been to investigate more on being able to recognize cyberbullying in many different languages and in colloquial forms. There are many ways to identify cyberbullying, but the most effective methods include machine learning techniques, particularly natural language processing (NLP). This research proposal will go into more depth and propose the research that needs to be done on identifying cyberbullying on social media with natural language processing (NLP). It will go into detail about the research problem, the aim and objectives of the research, existing research that has been done pertaining to this issue, the design and methodology of the research that will be carried out and finally the contribution and significance of this research. 1.2 BACKGROUND The manner in which technology advances and changes on a daily basis is very rapid, and it is now changing the interactions and relationships we have with people. With communication being a mere mouse click or swipe of a screen away (Weber & William V. Pelfrey, 2014). The communicative platforms used, and social networks used for such communication include Skype, email, Facebook, Twitter, and Instagram, amongst many others. Platforms like this allow us to share pictures, recorded videos, and aspects of our daily lives, and not only with the people in our world, but with people all over the world. It has become a daily aspect of everyday life (Sue North & Snyder, 2008). People, teenagers in particular use social networks to connect with others as well as for passing time with different entertainment venues (Boyd, 2009). Cyberbullying and social network use does not necessarily only affect a certain demographic however, research suggests that adolescents and young 2 adults are more likely to experience cyberbullying because the amount of exposure they have to social media is often more than that of other age groups (Ito, et al., 2008). This extended exposure to social networks can unfortunately make it a space for spreading rumours, social sabotage, and humiliation(cyberbullying) (Hinduja & Patchin, 2008). Bullying in South African schools has in previous years become notorious for gaining a lot of media consideration (Smit, 2015). However, studies on bullying in the workplace in South Africa are scarce (Wet, 2011). Nonetheless, bullying often results in emotions of "incompetence, alienation and depression" (Roux, et al., 2010); and in schools, the research suggests that cyberbullying might bring about "low confidence, family issues, scholarly issues, school brutality and suicidal thoughts” (Goodno, 2011). This shows that it is imperative for there to be a solution that aims to combat cyberbullying. One of the methods to achieve this is by implementing machine learning techniques, such as natural language processing to identify cyberbullying on social media, effectively helping to reduce or completely diminish the cyberbullying (Dadvar, et al., 2014) (Hee, et al., 2018) , hence research is necessary. This research proposal aims to propose and plan for the necessary research. 2 RESEARCH PROBLEM Cyberbullying can be defined as harassment/bullying that takes place over digital devices such as cell phones, computers, and tablets. The medium cyberbullying is usually found in, includes SMS, Emails, text messages/direct messages over the internet, forums, and gaming conference rooms such as Reddit, and on social media sites such as Facebook, Instagram, Snapchat and TikTok (Stop Bullying.Gov, 2021). Cyberbullying can include behaviours such as sending, posting, or sharing destructive and harmful content about someone else (Stop Bullying.Gov, 2021). Behaviours that fall under cyberbullying include (Gordon, 2022): • • • • • Harassment Impersonation Inappropriate photographs Video shaming Website creation For this research proposal, the words “teenager(s)” and “children(s)” will be used interchangeably. Focus will be placed on teenagers, because they are the demographic that is most exposed to cyberbullying (Amanda Lenhart, 2007). 3 In South Africa on average, a staggering 51.5% of teenagers have been cyberbullied, 54% of teenagers accessed inappropriate content via digital platforms, 35% of teenagers have been a victim of cyberstalking and 36.5% of teenagers have fallen victim to online shaming (Pheto, 2021) (News24, 2021). This indicates that more than half of the teenage population in South Africa have experienced cyberbullying. This is an issue because it affects the teenage population of South Africa negatively in ways such as making them feel distressed, humiliated, depression, anxiety, and low self-esteem. They are even affected by the behaviourisms that usually ensue after an occurrence of cyberbullying, which include self-harm, poor academic performance, skipping school, substance abuse and suicidal thoughts (Gordon, 2021) (LopezMeneses, et al., 2020). In just one of the provinces in South Africa, Limpopo, a cyberbullying study was conducted, and it reported that 31% of victimised students reported being very or extremely upset, 19% were very or extremely scared and 18% were extremely embarrassed after a cyberbullying episode, this proves that the effects of cyberbullying are not good and it goes to show that if something is not done to tackle an episode(s) of cyberbullying as soon as possible children might feel so desperate as to resort to selfharm or commit suicide (Farhangpour, et al., 2019). The effects of cyberbullying are extremely negative and will only continue and even become worse should nothing be done to deal with it (Navarro, et al., 2016). Problem Statement: Since the advent of the internet and distribution of digital devices amongst the youth, traditional bullying has taken an online form called cyberbullying. Teenagers and children are more likely to be exposed to this cyberbullying due to their (when compared to other age groups) frequent exposure to social media, texting applications and other technologies where cyberbullying can take place, and this has an extremely negative effect on these children’s and teenagers psychological, physical, emotional, and mental well-being. 3 AIM, OBJECTIVES and RESEARCH QUESTION 3.1 AIM The aim of the proposed research is to determine if the detection of cyberbullying on social media with the use of natural language processing is possible and if so, how it can be implemented. It is also important to determine if the detection of cyberbullying on social media with the use of natural language processing can be implemented how would such a feature change the effects of cyberbullying on the psychological, physical, emotional, and mental well-being of teenagers and victims of cyberbullying in general. 4 3.2 OBJECTIVE The main objective of this research is to eventually develop a systematic approach to automatically detect and classify cyberbullying entries on social media, which could help prevent, detect, and ultimately solve the problem of cyberbullying. Sub-objectives include: • • • • • • An objective of the research is to contribute to the prevention of the cyberbullying problem by transferring and testing the developed cyberbullying detection methods on mobile devices. An objective of the research is identifying cyberbullying text on social media using natural language processing and understanding its meaning and context. An objective of the research is figuring out the best implementation of natural language processing to detect cyberbullying to such an extent that it helps to prevent, reduce, or solve the cyberbullying problem. An objective of the research is to investigate how the detection, prevention, and reduction of cyberbullying on social media affects the psychological, physical, emotional, and mental wellbeing of the youth and effectively every other social media user, and improve the overall satisfaction levels of these. An objective of the research is identifying what classifies as social media and on which platform an ‘automatic cyberbullying detection system’ would work best on. An objective of the research is identifying if the chosen social media platform(s) that the automatic cyberbullying detection system will operate on will be sufficient to gauge the effectiveness of the system. 3.3 RESEARCH QUESTIONS 3.3.1 MAIN QUESTION How can a natural language processing system be successfully implemented to detect cyberbullying in its context on social media in order to prevent, stop or reduce further cyberbullying from happening and its negative effects on social media users going forward? 3.3.2 SUB QUESTION Sub-questions: • • • How can NLP techniques be implemented in a way that effectively detects and comprehends cyberbullying incidences found online in their context? Can the successful implementation of a NLP system that detects cyberbullying on social media be effective enough to reduce the high levels of psychological, physical, emotional, and mental health dissatisfaction seen amongst cyberbullying victims and regular social media users? Which social media platform(s) experience regular cyberbullying incidents to be appropriate enough to test the NLP system on? 5 • • • • 4 Are the users of the chosen social media platform(s) accessible enough so that the change in their psychological, physical, emotional, and mental health satisfaction levels can be used as a standard of measure to test the effectiveness of the NLP system? Is it ethically moral to use the change in psychological, physical, emotional, and mental health satisfaction levels of cyberbullying victims and regular social media users as a standard of measure to test the performance of the NLP system? If not, what is an accurate, stable, and precise standard of measure of performance for the NLP system? How will the NLP cyberbullying detection system navigate around ‘slang’ or colloquial terms which fall under cyberbullying found on social media but are not well known amongst other users? How can the NLP cyberbullying detection system still be effective when cyberbullying terms involve multiple languages or language varieties in the context of a single sentence/ conversation also known as ‘code-switching’ in a cyberbullying context? PRELIMINARY LITERATURE REVIEW Cyberbullying research has been very concentrated on causes and the social and psychological effects it can have on a person; however, it is only recently that more consideration has been given to the automatic detection of cyberbullying statements and incidents on social media (Capua, et al., 2016). The majority of the existing research on automatic cyberbullying detection on social media make use of the conventional Machine Learning concepts/ models. These machine learning models and data modelling techniques include deep neural network-based models (Zhang, et al., 2016), support vector machines (Yin, et al., 2009), language-based approaches (Kontostathis, et al., 2013), and data modelling methods such as Random Forest, Naive Bayes, K-Nearest Neighbour, amongst others. 4.1 THEORETICAL BACKGROUND/CONCEPTUAL FRAMEWORK The above section “Chapter 2 – Research problem” went into detail about the current effects and consequences of cyberbullying on regular social media users. This suggests that any measure to stop, prevent or reduce cyberbullying is necessary and this includes the automatic detection of cyberbullying utilising various machine learning techniques such as natural language processing are very necessary and important, however, the existing datasets and accurately efficient machine learning models are limited (Alotaibi, et al., 2021). However, before this is further discussed, the importance of firstly grasping a conceptual framework or theoretical background of the common machine learning techniques and data processing models is extremely necessary. 4.1.1. Datasets and Data Gathering There are a variety of strategies and techniques that have been used for data gathering in the datasets, required by Cyberbullying detection models, the majority of which are centred around ‘textual’ data/ data in the form of text (Gao & Huang, 2017). Some dataset data includes multimedia, such as ‘gifs’, ‘pictures’ and ‘voice recordings’, how these are a lot less in contrast to datasets with text 6 4.1.2. 4.1.3. 4.1.4. 4.1.5. 4.1.6. (Hosseinmardi, et al., 2015). The process for gathering such data and processing so that the cyberbullying model is accurate is an extremely tedious and inefficient action (Nobata, et al., 2016). Basic Features This refers to the ‘baseline’ natural language processing methods such as ‘Bag of words’ – which includes a cluster of known words that are usually associated with cyberbullying incidents. These methods are not as basic as they seem. They can be compared to n-gram and token n-gram frequencybased methods (Gao & Huang, 2017). Sentiment Analysis Sentiment is a natural language processing technique used to determine whether data is positive, negative, or neutral. Negative sentiment and hate speech have been proven to be correlated (Schmidt & Wiegand, 2017). Sentiment analysis can be used by the cyberbullying detection system to classify if text is negative - which could link to cyberbullying - or positive (Gitari, et al., 2015). Lexical Resources A lexical resource refers to the use of one’s words. According to previous research work done there is an association between a specific list of words and the existence of obscene/ sacrilegious content in textual bodies (Schmidt & Wiegand, 2017). This means that lexical resources can be used to reinforce Cyberbullying detection. This is often used in addition to other detection techniques. Linguistic features Linguistic features refer to the features related to how words are pronounced and arranged in a sentence, as well as what words are used. The idea behind linguistic features and the research projects that have used this is modelling the deeper and higher-level semantic relationship between words in sentences. Such functionality can be achieved by building dependency rule engines or more commonly using statistical approaches (Gitari, et al., 2015) (Burnap & Williams, 2015). Beyond the Textual context Rather than analysing the textual posts of social media, researchers have considered and taken interest in the complementary and meta-information around the textual body found on a social media post. Some research has shown that meta-features/ multimedia could be extracted from social media content. Such techniques make use of image recognition tactics, such as Support Vector Machines (SVMs) and Convolutional Neural networks (CNNs) for classification and prediction (Huang & Sushkov, 2016). Some methods have even combined textual, visual, and audio features of social media content for inappropriate content detection (Soni & Singh, 2018). This is quite helpful in the detection of cyberbullying. 4.2 RELATED WORK/STUDIES Traditional studies on cyberbullying focused on the act itself, including the statistics of cyberbullying, definitions, and mainly the negative impacts of cyberbullying. (Patchin & Hinduja, 2006). Not a lot of work has been done on the automation of cyberbullying detection on social media. The work that has been done however, include: (Hee, et al., 2015) who proposed a strategy for distinguishing the more subtle types of cyberbullying, like insults and threats. The authors classified the probable subjects into three classes: harasser, victim, and bystander. The 7 bystander class was separated into two classes: the individuals who defend the victim and the individuals who support the harasser. Support vector machines (SVMs) were then used to isolate the comments. (Sanchez & Kumar, 2011) who were one of the first to propose a technique to identify cyberbullying on the Twitter social media platform. The authors used the Naïve Bayes classifier to detect tweets that contained harmful behaviour toward a particular gender. Nonetheless, their strategy accomplished just an exactness of 70% and it should be noted that the size of the utilized dataset was relatively small. (Al-Garadi, et al., 2016) provided an approach for detecting cyberbullying in Twitter that uses many of the unique properties the Twitter platform has. For categorization, these attributes were given into a machine learning algorithm along with their corresponding samples. The authors looked at four machine learning algorithms, including Random Forest (RF), Naïve Bayes (NB), Support Vector Machines (SVMs), and K-Nearest Neighbours (KNNs), and discovered that RF is the top performer. (Kowalski, et al., 2012) took labelled data from a source and applied two models: language and machine learning, to create a very successful query application for efficient detection of cyberbullying incidents. The phrases created by the machine learning model were able to perform better than the language model in terms of recall and precision. Another related study has shown that initial research on cyberbullying detection algorithms focused mostly on the context of conversations, rather than the characteristics of the cyberbullying actors. It further revealed that men and women bully each other in different ways, stating that women, are more likely to use confrontational communication strategies, while men are more likely to use words and phrases that threaten (Lee & Ma, 2012). 5 DESIGN, METHODOLOGY AND ETHICS 5.1 RESEARCH APPROACH The research approach will be deductive in nature. The reasoning for this is that the research aims to test an existing theory which is that cyberbullying has such an impact on social media users because of their exposure to it on social media (Amanda Lenhart, 2007). 8 The problem statement backing this research approach was mentioned in “Chapter 2 – Research problem” and the ‘hypothesis’ backing this research approach is that: decreased exposure to cyberbullying on social media could positively impact the overall psychological, mental, emotional, and physical well-being of cyberbullying victims and social media users. Deductive research aims at testing an existing theory and in this case the existing theory can be deduced from the problem statement and hypothesis. This theory for the research project implies that decreased or no exposure to cyberbullying content on social media will statistically and in practice improve the emotional, physical, mental, and psychological well-being on regular social media users and cyberbullying victims, which is what this research project aims to do. Hence the choice of the deductive research approach. 5.2 METHODOLOGICAL CHOICE For this research project a mixed method methodology has been chosen. Mixed methods research is an approach involving collecting both quantitative and qualitative data and integrating the two forms of data. The reasoning for this choice is because it is the best research approach to gauge the effectiveness of the automatic cyberbullying detection system and test if the objectives of the research were achieved. For the quantitative part of the research, different surveys, and experiments to test how much cyberbullying content is censored from the regular social media users experience will be conducted. This is necessary to see if the cyberbullying detection system is effective enough in actively preventing/ completely stopping cyberbullying incidents from occurring, thus fulfilling one of the research objectives. For the qualitative part of the research, different interviews, focus groups and participant observation studies will be conducted and focus will be on the overall well-being of the social media users where the cyberbullying detection system has been implemented. This is necessary to see if the cyberbullying detection system is effective in improving the psychological, mental, physical, and emotional wellbeing/ satisfaction levels of the social media users, thus fulfilling one of the research objectives. Therefore, both quantitative and qualitative research approaches are necessary, to fulfil the objectives of the research and to test the effectiveness of the system. Hence the mixed methods methodology choice. 9 5.3 RESEARCH STRATEGY The research strategy had to be chosen carefully for the research project, in ensure that the research aligns with the aim of the project which is to determine if the detection of cyberbullying on social media with the use of natural language processing is possible and if so, how it can be implemented. Another part of the aim of the project is to determine if the detection of cyberbullying on social media with the use of natural language processing can be implemented in a way that positively changes the effects that cyberbullying has on the psychological, physical, emotional, and mental well-being of regular social media users and victims of cyberbullying in general. Considering all of this, the first step in the research strategy will involve a pipeline which extracts suitable data from social media sites, and various online sources and the goal with this data will be to classify if a remark or status/post from a social media user can be classified into a “cyberbullying’ category or not. After data collection and extraction, the second step will involve the pre-processing and categorization of the data. This includes activities such as noise reduction, lowercasing, lemmatization and discarding spam content. “Cyberbullying” and “non-cyberbullying” categories will be made, and additional information will be fed into the system about the data in these categories, because it improves the natural language processing learning model (Sharma, et al., 2018). To increase the precision of the model, the next step of the research would be feature engineering – which refers to the process of using known knowledge to select and transform the relevant variables from raw data when creating a predictive model using machine learning. Examples of possible features that the cyberbullying detection system using NLP could entail are counting the number of offensive words in a sentence, counting the number of positive words in a sentence, etc… The final steps of the research project will be to implement the system on social media sites and evaluate its performance. It will use its features for probabilistic classification and detecting of cyberbullying content on social media. For the quantitative part of the research various accuracy metrics such as test accuracy score, cross-validation score, etc... will be used to evaluate the performance of the system and for the qualitative part of the research various surveys, interviews and focus groups will be implemented to evaluate the performance of the cyberbullying detection system on their psychological, emotional, mental, and physical well-being. 10 5.4 RESEARCH DESIGN 5.4.1 DATA COLLECTION For the initial data collection phase of the research project the system will have to get raw data sets. Data sets for cyberbullying usually consist of user comments, posts, images, and videos found on and social media sites. There are multiple places to retrieve such data such as the UCI Machine Learning Repository which encapsulates a large source of open-source datasets for data analysis purposes (Sharma, et al., 2018). Other places to obtain this initial raw data are ‘Kaggle’ where individuals and businesses contribute data for research purposes, FormSpring.Me, MySpace, the Twitter API’s, – the streaming one of which, gives one access to all tweets as they publish on Twitter – and from extracting comment threads from YouTube videos that are suspected to potentially ignite hate speech. This data is expected to provide the initial information that the cyberbullying system will use and as these classify as actual data from their sources the data can be deemed as reliable. 5.4.2 POPULATION AND SAMPLE The population field for this research will be all social media users. For the data collection and preprocessing stage all gathered data will be used. For the testing and evaluation stage, where the psychological, physical, emotional, and mental well-being of social media users will be evaluated and compared before and after the implementation of the cyberbullying detection system only a sample of people in the adolescent age group will be used and tested on, as this seems to be the group that is mostly affected by negative effects of cyberbullying because of their extended exposure to social media, when compared to other age groups (Amanda Lenhart, 2007). 5.4.3 EXPERIMENTATION, EVALUATION AND INTERPRETATION OF THE RESULTS All the raw data and information will then be used for the purpose of building a machine learning, specifically a natural language processing model which will be referred to as the cyberbullying detection system. Four classifiers/models will be trained to detect cyberbullying content. These models being Logistic Regression, Support Vector Machine, Random Forest Classifier and Gradient Boosting Machine. For the quantitative part of the research the models will be provided with the training and test datasets from the data collection phase and the evaluation of their performance will be measured 11 with train accuracy, test accuracy, AUC score and cross-validation scores. These test scores will be analysed to see if the system has achieved its objectives. The qualitative part of the research will be able to be tested in real time as the cyberbullying detection system will have to be deployed on selected social media sites, which will require adoption from the various social media site companies, such as Twitter and YouTube, or it can be deployed locally as a ‘cyberbullying content blocker’ on one’s personal devices which they use to browse social media sites. The qualitative part of the research can then be tested by conducting surveys, interviews and focus group meetings to gauge the overall satisfaction level that the sample group has for their psychological, mental, emotional, and physical well-being prior to the systems deployment. The same surveys, interviews and focus group meetings will then be conducted after the systems deployment and the satisfaction levels of the sample groups psychological, physical, emotional, and mental well-being will be compared to those of before to test if the system has achieved its objectives. 5.5 TIME HORIZON The nature of the research project does not require a lot of time for the data collection and data detection phases, it is the testing phase that will take the longest time. The research will take an estimated 6 months to complete in its entirety. The first month will be used to gather enough data and information to build the NLP model and its feature classification with and to conduct surveys, interviews and focus group meetings on the adolescent sample group that are active on the social media platforms where the cyberbullying detection and censoring system will be implemented. This will be to gather data and information on their current psychological, emotional, mental, and physical well-being satisfaction levels. The system will then be deployed in the 2nd month and left to run and operate for the following 4 months. In the last month of the research the same surveys, interviews and focus group meetings will be had with the adolescent sample group brought in before. Their psychological, mental, physical, and emotional well-being satisfaction levels will be evaluated again and compared to the data gathered the first time. These results will be put against the objectives of the system to see if the experiment was effective and evaluation reports will be constructed, to look at the effectiveness and efficiency of the system and what can potentially be improved. 12 5.6 ETHICS The nature of the project presents potential risks with regards to the testing method of the overall system. The data collection and gathering process of the cyberbullying system does not present any ethical challenges as the initial data used for model feature selection and training the model are made available to the public for reasons such as this research project’s one, which is building, training, and testing machine leaning/ natural language processing models. The issue is presented where adolescents participate in surveys, interviews and focus groups to gauge their psychological, physical, emotional, and mental well-being satisfaction levels. This presents an ethical/ moral challenge because the psychological, emotional, mental, and physical well-being satisfaction levels of human beings is being used for testing data. In some respects, this can be deemed as inhumane. However, for the adolescents to participate in this study they will be made aware of what the testing process is and how it may affect them before ethical consent is gathered from them. All participants data and responses to the surveys, interviews and focus group questions will be kept confidential, and participants can choose to remain anonymous, as this process will be voluntary. Participants can also choose to withdraw form the study at any time if they choose to. The collected raw data, and final report of this research project will be kept in confidentiality by the Belgium Campus institution for 2 years after its submission. 6 DELINEATIONS, TIMELINE, BUDGET AND LIMITATIONS 1. Delineations For this research project, the research will not cover the exact information asked from the sample group who will participate in the surveys, interviews and focus groups, neither will their confidential data be shared or covered. This research project will also not go into the details of the sample group, such as when they use social media the most frequently or the activities they are doing/ participate in on the social media sites. 2. Timeline The timeline of the research will be over the course of the 2023 year, beginning in February and being completed in August. 3. Budget 13 The budget limitations for this research project are that all research and additional processes should be able to aspire ‘free-of-charge’. All research, evaluation and testing efforts will be done using open source/ free technologies. 4. Limitations The limitations in this research project are presented by the chosen research method. A limitation presented by this research project is the workload. Being a research approach that makes use of both quantitative and qualitative research methods, the research result takes a lot of time and effort to produce, and this has the potential to extend the given timeline for the project. Another limitation the chosen research method presents is differing or conflicting results. Having both quantitative and qualitative research leaves room for the different research methods to conflict with each other. For example, the quantitative results in this research project could portray that lots of cyberbullying content is being detected and successfully censored, however the qualitative research results could portray a further drop in the psychological, mental, emotional, and physical well-being levels of the adolescent group or regular social media users in general. This would present a challenge because only some of the objectives will be fulfilled and it will be harder figure out the cause of the seen effect. 7 ASSUMPTIONS The assumptions made for this research project are that the sample group for the psychological, mental, physical, and emotional well-being testing are all adolescents and that the vast majority of them do not have other present mental illnesses or disorders that may skew the given data. Another assumption that will be made is that the adolescent group is a group of regular social media users, which would suggest that they are actively using a social media platform(s) which means that they use social media platforms for the average 10 hours of the week (Chaffey, 2022). Another assumption is that the adolescents used in the sample testing group will be honest about the state of their psychological, mental, physical, and emotional health throughout the duration of the research study, so as to not skew the data. 8 OUTCOMES, CONTRIBUTION AND SIGNIFICANCE 1. The reason the work is worth doing. 14 This research is worth doing because of the potential positive effect that it may have on the psychological, mental, emotional, and physical well-being of the adolescents in the sample group and effectively all regular social media users. 2. Outcomes If successful, the product of the research will be a cyberbullying detection system that operates on social media and censor’s offensive and potential cyberbullying content from other users or completely does not allow a social media user to post their offensive comment/ media because of its cyberbullying content. Thus, positively impacting the psychological, mental, emotional, and physical well-being of social media users and previous cyberbullying victims. 3. Contribution The research study will provide a way a way in which an NLP system can detect cyberbullying. The methods used might be new a method and the implementation will contribute a different methodology to address the cyberbullying detection topic. Thus, providing more valid research to the field of cyberbullying detection and natural language processing techniques. 4. Significance The significance of the research is that it will provide a way in which in a natural language processing system can be successfully implemented to detect cyberbullying in its context on social media to prevent, stop or reduce further cyberbullying from happening and its negative effects on social media users going forward. 9 CONCLUSION It can be deduced that since the advent of the internet and distribution of digital devices amongst the youth, traditional bullying has taken an online form called cyberbullying. Adolescents are more likely to be exposed to this cyberbullying due to their frequent exposure to social media and this has an extremely negative effect on their psychological, physical, emotional, and mental well-being. This research proposal thus saw it necessary to conduct a research study on a system that can automatically detect cyberbullying incidents that happen on social media platforms and then proceed to censor such content from other users or completely disallow such content from being posted on these social media platforms. This research studies approach will be deductive in nature and have a mixed method approach to entirely gauge the effectiveness and efficiency of study and cyberbullying detection system. 15 In conclusion, this research study will be executed and carried out according to the aforementioned plans and it can be deemed as necessary to help solve or alleviate the problem and the negative effects of cyberbullying on social media, should its research objectives - which include: contributing to the prevention of the cyberbullying problem by transferring and testing the developed cyberbullying detection methods on mobile devices, identifying cyberbullying text on social media using natural language processing and understanding its meaning and context, and figuring out the best implementation of NLP to detect cyberbullying to such an extent that the psychological, mental, emotional, and physical well-being of cyberbullying victims and regular social media users improves - be achieved. 16 References Al-Garadi, M. A., Varathan, K. & Ravana, S. D., 2016. Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network. Computers in Human behaviour, Volume 63, pp. 433 - 443. Alotaibi, M., Alotaibi, B. & Razaque, A., 2021. A Multichannel Deep Learning Framework for Cyberbulliyng Detection On Social media. Electronics, 10 (2664), pp. 1 - 14. Amanda Lenhart, M. M. A. M. A. S., 2007. Teens and Social media, Washington DC: Pew internet and American life project. Boyd, D., 2009. Why Youth (Heart) Social Network Sites: The Role of Networked Publics in Teenage Social Life, Cambridge: Berkman Center. Burnap, P. & Williams, M. L., 2015. Cyber Hate speech on twitter: An application of machine classification and statistical modelling for policy and decision making. Policy \& internet, 7(2), pp. 223 - 242. Capua, M. D., Nardo, E. D. & Petrosino, A., 2016. Unsupervised Cyberbullying Detection in Social Networks. Cancun, International Conference on Pattern Recognition(ICPR). Chaffey, D., 2022. Global social media statistics research summary 2022. [Online] Available at: https://www.smartinsights.com/social-media-marketing/social-media-strategy/new-global-socialmedia-research/ [Accessed 14 May 2022]. Dadvar, M., Trieschnigg, D. & Jong, F. d., 2014. Experts and Machines against Bullies: A hybrid approach to detect cyberbullies. Spring International Publishing, 8436(1), pp. 275-281. Farhangpour, P., Maluleke, C. & N.Mutshaeni, H., 2019. Emotional and academic effects of cyberbullying on students in a rural high school in the Limpopo province, South Africa. South African Journal of Information Managemnt, 21(1), pp. 1 - 8. Gao, L. & Huang, R., 2017. Detecting online speech using context aware models. arXiv preprint arXiv:0710.07395. Gitari, N. D., Zuping, Z., Damien, H. & long, J., 2015. A lexicon-based approach for hate speech detection. International journal of Multimeida and Ubiquitous Engineering, 10(4), pp. 215 - 230. Goodno, N. H., 2011. How public schools can constitutionally halt cyberbullying: A model cyberbullying policy that considers first amendment, due process, and fourth amendment challenges.. Wake Forest L. Rev., 46(1), p. 641. Gordon, Available S., 2021. at: The Real-Life Effects of Cyberbullying on Children. [Online] https://www.verywellfamily.com/what-are-the-effects-of-cyberbullying-460558 [Accessed 14 May 2022]. 17 Gordon, S., Available 2022. What at: Is Cyberbullying?. [Online] https://www.verywellfamily.com/types-of-cyberbullying-460549 [Accessed 14 May 2022]. Hee, C. V. et al., 2018. Automatic detection of cyberbullying in social media text. Plos One, 13(10). Hee, C. V. et al., 2015. Automatic detection and prevention of cyberbullying. s.l., Internation Conference on Human and Social Analytics. Hinduja, S. & Patchin, J. W., 2008. Cyberbullying: An Exploratory Analysis of Factors Related to Offending and Victimization. Deviant Behavior, 29(2), pp. 129 - 156. Hosseinmardi, H. et al., 2015. Detection of cyberbullying incident on the instagram Social network. arXiv preprint arXiv, 1503(3909). Huang, C. & Sushkov, M., 2016. InstaNet: Object Cliassification Applied to Instagram image Streams, California: Standford Computer Science. Ito, M. et al., 2008. Living and Learning with New Media: Summary of Findings from the Digital Youth Project, Chicago: The MacArthur Foundation. Kontostathis, A., Reynolds, K., Garron, A. & Edwards, L., 2013. Detecting cyberbullying: query terms and techniques. New York, Proceedings of the 5th annual ACM Web science conference. Kowalski, R. M., Limber, S. P. & Agatston, P. W., 2012. Cyberbullying: Bullying In The Digital age. s.l.:John Wiley & Sons. Lee, C. S. & Ma, L., 2012. News sharing in social media: The effect of gratifications and prior experience. Computers in human Behaviour, 28(2), pp. 331 - 339. Lopez-Meneses, E., Vazquez-Cano, E., Gonzalez-Zamar, M.-D. & Abad-Segura, E., 2020. Socioeconomic Effects in Cyberbullying: Global Research Trends in the Educational Context. International Journal of Environmental Research and Public Health, 17(12), p. 4369. Navarro, R., Yubero, S. & Larranaga, E., 2016. Cyberbullying Acroos the Globe. 3rd ed. Cuenca, Spain: Springer International Publishing Switzerland. News24, 2021. Parents share their top five digital concerns in local survey. [Online] Available at: https://www.news24.com/parent/family/parenting/parents-share-their-top-five-digital-concernsin-local-survey-20210325 [Accessed 14 May 2022]. Nobata, C. et al., 2016. Abusive language detection in online user content. s.l., Proceedings of the 25th International Conference on World Wide Web. 18 Patchin, J. W. & Hinduja, S., 2006. Bullies move beyond the schoolyard: A preliminary look at cyberbullying. Youth Violence and Juvenile Justice, 4(2), pp. 148 - 169. Pheto, B., 2021. More than half of SA's children have been cyberbullied, survey finds. [Online] Available at: https://www.timeslive.co.za/news/south-africa/2021-03-10-more-than-half-of-sas-children-havebeen-cyberbullied-survey-finds/ [Accessed 14 May 2022]. Roux, R. l., Rycroft, A. & Orleyn, T., 2010. Harassment in the workplace - Laws, policies and processes. 1 ed. Johannesburg: LexisNexis South Africa. Sanchez, H. & Kumar, S., 2011. Twitter bullying detection. ser. NSDI, 12(2011), pp. 1 - 15. Schmidt, A. & Wiegand, M., 2017. A survey on hate speech detection using natural language processing. Valencia, Spain, Proceedings of the 5th International Workshop on Natural language processing for social media. Sharma, H. K., Kshitiz, k. & Shailendra, 2018. NLP and Machine Learning Techniques for Detecting Insulting Comments on Social Networking Platforms. Paris, International Confernce on Advances in Computing and Communication Engineering. Smit, D., 2015. Cyberbullying in South African and American schools : a legal comparative study. Sabinet African Journals, 35(2), pp. 1 - 11. Soni, D. & Singh, V. k., 2018. See no Evil, hear no evil, Audio-Visual-Textual cyberbulliny detection. Proceedings of the ACM on human-Computer Interaction, Volume 2, pp. 1 - 26. Stop Bullying.Gov, Available 2021. What at: Is Cyberbullying. [Online] https://www.stopbullying.gov/cyberbullying/what-is-it [Accessed 14 May 2022]. Sue North, S. B. & Snyder, I., 2008. Digital Tastes: Social Class and Young People's Technology Use. 1 ed. London: Routledge. Weber, N. L. & William V. Pelfrey, J., 2014. Cyberbullying. Causes, Consequences, and Coping Strategies. 1 ed. United States of America: LFB Scholarly Publishing LLC. Wet, C. D., 2011. The professional lives of teacher victims of workplace bullying: A narrative analysis. Perspectives in Education, 29(4), pp. 66 - 76. Yin, D. et al., 2009. Detection of harassment on web 2.0. Proceedings of the Content Analysis in the WEB, Volume 2, pp. 1-7. Zhang, X. et al., 2016. Cyberbullying Detection with a Pronunciation Based Colvulutional Neural Network. s.l., 15th IEEE International Conference on Machine Learning and Applications, pp. 740 - 745. 19

Cyberbullying Detection NLP Thesis-Proposal

Related documents

Products

Support

Cyberbullying Detection NLP Thesis-Proposal

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib