Mini Project Report on Online Conversation Scanner Submitted to Ajay Kumar Garg Engineering College, Ghaziabad B.Tech. Information Technology Sem 5TH, 2023-24 CODE: KCS-554 Project Mini Project or Internship Assessment Report Submitted To: Submitted By: Mrs. Shikha Aggarwal Sunakshi Singh (2100270110083) Uzair Ali (2100270110092) Udit Singh (2100270110089) Dr. A.P.J. Abdul Kalam Technical University, Uttar Pradesh, Lucknow 1 CERTIFICATE This is to certify that the Seminar entitled “ONLINE CONVERSATION SCANNER” has been submitted by SUNAKSHI SINGH, UZAIR ALI AND UDIT SINGH under my guidance in partial fulfillment of the degree of Bachelor of Engineering in COMPUTER SCIENCE in semester 5TH of AJAY KUMAR GARG COLLEGE OF ENGINEERING, GHAZIABAD, UTTAR PRADESH during the year 2023-2024. Date: 11-12-23 Place: Ghaziabad Guide Head HOD Mrs. SHIKHA AGGRAWAL DR. RAHUL SHARMA 2 ACKNOWLEDGEMENT I would like to express my deepest thanks to Mrs. Shikha Aggrawal, a mini project advisor, for her cooperative attitude and consistent guidance, due to which I was able to complete my project successfully. I would like to express my sincere gratitude to Dr. Anupama Sharma(H.O.D. CSIT Department), Ajay Kumar Garg Engineering College, Ghaziabad and Dr R. K. Aggarwal (Director General), Ajay Kumar Garg Engineering College for allowing me to pursue my choice of project. They gave me valuable guidance and support. I wish to thank various people in my college, Ajay Kumar Garg Engineering College, for their valuable guidance. I received practical as well as theoretical knowledge and experience in this project. Finally, last but by no means least, a paper is not enough for me to express the support and guidance I received from them. Sunakshi Singh (2100270110083) Uzair Ali (2100270110092) Udit Singh (2100270110089) 3rd year CSIT 3 TABLE OF CONTENTS S.NO CONTENTS PAGE NO. Abstract 5-7 Objectives 8-9 3 Introduction 10 4 Review of literature 11 - 15 5 Design and implementation 16 - 19 6 Code and output 20 7 Limitations 21 8 Conclusion 22 9 Future Scope 23 10 References 24 1 2 4 INTRODUCTION The Online Conversation Scanner is a powerful tool for uncovering valuable insights from conversational data. By analyzing chat logs, it provides deep understanding of behaviour and preferences. This presentation will explore the key features and benefits of using the Online Conversation Scanner for conversation analysis. It is a software tool that allows users to analyse conversations in a variety of ways. Users can view chat frequency, message types, word count, and more. It's a great way for individuals and businesses to gain insights into their messaging patterns and identify areas for improvement in communication. The tool is easy to use and provides valuable data that can be used to make informed decisions about messaging strategies. Online conversation has become an indispensable part of our daily communication, with billions of users exchanging messages, photos, videos, and voice notes every day. However, with the increasing use of WhatsApp for personal and professional communication, there has been a growing need for tools that can analyze and interpret the data generated on this platform. This is where WhatsApp chat analyzers come in - they help users to understand their communication patterns, identify trends, and gain insights into their relationships with others. An online conversation scanner is a sophisticated tool designed to monitor and analyze digital dialogues, texts, or discussions across various online platforms. Utilizing advanced algorithms, it scans text-based interactions, identifying keywords, sentiments, and context to extract meaningful insights. It is a software tool that uses natural language processing (NLP) and machine learning algorithms to analyze the text-based content of WhatsApp chats. It can identify keywords, phrases, and sentiment (positive, negative, or neutral) in the messages exchanged between users. The tool can also categorize the messages based on topics or themes discussed in the chat. The benefits of using a online conversation scanner are numerous. Firstly, it helps users to understand their communication style and identify areas for improvement by analyzing the frequency and tone of messages exchanged, users can learn to communicate more effectively and efficiently. For instance, if the tool identifies that a user tends to use negative language frequently, it may suggest ways to improve communication skills and promote a more positive approach. 5 Secondly, it can help users to identify trends in their communication patterns. By analyzing the topics discussed in chats over a period of time, users can gain insights into their interests, preferences, and priorities. This information can be useful for personal growth and development as well as for making informed decisions about career choices or educational opportunities. Thirdly, it can help users to manage their relationships better. By identifying patterns of communication between individuals or groups, users can understand the dynamics of their relationships and take proactive steps to improve them. For instance, if the tool identifies that a user tends to communicate less frequently with certain individuals or groups, it may suggest ways to reconnect and strengthen those relationships. Fourthly, it can help users to monitor their online reputation. By analyzing the messages exchanged between users in a group or community, the tool can identify potential issues or conflicts that may affect the user's reputation. This information can be useful for taking proactive steps to address these issues before they escalate into bigger problems. By leveraging natural language processing (NLP) and machine learning techniques, these scanners can detect patterns, trends, or potential risks within conversations. They serve diverse purposes, from social media sentiment analysis to cybersecurity threat detection and content moderation. With the ability to sift through vast amounts of data swiftly, an online conversation scanner aids in understanding user behavior, sentiment trends, and emerging issues. It can be pivotal in enhancing customer experiences, identifying potential threats, ensuring compliance, and providing actionable insights for businesses, organizations, or platforms. The scanner's adaptability and ability to evolve alongside language nuances make it a valuable asset in navigating the dynamic landscape of online communication. 6 In conclusion, it is a powerful tool that can provide users with valuable insights into their communication patterns, trends, relationships, and online reputation. By using this tool regularly, users can learn to communicate more effectively and efficiently, manage their relationships better, and monitor their online reputation proactively. As WhatsApp continues to be an integral part of our daily communication, it is essential to use tools like these to make the most out of this platform's potential benefits. 7 REVIEW OF LITERATURE 1.Literature review on Chat Analysis: A survey analysis on the usage and impact of Messages has been conducted and various studies and analysis have been found. In the survey it was found that in the southern part of India, ages 18 to 23 spend around 8 hours using online apps and sometimes be online almost 12-16 hours a day. Most of them agreed to be using whatsapp or on any other site. They exchange images, audios and videos. This survey also proved that the whatsapp has been the most widely used app on the smart phones than any other app. This survey was conducted to know the positive and negative impacts of using whatsapp. As we can know that from this survey, whatsapp is most used app by the youth and other generations so, our project can give them the insights of their chats and provide them unknown facts. 2.Literature review on Web Design: Internet Users are reaching millions and can be expected to increase more over the years. The websites are the crucial media of information, transmission, dissemination.[5] Current paper purposes to review previous studies that have been done in the field of web development. As the result, literatures either proposed set of guidelines or assistive technologies particularly web interfaces, adaptive systems. The purpose of this paper is to analyse and know the users' perceptions and behaviors, in order to achieve a successful e-commerce website. According to a survey (Lee & Kozar, 2012) there is currently no consensus on how to properly operationalize and assess website usability. Right now we do not have any guidelines that individuals can follow when designing websites to increase users engagement. • "Hypertext" are the links to connect web pages to one another, either within a single website or between websites. Links are a fundamental aspect of the Web, by uploading content to the Internet and linking it to pages. • HTML uses "markup" to annotate text, images, and other content for display in a Web browser to describe the presentation of a document written in HTML or XML. • CSS is the core languages of the open web, standardized across Web browsers according to W3C specifications. CSS describes how elements should be rendered on screen, on paper, in speech, or on other media means like the styling part of the webpage. The literature surrounding online conversation scanners presents a multifaceted view of their technological capabilities and societal implications. Research in this domain focuses on various aspects, including the underlying algorithms, natural language processing techniques, and machine learning models employed to extract insights from digital 8 conversations. Studies delve into sentiment analysis, topic modeling, and entity recognition as core functionalities, highlighting their applications in diverse fields such as social media monitoring, cybersecurity, and content moderation. Researchers often explore the ethical considerations surrounding these tools, emphasizing the balance between privacy concerns and the need for effective monitoring in ensuring online safety. Moreover, the evolution of online conversation scanners in response to the ever-changing landscape of online communication and the challenges posed by multilingualism, slang, and contextual understanding remains a central theme. Overall, the literature underscores the significance of these scanners in deciphering the complexities of digital discourse while addressing the ethical, technical, and societal implications of their implementation. 9 ABSTRACT Text conversation has been the most used mode of communication and has been an efficient one too. It consists of many conversations in groups and individuals. So, there might be some hidden facts in them. This project takes those chats and provide a deep analysis of that data. Being any topic, the chats are it provide the analysis in an efficient and accurate way. The main advantage of this project is that it has been built using libraries like pandas, matplotlib, emoji etc. They are used to create data frames and plot graphs in an efficient way.The proposed chat analyser is a machine learning-based tool designed to extract insights and patterns from chats. The system utilizes natural language processing techniques to understand the context and sentiment of messages, identify key topics and entities, and track user behavior over time. The analyser can also detect potential issues such as cyberbullying, misinformation, and privacy breaches, making it a valuable resource for parents, educators, and organizations. The system's accuracy and efficiency are enhanced through the use of deep learning algorithms, allowing for real-time analysis of large volumes of data. Overall, this tool has the potential to revolutionize the way we interact with social media platforms by providing a more comprehensive understanding of our digital communication habits. 10 DESIGN & IMPLEMENTATIONS TECHNOLOGY USED 1. Streamlit: Streamlit is a free and open-source python framework. [2] We can quickly develop web apps for Machine Learning and Data Science by using Streamlit. Streamlit can easily integrates with other popular python packages such as NumPy, Pandas, Matplotlib, Seaborn. Streamlit provides fastest way to develop and deploy web apps. 2. Matplotlib: Matplotlib is a popular Python packages used for data visualization. It is a crossplatform library for making plots from data in arrays. It helps in creating static, animated and interactive visualizations in python. 3. Word cloud: Word Cloud is a data visualization library used for representing most frequently used words within a given text. Most frequent and important words are represented in bigger and bolder size. 4. Pandas: Pandas is an open-source python library. Pandas used to convert string data into Data frame. Data frame is the representation of data into 2-dimensional table of rows and columns. We can work with large data sets using Panda library. Panda library has many builtin functions for data analysis, data cleaning, data exploration and data manipulation. DESIGN SOFTWARE REQUIREMENT SPECIFICATION Software requirement specification (SRS) is a technical specification of requirements for the software product.SRS represents an overview of products, features and summaries the processing environments for developmentoperation and maintenance of the product Requirement Specification – Conceptually every SRS should have the components: ●Functionality ●Performance ●Design constraints imposed on 11 ●Implementation External interfaces USE CASE MODEL ●In the use case Diagram the actor is User. ●Users can make use of chat upload use cases to give input to the system. ●Select time format use case describes that user can input the time format of the file in the system. ●Select user use case is to select whose analysis result is desired. ●Users can make use of Show analysis use cases to see the result of the entire analyis. Figure 1.1 ACTIVITY DIAGRAM ●In the activity diagram as the initial activity starts user will upload the file as input which isaction and in the next action time format will be selected. ●The decision box check chat format represents the validity of the time format of the file. ●If the time format is correct then analysis will be done and process will end. ●If the time format is wrong user will have to again check for the correct format 12 Figure 1.2 Implementation Python is a high-level, general-purpose and a very popular programming language. Python programminglanguage (latest Python 3) is being used in web development, Machine Learning applications, along with allcutting-edge technology in the Software Industry. Python Programming Language is very well suited for Beginners. 1. Python is currently the most widely used multi-purpose, high-level programming language. 2.Python allows programming in Object-Oriented and Procedural paradigms. 3.Python programs generally are smaller than other programming languages like Java. Programmers have totype relatively less and the indentation requirement of the language makes them readable all the time. 4.Python language is being used by almost all tech-giant companies like – Google, Amazon FacebookInstagram, Dropbox, Uber… etc. 5.The biggest strength of Python is huge collection of standard libraries which can be used for the following Software requirement for developing application • Jupyter notebook 13 • VS code Technologies • Python and its libraries (streamlit) • ML algorithm • NLTK 14 HARDWARE REQUIREMENTS 1. Processor (CPU) with 2 gigahertz (GHz) frequency or above 2. STORAGE - 1GB 3. DISPLAY-ANY DEVICE. 4. Monitor Resolution 1024 X 768 or higher 5. Internet Connection Broadband (high-speed) Internet connection with a speed of 4 Mbps or higher. Operating System: • Windows 7, Windows 8 or Windows 10 • Mac OSX 10.8, 10.9, 10.10 or 10.11 15 CODE Figure 1.3 Figure 1.4 16 Figure 1.5 Figure 1.6 17 OUTPUT Figure 1.6 Figure 1.7 18 Figure 1.8 Figure 1.9 Figure 2.0 19 OBJECTIVE The primary objective of an online conversation scanner is to systematically analyze and interpret digital dialogues, texts, or discussions occurring across various online platforms. Its core goals include: 1. Insight Extraction: To extract valuable information, sentiments, trends, and patterns from conversations. 2. Risk Detection: To identify potential risks, threats, or problematic content, such as cyber threats, hate speech, or inappropriate behavior. 3. Monitoring and Surveillance: To track and monitor discussions for various purposes, including brand reputation management, customer sentiment analysis, and public opinion monitoring. 4. Enhanced Decision-Making: To provide actionable insights for businesses, organizations, or platforms to make informed decisions regarding customer engagement, content moderation, or security measures. 5. Adaptability and Evolution: To continuously evolve and adapt to changing linguistic nuances, slang, and emerging online communication trends, ensuring relevance and effectiveness in understanding digital conversations. We can say that the capabilities of the chat application and the power of the python programming language in implementing our data analysis intended, cannot be overemphasized. The system was done with python, and the python libraries that were implemented includes, Streamlit, Emoji, NumPy, Pandas and Matplotlib . Finally results that we intended were obtained. The future of our project is it is mainly useful for organisers. Then will get to know who is more and least active in the group. Depending on that they can take decisions. 20 LIMITATIONS • It has some limitations, including the inability to analyze photos and other media, as it only focuses on text. • Additionally, it cannot determine sarcasm, humor, or other subtleties in the conversation, which may affect the accuracy of the analysis. • The tool also lacks a sentiment analysis feature, which could provide insights into the emotions expressed in the chat. • Lastly, it may not be suitable for analyzing large group conversations as it may be challenging to identify specific individuals' sentiments and opinions within the chat. • Certainly, here are some limitations associated with online conversation scanners: • Difficulty in accurately grasping nuanced context, sarcasm, or cultural references within conversations, leading to potential misinterpretations. • Struggles with multilingual conversations or dialects, impacting the accuracy of analysis and sentiment identification, especially in languages with varying nuances. • Challenges in detecting coded or encrypted language, especially in cases involving sophisticated methods of communication that aim to bypass scanning algorithms. • Ethical considerations arise regarding the monitoring of private conversations, raising concerns about user privacy and data protection regulations. • Potential biases within the algorithms used, leading to skewed results or misidentification of sentiments, particularly in sensitive topics or cultural contexts. • Processing delays might occur when dealing with vast amounts of data in real-time, impacting the scanner's effectiveness in swiftly identifying and responding to emerging issues. • Inability to access or scan private or encrypted conversations, limiting the comprehensiveness of the analysis and potentially missing crucial information. • Addressing these limitations often requires continual refinement, advancements in machine learning, and an ongoing effort to balance accuracy with privacy and ethical considerations. • Another limitation is the privacy concerns that arise from monitoring customer conversations, as some individuals may not want their personal information or opinions shared with third parties. 21 CONCLUSION The advent of an online conversation scanner marks a pivotal milestone in digital communication. By employing advanced algorithms and linguistic analysis, this innovative tool sifts through textual exchanges, identifying nuanced sentiments, intentions, and even potential risks within conversations. Its multifaceted functionality extends beyond mere content comprehension, delving into context, tone, and underlying emotions. Consequently, the scanner serves as a guardian of online spaces, flagging instances of harassment, cyberbullying, or suspicious behavior, fostering a safer digital environment. Its applications span diverse arenas, from social media platforms and forums to customer service interactions and educational settings, augmenting moderation efforts and proactive intervention strategies. However, while empowering in its capabilities, ethical considerations remain crucial. Balancing privacy concerns with the imperative to ensure safety and wellbeing is imperative. As this technology evolves, collaborative efforts between developers, users, and policymakers become imperative to cultivate a digital sphere where expression thrives harmoniously within a framework of safety and respect. 22 FUTURE SCOPE The future scope of Online conversation scanner is vast as it has the potential to provide valuable insights into group conversations, brand awareness, customer behavior, and sentiment analysis. The tool's features can be expanded to include real-time analysis, integration with other popular messaging platforms, and automated report generation. It can also be used by businesses and organizations to improve their communication strategies and enhance customer experience by understanding their needs and requirements. The data derived from the tool can help in making informed decisions and improving overall efficiency. The future of online conversation scanners appears promising, with several avenues for advancement and evolution. Enhancements in natural language processing, coupled with machine learning techniques, are expected to refine these scanners, enabling a more nuanced understanding of contextual cues, colloquialisms, and cultural references within conversations. The integration of advanced AI models will likely improve sentiment analysis accuracy, allowing for better identification and handling of nuanced emotions. Moreover, the application of these scanners could expand into new domains, such as mental health monitoring through analyzing online discourse patterns for early detection of potential issues. With the rising concerns surrounding misinformation and fake news, online conversation scanners are poised to play a pivotal role in combating disinformation by swiftly identifying and flagging misleading content. Additionally, the fusion of these scanners with emerging technologies like blockchain might address privacy concerns by offering secure and transparent ways to analyze conversations without compromising user data. As these tools continue to evolve, their potential applications span across industries, from personalized customer experiences to proactive cybersecurity measures, showcasing a dynamic and impactful future scope for online conversation scanners. 23 REFERENCES [1] Ravishankara K, Dhanush, Vaisakh, Srajan I S, “International Journal of Engineering Research & Technology (IJERT)”, ISSN: 2278-0181, Vol. 9 Issue 05, May-2020 [2] https://www.analyticsvidhya.com/blog/2021/06/build-web-app-instantly-for-machinelearningusing-streamlit/ [3] Meng Cai, “PubMed Central”, PMCID: PMC7944036, PMID: 33732917 [4] Dr. D. Lakshminarayanan, S. Prabhakaran, “Dogo Rangsang Research Journal”, UGC Care Group I Journal, Vol-10 Issue-07 No. 12 July 2020 [5] https://www.i 24