PROBATIONERS CLASSIFICATION IN AGUSAN DEL NORTE USING K-MEANS CLUSTERING ALGORITHM A research study Presented to the DIT Faculty of the Graduate School Technological Institute of the Philippines Quezon City In Partial Fulfillment of the Requirements for the Degree Doctor in Information Technology MERIAM A. ENALDO June 2021 ABSTRACT An increasing number of individuals serving their penalty inside and outside prisons and jails led to the adaptation of modern technology and computer-aided diagnosis system, tools, methods and techniques in analyzing and understanding the data gathered in relation to rehabilitation and reformation of offenders. This paper clusters the municipalities of Agusan del Norte using K-Means algorithm. This is essential in finding the similar characteristics, patterns and value of the number of probationers serving their sentence within the said municipalities from January to June 2021. Results showed Cabadbaran City to have the highest number of Probationers while RTR has the lowest recorded numbers of Probationers. The result also showed that Male probationers has the highest number in all the municipalities of Agusan del Norte compared to female probationers. Future researchers may use other data mining techniques that can be most instrumental in their studies related to this paper. CHAPTER 1 INTRODUCTION Probation is a program under the Correction Pillar of the Philippine Criminal Justice System that was formulated for the seek of decongesting the prisons and jails in the country at the same time to rehabilitate offenders without putting them behind bars. A lot of studies were conducted in order to understand the nature and origin of crime but only few had been done in trying to understand the lives of those individuals during and after serving the years of punishment as a result of committing the said crimes. That is why further research and studies must be done in order to formulate a more effective solution to address these long time problem in the society. Data Mining is one of the most effective method of analyzing different data and information in all sectors of the society to come up with the best alternative solution of the issues being presented. Through data gathering with the use of different techniques in data mining, identifying the most affected sector and finding the best solution is attainable and achievable. The ulitimate goal of this paper is to identify the municipalities with most number of probationers in Agusan del Norte in order to formulate effective programs that can help the government and the involved individuals to pinpoint of which area do they should address first in order to attain their desired result in the Probation Program. Other aim of this paper is to identify what gender has the most number of being a probationer in order to relate the rehabilitation and reformation programs based on their needs and circumstances. CHAPTER 2 REVIEW OF RELATED LITERATURE Data mining is known as Knowledge Discovery in Database (KDD). It is also defined as the process which includes extracting the interesting, interpretable and useful information from the raw data. This is the main reason the applications of data mining are increasing rapidly (Madni, Anwar, and Shah 2017). K-means clustering algorithm will help finding identical traits, patterns and values in categorizing municipalities with much, more and most number of recorded index and non-index crimes from Surigao City area. (Delima 2019). The implementation of K-means algorithm in analyzing a crime is essential for providing safety and security to the civilian population. Using this data mining technique, we can discover critical information which can help local authorities detect crime and areas of importance. This paper is to analyze the crime which entails theft, homicide and various drug offenses which also include suspicious activities, noise complaints and burglar alarm by using qualitative and qualitative and quantitative approach. Using K-means clustering data mining approach on a crime dataset from New South Wales region of Australia, crime rates of each type of crimes and cities with high crime rates have been found (Joshi, Sabitha, and Choudhury 2017). Furthermore, implemented data mining techniques to understand certain trends and pattern of terrorist attacks in India. K-means clustering was used to determine the year wherein the terrorist groups were most active and also which terrorist group has affected the most. The experimental result is implemented in Rapidminer tool to determine the active group and the affected year (Gupta et al. 2018). This paper, Data mining applications are utilized in many banking sectors for client segmentation and productivity, credit scores and authorization, predicting payment default, advertising, detecting fake transactions, etc. This paper presents a general idea about the model of Data Mining techniques and diverse cyber crimes in banking applications.It also provides an inclusive survey of competent and valuable techniques on data mining for cyber crime data analysis. The objective of cyber crime data mining is to recognize patterns in criminal manners in order to predict crime anticipate criminal activity and prevent it.This paperimplements a novel data mining techniques like K-Means, Influenced Association Classifier and J48 Prediction tree for investigating the cyber crime data sets and sorts out the accessible problems. The K-Means algorithm is being utilized for unsupervised learning cluster within influencedAssociation Classification. K-means selects the initial centroidsso that the classifier can mine the record and formulate predictions of cyber crimes with J48 algorithm. The collective knowledge of K-Means, Influenced Association Classifier and J48 Prediction tree tends certainly to afford a enhanced, incorporated, and precise result over the cyber crime prediction in the banking sectorsOur law enforcement organizations require to be adequately outfitted to defeat and prevent the cyber crime (Lekha and Prakasam 2017). In addition, Criminal justice practitioners increasingly seek out efficient means of community supervision supplanting face-to-face interactions with practices that are less onerous to administrators and clients. We examined the differential impact of remote supervision for low-risk probationers by race. Remote reporting greatly reduces or eliminates in-person meetings where race would be salient; however, it also creates conditions where an officer may rely more heavily on heuristics. We found the program drastically reduced violations, but also exacerbated the racial discrepancy in reporting high discretion violations (Saunders et al. 2021). Moreover, Offender rehabilitation seeks to minimise recidivism. Using their experi-ence and actuarial-type risk assessment tools, probation officers in Singapore makerecommendations on the sentencing outcomes so as to achieve this objective.However, it is difficult for them to maximise the utility of the large amounts of datacollected, which could be resolved by using predictive modelling informed by statistical learning methods. Ministry of Social and Family Development for rehabilitation were used to create arandom forests model to predict recidivism. This article identifies how analysis of administrative data at the discretelevel using statistical learning methods is more accurate in predicting recidivism thanusing conventional statistical methods. This provides an opportunity to direct interven-tion efforts at individuals who are more likely to reoffend (Ting et al. 2018). Also, Probation agency performance, probationer outcomes, and publicsafety all depend on the successful implementation of evidence-based practices (EBPs). Yet, EBP implementation is short-lived withincommunity corrections agencies. The current study focused on inter-actions between 834 probation officers and their agencies (six proba-tion jurisdictions) by examining alignment between the use of client-centered communication strategies, perceived agency support, andagency climate. Results showed a significant, negative linear relation-ship between probation officer-agency alignment with regard toEBPs and agency context. Quadratic regression analyses were usedto model the level of the outcome (satisfaction with climate). Takentogether,findings suggest agency climate is: (1) most at risk whenofficers are more comfortable with use of client-centered communication than they feel the agency can support, and (2) not influencedby officers uncomfortable with use client-centered communicationwho perceive the agency supports their use. Failure to recognizethese officer differences can complicate effective implementation ofEBPs in community supervision agencies. New avenues for imple-mentation research are discussed (Blasko et al. 2019). Besides, In criminal justice analytics, the widely-studied problem of recidivism pre-diction (forecasting re-offenses after release or parole) is fraught with ethi-cal missteps. In particular, Machine Learning (ML) models rely on historicalpatterns of behavior to predict future outcomes, engendering a vicious feed-back loop of recidivism and incarceration. This paper repurposes ML to in-stead identify social factors that can serve as levers to prevent recidivism. Ourcontributions are along three dimensions. (1) Recidivism models typically ag-glomerate individuals into one dataset, but we invokeunsupervisedlearningto extract homogeneous subgroups with similar features. (2) We then applysubgroup-levelsupervisedlearning to determine factors correlated to recidi-vism. (3) We therefore shift the focus frompredicting which individuals willre-offendtoidentifying broader underlying factors that explain recidivism, withthe goal of informing preventative policy intervention. We demonstrate thatthis approach can guide the ethical application of ML using realworld data (Shirvaikar and Lakshminarayan 2020). Further, Crime analysis and prevention is a systematic approach for identifying and analyzing patterns and trends in crime. Our systemcan predict regions which have high probability for crime occurrence and can visualize crime prone areas. With the increasing advent of computerized systems, crime data analysts can help the Law enforcement officers to speed up the process of solving crimes.About 10% of the criminals commit about 50% of the crimes. Even though we cannot predict who all may be the victims of crime but can predict the place that has probability for its occurrence. K-means algorithm is done by partitioning data into groups based on their means. K-means algorithm has an extension called expectation -maximization algorithm where we partition the data based on their parameters. This easy to implement data mining framework works with the geospatial plot of crime and helps to improve the productivity of the detectives and other law enforcement officers. This system can also be used for the Indian crime departments for reducing the crime and solving the crimes with less time (Jain et al. 2017). Conceptual Framework The study anchored on the concept of (Delima 2019) but differs in many ways. Although K-Means algorithm was also used, it was utilized to cluster within the city and municipalities in the province of Agusan del Norte to identify areas with much, and more number of probationers recorded. PAROLE PROBATION ADMINISTRATION (CARAGA REGION) K-Means Clustering Algorithm Clustered Statistical Interpretation Fig. 1. Conceptual framework of the study. CHAPTER 3 METHODS AND PROPOSED SOLUTION This paper apply the K-Means Clustering Algorithms to clusters the municipalities of Agusan del Norte. This is essential in finding the similar characteristics, patterns and value of the number of probationers serving their sentence within the said municipalities from January to June 2021. Clustering is an unsupervised data analyzing technique used to divide the same data into the same group and the different data into the other group (Ishaq and Buyukkaya 2017). Probationers datasets of the city and municipalites from the province of Agusan del Norte for the month of January 2021 to June 2021 were used as presented in the Table I in the next chapter. CHAPTER 4 SIMULATION AND RESULTS PROBATIONERS from Agusan del Norte Male Female Municipality 60 4 Butuan 97 9 Cabadbaran 14 0 Carmen 21 2 Jabonga 21 3 Kitcharao 18 2 Las Nieves 28 1 Magallanes 65 5 Nasipit 13 0 RTR 22 3 San Vicente 21 3 Tubay Table I: Probationers Dataset per City and Municipality in the Province of Agusan del Norte. In K-means algorithm, the user specifies the k centroids. This K centroids refers to the number of the wanted clusters. Each cluster must have a centroid that is a mean of a cluster. Then a nearest centroid is assigned to each data record. When all input data records have been assigned, the centroid changes in each cluster and is updated by calculating the mean cluster. These processes will be repeated until the latest centroids do not change (Thammano and Kesisung 2013). The experimental result for clustering was implemented using WEKA (Nafie Ali and Mohamed Hamed 2018). Fig. 2 shows the result which can be interpreted as 0 for the lowest number of probationers and 1 for the highest number of probationers among the municipality of Agusan del Norte. Fig.2: K-Means clustering output in WEKA. Cabadbaran City, Butuan City, and Nasipit, Agusan del Norte were categorized under Cluster 1 showing the highest number of probationers while the rest were categorized under Cluster 0 with the lowest number of probationers from January to June 2021. PROBATIONERS From Agusan del Norte Male Female Municipality Cluster 60 4 Butuan 1 97 9 Cabadbaran 1 14 0 Carmen 0 21 2 Jabonga 0 21 3 Kitcharao 0 18 2 Las Nieves 0 28 1 Magallanes 0 65 5 Nasipit 1 13 0 RTR 0 22 3 San Vicente 0 21 3 Tubay 0 Table II: Clustering Result in WEKA. T ot al numbers of Probationer s from January to June 2021 120 100 80 60 40 20 0 Male Female Fig. 3: Probationers rates in Agusan del Norte. Figure 3 showed that among the Municipalities and Cities of Agusan del Norte, Cabadbaran City had the highest number of probationers mostly on Male probationers. On the other hand, the result also showed Remedios T. Romualdez (RTR) garnered the lowest number of probationers with no female probationer. Highest numbers of Probatione r s from January to June 2021 120 100 80 60 40 20 0 Butuan Cabadbaran Male Nasipit Female Fig.4: Probationers rates in Nasipit, Cabadbaran, and Butuan from Cluster 1. Figure 4 showed municipalities under Cluster 1 showing the highest number of probationers from both gender. Cabadbaran City had the highest number with 97 male probationers and 9 female probationers. Next is Nasipit, Agusan del Norte with 65 male probationers and 5 female probationers. And lastly, Butuan City with 60 male probationers and 4 female probationers. Low est Probat ioner s from January to June 2021 30 25 20 15 10 5 0 Male Female Fig.5: Probationers rates in Different Municipalities from Cluster 0. Figure 5 showed the Probationers rates in different Municipalities from Cluster 0 showing the lowest number of probationers in Agusan del Norte. Remedios T. Romualdez (RTR) garnered the lowest number of probationers with 13 male probationers and no female probationer. CHAPTER 5 SUMMARY, CONCLUSION AND RECOMMENDATIONS Summary This research study is entitled “PROBATIONERS CLASSIFICATION IN AGUSAN DEL NORTE USING K-MEANS CLUSTERING ALGORITHM” that was conducted in the municipalities of Agusan del Norte. This paper clusters the municipalities of Agusan del Norte using K-Means Clustering Algorithm. This is essential in finding the similar characteristics, patterns and value of the number of probationers serving their sentence within the said municipalities from January to June 2021. Results showed Cabadbaran City to have the highest number of Probationers while RTR has the lowest recorded numbers of Probationers. The result also showed that Male probationers has the highest number in all the municipalities of Agusan del Norte compared to female probationers. Conclusion The use of K-Means clustering algorithm made the categorization of the groupings of municipality with identical traits and values attainable. In cluster 1, Cabadbaran City topped the highest number of probationers from January to June 2021, it was followed by the Municipality of Nasipit and Butuan City. While at cluster 0, Remedios T. Romualdez (RTR) showed the lowest number of probationers along with the other identified municipalities in Agusan del Norte with low number of probationers. Among the number of probationers, Male was shown to have the most number in all the municipalities and cities of Agusan del Norte. While female probationers has smaller number compared to male probationers. Remedios T. Romualdez (RTR) and Carmen, Agusan del Norte has no recorded female probationer from January to June 2021. Recommendation Probationers were individuals who committed offenses for the first time or were adjudicated guilty for the first time with transgressions of the law that were not considered heinous crimes under the law, therefore it is best for these humble researcher to recommend that the government must address this issue as fast as possible to avoid this individuals from committing another crimes and be a recidivist. The following are the programs recommended to might have impact the life of probationers even after serving their sentence: 1. Rehabilitation and Reformation Programs through education and skills related activities. 2. Livelihood Programs. Being an offender means smaller chance of having a decent job no matter how professional a person can be. Having a livelihood program that can be used in their daily lives can be a huge help to save probationers from committing crimes again. 3. Information drives for the community to understand the lives of the probationers for them to learn to accept these people back to the community without judgment and scrutiny. REFERENCES Blasko, Brandy L., Jill Viglione, Heather Toronjo, and Faye S. Taxman. 2019. “Probation Officer–Probation Agency Fit: Understanding Disparities in the Use of Motivational Interviewing Techniques.” Corrections 4(1):39–57. Delima, Allemar Jhone P. 2019. “Applying Data Mining Techniques in Predicting Index and Non-Index Crimes.” International Journal of Machine Learning and Computing 9(4):533–38. Gupta, Pranjal, A. Sai Sabitha, Tanupriya Choudhury, and Abhay Bansal. 2018. “Terrorist Attacks Analysis Using Clustering Algorithm.” Pp. 317–28 in Smart Computing and Informatics. Springer. Ishaq, Waqar, and Eliya Buyukkaya. 2017. “Dark Patches in Clustering.” Pp. 806–11 in 2017 International Conference on Computer Science and Engineering (UBMK). IEEE. Jain, Vineet, Yogesh Sharma, Ayush Bhatia, and Vaibhav Arora. 2017. “Crime Prediction Using K-Means Algorithm.” GRD Journals-Global Research and Development Journal for Engineering 2(5). Joshi, A., A. S. Sabitha, and T. Choudhury. 2017. “Crime Analysis Using K-Means Clustering.” Pp. 33–39 in 2017 3rd International Conference on Computational Intelligence and Networks (CINE). Lekha, K. Chitra, and S. Prakasam. 2017. “Data Mining Techniques in Detecting and Predicting Cyber Crimes in Banking Sector.” Pp. 1639–43 in 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS). IEEE. Madni, Hussain Ahmad, Zahid Anwar, and Munam Ali Shah. 2017. “Data Mining Techniques and Applications—A Decade Review.” Pp. 1–7 in 2017 23rd International Conference on Automation and Computing (ICAC). IEEE. Nafie Ali, Faisal Mohammed, and Abdelmoneim Ali Mohamed Hamed. 2018. “Usage Apriori and Clustering Algorithms in WEKA Tools to Mining Dataset of Traffic Accidents.” Journal of Information and Telecommunication 2(3):231–45. Saunders, Jessica, Greg Midgette, Jirka Taylor, and Sara‐Laure Faraji. 2021. “A Hidden Cost of Convenience: Disparate Impacts of a Program to Reduce Burden on Probation Officers and Participants.” Criminology & Public Policy 20(1):71–122. Shirvaikar, Vik, and Choudur Lakshminarayan. 2020. “Social Determinants of Recidivism: A Machine Learning Solution.” ArXiv Preprint ArXiv:2011.11483. Thammano, Arit, and Pannee Kesisung. 2013. “Enhancing K-Means Algorithm for Solving Classification Problems.” Pp. 1652–56 in 2013 IEEE International Conference on Mechatronics and Automation. IEEE. Ting, Ming Hwa, Chi Meng Chu, Gerald Zeng, Dongdong Li, and Grace S. Chng. 2018. “Predicting Recidivism among Youth Offenders: Augmenting Professional Judgement with Machine Learning Algorithms.” Journal of Social Work 18(6):631–49. APPENDICES Figures PAROLE PROBATION ADMINISTRATION (CARAGA REGION) K-Means Clustering Algorithm Clustered Statistical Interpretation Fig. 1. Conceptual framework of the study. Fig.2: K-Means clustering output in WEKA. T ot al num bers of Probationer s from January to June 2021 120 100 80 60 40 20 0 Male Female Fig. 3: Probationers rates in Agusan del Norte. Highest num bers of Probatione r s from January to June 2021 120 100 80 60 40 20 0 Butuan Cabadbaran Male Nasipit Female Fig.4: Probationers rates in Nasipit, Cabadbaran, and Butuan from Cluster 1. Low est Probationer s from January to June 2021 30 25 20 15 Tables 10 5 0 PROBATIONERS from Agusan del Norte Male Female Fig.5: Probationers rates in Different Municipalities from Cluster 2. Male Female Municipality 60 4 Butuan 97 9 Cabadbaran 14 0 Carmen 21 2 Jabonga 21 3 Kitcharao 18 2 Las Nieves 28 1 Magallanes 65 5 Nasipit 13 0 RTR 22 3 San Vicente 21 3 Tubay PROBATIONERS From Agusan del Norte Male Female Municipality Cluster 60 4 Butuan 1 97 9 Cabadbaran 1 14 0 Carmen 0 21 2 Jabonga 0 21 3 Kitcharao 0 18 2 Las Nieves 0 28 1 Magallanes 0 65 5 Nasipit 1 13 0 RTR 0 22 3 San Vicente 0 21 3 Tubay 0 Table II: Clustering Result in WEKA. Gantt Chart Financial Requirement in implementing the Research Estimated Amount Rehabilation Programs Livelihood Programs Trainings and Seminars Estimated Total Amount 3,000,000.00 3,000,000.00 1,000,000.00 7,000,000.00