Uploaded by Meriam Enaldo

Research-Final (Meriam Enaldo)

advertisement
PROBATIONERS CLASSIFICATION IN AGUSAN DEL NORTE USING K-MEANS
CLUSTERING ALGORITHM
A research study
Presented to the DIT Faculty of the Graduate School
Technological Institute of the Philippines
Quezon City
In Partial Fulfillment
of the Requirements for the Degree
Doctor in Information Technology
MERIAM A. ENALDO
June 2021
ABSTRACT
An increasing number of individuals serving their penalty inside and outside
prisons and jails led to the adaptation of modern technology and computer-aided
diagnosis system, tools, methods and techniques in analyzing and understanding the
data gathered in relation to rehabilitation and reformation of offenders. This paper
clusters the municipalities of Agusan del Norte using K-Means algorithm. This is
essential in finding the similar characteristics, patterns and value of the number of
probationers serving their sentence within the said municipalities from January to
June 2021. Results showed Cabadbaran City to have the highest number of
Probationers while RTR has the lowest recorded numbers of Probationers. The
result also showed that Male probationers has the highest number in all the
municipalities of Agusan del Norte compared to female probationers. Future
researchers may use other data mining techniques that can be most instrumental in
their studies related to this paper.
CHAPTER 1
INTRODUCTION
Probation is a program under the Correction Pillar of the Philippine Criminal
Justice System that was formulated for the seek of decongesting the prisons and
jails in the country at the same time to rehabilitate offenders without putting them
behind bars. A lot of studies were conducted in order to understand the nature and
origin of crime but only few had been done in trying to understand the lives of those
individuals during and after serving the years of punishment as a result of committing
the said crimes. That is why further research and studies must be done in order to
formulate a more effective solution to address these long time problem in the society.
Data Mining is one of the most effective method of analyzing different data
and information in all sectors of the society to come up with the best alternative
solution of the issues being presented. Through data gathering with the use of
different techniques in data mining, identifying the most affected sector and finding
the best solution is attainable and achievable.
The ulitimate goal of this paper is to identify the municipalities with most
number of probationers in Agusan del Norte in order to formulate effective programs
that can help the government and the involved individuals to pinpoint of which area
do they should address first in order to attain their desired result in the Probation
Program. Other aim of this paper is to identify what gender has the most number of
being a probationer in order to relate the rehabilitation and reformation programs
based on their needs and circumstances.
CHAPTER 2
REVIEW OF RELATED LITERATURE
Data mining is known as Knowledge Discovery in Database (KDD). It is also
defined as the process which includes extracting the interesting, interpretable and
useful information from the raw data. This is the main reason the applications of data
mining are increasing rapidly (Madni, Anwar, and Shah 2017).
K-means clustering algorithm will help finding identical traits, patterns and
values in categorizing municipalities with much, more and most number of recorded
index and non-index crimes from Surigao City area. (Delima 2019).
The implementation of K-means algorithm in analyzing a crime is essential for
providing safety and security to the civilian population. Using this data mining
technique, we can discover critical information which can help local authorities detect
crime and areas of importance. This paper is to analyze the crime which entails theft,
homicide and various drug offenses which also include suspicious activities, noise
complaints and burglar alarm by using qualitative and qualitative and quantitative
approach. Using K-means clustering data mining approach on a crime dataset from
New South Wales region of Australia, crime rates of each type of crimes and cities
with high crime rates have been found (Joshi, Sabitha, and Choudhury 2017).
Furthermore, implemented data mining techniques to understand certain
trends and pattern of terrorist attacks in India. K-means clustering was used to
determine the year wherein the terrorist groups were most active and also which
terrorist group has affected the most. The experimental result is implemented in
Rapidminer tool to determine the active group and the affected year (Gupta et al.
2018).
This paper, Data mining applications are utilized in many banking sectors for
client segmentation and productivity, credit scores and authorization, predicting
payment default, advertising, detecting fake transactions, etc. This paper presents a
general idea about the model of Data Mining techniques and diverse cyber crimes in
banking applications.It also provides an inclusive survey of competent and valuable
techniques on data mining for cyber crime data analysis. The objective of cyber
crime data mining is to recognize patterns in criminal manners in order to predict
crime anticipate criminal activity and prevent it.This paperimplements a novel data
mining techniques like K-Means, Influenced Association Classifier and J48
Prediction tree for investigating the cyber crime data sets and sorts out the
accessible problems. The K-Means algorithm is being utilized for unsupervised
learning cluster within influencedAssociation Classification. K-means selects the
initial centroidsso that the classifier can mine the record and formulate predictions of
cyber crimes with J48 algorithm. The collective knowledge of K-Means, Influenced
Association Classifier and J48 Prediction tree tends certainly to afford a enhanced,
incorporated, and precise result over the cyber crime prediction in the banking
sectorsOur law enforcement organizations require to be adequately outfitted to
defeat and prevent the cyber crime (Lekha and Prakasam 2017).
In addition, Criminal justice practitioners increasingly seek out efficient means
of community supervision supplanting face-to-face interactions with practices that
are less onerous to administrators and clients. We examined the differential impact
of remote supervision for low-risk probationers by race. Remote reporting greatly
reduces or eliminates in-person meetings where race would be salient; however, it
also creates conditions where an officer may rely more heavily on heuristics. We
found the program drastically reduced violations, but also exacerbated the racial
discrepancy in reporting high discretion violations (Saunders et al. 2021).
Moreover, Offender rehabilitation seeks to minimise recidivism. Using their
experi-ence and actuarial-type risk assessment tools, probation officers in Singapore
makerecommendations on the sentencing outcomes so as to achieve this
objective.However, it is difficult for them to maximise the utility of the large amounts
of datacollected, which could be resolved by using predictive modelling informed by
statistical learning methods. Ministry of Social and Family Development for
rehabilitation were used to create arandom forests model to predict recidivism. This
article identifies how analysis of administrative data at the discretelevel using
statistical learning methods is more accurate in predicting recidivism thanusing
conventional statistical methods. This provides an opportunity to direct interven-tion
efforts at individuals who are more likely to reoffend (Ting et al. 2018).
Also, Probation agency performance, probationer outcomes, and publicsafety
all depend on the successful implementation of evidence-based practices (EBPs).
Yet, EBP implementation is short-lived withincommunity corrections agencies. The
current study focused on inter-actions between 834 probation officers and their
agencies (six proba-tion jurisdictions) by examining alignment between the use of
client-centered communication strategies, perceived agency support, andagency
climate. Results showed a significant, negative linear relation-ship between
probation officer-agency alignment with regard toEBPs and agency context.
Quadratic regression analyses were usedto model the level of the outcome
(satisfaction with climate). Takentogether,findings suggest agency climate is: (1)
most at risk whenofficers are more comfortable with use of client-centered communication than they feel the agency can support, and (2) not influencedby officers
uncomfortable with use client-centered communicationwho perceive the agency
supports their use. Failure to recognizethese officer differences can complicate
effective implementation ofEBPs in community supervision agencies. New avenues
for imple-mentation research are discussed (Blasko et al. 2019).
Besides, In criminal justice analytics, the widely-studied problem of recidivism
pre-diction (forecasting re-offenses after release or parole) is fraught with ethi-cal
missteps. In particular, Machine Learning (ML) models rely on historicalpatterns of
behavior to predict future outcomes, engendering a vicious feed-back loop of
recidivism and incarceration. This paper repurposes ML to in-stead identify social
factors that can serve as levers to prevent recidivism. Ourcontributions are along
three dimensions. (1) Recidivism models typically ag-glomerate individuals into one
dataset, but we invokeunsupervisedlearningto extract homogeneous subgroups with
similar features. (2) We then applysubgroup-levelsupervisedlearning to determine
factors correlated to recidi-vism. (3) We therefore shift the focus frompredicting
which individuals willre-offendtoidentifying broader underlying factors that explain
recidivism,
withthe goal of
informing preventative policy intervention. We
demonstrate thatthis approach can guide the ethical application of ML using realworld data (Shirvaikar and Lakshminarayan 2020).
Further, Crime analysis and prevention is a systematic approach for
identifying and analyzing patterns and trends in crime. Our systemcan predict
regions which have high probability for crime occurrence and can visualize crime
prone areas. With the increasing advent of computerized systems, crime data
analysts can help the Law enforcement officers to speed up the process of solving
crimes.About 10% of the criminals commit about 50% of the crimes. Even though we
cannot predict who all may be the victims of crime but can predict the place that has
probability for its occurrence. K-means algorithm is done by partitioning data into
groups based on their means. K-means algorithm has an extension called
expectation -maximization algorithm where we partition the data based on their
parameters. This easy to implement data mining framework works with the
geospatial plot of crime and helps to improve the productivity of the detectives and
other law enforcement officers. This system can also be used for the Indian crime
departments for reducing the crime and solving the crimes with less time (Jain et al.
2017).
Conceptual Framework
The study anchored on the concept of (Delima 2019) but differs in many
ways. Although K-Means algorithm was also used, it was utilized to cluster within the
city and municipalities in the province of Agusan del Norte to identify areas with
much, and more number of probationers recorded.
PAROLE PROBATION
ADMINISTRATION
(CARAGA REGION)
K-Means Clustering
Algorithm
Clustered
Statistical
Interpretation
Fig. 1. Conceptual framework of the study.
CHAPTER 3
METHODS AND PROPOSED SOLUTION
This paper apply the K-Means Clustering Algorithms to clusters the municipalities
of Agusan del Norte. This is essential in finding the similar characteristics, patterns
and value of the number of probationers serving their sentence within the said
municipalities from January to June 2021.
Clustering is an unsupervised data analyzing technique used to divide the same
data into the same group and the different data into the other group (Ishaq and
Buyukkaya 2017). Probationers datasets of the city and municipalites from the
province of Agusan del Norte for the month of January 2021 to June 2021 were used
as presented in the Table I in the next chapter.
CHAPTER 4
SIMULATION AND RESULTS
PROBATIONERS from Agusan del Norte
Male
Female
Municipality
60
4
Butuan
97
9
Cabadbaran
14
0
Carmen
21
2
Jabonga
21
3
Kitcharao
18
2
Las Nieves
28
1
Magallanes
65
5
Nasipit
13
0
RTR
22
3
San Vicente
21
3
Tubay
Table I: Probationers Dataset per City and Municipality in the Province of Agusan del Norte.
In K-means algorithm, the user specifies the k centroids. This K centroids
refers to the number of the wanted clusters. Each cluster must have a centroid that is
a mean of a cluster. Then a nearest centroid is assigned to each data record. When
all input data records have been assigned, the centroid changes in each cluster and
is updated by calculating the mean cluster. These processes will be repeated until
the latest centroids do not change (Thammano and Kesisung 2013).
The experimental result for clustering was implemented using WEKA (Nafie
Ali and Mohamed Hamed 2018). Fig. 2 shows the result which can be interpreted as
0 for the lowest number of probationers and 1 for the highest number of probationers
among the municipality of Agusan del Norte.
Fig.2: K-Means clustering output in WEKA.
Cabadbaran City, Butuan City, and Nasipit, Agusan del Norte were
categorized under Cluster 1 showing the highest number of probationers while the
rest were categorized under Cluster 0 with the lowest number of probationers from
January to June 2021.
PROBATIONERS From Agusan del Norte
Male
Female
Municipality
Cluster
60
4
Butuan
1
97
9
Cabadbaran
1
14
0
Carmen
0
21
2
Jabonga
0
21
3
Kitcharao
0
18
2
Las Nieves
0
28
1
Magallanes
0
65
5
Nasipit
1
13
0
RTR
0
22
3
San Vicente
0
21
3
Tubay
0
Table II: Clustering Result in WEKA.
T ot al numbers of Probationer s from January to
June 2021
120
100
80
60
40
20
0
Male
Female
Fig. 3: Probationers rates in Agusan del Norte.
Figure 3 showed that among the Municipalities and Cities of Agusan del
Norte, Cabadbaran City had the highest number of probationers mostly on Male
probationers. On the other hand, the result also showed Remedios T. Romualdez
(RTR) garnered the lowest number of probationers with no female probationer.
Highest numbers of Probatione r s from January
to June 2021
120
100
80
60
40
20
0
Butuan
Cabadbaran
Male
Nasipit
Female
Fig.4: Probationers rates in Nasipit, Cabadbaran, and Butuan from Cluster 1.
Figure 4 showed municipalities under Cluster 1 showing the highest number
of probationers from both gender. Cabadbaran City had the highest number with 97
male probationers and 9 female probationers. Next is Nasipit, Agusan del Norte with
65 male probationers and 5 female probationers. And lastly, Butuan City with 60
male probationers and 4 female probationers.
Low est Probat ioner s from January to June 2021
30
25
20
15
10
5
0
Male
Female
Fig.5: Probationers rates in Different Municipalities from Cluster 0.
Figure 5 showed the Probationers rates in different Municipalities from Cluster
0 showing the lowest number of probationers in Agusan del Norte. Remedios T.
Romualdez (RTR) garnered the lowest number of probationers with 13 male
probationers and no female probationer.
CHAPTER 5
SUMMARY, CONCLUSION AND RECOMMENDATIONS
Summary
This research study is entitled “PROBATIONERS CLASSIFICATION IN
AGUSAN DEL NORTE USING K-MEANS CLUSTERING ALGORITHM” that was
conducted in the municipalities of Agusan del Norte.
This paper clusters the municipalities of Agusan del Norte using K-Means
Clustering Algorithm. This is essential in finding the similar characteristics, patterns
and value of the number of probationers serving their sentence within the said
municipalities from January to June 2021. Results showed Cabadbaran City to have
the highest number of Probationers while RTR has the lowest recorded numbers of
Probationers. The result also showed that Male probationers has the highest number
in all the municipalities of Agusan del Norte compared to female probationers.
Conclusion
The use of K-Means clustering algorithm made the categorization of the
groupings of municipality with identical traits and values attainable. In cluster 1,
Cabadbaran City topped the highest number of probationers from January to June
2021, it was followed by the Municipality of Nasipit and Butuan City. While at cluster
0, Remedios T. Romualdez (RTR) showed the lowest number of probationers along
with the other identified municipalities in Agusan del Norte with low number of
probationers.
Among the number of probationers, Male was shown to have the most
number in all the municipalities and cities of Agusan del Norte. While female
probationers has smaller number compared to male probationers. Remedios T.
Romualdez (RTR) and Carmen, Agusan del Norte has no recorded female
probationer from January to June 2021.
Recommendation
Probationers were individuals who committed offenses for the first time or
were adjudicated guilty for the first time with transgressions of the law that were not
considered heinous crimes under the law, therefore it is best for these humble
researcher to recommend that the government must address this issue as fast as
possible to avoid this individuals from committing another crimes and be a recidivist.
The following are the programs recommended to might have impact the life of
probationers even after serving their sentence:
1. Rehabilitation and Reformation Programs through education and skills
related activities.
2. Livelihood Programs. Being an offender means smaller chance of having a
decent job no matter how professional a person can be. Having a livelihood program
that can be used in their daily lives can be a huge help to save probationers from
committing crimes again.
3. Information drives for the community to understand the lives of the
probationers for them to learn to accept these people back to the community without
judgment and scrutiny.
REFERENCES
Blasko, Brandy L., Jill Viglione, Heather Toronjo, and Faye S. Taxman. 2019.
“Probation Officer–Probation Agency Fit: Understanding Disparities in the Use of
Motivational Interviewing Techniques.” Corrections 4(1):39–57.
Delima, Allemar Jhone P. 2019. “Applying Data Mining Techniques in Predicting
Index and Non-Index Crimes.” International Journal of Machine Learning and
Computing 9(4):533–38.
Gupta, Pranjal, A. Sai Sabitha, Tanupriya Choudhury, and Abhay Bansal. 2018.
“Terrorist Attacks Analysis Using Clustering Algorithm.” Pp. 317–28 in Smart
Computing and Informatics. Springer.
Ishaq, Waqar, and Eliya Buyukkaya. 2017. “Dark Patches in Clustering.” Pp. 806–11
in 2017 International Conference on Computer Science and Engineering
(UBMK). IEEE.
Jain, Vineet, Yogesh Sharma, Ayush Bhatia, and Vaibhav Arora. 2017. “Crime
Prediction Using K-Means Algorithm.” GRD Journals-Global Research and
Development Journal for Engineering 2(5).
Joshi, A., A. S. Sabitha, and T. Choudhury. 2017. “Crime Analysis Using K-Means
Clustering.” Pp. 33–39 in 2017 3rd International Conference on Computational
Intelligence and Networks (CINE).
Lekha, K. Chitra, and S. Prakasam. 2017. “Data Mining Techniques in Detecting and
Predicting Cyber Crimes in Banking Sector.” Pp. 1639–43 in 2017 International
Conference on Energy, Communication, Data Analytics and Soft Computing
(ICECDS). IEEE.
Madni, Hussain Ahmad, Zahid Anwar, and Munam Ali Shah. 2017. “Data Mining
Techniques and Applications—A Decade Review.” Pp. 1–7 in 2017 23rd
International Conference on Automation and Computing (ICAC). IEEE.
Nafie Ali, Faisal Mohammed, and Abdelmoneim Ali Mohamed Hamed. 2018. “Usage
Apriori and Clustering Algorithms in WEKA Tools to Mining Dataset of Traffic
Accidents.” Journal of Information and Telecommunication 2(3):231–45.
Saunders, Jessica, Greg Midgette, Jirka Taylor, and Sara‐Laure Faraji. 2021. “A
Hidden Cost of Convenience: Disparate Impacts of a Program to Reduce
Burden on Probation Officers and Participants.” Criminology & Public Policy
20(1):71–122.
Shirvaikar, Vik, and Choudur Lakshminarayan. 2020. “Social Determinants of
Recidivism: A Machine Learning Solution.” ArXiv Preprint ArXiv:2011.11483.
Thammano, Arit, and Pannee Kesisung. 2013. “Enhancing K-Means Algorithm for
Solving Classification Problems.” Pp. 1652–56 in 2013 IEEE International
Conference on Mechatronics and Automation. IEEE.
Ting, Ming Hwa, Chi Meng Chu, Gerald Zeng, Dongdong Li, and Grace S. Chng.
2018. “Predicting Recidivism among Youth Offenders: Augmenting Professional
Judgement with Machine Learning Algorithms.” Journal of Social Work
18(6):631–49.
APPENDICES
Figures
PAROLE PROBATION
ADMINISTRATION
(CARAGA REGION)
K-Means Clustering
Algorithm
Clustered
Statistical
Interpretation
Fig. 1. Conceptual framework of the study.
Fig.2: K-Means clustering output in WEKA.
T ot al num bers of Probationer s from January to
June 2021
120
100
80
60
40
20
0
Male
Female
Fig. 3: Probationers rates in Agusan del Norte.
Highest num bers of Probatione r s from January
to June 2021
120
100
80
60
40
20
0
Butuan
Cabadbaran
Male
Nasipit
Female
Fig.4: Probationers rates in Nasipit, Cabadbaran, and Butuan from Cluster 1.
Low est Probationer s from January to June 2021
30
25
20
15
Tables
10
5
0
PROBATIONERS from Agusan del Norte
Male
Female
Fig.5: Probationers rates in Different Municipalities from Cluster 2.
Male
Female
Municipality
60
4
Butuan
97
9
Cabadbaran
14
0
Carmen
21
2
Jabonga
21
3
Kitcharao
18
2
Las Nieves
28
1
Magallanes
65
5
Nasipit
13
0
RTR
22
3
San Vicente
21
3
Tubay
PROBATIONERS From Agusan del Norte
Male
Female
Municipality
Cluster
60
4
Butuan
1
97
9
Cabadbaran
1
14
0
Carmen
0
21
2
Jabonga
0
21
3
Kitcharao
0
18
2
Las Nieves
0
28
1
Magallanes
0
65
5
Nasipit
1
13
0
RTR
0
22
3
San Vicente
0
21
3
Tubay
0
Table II: Clustering Result in WEKA.
Gantt Chart
Financial Requirement in implementing the Research
Estimated Amount
Rehabilation
Programs
Livelihood Programs
Trainings and
Seminars
Estimated Total
Amount
3,000,000.00
3,000,000.00
1,000,000.00
7,000,000.00
Download