Uploaded by ASIF KHAN SHAKIR

UpdatesPVA (2)

advertisement
Executive Summary: Business analysis is a process of implementing statistical and logical
techniques to achieve business needs and determine the solutions to a specific problem. This
study ureses the BIGPVA dataset for the non-profit organization Paralyzed Veterans of America
(PVA). This analysis discovers all factors insights that may influence donations in the target
promotion. The analyzed dataset of BIGPVA in SAS has contains data on customers’ demographic
information, past donation behavior, and response to a current promotion event. This analysis
used the regression models to identify the internal dependencies between the customer's pieces
of information’s. Regression models are works for two types of variables dependent and
independent. In this analysis, two variables are mainly dependent, and the others variables are
independent. Regression has been done using variables because the target of this study is to
identify the factors that may influence donations. Besides, a correlation has been used to detect
the relationship among the variables which used in the regression. According to the regression
and the correlation, this report finalized the factors in PVA that can influence donations in the
target promotion.
Introduction:
The purpose of business analysis is to assist an organization to do value
improvements, organizational change, or strategic planning and policy development. Analysis
helps the high authority to make necessary decisions to improve the business according to
market demand. Businesses can grow up through satisfied customers because the customers are
a valuable segment in business. So, customers data analysis is an important part of business
analysis. This analysis is all about customers information analysis for an organization. Paralyzed
Veterans of America (PVA) is a not-for-profit organization that provides programs and services
for US veterans with spinal cord injuries or disease. This organization was made by the veteran's
service members who came from World War II with spinal cord injuries. Because they wanted to
live with independence, dignity, and as contributors to society.
This organization wants to get insights into all factors that may influence donations to the target
promotion. This study used the BIGPVA dataset from SAS Visual Analytics. This dataset contains
data on customers' demographic information, past donation behavior, and response to a current
promotion event. This organization has more than 13 million donors approximately who donate
continuously. This analysis used the regression and the correlation process to make the decisions.
Regression analysis is a statistical process that can estimate the relationship or dependency
between a dependent variable and one or more independent variables. The most common
regression model is linear regression which can find a line that mostly fits the data according to
the equation. This study uses two dependent variables, and the rest of the variables in the dataset
are independent. These dependent variables perform in regression with most of the other
independent variables separately. Another relationship identification process is a correction that
has been used in this analysis to measure the relations. Finally, they combine the relationships
and describe the factors that are related to donations.
Background Studies:
Methodology: Analysis of organizational data means searching a pattern or solution for a
specific area. Business analysis is performed at the current position of an organization and makes
the decision as required. This kind of analysis involves determining how a business operates, its
customers, key stakeholders, marketing strategies, and production techniques. All the segment
of a business is valuable parts of business analytics. So, the customer segment of all businesses
is more important than others. This business analysis worked with customer demographic
information and donation behavior etc.
The Paralyzed Veterans of America(PVA) is an American donation organization for spinal cord
injured service members. This organization has a larger dataset named BIGPVA in SAS Visual
Analytics. The dataset has categorical data and measures data. Analysis has been completed
excluding two categorical data Control Number and Demographic Cluster variables. 'Target Gift
Amount' represents the donation value another dependent variable is 'Target Gift Flag'
represents the donation was made or not. These two variables are dependent on others variables
in BIGPVA. The linear regression model was applied to understand the relations between the
variables. Regression is a statistical method used to estimate the dependencies between
variables. This analysis used more than one variable as independent for both dependent variables
'Target Gift Amount' and 'Target Gift Flag'. Linear regression is worked with continuous variables
and logistic regression worked with categorical variables.
Fig 1: Linear regression model sample
Linear regression analysis takes the variables x as an independent variable and y as a function or
dependent variable.
𝑌 = 𝑚𝑥 + 𝑐---------------------------------------------------------(1)
Where m represents the slope of the line and c intercept. Initially, the prediction process takes
the value of m and c is 0.
Fig 2: Logistic regression sample structure
Logistic regression works for categorical data. First, it draws the line as linear regression. After
that, it used the sigmoid function to map predictions to probabilities. It works the number
between 0 and 1 as shown in fig 1.
1
𝑆(𝑥) = 1+𝑒 −𝑥 ----------------------------------------------------(2)
This sigmoid function helps the logistic function to predict the fraction number from 0 to 1 and
give it the closest one 0 or 1.
In this PVA analysis, two measures variables have been performed as the dependent variables.
The regression model performs on the measures variables ‘Target Gift Amount’ and ‘Target Gift
Flag’ to predict. So, this study uses Linear Regression to predict the output of donation value and
the donation-making decision. This analysis has done the linear regression which carried more
than one variable as independent variables and correlation analysis to examine the relations.
Almost all the BIGPVA variables except Control Number and Demographic Cluster variables have
been performed on this analysis.
Result Analysis: Data analysis is defined as the process of data processing, data cleaning,
data transforming, and data modeling. The analysis mainly discovers the possible path for critical
business decisions [15]. It helps the high authority to easily understand the actual data behavior.
Sometimes data analysis needs Machine Learning models for creating a model in the final steps
of the process. There have many machine learning models are used in data analysis. Among them,
Regression is very common and useful for decision-making problems. The regression model can
easily predict an output according to the train data. Because it’s a relation between the two
variables, where one is dependent and another one is independent. This analysis use the multiple
regression model which can predict a dependent variable using more than one independent
variables.
This study works with the BIGPVA dataset for Paralyzed Veterans of America (PVA) company [16].
This dataset contains demographic and donation information about the customers. There has a
donation amount for the different types of age and gender customers. ‘Target Gift Amount’
column shows the value of the donation amount and the ‘Target Gift Flag’ column shows the
donation has been made or not. This regression model shows the factors that are influence
customers to make donations and fulfill the target promotion.
Fig 2: Frequency of Target Gift Amount.
The donation amount of maximum customers is around 20-30 dollars. This diagram shows the
number of customers for different amounts of donations. This variable helps PVA to understand
the amount of donation value.
Fig 3: Target Gift Flag frequency of donation and non-donation customers.
The Target Gift Flag indicates the donation has been made by '1' or not by '0'. This figure
represents two bars where the first represents non-donation customers and the second is
donation customers. The number of donated customers and non-donated are approximately
equal.
There have customer categories according to the 'Status Category 96NK' variables. It helps to get
the basic idea of a customer donation status like golden, silver, and others. Golden means the
customers who donate regularly a high amount. And silver means lower than golden but a
handsome total amount they donate.
Fig 4: Status Category shows the different customers classification.
A bar contains the highest number of customers and then s bar contains the second-highest and
so on. That means there have many customers who are golden and silver and so on.
First regression analysis takes the Target Gift Amount variable as a dependent variable and
almost all variables of the BIGPVA dataset as an independent variable.
Fig 5: Linear regression analysis for target gift amount variable
This graph shows the linear regression result of the target gift amount with all measures variables
and some categorical variables. Because the dependent variable depends on the independent
variables. The regression clearly shows the dependencies between the variables. The third
section of figure 5 shows the regression graph where the predicated average contains the blue
line and the observed average contains the yellow line. Here two lines are in the same parallel
means the best model. This regression model predicts the best result since it has 0 mean squared
error. The error zero means that this regression model can perform well means the prediction
was almost near to actual output.Now the other dependent variable Target Gift Flag variable on
the linear regression process with the other variables as indepandent variable.
Here, the second linear regression model was performed for the variable of Target Gift Flag.
Which also takes almost all variables as independent from the BIGPVA dataset. It avoids the two
mentioned categorical variables as the PVA requirements.
Fig 6: Linear Regression for Target Gift Flag where shows the donation made or not.
This linear regression process used most of the variables focused on the factors that can influence
the value of the donation. This graph helps the PVA to understand the donation was made or
not. In the third section in the above figure, the linear line is represented by the blue line named
as predicted average, and the yellow line represents the observed average. Here the two lines
are aaaparallel means the best model prediction because the Mean Squared Error is 0.
Fig 7: Correlations between the two targeted variables and others variables.
Here are the relations between the variables that represent the weak and strong relationship.
Mainly the value of the donation and the decision to donate is most important to reach the target
promotion. So, this analysis clearly shows the relations between the variables to justify the
donation value and donation making depending on which factors in that PVA data. The above
figure clearly shows that the variable Target Gift Amount strong relation with the variables ‘Gift
Amount Average 36 Months’, ‘Gift Amount Average All Months’, ‘Gift Amount Average Card 36
Months’, ‘Gift Amount Last’, and ‘Target Gift Amount with Zero’. These are the key factors that
relate to making donation value. On the other hand, Target Gift Flag has strong relationships with
only the ‘Target Gift Amount with Zero’ variable and other relations are weak. So, this study can
say that those variables or factors are the factors of PVA which has strong relationships with the
two-target variable.
Conclusion: Data mining is a process utilized by organizations to transform crude information
into valuable decision-making data of the association named PVA. This analysis works with PVA
data where make donations for the spinal cord injured veterans in America. PVA wants to get the
insight knowledge for their information because they want to influence the donation value for
target promotion. So, this analysis shows the dependencies between the variables where
donation value depends on the other variables. This analysis used Linear Regression to predict
the donation value and predict the donation was made or not. Those regressions clearly show
the prediction is good and use almost all BIGPVA data to predict. Then this study shows the
correlation to identify the relations between the variables. Because the PVA wants to know the
factors that may influence the donation value. Finally, PVA needs to focus on the strong
relationships in correlation analysis to influence donation value for target promotion.
Download