Document 16089180

advertisement
CALIFORNIA SCHOOL BOARDS ASSOCIATION BUSINESS INTELLIGENCE PROJECT
Muhammed Ali-Can Davutoglu
B.S. , Kocaeli University, Turkey, 2004
PROJECT
Submitted in partial satisfaction of
the requirements for the degree of
MASTER OF SCIENCE
in
BUSINESS ADMINISTRATION
(Management Information Systems)
at
CALIFORNIA STATE UNIVERSITY, SACRAMENTO
SUMMER
2010
CALIFORNIA SCHOOL BOARDS ASSOCIATION BUSINESS INTELLIGENCE PROJECT
A Project
by
Muhammed Ali-Can Davutoglu
Approved by:
__________________________________, Committee Chair
Beom-Jin Choi, Ph.D.
____________________________
Date
ii
Student: Muhammed Ali-Can Davutoglu
I certify that this student has met the requirements for format contained in the University format
manual, and that this project is suitable for shelving in the Library and credit is to be awarded for
the Project.
__________________________, Graduate Coordinator
Monica Lam, Phd.
Department of Business Administration
iii
________________
Date
Abstract
of
CALIFORNIA SCHOOL BOARDS ASSOCIATION BUSINESS INTELLIGENCE PROJECT
by
Muhammed Ali-Can Davutoglu
Statement of Problem
California School Boards Association (CSBA) requires a Business Intelligence System
that can process the data provided by California Department of Education. Due to limited
resources, CSBA cannot work with advanced data processing software organizations such as SAP
or Oracle. The project’s goal is to structure a Business Intelligence foundation with Microsoft
SQL technology.
Sources of Data
CSBA information technology department was the source of data. They have provided
necessary technical environment and information for this project. The physical data that this
project utilized is available at California Department of Education’s website.
Conclusions Reached
The needs of the organization always have the priority. A strong understanding of the
business process will improve the effectiveness of the Business intelligence foundation. The
master data in this project was complex, random and unorganized. The results of the project are
applicable to CSBA. According to CSBA, this project reached its objective.
_______________________, Committee Chair
Beom-Jin Choi, Ph.D.
Department of Business Administration
_______________________
Date
iv
TABLE OF CONTENTS
Page
List of Tables .................................................................................................................................. vi
List of Figures ................................................................................................................................ vii
Chapter
1. INTRODUCTION ...................................................................................................................... 1
Master Data ......................................................................................................................... 1
Organization Needs............................................................................................................. 1
Technology ......................................................................................................................... 3
2. THE PROJECT ........................................................................................................................... 4
Online Analytical Processing (OLAP)................................................................................ 4
Online Transaction Processing (OLTP) .............................................................................. 6
Iterative and Incremental Development .............................................................................. 7
Data Normalization ............................................................................................................. 8
Details of Master Data ........................................................................................................ 9
3. CREATING A BUSINESS INTELLIGENCE STRUCTURE ................................................. 11
Filtering the Master data and Renaming the columns....................................................... 11
Altering the data type of columns ..................................................................................... 23
Creating the InfoCubes and Dimensions .......................................................................... 31
4. CONCLUDING REMARKS .................................................................................................... 40
Adventure Works .............................................................................................................. 41
SQL Server Technology ................................................................................................... 41
SQL Server Integration Services & SQL Server Analysis Services ................................. 42
Appendix........................................................................................................................................ 44
Bibliography .................................................................................................................................. 56
v
LIST OF TABLES
Page
Table 1 A Simple Hypothetical database ........................................................................................ 8
Table 2 Simple Database with three years of information ............................................................... 9
Table 3 Merge Join data after renaming columns .......................................................................... 10
Table 4 Result of Union All function at SSIS ................................................................................ 20
vi
LIST OF FIGURES
Page
Figure 1 Star Schema of AdventureWorks DW............................................................................... 5
Figure 2 OLTP vs. OLAP ................................................................................................................ 6
Figure 3 Iterative Development ....................................................................................................... 7
Figure 4 Creating a new SSIS Package .......................................................................................... 12
Figure 5 Data Flow Task at SSIS Package .................................................................................... 12
Figure 6 Adding OLE DB Source at SSIS Package ....................................................................... 13
Figure 7 Creating a new Connection Manager for SSIS ................................................................ 14
Figure 8 Creating a new connection manager for a Visual FoxPro files located in the server ...... 14
Figure 9 Create connection manager for destination source .......................................................... 15
Figure 10 Assigning the source to the Object ................................................................................ 16
Figure 11 Code Page Warning at Data Load ................................................................................. 16
Figure 12 “Always Use The Default Code Page setting” at False ................................................. 17
Figure 13 Run 64 Bit Runtime ....................................................................................................... 18
Figure 14 Option to remove any duplicate data from the master data ........................................... 19
Figure 15 Input Column names vs. Output Column names ........................................................... 19
Figure 16 Merge Join for all the API Base databases .................................................................... 21
Figure 17 Normalized data route ................................................................................................... 22
Figure 18 Result of Merging .......................................................................................................... 22
Figure 19 Derived Column Error ................................................................................................... 24
Figure 20 Data Conversation control panel ................................................................................... 25
Figure 21 Data Conversation Errors .............................................................................................. 26
Figure 22 Error Fix for Data Conversation .................................................................................... 28
Figure 23 Aggregate Tool .............................................................................................................. 29
Figure 24 Aggregate destination .................................................................................................... 29
Figure 25 The Final ETL Process .................................................................................................. 30
Figure 26 Star Schema for our Sample .......................................................................................... 31
Figure 27 Primary and Foreign keys .............................................................................................. 32
Figure 28 Analysis Service Project ................................................................................................ 33
Figure 29 Adding new data source to the ASP .............................................................................. 33
vii
Figure 30 New Data Source view .................................................................................................. 34
Figure 31 InfoCube wizard ............................................................................................................ 35
Figure 32 Default dimension and completing the InfoCube wizard .............................................. 36
Figure 33 Browsing an InfoCube ................................................................................................... 36
Figure 34 Creating a Data Mining Structure.................................................................................. 37
Figure 35 Deciding columns that are going to be used in data mining .......................................... 38
Figure 36 Final stop for creating Data Mining process ................................................................. 38
Figure 37 Data Mining Tree .......................................................................................................... 39
Figure 38 Data Mining Generic Tree viewer ................................................................................. 39
Figure 39 64-Bit runtime error ....................................................................................................... 42
Figure 40 An overview of the SSIS architecture ........................................................................... 43
viii
1
Chapter 1
INTRODUCTION
The purpose of this project is to evaluate, study, and create a Business Intelligence (BI)
foundation for California School Boards Association (CSBA). This project used master data
from California Department of Education (CDE), Structured Query Language (SQL), and Visual
Studio technology according to the needs of the CSBA.
Master Data
CSBA wants to use three different master data sources. Two of these master data are
publicly accessible through CDE website. These data are Adequate Yearly Progress (AYP)
reports and Academic Performance Index (API) reports. The third data is the Integrated Mobile
Information System (IMIS) reports. This data had restricted access due to confidential
information.
AYP is a series of annual academic performance goals established for each school, Local
Education Agency (LEA), and the state as a whole. Schools, LEAs, and the state are determined
to compare AYP, and see if they meet or exceed each year’s goals (AYP targets and criteria).
API is a single number, ranging from as low as two hundred to as high as thousand that reflects
schools, LEAs, or a subgroup’s performance level based on the results of statewide testing. Its
purpose is to measure the academic performance and growth of schools. Both these data are
crucially important to the CSBA organization. Despite the different functions, both data sets have
one common column; County/District/School (CDS) code. This column is significant element of
this project and it is the primary key.
Organization Needs
Interviews of CSBA staff were conducted for getting a better understanding of
organizations needs. The interviews conducted with the following staff members; Mr. Devin
2
Crosby, Principal Director of Information Technologies, Mr. Irbanjit Sahota, Manager at
Software Development, and Mrs. Brittany Ridley, Public Information Officer. Mr. Crosby
arranged all the interviews and was present at all of them. The first interview was with Mrs.
Ridley at September 2009. The intention of the interview with Mrs. Ridley was to find out if the
master data could be limited to certain related data sets. However, Mrs. Ridley explained the
necessity of CDE databases for CSBA and stressed the need for all information in these
databases. The interviews with Mrs. Ridley drew the borders of the master data sets. The second
interview was conducted with Mr. Sahota, again in September 2009. Mr. Sahota is one of the
responsible managers at CSBA for the database developments. He assisted drawing the
blueprints of the data warehouses. The final interview was conducted with Mr. Crosby, who
summarized the five-year goal of CSBA and explained the reasons why they needed a Business
Intelligence foundation for their organization. This final interview conducted in April 2010. All
the interviews were open-ended interview. The results from previous interview carried and
shared during the next interview in order to create an efficient balance. Mrs. Ridley provided the
information about the required data and Mr. Sahota helped the design of data warehouses in order
to decrease the difficulty of data normalization.
CSBA’s goal is to set the direction of the organization based on the trends that drive their
districts. CSBA will use the analysis of the master data for providing accurate advises to
governing school board at the district level. For example, schools have program improvement
situations and look for potential sanctions from state. CSBA needs to look at key performance
indicators and the data goes with those indicators to see what might be the causes of that specific
program improvement situation. This is just one example. CSBA also has a need for evaluating
the demographics of how Asian Pacific, Hispanic, African American or Caucasian students
perform in the same social economical area as well as how gender or class may affect the
3
education. Based on the findings, CSBA may recommend effective programs for school districts
to assist the disadvantaged groups in achieving better educational outcome. In addition, CSBA
would like to map out master data to the CENSUS data. CSBA is going to utilize this
information for analyzing the classic wealth issues versus overall score issues.
Currently the master data is scattered around hundred eight databases throughout the
California Department of Education as well as several other databases at CSBA. Majority of this
data is residing in CSBA core enterprise system with a few that are residing in external systems.
Technology
The system that can analyze the master data from previous years is not available to
CSBA in a useful format. Current practice is manual visit to CDE's website, download the data to
their SQL database and try to create some kind of report according to their end user needs. This
BI project is very important and it is part of their strategic planning process as a primary goal for
being a data driven organization. It should also be noted that due to limited budget at CSBA
acquisition of SAP, IBM COGNOS, and/ or Oracle Business Intelligence packages are not
feasible. Therefore, the best option for meeting their need is the use of Microsoft Business
Intelligence, SQL Server Management, SQL Server Integration Services (SSIS), and SQL Server
Analysis Services (SSAS). SSIS and SSAS are all Visual Studio Business Intelligence templates
with drag-n-drop structures and other custom settings. It would be a powerful tool for a small
organization.
At the beginning of this project, I did not posses any significant experience with Microsoft
Business Intelligence suit. There is valuable information in this project that may not be readily
available in other user books in the market. In the subsequent chapters, I will present this
knowledge/ information for creating a Business Intelligence foundation with SQL Server
Management and Microsoft Visual Studio.
4
Chapter 2
THE PROJECT
The reader of this study must be familiar with the basics of Business Intelligence.
Enterprise users may utilize the Business Intelligence for gathering, storing, analyzing, and
providing access to broad category of applications for making better business decisions. BI
applications include the activities of decision support systems, query and reporting, online
analytical processing (OLAP), statistical analysis, forecasting, and data mining. This project will
focus on gathering the master data and use this data to develop a foundation for Data Mining
environment.
The BI team must start with complete understanding of the organization’s business
concept and the needs of the associated users. This chapter summarizes few of these important
concepts. [11]
Online Analytical Processing (OLAP)
OLAP is a method for organizing data into multi-dimensional cubes of data. Relatively
low volumes of transactions characterize OLAP. Queries are often very complex and involve
aggregations. For OLAP systems, a response time is an effectiveness measure. OLAP
applications are widely used by Data Mining techniques. The recent best practice in OLAP
technology is using star schema to store aggregated and historical data in a multi-dimensional
schema. This project utilizes a Star Schema. However, there are other types of schemas as well.
The reason this project uses star schema would be explained best with Ralph Kimball, the creator
of modern data warehousing and business intelligence. According to an article of Mr. Tim
Martyn from Rensselaer at Harford, dated March 2004, there are two significant advantages of
Star Schema. First, a design with de-normalized tables encounters fewer join operations. Second,
5
most optimizers are smart enough to recognize a star schema and generate access plans that use
efficient "star join" operations. [7]
Figure 1 Star Schema of AdventureWorks DW
Figure 1 illustrates a star schema example from Adventure Works DW. Adventure
Works DW is an exercise data warehousing and business intelligence project, realized by
Microsoft for educational purposes. Chapter 4 will elaborate more on advantages and
disadvantages of Adventure Works DW sample data.
6
At the core of Dimension Tables, there is a Fact Table. Dimensions and Fact Tables are
two of the key elements of business intelligence. According to Ralph Kimball’s Fact Table
article from 2008, Fact tables are the foundation of the data warehouse. They contain the
fundamental measurements of the enterprise, and they are the ultimate target of most data
warehouse queries. [6] Chapter 3 will explain these important elements in details.
Online Transaction Processing (OLTP)
Online Transaction Processing is information and data transaction in the networks,
businesses and organizations. OLTP starts with data entry. Figure 2 is illustrating the difference
between OLTP and OLAP.
Figure 2 OLTP vs. OLAP
It is important to understand these two concepts before proceeding to the technical
application of the system. In this project, OLTP encompasses operation of CSBA, the location of
7
the data, the need for that data, and the required results. OLAP is the Business Intelligence
foundation that this project intends to create.
Iterative and Incremental Development
Iterative and Incremental development is a cyclic software development process. The
Input and output of this cycle are Planning Process and Deployments Process. Other elements of
this cycle are Requirements, Analysis & Design, Testing and Evaluation processes. This
development life cycle is the basis of Agile Methodology. Figure 3 illustrates the deployment
cycle for this project.
Planning
Requirements
Analysis
Design
Implementation
Testing
Deployment
Figure 3 Iterative Development
CSBA has necessary enterprise environment to deploy a BI system, however the tools
that are required to create such BI project was limited with the budge of the department.
8
Therefore, the tools could not be complex, thus the design of the foundation had to be as simple
as possible. CSBA needed every bit of data for their analysis reports. Based on conducted
interviews, a data schema of the data groups was created. These data groups included different
sets of related data columns. At the time of writing of this project, the BI environment was
residing at the buffer area and had not been deployed to the CSBA SQL system. Currently, this
project is in loop between design and testing.
Data Normalization
Data Normalization is crucial to the success of Data Warehousing and Business
Intelligence projects. Data Normalization is steps that will clean the data structure from
duplication, unwanted characters, would update, filter, sort and prepare the data for BI
environment. API and AYP databases are large and extremely unorganized. The data has been
collected between 2000 and 2009. Each database from every year has between eight and two
hundred fifty columns to process. If the naming of columns and data type stated correctly, the
amount of data would not cause any problem. This project will utilize a simple First Name, Last
Name, Social security, Address, Age, Gender, Income hypothetical database to explain the
challenge.
Social Security
First Name
Last Name
Gender
Age
Address
Income
0001
0002
0003
John
Jane
Fred
Doe
Smith
Flintstone
M
F
M
32
28
55
New York
Folsom
Rockbed
72000
80000
55000
Table 1 A Simple Hypothetical database
As illustrated by Table 1, social security column is an integer. Social security column is
fit to be the primary key, unless there are duplications in the database. There were massive
duplications in the AYP and API databases. These duplications were only visible at the SQL
Server Management. This challenge required filtering of the data.
9
Details of Master Data
The master data for this project is available to download from the website of California
Department of Education. The complete data is between years 2000 and 2009. In the appendix, a
sample of the record lay out for AYP and API 2009 Databases are included. This record lay out
did change during the last ten years and this is the latest form. During the SSIS merging process,
matching columns correctly between different years despite the different column naming is very
important.
The column naming of master data is unclear, unorganized and almost the same with
other data sets. There were no indications of the year inside individual databases. This caused a
problem during comparing the data values between different databases. This could result in
overriding the information from previous years with the recent ones. The challenge occurred
during the merging process of individual years. Since the names of the columns were identical, a
problem was inevitable. In such case, master data need to be simplified as much as possible,
similar to the previous simple table with social security, first and last name etc. If an
organization wants to merge data from previous years with no indication about the year the data
belongs, it will cause conflicts. These databases may merge. However, there will be duplications
of the data. As illustrated by Table 2, the sample database has duplicate data. In this case, it is
impossible for a primary key to work properly.
Social Security
0001
0001
0001
0002
0002
0002
First Name
John
John
John
Jane
Jane
Jane
Last Name
Doe
Doe
Doe
Smith
Smith
Smith
Gender
M
M
M
F
F
F
Age
32
33
34
28
29
30
Address
New York
New York
New Jersey
Folsom
El Dorado H
Folsom
Table 2 Simple Database with 3 years of information
Income
72000
75000
82000
80000
80000
79000
10
A reminder, CDS code acts same as Social Security that is in the sample. In some cases,
SQL Server management accepts primary key even with duplicate data. However, later on in
SSAS process, an error message will be present.
In this project, the columns’ names updated according to year. For example, Value_A
column name from API Base data year 2008 is changed to Value_A_Api_Base_08. Table 3
illustrates the final result. The merging process after renaming the columns did not produce
duplicate data.
Social Security
Address_08
Address_09
Income_08
Income_09
0001
0002
New York
El Dorado H
New Jersey
Folsom
75000
80000
82000
79000
Table 3 Merge Join data AFTER renaming columns
11
Chapter 3
CREATING A BUSINESS INTELLIGENCE STRUCTURE
This chapter will focus on challenges of creating a Business Intelligence. The main
challenges are;
a. The duplication in the master data; required filtering and sorting
b. Column names were not organized; required renaming
c. Data Types of the columns were not associated to the actual type of the data. For
example, a numeric data was given as a character data type.
This project utilizes sample portions of the data, rather than including all of the master
data. This allows a better illustration of the results. Using sample data also reduces the
processing time in Visual Studio. Merging process for the entire AYP database would take up to
fifty minutes each time. The sample was utilized from API Base 2007, 2008, and 2009 databases.
API databases are comprised of two different separate sets; Base and Growth. The API
Base data has the values from the beginning of the year. The API Growth has the values from
end of the year. CSBA needs to compare these two sets of data. These databases are
downloadable at CDE website as Excel and FoxPro formats. The following chapters will explain
the filtering, renaming and aggregating process of the master data.
Filtering the Master Data and Renaming the Columns
Visual Studio 2008 Business Intelligence program offers SQL Server Integration Services
template. It begins with creating a new template for our three sample master data sets.
12
Figure 4 Creating a new SSIS Package
Figure 4 also illustrates other SSIS packages related to this project.
500_Project_Data_Normalization.dtsx is the SSIS package used for illustrations. Adding the
Data Flow Task was the next step in the process. The function of Data flow task was to provide
the environment for data Extraction, transferring and loading.
Figure 5 Data Flow Task at SSIS Package
13
Data Warehouses is where Data flow task could be created. Figure 5 illustrates inside the
Data Flow Task. An Object Linking and Embedding Databases (OLE DB) objects is required to
create the data source. The flat file master data need to be linked to these OLE DB sources via
connection manager. Connection managers are necessary for data source. They specify the path
between Visual Studio and the physical location of source data in the server. Without connection
managers, these objects are useless.
Figure 6 Adding OLE DB Source at SSIS Package
In order to create a connection manager, right click on the “Connection Manager” space
below the main design area. By default, this area would be available to user. As illustrated by
Figure 7, open the control panel for creating a new connection via clicking on “New OLE DB
Manager”. Figures 8 and 9 illustrate the setting specifically for Visual FoxPro databases located
in the server. This connection manager would allow user to use API Data in the OLE DB
sources.
14
Figure 7 Creating a new Connection Manager for SSIS
Figure 8 Creating a new connection manager for a Visual FoxPro files located in the server
15
Figure 9 Create connection manager for destination source
The next step was to assign OLE DB source objects to their sources using the connection
manager. A double click on OLE DB object would open the control panel. In this control panel,
Connection Manager, and the desired master database source has to be selected. The project has
API databases connection manager and “Apibase_2007” for that specific object.
16
Figure 10 Assigning the source to the Object
By default, Visual Studio setting to use the default code page is off. As a result, once
assigned the source to an OLE DB object, a warning message would be presented. This warning
may be bypassed with a simple setting customization.
Figure 11 Code Page Warning at Data Load
17
The function of the code page in this application is to change the data from its status to
Unicode. The fix was located at the properties of the OLE DB data object and changing the
setting “Always Use The Default Code Page” from False to True.
Figure 12 “Always Use The Default Code Page setting” at False
Another note to users who are using a 64-Bit Operating system; make sure that version of
Visual Studio debugging setting for “Run64bitruntime” is set to “False”. This setting can be
found in the properties of the SSIS project under Debugging setting. Chapter 4 has more
information about SQL Technology.
18
Figure 13 Run 64 Bit Runtime
After customization of this setting, OLE DB Source databases are ready for sorting and
filtering. Visual Studio 2008 BI project offers a tool called “Sort”. The purpose of this tool is to
sort, rename and remove duplicates. It was a very helpful tool, yet, it does not allow massive
editing to columns. In the case of this project, renaming almost every single column from API
Base, API Growth and AYP master data was necessary. In total, there were about four thousand
columns. Figure 14 and 15 illustrate control panel of sorting.
This project sample presenting
has one hundred and fifty seven columns. There is a check mark box at the bottom of the Sort
tool, indicating to remove rows with duplicate sort values. Once checked, this tool will not allow
the duplicate data pass to the output. This tool will not remove the duplicate data in the original
flat file.
19
Figure 14 Option to remove any duplicate data from the master data
Figure 15 Input Column names vs. Output Column names
.
20
The next process is merging the clean, sorted and filtered data. There are three different
merging tools available at SSIS toolbox; Merge, Merge Join, and Union All. The necessary tool
is “Merge Join”. Two other methods did not provide the desired results.
Merge tool combines two databases without any common column. It would be best
suited for combining unrelated databases; however, it was not the tool for this project. Union All
tool is a powerful tool that could combine several databases; however, the way it merges
databases creates duplicate columns. It will add same CDS code rows instead of adding new
columns to the existing CDS code.
CDS Code
Value A 08
Value B 08
Value A 09
Value B 09
0001
0001
0002
0002
123
NULL
53
NULL
456
NULL
90
NULL
NULL
34
NULL
72
NULL
89
NULL
44
Table 4 Result of Union All function at SSIS
The best method was Merge Join the databases. This method uses a Join key. In this
case, join key was the CDS code. This tool allows the user to leave out any data column and
merge the rest, yet keeping the merging data intact using the join key. The only disadvantage is
Merge Join can only merge maximum of two databases. The process utilized for this project
was application of Merge Join on two different databases and repeating the Merge Join process
with the outcomes of other merges. Figure 16, illustrates the structure of the original Merge Join
process.
21
Figure 16 Merge Join for all the API Base databases
It needs to be noted that after the very last Merge Join there is a Union All tool. This is
just a good practice. Depending on the size of the database, each of these Merge Join processing
takes about three to five minutes. The final Merge Join took between forth five to fifty minutes to
complete the process. This process repeats every time the user clicks the OK button. The Union
All tool allows the administrator to change the column names even after merging and the user
does not need to go through all the merging process that takes time. The sample Merge Join took
about five minutes to complete. The resulting data has been normalized, filtered, and ready for
aggregation.
22
Figure 17 Normalized data route
Figure 18 Result of Merging
23
Altering the data type of columns
At this point, the data should be ready for altering. The data type is important. Data type
will be crucial during the Analysis Service. SSAS did not allow me to create information cubes
(InfoCube) without a numeric data type; such as integer. The master data included significant
amount of numeric information such as target base, growth base, number of students from
different ethnicities, etc. All these information were in VarChar format; this data type is all
character. Information Cubes in SSAS package needs numeric data types to create measurements,
calculate statistical formulas and process decision trees. This numeric data is called
“Measurements of the cube”. If there were no numeric data, there would be nothing to compare.
Ms. Brigette Bucke, Database Director of Information Resources and Technology
Department (IRT) Sacramento State University was very helpful explaining the structure of
InfoCube. She underlined that data warehousing is the concept and data mining is how that
concept is used. She stated the data is actually stored inside the InfoCube under the security
layers. In hence, extracting, transforming and loading the data correctly is very important to the
structure of an InfoCube.
There are two tools to change the data type of a column. These tools are “Derived
Column” and “Data Conversation”. Derived Column tool updates column values with
expression. Although, this tool was very effective for creating new individual columns, it was
unable to alter the data type of present columns. Any attempt for altering the destination database
table with correct data type failed with a pipeline error indicating that there was a data type
mismatch between the sources of destination. There were no resources indicating the reason for
this error.
24
Figure 19 Derived Column Error
The second tool is Data Conversation. The purpose of this tool is to create a new copy of
a column with a new data type. Although, it is not the most efficient way, it did serve to solve the
problem. Depending on the amount of data type, this tool may require a lot of manual
modification work. Figure 20 illustrates the control panel of Data Conversation tool. Enrollment
Percentages from 2nd grade was applied until 11th grade for years 2007, 2008 and 2009. Data
Conversation tool automatically created a copy of the chosen column with the default data type in
the source of the chosen data. Drop down box enables data type change of the object.
25
Figure 20 Data Conversation control panel
When running the SSIS package, this tool will would indicate an error in order to protect
the original data. SSIS user must bypass the safety options in order for this tool to work. Figure
21indicates the safety error.
26
[Data Conversion [13832]] Error: Data conversion failed while converting column
"pen_2_api_base_08" (5694) to column "Copy of pen_2_api_base_08" (15217). The conversion
returned status value 2 and status text "The value could not be converted because of a potential loss
of data.".
[Data Conversion [13832]] Error: SSIS Error Code
DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "output column "Copy of
pen_2_api_base_08" (15217)" failed because error code 0xC020907F occurred, and the error row
disposition on "output column "Copy of pen_2_api_base_08" (15217)" specifies failure on error.
An error occurred on the specified object of the specified component. There may be error
messages posted before this with more information about the failure.
[SSIS.Pipeline] Error: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput
method on component "Data Conversion" (13832) failed with error code 0xC0209029 while
processing input "Data Conversion Input" (13833). The identified component returned an error
from the ProcessInput method. The error is specific to the component, but the error is fatal and will
cause the Data Flow task to stop running. There may be error messages posted before this with
more information about the failure.
[SSIS.Pipeline] Error: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput
method on component "Merge Join 2007 & 2008 & 2009" (4335) failed with error code
0xC0047020 while processing input "Merge Join Left Input" (4336). The identified component
returned an error from the ProcessInput method. The error is specific to the component, but the
error is fatal and will cause the Data Flow task to stop running. There may be error messages
posted before this with more information about the failure.
Figure 21 Data Conversation Errors
Analyzing these errors took a lot of time. Even though, they seem meaningless, there are
clues indicating how to work around this error. The first error where it is telling that the value
conversion failed because of a potential loss of data, it focuses on single column; Copy of
27
pen_2_api_base_08. This means, Data Conversation receives the data but refuses to change the
data type, and repeats itself in each column. The second error tells us that the data conversation
cannot be completed because the settings for that column are set to failure after an error. This is
the result of the first error causing a domino effect. After certain failures, SSIS breaks the
process. There is a tip from the second error; if the problem occurs at a certain column, is there
an access to the setting of that column or any other columns that may fail. Right clicking to Data
Conversation tool gives a menu that where the access to Advanced Editor is located. From this
editor, locate the properties of each column. There are two Error handling settings; Error Row
Disposition and Truncation Row Disposition. Change the settings of both error-handling
dispositions to Ignore Failure. Figure 22 illustrates the custom settings of the column. All
columns can be selected and the settings can change at once instead of repeating it for each
column.
28
Figure 22 Error Fix for Data Conversation
At this point, the sample data is ready for aggregation. Aggregate tool aggregates,
creates custom grouping and specific outcome. From the control bar of the Aggregate tool, an
administrator can assign desired columns to the custom outputs. In this sample there are three
different aggregation; Enroll_Percent, District, and Fact_Table. Each of these includes only the
columns of choosing. Once decided to connect each these aggregations to the source databases,
only the chosen columns will transfer to the destination.
29
Figure 23 Aggregate Tool
Figure 24 Aggregate destination
30
Figure 25 The Final ETL Process
As illustrated by figure 25, the result of ETL process is three different database tables;
Fact_table_Api, Enroll_Percent and District_Api. The final process is to create information cube
according to these database tables. Enroll_Percent table includes the information CSBA needs to
filter for each school district and/or school name. District_Api table includes the data of district,
school names etc. Finally, Fact_Table is the core of the star schema connecting these databases
together. It includes the measurements of the business. For the next stage, this project creates
and SQL Server Analysis Service (SSAS). The material will be Enroll_Percent, Fact_table_Api
and District_Api tables.
31
TABLE -Enroll_Percent
[cds]
[pen_2_api_base_07]
[pen_2_api_base_08]
[pen_2_api_base_09]
[pen_35_api_base_07]
[pen_35_api_base_08]
[pen_35_api_base_09]
[pen_6_api_base_07]
[pen_6_api_base_08]
[pen_6_api_base_09]
[pen_78_api_base_07]
[pen_78_api_base_08]
[pen_78_api_base_09]
[pen_91_api_base_07]
[pen_91_api_base_08]
[pen_911_api_base_09]
TABLE - Fact_table_api
[cds]
TABLE -District_Api
[cds]
[cname]
[rtype]
[stype]
[sped]
[charter]
[sname]
[dname]
Figure 26 Star Schema for our Sample
Creating the InfoCubes and Dimensions
Before beginning an Analysis Services Project, primary and foreign keys must be set in
the all databases/data warehouses that were created at the end of the SSIS process. In the
designated SQL Server Management staging area, CDS code must be assigned to primary key at
all tables. Once the primary key is set at all tables, foreign keys must be assigned to the fact
table accordingly. [6] This will create bridge between District and Enroll_Percent via
Fact_table_api. An important note, the proper way to create a fact table is to populate within
SSIS process with other databases. If the fact table is created in the SQL server independently,
there will be difficulties later in the analysis project. .
32
Figure 27 Primary and Foreign keys
In order to create a SSAS project, open a new Analysis Service Project. In this project,
this has been named ASP_500. Following steps would create an Information Cube; [13]
a. Assign the designated SQL Server location where the databases/data warehouses are
b. Choose the needed databases necessary for the project; this includes dimension tables
and fact tables.
c. Create the InfoCube and dimension according to the data in the SSAS environment
In order to add a data, right click on the first folder stated under my ASP_500 project; it
is named Data Sources. The following display boxes would ask to choose to location of the data.
An important tip, first the SQL Database location table needs to be selected, and then the correct
33
user has to be selected. For this project, the user was “service account”. It may be different in
other server environments.
.
Figure 28 Analysis Service Project
Figure 29 Adding new data source to the ASP
34
After adding data source, a data source view need to be created. This Data Source View
will include the actual data tables that would be used; Enroll_Percent, District_Api and
Fact_Table.
Add new Data Source View from the Data Source View folder. By default, all the data
sources will be there. The next window is to choose the tables. Common practice is to choose
the Fact Table and then click on add related table. This will decrease the chance of adding an
extra table to the pack or missing one.
Figure 30 New Data Source view
Finally, right click on Cubes object folder, and create a New Cube. The wizard will ask
for previously created tables. In this sample, an existing table is creating an InfoCube. This
option will list the data tables that ready to be used. The measure table will be Enroll_Percent
because it includes all the necessary numeric data. Enroll_Percent is also the dimensions table.
This table requires numeric data in order to create a working InfoCube. This part was
35
challenging during the project. As it was indicated previously, the data type is important to the
cube. InfoCube will not recognize the rows just because they have numbers inside them. It will
require proper data type introduction.
Figure 31 InfoCube wizard
Illustrations in Figure 31 indicate, the measurements will be displayed at the next
window, and the wizard will ask the user if the user wants to keep all the measurement. The next
step is to create dimension. The wizard will offer the user a default dimension. It is possible to
add new dimensions to the project. The requirement is only one dimension.
36
Figure 32 Default dimension and completing the InfoCube wizard
Figure 33 Browsing an InfoCube
37
The InfoCube is ready to be processed. Once it is processed, it will allow user to filter
the data according to desired result in the browsing area of the InfoCube.
The user can create a decent InfoCube structure with SSAS. Data Mining process uses
these InfoCubes and presents with Data mining models that will allow the user to drill through the
information or create maps. For example, this project created a mining structure for Pen_2
enrollment percent for years 2007, 2008 and 2009 via selecting the InfoCubes created from the
available data structures.
Figure 34 Creating a Data Mining Structure
Right click on the Data Mining folder and open a new Data Mining wizard. Next Step is
to confirm the cube and the dimensions. It is important to select correct dimensions associated
with the InfoCube. Incorrect dimensions will not process the Data Mining query and an error will
present itself.
An important part is to choose which columns the organization wants to process. The
choices in this project are limited to the SSIS package configuration. SSIS results have direct
effect to SSAS. The larger the data gets, the longer Data Mining process requires to be built.
38
Figure 35 Deciding columns that are going to be used in data mining
For the cube slices, this project used the Enroll Percent dimensions. At the final control
box, “Allow Drill through”, “Create mining model dimensions” and “Create cube using mining
dimensions” options must be selected. As a result, the wizard will create a mining dimension and
cube. Figures 37 and 38 illustrate Data Mining Tree and Generic Tree viewer.
Figure 36 Final stop for creating Data Mining process
39
Figure 37 Data Mining Tree
Figure 38 Data Mining Generic Tree viewer
40
Chapter 4
CONCLUDING REMARKS
The data that CSBA is going to use is very large. The SQL technology is powerful, yet
limited. I tried to prepare this project to be a foundation for CSBA administrators. Unorganized
master data from CDE would be the main problem for the organization. The normalization
process takes time and resources. The specification of the servers must be powerful enough to
run large amounts of merging data. This project illustrates creative ways to work around many
problems.
In general, it was a lengthy process including but not limited to learning a new
technology, applying all the knowledge I have learned and finally, cooperating with a live
organization in order to understand and respond to CSBA needs. The system is working and it is
applicable. The Business Intelligence foundation I created is designed for the basic needs of
CSBA. This structure could be customized; Such as, instead of merging all the information from
every year, creating information cubes from selective individual years of data. With this type of
customization, CSBA would have a chance to compare different information cubes inside of one
data mining structure. However, theory is always different from practice; such as the primary
key assignment of Data Warehouses that this project experienced.
In this project, I tried to follow Ralph Kimball’s approach to Data Warehousing and
Business Intelligence. The Kimball approach, according to Ms. Margy Ross recites the four key
decisions when designing a dimensional model: identify the business process, grain, dimensions
and facts. [11]
41
Adventure Works
Adventure Works is a collection of sample databases for users to study the SQL Data
Warehousing, SQL Server management etc. It has mock data of a mock bicycle company called
Adventure Works Bicycle. It is free of charge and available to download from Microsoft SQL
Server CodePlex open source community web page. It is the best sample source for any beginners
in the SQL environment for couple of reasons.
First of all every data, every table and key is extremely organized. In hence, it is easy on
untrained eyes. This helped me a lot when I study the connections between dimension tables and
measurement tables. Secondly, Adventure Work DW works perfectly. If the user wants to create
information cubes using dimension and fact tables of Adventure Works DW, it will work without
any error. This could lead the user to a misconception of real life data. Not every data is
organized as it is in Adventure Works. As a professional, the user has to spend many hours
creating a data warehouse, filtering necessary columns etc. In this scenario, Adventure Works
could be misleading.
SQL Server Technology
I have used Windows Server 2008 operating system for this project. I have given remote
access to this test machine in order to draft the project. This server has Intel Core Duo CPU
E6750 at 2.66 GHz each CPU, with 4 Gigabyte RAM in total. The operating system is 64-bit and
it may create a challenge in the process. The Integration Services run-time engine controls the
management and execution of packages, by implementing the infrastructure that enables
execution order, logging, variables, and event handling. Programming the Integration Services
run-time engine lets developers automate the creation, configuration, and execution of packages
and create custom tasks and other extensions. For instance, the Data Transformation Run-time
engine handles package storage, package execution, logging, debugging, event handling, package
42
deployment, and the management of variables, transactions, and connections. If the master data is
a data format that does not provide a 64-bit provider, such as mine FoxPro, en error will present
itself, once the project runs.
Error: SSIS Error Code DTS_E_OLEDB_NOPROVIDER_64BIT_ERROR. The requested
OLE DB provider VFPOLEDB.1 is not registered -- perhaps no 64-bit provider is available.
Error code: 0x00000000.
An OLE DB record is available. Source: "Microsoft OLE DB Service Components" Hresult:
0x80040154 Description: "Class not registered".
Figure 39 64-Bit runtime error
Changing the debugging engine setting to False would solve this problem. This setting
would tell Visual Studio to stop using only 64-Bit runtime engine.
SQL Server Integration Services & SQL Server Analysis Services
SSIS architecture consists of two parts: the Data Transformation Run-time engine, which
handles the control flow of a package; and the Data Flow engine or Data Transformation Pipeline
engine, which manages the flow of data from data sources, through transformations, and on to
destination targets. Although, SSIS provides powerful drag and drop features, it is very important
to know the specific settings before placing them. The user may need to locate work around of
similar errors. Microsoft has many articles and introductory web pages, however, it lacks of
detailed samples. Not only Microsoft, but also many books written in this subject could confuse
the user. Figure 40 illustrates an overview of SSIS architecture.
43
Figure 40 An overview of the SSIS architecture
Similar to SSIS, SSAS is also a very powerful tool. Its main function is to formulate the
data the user inputs and generates a meaningful result. Again, despite it is flexibility and
functionality, the user may get confused. Mistakes that have been made could be the best
resource for the project. Finally, understanding the business needs of an organization is the
healthy beginning to a Business Intelligence project.
44
APPENDIX
Record Layout for the 2009 Base API Data File
Field #
1
2
Field Name
CDS
RTYPE
Type
Character
Character
Width
14
1
3
STYPE
Character
1
4
SPED
Character
1
5
SIZE
Character
1
6
CHARTER
Character
1
7
8
9
10
11
SNAME
DNAME
CNAME
FLAG
VALID
Character
Character
Character
Character
Character
40
30
15
5
7
12
13
API09B
ST_RANK
Character
Character
5
5
14
SIM_RANK
Character
5
15
GR_TARG
Character
5
16
API_TARG
Character
5
Description
County/District/School code
Record Type: (D=District, S=School,
X=State)
Type: 1=Unified, 2=Elementary District,
3=9-12 High District, 4=7-12 High District,
E=Elementary School, M=Middle School,
H=High School
A=Alternative Schools Accountability
Model (ASAM), E=Special Education, and
C=Combination ASAM and Special
Education, S=State Special Schools
S=Small (11-99 Valid Scores), T=Under 11
Valid Scores
Y=Charter, Not Direct Funded, D=Direct
Funded Charter, Blank=Not a Charter
School
School Name
District Name
County Name
Flag Values
Number of Students Included in the 2009
Academic Performance Index (API)
2009 API (Base)
Statewide Rank (I=Invalid data, B=District
or ASAM, C=Special Education School .
Note: Should have asterisk if Valid_Num <
100)
Similar Schools Rank (I=Invalid data,
B=District or ASAM, C=Special Education
School, O=Schools with SCI changed
because of data change, but similar school
ranks not changed, S - Schools whose SCI
changed and their Similar School Ranks also
changed. Blank=the school did not have an
SCI score with p3 data or the school did not
have SCI score change with updated data.)
2009-10 API Growth Target (A=means the
school or subgroup scored at or above the
statewide performance target of 800 in 2009,
B=means this is either an LEA or an
Alternative Schools Accountability Model
(ASAM) school, C=means this is a special
education school.)
2010 API Target (A=means the school or
subgroup scored at or above the statewide
45
17
AA_NUM
Character
7
18
AA_SIG
Character
5
19
20
AA_API
AA_GT
Character
Character
5
5
21
AA_TARG
Character
5
22
AI_NUM
Character
7
23
AI_SIG
Character
5
24
AI_API
Character
5
25
AI_GT
Character
5
26
AI_TARG
Character
5
27
28
29
30
31
32
33
34
35
36
37
AS_NUM
AS_SIG
AS_API
AS_GT
AS_TARG
FI_NUM
FI_SIG
FI_API
FI_GT
FI_TARG
HI_NUM
Character
Character
Character
Character
Character
Character
Character
Character
Character
Character
Character
7
5
5
5
5
7
5
5
5
5
7
38
39
40
HI_SIG
HI_API
HI_GT
Character
Character
Character
5
5
5
41
HI_TARG
Character
5
42
PI_NUM
Character
7
43
PI_SIG
Character
5
44
PI_API
Character
5
45
PI_GT
Character
5
46
PI_TARG
Character
5
performance target of 800 in 2009, B=means
this is either an LEA or an Alternative
Schools Accountability Model (ASAM)
school, C=means this is a special education
school.)
Number of Black or African Americans
Included in API
Black or African Americans
Significant?(Yes/No)
2009 Black or African American API (Base)
2009-10Black or African American
Subgroup Growth Target
2010 Black or African American Subgroup
API Target
Number of American Indian/Alaska Native
Included in API
American Indian/Alaska Native
Significant?(Yes/No)
2009 American Indian/Alaska Native API
(Base)
2009-10 American Indian/Alaska Native
Subgroup Growth Target
2010 American Indian/Alaska Native
Subgroup API Target
Number of Asian Included in API
Asian Significant?(Yes/No)
2009 Asian API (Base)
2009-10 Asian Subgroup Growth Target
2010 Asian Subgroup API Target
Number of Filipino Included in API
Filipino Significant?(Yes/No)
2009 Filipino API (Base)
2009-10 Filipino Subgroup Growth Target
2010 Filipino Subgroup API Target
Number of Hispanic or Latino included in
API
Hispanic or Latino Significant?(Yes/No)
2009 Hispanic or Latino API (Base)
2009-10 Hispanic or Latino Subgroup
Growth Target
2010 Hispanic or Latino Subgroup API
Target
Number of Native Hawaiian/Pacific Islander
Included in API
Native Hawaiian/Pacific Islander
Significant?(Yes/No)
2009 Native Hawaiian/Pacific Islander API
(Base)
2009-10 Native Hawaiian/Pacific Islander
Subgroup Growth Target
2010 Subgroup Native Hawaiian/Pacific
46
47
48
49
50
51
52
WH_NUM
WH_SIG
WH_API
WH_GT
WH_TARG
MR_NUM
Character
Character
Character
Character
Character
Character
7
5
5
5
5
7
53
54
55
56
57
MR_SIG
MR_API
MR_GT
MR_TARG
SD_NUM
Character
Character
Character
Character
Character
5
5
5
5
7
58
SD_SIG
Character
5
59
SD_API
Character
5
60
SD_GT
Character
5
61
SD_TARG
Character
5
62
EL_NUM
Character
7
63
64
65
EL_SIG
EL_API
EL_GT
Character
Character
Character
5
5
5
66
67
68
EL_TARG
DI_NUM
DI_SIG
Character
Character
Character
5
7
5
69
70
DI_API
DI_GT
Character
Character
5
5
71
72
73
74
75
76
77
78
79
80
DI_TARG
PCT_AA
PCT_AI
PCT_AS
PCT_FI
PCT_HI
PCT_PI
PCT_WH
PCT_MR
MEALS
Character
Character
Character
Character
Character
Character
Character
Character
Character
Character
5
5
5
5
5
5
5
5
5
5
81
P_GATE
Character
5
82
P_MIGED
Character
5
Islander API Target
Number of White included in API
White Significant?(Yes/No)
2009 White API (Base)
2009-10 White Subgroup Growth Target
2010 White API Target
Number of students who marked 'Two or
More Races' included in API
'Two or More Races' Significant?(Yes/No)
2009 'Two or More Races' API (Base)
2009-10 'Two or More Races' Growth Target
2010 'Two or More Races' API Target
Number of Socioeconomically
Disadvantaged (SD) Students Included in
API
Socioeconomically Disadvantaged
Significant?(Yes/No)
2009 Socioeconomically Disadvantaged API
(Base)
2009-10 Socioeconomically Disadvantaged
subgroup Growth Target
2010 Socioeconomically Disadvantaged API
Target
Number of English Learner Students
Included in API
English Learner Significant?(Yes/No)
2009 English Learner API (Base)
2009-10 English Learner subgroup Growth
Target
2010 English Learner API Target
Number of Included in API
Students with Disabilities
Significant?(Yes/No)
2009 Students with Disabilities API (Base)
2009-10 Students with Disabilities subgroup
Growth Target
2010 Students with Disabilities API Target
Percent Black or African American
Percent American Indian
Percent Asian
Percent Filipino
Percent Hispanic or Latino
Percent Native Hawaiian/Pacific Islander
Percent White
Percent 'Two or More Races'
Percentage of Students Tested that are
eligible for Free or Reduced Price Lunch
Program
Percent of participants in Gifted and
Talented education programs (STAR)
Percent of participants in migrant education
47
83
84
P_EL
P_RFEP
Character
Character
5
5
85
86
87
P_DI
YR_RND
CBMOB
Character
Character
Character
5
5
5
88
DMOB
Character
5
89
90
91
92
ACS_K3
ACS_46
ACS_CORE
PCT_RESP
Character
Character
Character
Character
5
5
5
5
93
NOT_HSG
Character
5
94
HSG
Character
5
95
SOME_COL
Character
5
96
COL_GRAD
Character
5
97
GRAD_SCH
Character
5
98
99
AVG_ED
FULL
Character
Character
5
5
100
EMER
Character
5
101
102
PEN_2
PEN_35
Character
Character
5
5
103
104
PEN_6
PEN_78
Character
Character
5
5
105
PEN_91
Character
5
106
ENROLL
Character
7
107
PARENT_OPT
Character
7
108
109
110
TESTED
SCI
VCST_E28
Character
Character
Character
7
15
10
111
PCST_E28
Character
10
programs (STAR)
Percent English Learners
Percent of Reclassified Fluent-EnglishProficient (RFEP) students (STAR)
Percent of Students with Disabilities (STAR)
Year Round School
Percentage of Students counted as part of
school enrollment in October 2006 CBEDS
and has been continuously enrolled since
that date
Percentage of Students counted as part of
district enrollment in October 2008 CBEDS
and has been continuously enrolled since
that date
Average Class Size (Grades K-3)
Average Class Size (Grades 4-6)
Number of Core Academic Courses
Percent of Student Answer Documents with
Parent Education Level Information
Parent Education Level: Percent Not High
School Graduate
Parent Education Level: Percent High
School Graduate
Parent Education Level: Percent Some
College
Parent Education Level: Percent College
Graduate
Parent Education Level: Percent Graduate
School
Average Parent Education Level
Percent Teachers at this school with Full
Credentials
Percent Teachers at this school with
Emergency Credentials
Percent of Enrollments in grade 2 (STAR)
Percent of Enrollments in grades 3-5
(STAR)
Percent of Enrollments in grade 6 (STAR)
Percent of Enrollments in grades 7-8
(STAR)
Percent of Enrollments in grades 9-11
(STAR)
Number of Students Enrolled on the First
Day of Testing for Grades 2-11
Number of Students Excused from Testing
by Parental Written Request
Number of Students Tested on STAR
School Characteristic Index
Valid Score for California Standards Test
(CST) in English-language arts Grades 2-8
Product of Test Weights Multiplied by Valid
48
112
VCST_E91
Character
10
113
PCST_E91
Character
10
114
CW_CSTE
Character
5
115
VCST_M28
Character
10
116
PCST_M28
Character
10
117
VCST_M91
Character
10
118
PCST_M91
Character
10
119
CW_CSTM
Character
5
120
121
VCST_S28
PCST_S28
Character
Character
10
10
122
VCST_S91
Character
10
123
PCST_S91
Character
10
124
CWS_91
Character
5
125
VCST_H28
Character
10
126
PCST_H28
Character
10
127
VCST_H91
Character
10
128
PCST_H91
Character
10
129
CW_CSTH
Character
5
130
VCHS_E91
Character
10
131
PCHS_E91
Character
10
132
CW_CHSE
Character
5
133
VCHS_M91
Character
10
134
PCHS_M91
Character
10
scores for CST in English-language arts
Grades 2-8
Valid Score for CST in English-language
arts Grades 9-11
Product of Test Weights Multiplied by Valid
scores for CST in English-language arts
Grades 9-11
School Content Area Weights Percentage in
CST English-language arts
Valid Score for CST in mathematics Grades
2-8
Product of Test Weights Multiplied by Valid
scores for CST in mathematics Grades 2-8
Valid Score for CST in mathematics Grades
9-11
Product of Test Weights Multiplied by Valid
scores for CST in mathematics Grades 9-11
School Content Area Weights Percentage in
CST mathematics
Valid Score for CST in science Grades 2-8
Product of Test Weights Multiplied by Valid
scores for CST in science Grades 2-8
Valid Score for CST in science Grades 9-11
(End of Course, CST )
Product of Test Weights Multiplied by Valid
scores for CST in science Grades 9-11 (End
of Course, CST )
School Content Area Weights Percentage in
CST science (End of Course, CST )
Valid Score for CST in history-social
science Grades 2-8
Product of Test Weights Multiplied by Valid
scores for CST in history-social science
Grades 2-8
Valid Score for CST in history-social
science Grades 9-11
Product of Test Weights Multiplied by Valid
scores for CST in history-social science
Grades 9-11
School Content Area Weights Percentage in
CST history-social science
Valid Score for California High School Exit
Exam (CAHSEE) ELA in Grades 9-11
Product of Test Weights Multiplied by Valid
scores for CAHSEE ELA in Grades 9-11
School Content Area Weights Percentage in
CAHSEE ELA
Valid Score for CAHSEE mathematics in
Grades 9-11
Product of Test Weights Multiplied by Valid
scores for CAHSEE mathematics in Grades
49
135
CW_CHSM
Character
5
136
TOT_28
Character
10
137
TOT_91
Character
10
138
CW_SCI
Character
5
139
VCST_LS10
Character
10
140
PCST_LS10
Character
10
141
CWM2_28
Character
5
142
VCSTM2_28
Character
10
143
PCSTM2_28
Character
10
144
CWM2_91
Character
5
145
VCSTM2_91
Character
10
146
PCSTM2_91
Character
10
147
CWS2_91
Character
5
148
VCSTS2_91
Character
10
149
PCSTS2_91
Character
10
150
IRG5
Character
1
151
CMA_ADJ_ELA
Character
5
152
CMA_ADJ_MATH
Character
5
153
CMA_ADJ_SCI
Character
5
9-11
School Content Area Weights Percentage in
CAHSEE mathematics
Product of Total of Test Weights Multiplied
by Total of Valid scores in Grades 2-8
Product of Total of Test Weights Multiplied
by Total of Valid scores in Grades 9-11
School Content Area Weights Percentage in
CST Life Science Grade 10 and Grades 2-8
Valid Score for CST in Life Science Grade
10 only
Product of Test Weights Multiplied by Valid
scores for CST in Life Science Grade 10
only
School Content Area Weights Percentage for
Mathematics Assignment of 200 CST in
grades 2-8
Valid Score for Mathematics Assignment of
200 CST in Grades 2-8
Product of Test Weights Multiplied by Valid
scores for Mathematics Assignment of 200
CST in Grades 2-8
School Content Area Weights Percentage for
Mathematics Assignment of 200 CST in
grades 9-11
Valid Score for Mathematics Assignment of
200 CST in Grades 9-11
Product of Test Weights Multiplied by Valid
scores for Mathematics Assignment of 200
CST in Grades 9-11
School Content Area Weights Percentage for
Science Assignment of 200 CST in grades 911
Valid Score for Science Assignment of 200
CST in Grades 9-11
Product of Test Weights Multiplied by Valid
scores for Science Assignment of 200 CST
in Grades 9-11
Testing irregularities greater than zero but
less than 5 percent. (Y=Yes)
The number of valid ELA records excluded
for any of the content areas due to the CMA
adjustment, Grade 9
The number of valid Math records excluded
for any of the content areas due to the CMA
adjustment, Grades 7 - 11
The number of valid Science records
excluded for any of the content areas due to
the CMA adjustment, Grade 10
50
Record Layout for the 2009 AYP Data File
Field #
Field Name
Type
Width
Description
1
cds
Character
14
County/District/School code
2
rtype
Character
1
Record Type D=District, S=School, X=State
3
type
Character
1
4
sped
Character
1
Type 1=Unified, 2=Elementary District, 3=9-12
High District, 4=7-12 High District,
E=Elementary School, M=Middle School,
H=High School
A= Alternative Schools Accountability Model
(ASAM), E=Special Education, and
C=Combination ASAM and Special Education
5
size
Character
1
S=Small (11-99 Valid API Scores), T=Under 11
Valid API Scores
6
charter
Character
1
7
sname
Character
40
Y=Charter, Not Direct Funded, D=Direct
Funded Charter, Blank=Not a Charter School
School Name
8
dname
Character
30
District Name
9
cname
Character
15
County Name
10
api08b
Character
5
2008 Base
11
api09g
Character
5
2009 Growth
12
apichang
Character
5
Change in API
13
met_all
Character
5
14
crit1
Character
5
Yes = Met all 2009 AYP Criteria, No = Did not
Meet all 2009 AYP Criteria
Number of AYP criteria met
15
crit2
Character
5
Number of AYP criteria possible
16
capa_ela1
Character
5
ELA (CAPA) Percent proficient and above
17
capa_ela2
Character
5
ELA (CAPA) Above 1.0
18
capa_ela3
Character
5
19
capa_math1
Character
5
ELA (CAPA) Exception Approved (Blank, Yes,
Adj=Adjustment made for districts exceeding
ELA CAPA cap)
Math (CAPA) Percent proficient and above
20
capa_math2
Character
5
Math (CAPA) Above 1.0
21
capa_math3
Character
5
22
e_enr
Character
7
Math (CAPA) Exception Approved (Blank, Yes,
Adj=Adjustment made for districts exceeding
ELA CAPA cap)
Schoolwide or LEA-wide ELA Enrollment First
Day of Testing
51
23
e_tst
Character
7
Schoolwide or LEA-wide ELA Number of
Students Tested
Schoolwide or LEA-wide ELA Participation
Rate
Schoolwide or LEA-wide ELA Participation
Rate Met
Schoolwide or LEA-wide Math Enrollment
Enrollment First Day of Testing
Schoolwide or LEA-wide Math Number of
Students Tested
24
e_prate
Character
5
25
e_pr_met
Character
4
26
m_enr
Character
7
27
m_tst
Character
7
28
m_prate
Character
5
29
m_pr_met
Character
4
30
ee_aa
Character
7
Schoolwide or LEA-wide Math Participation
Rate
Schoolwide or LEA-wide Math Participation
Rate met
ELA Enrollment African American
31
et_aa
Character
7
ELA Tested African American
32
ep_aa
Character
5
ELA Participation Rate African American
33
epm_aa
Character
4
ELA Participation Rate Met African American
34
me_aa
Character
7
Math Enrollment African American
35
mt_aa
Character
7
Math Tested African American
36
mp_aa
Character
5
Math Participation Rate African American
37
mpm_aa
Character
4
Math Participation Rate Met African American
38
ee_ai
Character
7
ELA Enrollment American Indian
39
et_ai
Character
7
ELA Tested American Indian
40
ep_ai
Character
5
ELA Participation Rate American Indian
41
epm_ai
Character
4
ELA Participation Rate met American Indian
42
me_ai
Character
7
Math Enrollment American Indian
43
mt_ai
Character
7
Math Tested American Indian
44
mp_ai
Character
5
Math Participation Rate American Indian
45
mpm_ai
Character
4
Math Participation Rate met American Indian
46
ee_as
Character
7
ELA Enrollment Asian
47
et_as
Character
7
ELA Tested Asian
48
ep_as
Character
5
ELA Participation Rate Asian
49
epm_as
Character
4
ELA Participation Rate Met Asian
50
me_as
Character
7
Math Enrollment Asian
51
mt_as
Character
7
Math Tested Asian
52
mp_as
Character
5
Math Participation Rate Asian
53
mpm_as
Character
4
Math Participation Rate Met Asian
54
ee_fi
Character
7
EL A Enrollment Filipino
52
55
et_fi
Character
7
ELA Tested Filipino
56
ep_fi
Character
5
ELA Participation Rate Filipino
57
epm_fi
Character
4
ELA Participation Rate Met Filipino
58
me_fi
Character
7
Math Enrollment Filipino
59
mt_fi
Character
7
Math Tested Filipino
60
mp_fi
Character
5
Math Participation Rate Filipino
61
mpm_fi
Character
4
Math Participation Rate Met Filipino
62
ee_hi
Character
7
ELA Enrollment Hispanic
63
et_hi
Character
7
ELA Tested Hispanic
64
ep_hi
Character
5
ELA Participation Rate Hispanic
65
epm_hi
Character
4
ELA Participation Rate met Hispanic
66
me_hi
Character
7
Math Enrollment Hispanic
67
mt_hi
Character
7
Math Tested Hispanic
68
mp_hi
Character
5
Math Participation Rate Hispanic
69
mpm_hi
Character
4
Math Participation Rate met Hispanic
70
ee_pi
Character
7
ELA Enrollment Pacific Islander
71
et_pi
Character
7
ELA Tested Pacific Islander
72
ep_pi
Character
5
ELA Participation Rate Pacific Islander
73
epm_pi
Character
4
ELA Participation Rate met Pacific Islander
74
me_pi
Character
7
Math Enrollment Pacific Islander
75
mt_pi
Character
7
Math Tested Pacific Islander
76
mp_pi
Character
5
Math Participation Rate Pacific Islander
77
78
mpm_pi
ee_wh
Character
Character
4
7
Math Participation Rate met Pacific Islander
ELA Enrollment White
79
et_wh
Character
7
ELA Tested White
80
ep_wh
Character
5
ELA Participation Rate White
81
epm_wh
Character
4
ELA Participation Rate met White
82
me_wh
Character
7
Math Enrollment White
83
mt_wh
Character
7
Math Tested White
84
mp_wh
Character
5
Math Participation Rate White
85
mpm_wh
Character
4
Math Participation Rate met White
86
87
ee_sd
et_sd
Character
Character
7
7
ELA Enrollment Socioeconomic Disadvantaged
ELA Tested Socioeconomic Disadvantaged
88
ep_sd
Character
5
89
epm_sd
Character
4
ELA Participation Rate Socioeconomic
Disadvantaged
ELA Participation Rate met Socioeconomic
Disadvantaged
53
90
me_sd
Character
7
Math Enrollment Socioeconomic Disadvantaged
91
mt_sd
Character
7
Math Tested Socioeconomic Disadvantaged
92
mp_sd
Character
5
93
mpm_sd
Character
4
94
ee_el
Character
7
Math Participation Rate Socioeconomic
Disadvantaged
Math Participation Rate met Socioeconomic
Disadvantaged
ELA Enrollment English Learner
95
et_el
Character
7
ELA Tested English Learner
96
ep_el
Character
5
ELA Participation Rate English Learner
97
epm_el
Character
4
ELA Participation Rate met English Learner
98
me_el
Character
7
Math Enrollment English Learner
99
mt_el
Character
7
Math Tested English Learner
100
mp_el
Character
5
Math Participation Rate English Learner
101
mpm_el
Character
4
Math Participation Rate met English Learner
102
ee_di
Character
7
ELA Enrollment Students with Disabilities
103
et_di
Character
7
ELA Tested Students with Disabilities
104
ep_di
Character
5
105
epm_di
Character
5
106
me_di
Character
7
ELA Participation Rate Students with
Disabilities
ELA Participation Rate met Students with
Disabilities
Math Enrollment Students with Disabilities
107
mt_di
Character
7
Math Tested Students with Disabilities
108
mp_di
Character
5
109
mpm_di
Character
4
110
e_val
Character
7
111
e_prof
Character
7
112
e_pprof
Character
5
Math Participation Rate Students with
Disabilities
Math Participation Rate met Students with
Disabilities
Schoolwide or LEA-wide ELA Number of Valid
scores
Schoolwide or LEA-wide ELA Number of
students scoring Proficient or Above
Schoolwide or LEA-wide ELA Percent of
students scoring Proficient or Above
113
e_ppm
Character
4
Schoolwide or LEA-wide ELA Percent
Proficient or Above met
114
m_val
Character
7
Schoolwide or LEA-wide Math Valid scores
115
m_prof
Character
7
116
m_pprof
Character
5
117
m_ppm
Character
4
118
ev_aa
Character
7
Schoolwide Math number of students scoring
Proficient or Above
Schoolwide Math Percent of students scoring
Proficient or Above
Schoolwide Math Percent Proficient or Above
met
ELA Valid scores African American
54
119
enp_aa
Character
7
ELA Number of students scoring Proficient or
Above African American
ELA Percent Proficient or Above African
American
ELA Percent Proficient or Above met African
American
Math Valid scores African American
120
epp_aa
Character
5
121
eppm_aa
Character
4
122
mv_aa
Character
7
123
mnp_aa
Character
7
124
mpp_aa
Character
5
125
mppm_aa
Character
4
126
ev_ai
Character
7
127
enp_ai
Character
7
128
epp_ai
Character
5
129
eppm_ai
Character
4
130
mv_ai
Character
7
131
mnp_ai
Character
7
132
mpp_ai
Character
5
133
mppm_ai
Character
4
134
ev_as
Character
7
135
enp_as
Character
7
136
epp_as
Character
5
137
eppm_as
Character
4
ELA Number of students scoring Proficient or
Above Asian
ELA Percent of students scoring Proficient or
Above Asian
ELA Percent Proficient or Above met Asian
138
mv_as
Character
7
Math Valid scores Asian
139
mnp_as
Character
7
Math students scoring Proficient or Above Asian
140
mpp_as
Character
5
141
mppm_as
Character
4
Math Percent of students scoring Proficient or
Above Asian
Math Percent Proficient or Above met Asian
142
ev_fi
Character
7
ELA Valid scores Filipino
143
enp_fi
Character
7
144
epp_fi
Character
5
145
eppm_fi
Character
4
ELA Number of students scoring Proficient or
Above Filipino
ELA Percent of students scoring Proficient or
Above Filipino
ELA Percent Proficient or Above met Filipino
146
mv_fi
Character
7
Math Valid scores Filipino
147
mnp_fi
Character
7
Math students scoring Proficient or Above
Math students scoring Proficient or Above
African American
Math Percent of students scoring Proficient or
Above African American
Math Percent Proficient or Above met African
American
ELA Valid scores American Indian
ELA Number students scoring Proficient or
Above American Indian
ELA Percent of students scoring Proficient or
Above American Indian
ELA Percent Proficient or Above met American
Indian
Math Valid scores American Indian
Math students scoring Proficient or Above
American Indian
Math Percent of students scoring Proficient or
Above American Indian
Math Percent Proficient or Above met American
Indian
ELA Valid scores Asian
55
Filipino
148
mpp_fi
Character
5
149
mppm_fi
Character
4
Math Percent of students scoring Proficient or
Above Filipino
Math Percent Proficient or Above met Filipino
150
ev_hi
Character
7
ELA Valid scores Hispanic
151
enp_hi
Character
7
152
epp_hi
Character
5
153
eppm_hi
Character
4
ELA Number students scoring Proficient or
Above Hispanic
ELA Percent of students scoring Proficient or
Above Hispanic
ELA Percent Proficient or Above met Hispanic
56
BIBLIOGRAPHY
1.
Academic Accountability , A. (2009). California Department of Education. Retrieved August 13,
2009 from AYP Reports: http://www.cde.ca.gov/ta/ac/ay/aypreports.asp.
2.
Academic Accountability , A. (2009). California Department of Education. Retrieved August 13,
2009 from Record Layout for the 2009 AYP Data File: http://www.cde.ca.gov/ta/ac/ay/reclayout09.asp.
3.
Msdn Library, M. (2009). "Designing Dimensions Microsoft BI Screencast." Business Intelligence
09a Designing Dimensions. Retrieved August 13, 2009 from 3. MSDN: http://channel9.msdn.com/posts/
zachskylesowens/ business-intelligence-09a-designing-dimensions/>..
4.
Hammergren, T. C. (2009). Data Warehousing For Dummies, 2nd Edition. . Hoboken, NJ: Wiley
Publishing, Inc.
5.
Msdn Library, M. (2008). Integration Services Data Types. Retrieved August 13, 2010 from MSDN:
http://msdn.microsoft.com/en-us/library/ms141036.aspx.
6.
Kimball, R. (2007). Fact Tables. Retrieved August 13, 2010 from information Management
Magazine: http://www.information-management.com/issues/2007_54/ 10002185-1.html?pg=1.
7.
Martyn, T. (Ed.). (2004). Reconsidering Multi-Dimensional Schemas. New York, NY USA: ACM
SIGMOD Record.
8.
Mundy, J. (2008). Design Tip #99 Staging Areas and ETL Tools. Retrieved August 13, 2010 from
Kimball Group: http://www.kimballgroup.com/html/08dt/ku99stagingareasetltools.pdf.
9.
Otey, M., & Otey, D. (2010). Managing and Deploying SQL Server Integration Service . Retrieved
August 13, 2010 from Microsoft: http://technet.microsoft.com/en-us/library/cc966389.aspx.
10. Margy, R. (2009). The 10 Essential Rules of Dimensional Modeling. Retrieved August 13, 2010 from
Kimball Group: http://intelligent-enterprise.informationweek.com/ showarticle.jhtml?articleid=217700810.
11. Margy, R. (2005). Design Tip # 69 Indentifying Business Process. Retrieved August 13, 2010 from
Kimball Group: http://www.rkimball.com/html/designtipspdf/designtips2005/
dtku69identifyingbusinessprocesses.pdf.
12. MSDN Microsoft. (2006). Retrieved August 13, 2010 from SQL Server 2005 Books Online
(November 2008) Integration Services Programming Architecture: http://msdn.microsoft.com/en-us/
library/ms403344%28sql.90%29.aspx.
13. Jacobsen, R., & Miser, S. (Producer). (2006). Microsoft SQL Server 2005 Analysis Services Step by
Step. . [Web Presentation]. Redwood, Washington: Hitachi Consulting.
14. Microsoft. (2010). Retrieved August 13, 2010 from Windows 1250: http://msdn.microsoft.com/en-us/
goglobal/cc305143.aspx.
Download