Grid Based School Enrollment Forecasting Richard Lycan – Institute on Aging Charles Rynerson – Population Research Center Portland State University Portland Oregon ESRI Education Conference San Diego, July, 2014 You can download the latest PowerPoint file for this presentation at: http://www.pdx.edu/prc/news-and-presentations-from-the-population-research-center Population Research Center What this paper is about • The authors have been involved in school enrollment forecasting for a number of years and have experimented with various ways to improve the forecasting process. • In this paper we will show how a simple model that is normally based on data for school attendance areas – elementary, middle, and high school, or perhaps planning areas, can be implemented for small grid areas roughly the size of a city block. • We are using data for the Portland Public Schools area because – We have geocoded student record data for a long time period – 1996 to the present – We have familiarity with the social demography of Portland – But, the geographic pattern of changes in the 2000-2010 period was complex • Evaluating the results of our model – We start our forecast in 2003 and forecast enrollment by grade level in 2006 and 2009. – We compare the results of the grid based model with that based on a model for the 37 middle school attendance areas. Common Forecast Methods • • • Cohort component – Informed by age specific rates for deaths, births, migration – Most often used for large geographical areas, counties, school districts – Often relied upon for long range forecasts Housing based – Uses estimates of students per household for different housing types – Requires knowledge of local housing markets – Informed by GIS analysis or local knowledge such as student census Grade progression model – Informed by recent enrollment history – Can be useful for short term forecasts – Simple model – we will explore a grid based implementation of the grade progression model The grade progression model • Tracks a cohort of students over time, e.g. the students in grades KG-02 in 2000. • The grade progression ratio (GPR) is the transition ratio from one cohort to the next, e.g. 0.91 = 724/795 • The forecast begins in 2003 and extends to 2009. The grade 06-08 in 2006 forecast of 659.3 = 0.91 * 724. • Forecast error is shown by subtracting the actual value from the forecast. 3 3 5 11 39 1 2 6 1 5 62 4 1 2 8 1 2 5 1 9 826 6 11 7 1 116 3 3 11 834 1 13 1 4 1 3 41 49 1 36 1 13 1 1 3 2 1 52 1 20 2 4 11 4 229 2 3 1 1 2 2 Roosevelt 2 2 2 165 121 91 2 303 29 194 85.1 31 152 79.6 27 118 77.1 71 374 81.0 1 1 5 1 2 3 1 219 61 191 111 173 102 3 11 114 137 78 297 73.7 19 80 76.3 46 237 80.6 38 149 74.5 29 202 85.6 33 135 75.6 39 22 153 159 74.5 1 86.2 4 1 11 1 7 164 6 5 5 3 29 13 2 2 2 8 20 2 13 31 18 1 37 61 1 2 7 120 1 1 14 8 1 133 122 9 3 61 42 16 1 5 2 8 2 2 4 8 3 1 1 98 3 2 1 240 2 54 79 153 178 189 33 87 62.1 26 105 75.2 50 203 75.4 38 1 216 82.4 3 63 252 75.0 9 111 91.9 7 56 7 41 718 781 1 80 861 90.7 1 133 1 1 13 4 6 3 3 3 5 1 1 17 2 4 1 1 1 5 3 1 2 33 1 886 32 6 236 5 36 89 5 12 6 33 1 13 6 2 3 36 4 2 1 1 7 3 1 1 1 34 15 3 70 6 15 128 115 22 4 16 131 2 2 11 3 182 133 186 29 211 86.3 4 27 160 83.1 26 212 87.7 1 1 3 21 1 1 2 79 1 1 2 223 1 7 42 111 16 3 6 25 2 4 15 5 11 4 King 6 7 860 21 1 1 8 2 1 866 1 6 1 1 2 3 1 4 30 3 263 102 77 51 46 1 165 1 33 198 83.3 2 22 8 79 156 311 92.9 94.9 13 162 92.0 30 2 5 5 1 2 1 3 32 1 7 29 68 339 7 2 21 11 1 149 1 1 13 1 1 1 4 4 161 1 51 5 878 32 1 2 1 3 1 11 289 148 70 1 81 3 5 1 101 2 14 2 16 695 11 3 88 115 2 111 2 23 501 105 140 1 10 36 31 16 2 537 136 156 93.3 277.2 89.7 1 29 2 7 69 5 4 1 1 2 2 849 879 895 3 1 23 15 3 852 1277 1 902 5 1 12 2 3 1 8 7 1 1 348 3 2 2 17 895 2 2 1 3 7 6 2 4 23 7 2 1 2 27 6 148 32 3667 2 63 356 21 1 1 3 4 Jackson George Peninsula Astor Lane Lent Marysville Jefferson 2 1 6 16 1 1 847 2 830 2 2 51 2 2 4 11 177 93.8 6 2 1 27 3 875 1243 827 841 5 28 1 47 4 2 2 860 6 866 878 11 1 12 5 2 5 1 9 8 44 1 1 1 1 2 1 1 2 2 King 7 1 2 847 1 51 870 7 1 166 1 1 3 842 7 4 Humboldt Vestal 3 9 26 3 1 1 21 2 2 8 1 1 3 1 5 4 4 868 1 8 4 830 2 1 Clark Faubion Scott 5 5 1 886 2 Beach 2 Sabin Rigler Roseway Hts Lee 4 868 8 1 1 102 2 71 5627 715 7 13 1 1 Laurelhurst Skyline Woodlawn 40 10 11 2 2 1 6 1 8611 896 4 Humboldt 1 6 21 3 2 9 17 2 29 7 831 55 43 10 2 1 2 69 1 1 14 887 2 Faubion 7 2 164 9 1 4 6 6 7 420 5 29 857 598 71 3 1 2 41 2 56 885 2 Beach 3 1 1 861 1 24 4 23 10 131 3 Grant 1 3 884 1 869 Sabin 1 41 1 1 79229 230 2 2 5 877 5 1 1 261 7 48 2 2 6 2 7 1 12 82 6 373 833 67 3 857 3 898 2 Laurelhurst 2 1 6 4 16 5 3 2 890 1 1 2 Irvington 7 2 1 833 902 Irvington Sylvan 3 877 831 26 3 1 1 1 16 8 21 6 17 2 5 4 895 3 Cleary Ockley Green Faubion Humboldt 2 Boise-Eliot Vernon 878 2 Cleary 1 13 843 866 Boise-Eliot 6 1 860 1 King Beaumont 3 843 13 8 2 21 3 6 433 2 1 1 Franklin 1 3 21 2 49 62 133 136 52 70 36 20 4 1 2 1 13 847 1 Mt. Tabor 6 830 2 Beaumont 834 CrestonBeach Sabin Laurelhurst 9 886 2 Creston Arleta 888 14 370 3 111 1 Sunnyside 858 2 368 40 1 2 23 4 1 1 893 868 8 Bridger 861 1 2 13 Bridger 10 Arleta Irvington Cleary Boise-Eliot Sunnyside Mt. Tabor Creston Sellwood Beaumont 857 3 888 893 826 14 6 6 1 3707 111 1 7 164 8 20 14 23 11 62 1 37 4Cleveland 15 1 98 116 7 3 6 2 56 37 6 9 40 11 22 31 2 1 39 5 11 1 9 1 1 3 1 1 4 6 17 833 Mt. Tabor 1 831 3 Sellwood 4 49 1 36 1 843 877 13 26 1 1 2 21 858 1 368 1 52 4 2040 229 2 Lincoln 4 23 3 902 3 91 24 12 4 6 6 44 5 23 293 53 719 11 327 24 68 6 5 70 2 1 7 567 96 422 1 6 90 22 57 6 657 2118 479 86.3 281.4 88.1 4 6 5 3 29 67 5 71 1 2 OUT Total 16 548 2 480 2 1 215 890 11 898 151 1 1 272 4 85 3 25 559 1 6 450 3 1 91 Lincoln 1 142 1 111 1 224 59 111 494 1 3 159 4 68 1 163 890 898 1 96 11 281 2 63 4 12 863 3 6 146 1 88 2 51 6 178 1 155 9 136 5 255 4 3 128 11 2 130 11 1 12 397 2 138 Sylvan 857 Beverly 841 Clarendon - Portsmouth K-8 Cleary 2-8 Grant 902 6-8 849 George M.S. 861 Woodlawn Irvington K-8 879 Peninsula K-8 Skyline K-8 890 868M.S.Laurelhurst K-8 1 Lincoln 852 Robert Gray Wilson 1277 Jackson 898 M.S. 886 West SabinSylvan PK-8 M.S.1 504 426 83 Total in regular middle schools 830 Beach PK-8 In other schools 149 62 4 847 Faubion PK-8 653 488 87 Total Residing Percent in regular schools 77.2 87.3 95.4 860 Humboldt PK-8 Jefferson 866 King 6-8 878 Ockley Green School 6-8 895 Vernon PK-8 902 Woodlawn 6-8 890 Skyline K-8 Lincoln 898 West Sylvan M.S. 834 13 Hosford Arleta Middle School Attending 858 888 893 826 attending 858 Hosford M.S. 368 14 6 6 888 Sellwood M.S. 40 370 11 893 Sunnyside Environmental 6-8 23 11 62 1 HS Cluster Middle School Attending 826 Arleta K-8 4 1 116 834 Bridger K-8 3 858 Hosford M.S. Franklin 843 Creston 6-8 11 2 3 Cleveland 888 877 Mt. Tabor M.S. Sellwood M.S. 39 5 9 11 831 Beaumont M.S. 1 1 893 Sunnyside Environmental 6-8 833 Boise-Eliot PK-8 High school residing Arleta K-8 cluster 857 Beverly 826 Cleary 2-8 2 Grant 861 Irvington K-8 834K-8Bridger K-8 50-100% 868 Laurelhurst 2 5 % of residing Franklin 886 Sabin PK-8 843 Creston 6-8 population 25-49% 830 Beach PK-8 877 847 Faubion PK-8 Mt. Tabor M.S. attending PK-8Beaumont M.S. 12.5-24% 860 Humboldt 831 Jefferson 866 King 6-8 833 Boise-Eliot PK-8Attending 878 Ockley Green School 6-8 School HS Cluster Middle 895 Vernon 857 PK-8 Beverly Cleary 2-8 858 902 Woodlawn 6-8 Hosford M.S. Grant 861 890 Skyline K-8 Cleveland 888 Irvington SellwoodK-8 M.S. Lincoln 898 West Sylvan M.S. 4 1 868 Laurelhurst K-8 2 869 Lee 6-8 893 Sunnyside Environmental 6-8 886 Sabin PK-8 884 Rigler 7-8 826 Arleta K-8 884 Rigler K-6 Madison 830 Beach PK-8 885 Roseway834 Heights 6-8 2 1 4 Bridger K-8 Franklin 887 Scott 6-8847 Faubion PK-8 843 Creston 6-8 896 Vestal K-8 1 Humboldt 842 Harrison860 Park K-8 877 Mt. Tabor PK-8 M.S. 1 870 Lent K-8 Jefferson 866 King 6-8 Marshall 831 875 Marysville K-8 Beaumont M.S. 1 4 878 Green School 6-8 1243 Lane M.S. 6 17 3 833 Ockley Boise-Eliot PK-8 827 Astor K-8 895 Vernon PK-8 12.5-24% Cleveland Wilson Sylvan Roosevelt Outside Marshall Vernon Madison Vernon Lincoln Ockley Clarendon Green Jefferson Ockley Green Grant Hosford HS Cluster Franklin Bridger 25-49% Sellwood 12.5-24% Cleveland Sunnyside 50-100% 25-49% High school cluster residing High school cluster residing Cleveland Franklin Grant Jefferson The Portland district has many programs that are not geographically based. It also of residing frequently allows parents to choose schools outside of their neighborhood. population % of residing population attending % Hosford • 50-100% Skyline One way to do this is to used a table, such as the one below, showing the relationships between where students live and which school they attend. Skyline • Woodlawn Gray The forecast which we have produced is a by residing forecast. To get a by attending forecast we need to distribute the residing students to the schools they attend. Woodlawn • 1 2 41 2 11 4 56056 8 585 158 162 388 122 7 419 715 651 8,329 33 618 94.7 66 224 70.5 1,496 9,825 84.8 1 1 41 56 7 715 Caveat – 2000 to 2010 a turbulent time for PPS • • Recession and slump in housing markets Gentrification – Affluent 30 somethings move into close in housing – Enrollment turnaround in some central area schools – Many black families moved to suburbs • School choice has resulted in race and class size imbalance • The PPS District closed schools and consolidated programs • Thus in evaluating the forecast we consider areas where enrollment change was: – Constant (10) – Turnaround (9) – Confused (12) Examples of enrollment trends How did the forecast perform? • • The 2009 grade 06-08 forecast was 9,005 students compared to actual 9,825. Early downward trends did not predict a turnaround in enrollment. The MAPE – mean absolute percent error ? – 12.0 % overall for middle school attendance areas – Middle school attendance areas • 11.9 % with constant trend • 13.4 % turnaround • 10.9% confused trend How is this done with a grid based model? • • • • This map shows the calculation of grade progression ratios for grades 03-05 in 2003 KG-02 in 2000 The map shows the ratio between density of students for the two cohorts. The orange areas show increase in the cohort trend, the green decrease. Density is calculated in a bandwidth surrounding each grid cell center for 660’x660’ cells. Example of Grade Progression Ratios for 03-05 / KG-02 • The grade progression ratios shown were calculated using the CrimeStat IV crime mapping and statistical package. The student data were from geocoded student records for Portland Public Schools from 1999 to 2010. • Data were averaged over time by using three year age groups. For example, the data shown for 2000 are in fact an average of 1999, 2000, and 2001. The data also are smoothed by using three year age groups, KG-02 and 03-05. • The data were averaged over space using grid density mapping. An adaptive bandwidth of 200 students, was used (compared to an average middle school size of 400 students) with a quadratic distance decay function and a grid size of 600 feet. New Columbia Grade Progression Ratio 0.30 0.40 0.50 0.60 0.70 0.80 • The interesting reversal of trend in the Clarendon attendance area was due to the demolition and subsequent redevelopment of a large public housing area. 0.90 1.00 1.11 1.25 1.40 1.70 2.00 We replicate the earlier forecast using grid method • We use the grid map grade progression ratios for – GPR. 1 =03-05/KG-02 for 2000-2003 – GPR.2 = 06-08/03-05 for 2000-2003 • We multiply the GPR.1 grid map times the GPR.2 grid map to get the product map GPR.12 • Using a point for each KG-02 student in 2003 we add the value for each cell in the GPR.12 map to the student attribute file. • The student point file contains the geography within which the student resides and the GPR.12 weighting. • We summarize the GPR.12 weight by the geography, here the code for each middle school area. • Voila! The resulting table contains the enrollment forecast for grades 06-08 in 2009. Grade Progression Ratio 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1.11 1.25 1.40 1.70 2.00 Choices • • • • There are a variety of implementations of the grid density model, examples: – ESRI Spatial Analyst (SA) – The CrimeStat Spatial Statistics program (CS) All provide some common options – Cell size – Distance weighting Quartic used – Band width: • Fixed – distance known, sample varies • Adaptive – sample known, distance varies, not in SA Common advice on options is that they don’t matter too much for applications like finding crime hot spots. However in using them for forecasting the metric may be more important. Quartic (Spherical) Uniform Triangular Normal Adaptive band width • The adaptive band width averages a constant number of points but the range over which it averages the points varies. • A set number of points, say 300, can be found in a smaller region on the denser east side of Portland than on the west side. • A fixed band width (as in S.A.) would summarize fewer points the west than in the east. Increasing bandwidth generalizes the data and map • The follow series of maps show how the grade progression ratio is generalized as the bandwidth in the density mapping ratio is varied. • The bandwidth of 100, 200, 300, etc. is the number of student points that are included in the computation of density for the two cohorts. • Is there an optimal bandwidth to use in the grid based forecasting model? Results of the grid based forecast • Evaluate the grid based model versus actual enrollment. • Explore the effects of varying the bandwidth in the grid based model. • Compare the results for the standard and grid based forecasts. • Evaluate the performance of the grid based model for MSAA’s where the enrollment trend was: standard, turnaround, confused. • Evaluate the use of the grid based model to create forecasts for special geographies, here gentrifying zones in the District. Results of Grid Based Forecast Compare grid forecast to actual by bandwidth • • The results of the grid based and standard forecast are quite similar. Hosford, George, and Lane are anomalies. George is impacted by enrollment shifts at the New Columbia housing development. For some bandwidths the locally high grid values push the value for Sylvan high. Grades 06-08, 2009 Forecast Band Width = 500 100 200 300 400 MAPE = 10.5 11.6 Y = 0.97X 10.6 10.4 1.06X 0.99X 0.98X 1,200 All Sylvan Constant 1,000 Sylvan Sylvan Sylvan Reverse Confused 800 Forecast • Linear (All) Lane Lane 600 Hosford George Hosford George George 400 200 0 0 200 400 600 Actual 800 1,000 Compare grid and standard forecasts by bandwidth Grades 06-08, Grid vs Standard Band Width = 500 100 200 300 400 MAPE = 10.5 11.6 RSQ = 0.994 10.6 10.4 0.982 0.991 1,200 All Sylvan Constant 1,000 Sylvan Sylvan Sylvan Reverse Confused 800 Grid • Except for Sylvan the results of the grid and standard forecasts are quite similar at all bandwidths as shown by the MAPE and R2 values. Linear (All) 600 400 200 0 0 200 400 600 Standard 800 1,000 Mean absolute and algebraic error • • • • For an increase in bandwidth from 100 to 200 students the MAPE for MSAA’s: – Rises for reversal MSAA’s rises. It may seem counter intuitive, but we should expect a more efficient model to increase the error level. – Drops for confused (other) MSAA’s. The forecast for areas which lack a clear trend is improved. – Drops slightly for constant MSAA’s only drops slightly. For bandwidths greater than 200 students the MAPE does not vary greatly. The average number of KG-02 students in an MSAA was about 275. A bandwidth roughly the size for which the point data are re-aggregated appears to produce reasonable results. The MAPE for the grid and standard models appear to order the three growth trend classes in the same way but the grid model results in more contrast. MAPE for standard model Forecast for custom geography • Top 10% of tracts by 1990-2000 change in percent baccalaureate + education and MTP occupation (after David Ley). • Gentrified / not gentrified added to student point file. • Number and percent for students in gentrified areas summarized for actual 2003 and forecast 2009 enrollment. • Conclusion: Number and percent of grade 06-08 students living in gentifying areas declined from 2003-2009. Number Gentrified? No Yes Percent of Total enrolled Gentri- No fied? Yes Actual Forecast 2003 2009 9,877 8,075 1,851 1,275 11,728 9,349 Actual Forecast 2003 2009 84.2 86.4 15.8 13.6 100.0 100.0 Deconstructing the Grade Progression Ratio • Other common measures such as the capture rate can be calculated as well. • Capture rate is the number enrolled in the district’s schools compared to the number age eligible – for example kindergarteners divided by the age 5 population. Here is the capture rate for grades KG-02 in 2000, 2010, and a map of the change in the rate. And, again, the grade progression ratio using the same classes and colors. • • Conclusions • Neither the standard or grid based models produced a good enrollment forecast for the 2003-2009 period. During this time period there were major demographic shifts in the District that confounded forecasts based on early trends. • The grid based forecast was best for the MSAA’s that changed in a confused way. It was worst for turnaround MSAA’s. Bandwidth had little effect on the forecast for MSAA’s that grew or declined in a constant trend. • The smallest bandwidth of 100 students produced erratic results. Bandwidths over 200 students produced reasonable results with minor variations in MAPE for bandwidths between 200 and 500 students. • The effort involved in building the model was considerable, but the final workflow is simple and easily could be scripted. • We think that the adaptive bandwidth approach is better than a fixed distance bandwidth for this type of application. It would facilitate analysis and scripting if ESRI Spatial Analyst provided an adaptive bandwidth option for its kernel density tool. • The grid based GPR model may be less useful as a primary forecasting model than as an allocation tool to create forecasts for special areas, such as the example for gentrifying areas. Richard Lycan - lycand@pdx.edu You can download the latest PowerPoint file for this presentation at: http://www.pdx.edu/prc/news-and-presentations-from-the-population-research-center Charles Rynerson – rynerson@pdx.edu