Energy Market Prediction: Papers from the 2014 AAAI Fall Symposium Predicting Rooftop Solar Adoption Using Agent-Based Modeling Haifeng Zhang and Yevgeniy Vorobeychik Electrical Engineering and Computer Science Vanderbilt University Nashville, TN {haifeng.zhang, yevgeniy.vorobeychik}@vanderbilt.edu Joshua Letchford and Kiran Lakkaraju Sandia National Laboratories Livermore, CA {jletchf,klakkar}@sandia.gov Abstract Predicting solar demands for residential customers is challenging. First, human decision processes are essentially dynamic and nondeterministic. Second, network effects make consumer decisions more hard to anticipate. Lastly, environmental variables change over time, which make reliable prediction even more difficult. Fortunately, huge amount of data has been stored in modern databases, which could contain many significant information involving individual solar adoptions. Our modeling approach is featured in utilizing machine learning techniques to learn reliable models from the real data. Specifically, we rely on a logistic regression model, which defines the conditional probability of adoption for individuals given their observations over a set of attributes. However, the training process is not direct, since non-adopters in the data have fewer explicit attributes than adopters do. Thus, we need to apply a few estimations of these unknown variables for non-adopters. Moreover, to guarantee the prediction accuracy as well as model interoperability, we would better apply the standard methods of model selection, i.e., step-wise regression and lasso (l1 ) regularization. The regularized logistic regression model encompasses peer effect, net-presented values and housing characteristics. The goal of this study is to train an individual behavioral model and compose multiple agents into an environment to simulate and forecast solar adoptions in the future. In the following sections, we will demonstrate how this methodology is applied to forecast solar demands in a typical zip code area. A remarkable finding is that incentives appears to have little impact on adoption rates. This finally motivated us to seek alternative policies that could stimulate more solar adoptions. To this end we considered a free-solar seeding policy relying on peer effects. By comparing this policy with original incentive outcomes, we demonstrate that it is significantly more efficacious than the incentive layout with the same budget. In this paper we present a novel agent-based modeling methodology to predict rooftop solar adoptions in the residential energy market. We first applied several linear regression models to estimate missing variables for non-adopters, so that attributes of non-adopters and adopters could be used to train a logistic regression model. Then, we integrated the logistic regression model along with other predictive models into a multiagent simulation platform and validated our models by comparing the forecast of aggregate adoptions in a typical zip code area with its ground truth. This result shows that the agent-based model can reliably predict future adoptions. Finally, based on the validated agent-based model, we compared the outcome of a hypothesized seeding policy with the original incentive plan, and investigated other alternative seeding policies which could lead to more adopters. Introduction The rooftop solar market in the US has experienced explosive growth over last decade. This is partly because of the government’s incentive programs, which effectively reduced the solar system cost and successfully created number of early adopters. Given what we knew about residential consumers and how they responded to incentives and other economic and social factors, can we reliably predict solar adoptions in the future? The problem is inspiring since predictions could become valuable resources for policy makers who are seeking more efficient policies to promote solar adoptions. The predictive models may also exhibit important patterns involving individuals’ decisions on solar products. If some of the patterns can be confirmed, a policy maker would consider to reinforce them and therefore attain higher solar adoptions. Our modeling methodology is motivated by these scenarios, which first aims to derive individual models of decisions, and then simulate joint behaviors in an energy market. Predictive Modeling We provide complete details of the modeling approach in this section. c 2014, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved. 44 Data • bedrooms(numeric): The number of bedrooms in the house We mainly used two datasets: the California Solar Initiative (CSI) dataset (shared by the California Center for Sustainable Energy and tailored to only residential customers in San Diego County) and assessor dataset (San Diego County only). The CSI dataset includes information of various aspects of a typical solar photovoltaics (PV) project, such as, system size (usually CSI rating), price, incentive amount, ownership type (i.e., lease or buy) and several important dates to determine solar adoption and installation. The CSI data covers completed projects since May 2007 through April 2013 (about 6 years, 8500 adopters). The assessor data stores comprehensive housing characteristics for the residential sector (about 440955 households in San Diego County), including square footage, acreage, number of bathroom and car storage etc. The CSI data and assessor data were merged to comply with our modeling framework. Using the final adoption information we were able to generate historical observations for each individual. In details, for a given month m, if someone’s adoption month was after m, we construct an observation with negative label (i.e., ”did not adopt in this month”). If one’s adoption month happens to be the same month we are looking at, we construct a positive observation labeled as ”adopt” instead. The positive and negative cases can be encoded by a binary class, i.e., 1 for “adopted” and 0 for “did not adopt”, in order to fit a logistic regression model. Once we expand individual entries by month, some original information, such as, incentive and cost will no longer be valid since those variables change over time. But, we assume no changes for home characteristics. We used simple linear regression models to estimate those unknown variables from the data. The assumption behind the practice is that we suppose non-adopters make decisions over same set of variables as adopter do, i.e., cost, economic benefits, and peer effect measures etc. Once those unknown variables are estimated, we are ready to train the logistic regression model. Note that, all models were trained on data spanning the first four years, while empirical adoptions in the last two years were used only for evaluation. Our previous modeling efforts revealed that one variable involving system size is critical in determination of system cost. As a result, our first step is to estimate this variable. • baths(numeric): The number of bathrooms in the house • pool(binary): 1 if the property has a private pool, otherwise 0 • numCarStorage(numeric): The sum of garage space and private parking spaces • totalLvg(numeric): The total number of livable square feet • parView(binary): 1 if the property is designated as having a particularly pleasant (valuable) view, otherwise 0 • acreage(binary): 1 if the property has at least .25 acres,otherwise 0 • aveKwh(numeric): Average electricity utilization of the zip that a house belongs to, measured in kilo watt per hour We fitted adopter sub-dataset with these variables, and stepwisely eliminate those unimportant variables, i.e., step-wise regression in statistics. The final features and their coefficients are listed in Table 1. Table 1: CSI Rating Linear Model Predictor (Intercept) ownerocc pool totalLvg acreage aveKwh Estimate 1.592e+00 -2.547e-01 6.315e-01 7.582e-04 1.319e+00 8.249e-04 Std. Error 1.553e-01 1.149e-01 7.658e-02 3.385e-05 8.507e-02 1.913e-04 t value 10.255 -2.217 8.247 22.403 15.505 4.311 Pr(> |t|) < 2e − 16 0.0267 < 2e − 16 < 2e − 16 < 2e − 16 1.66e − 05 The Adjusted R-squared is about 0.27, which seems pretty low. Limited by our data we can get, this is the highest we can get using linear regression. Although a solar system is typical sized to a scale to compensate one’s electric usage of higher tiers, there are many reasons why a system may be sized differently either smaller or larger in terms of how individual maximizes his internal rate of return (McAllister 2012). Fortunately, the subsequent modeling efforts show that the low R-squared actually does not affect our final prediction dramatically. Ownership Cost CSI Rating The two primary ways to have a solar system installed in one’s house are buying and leasing. For people who plan to purchase a solar system, they will usually face a large down payment and eventually own the system. For people who however choose to lease, they could have zero down payment, but they do not own the system. Choice between lease or buy is an interesting research question in econometrics, also, which is very common in many markets, such as, automobiles, house etc (Rai and Sigrin 2013). Ownership cost is the actual payment made by a solar PV owner. This information is available in the CSI program database. Note that, in practice, some owners may choose to finance a solar PVs system through bank loans. However, for now, we do not deal with this layer of complication. CSI rating (Commission 2013) is a key parameter that measures the size and efficiency of PV system to be installed. The choice of system size is usually the first step to go for solar, but for a non-adopter, this value is unknown. We estimated the CSI rating based on the home characteristics, such as, living square area, pool, bath, car storage etc. In specific, candidate variables involve all available assessor features, such as, • totalVal(numeric): Total property value including both land and building values • ownerocc(binary): 1 if the property is occupied by the owner, otherwise 0 45 Initial candidate variables used to train the ownership cost model involve all assessor features and aveKwh, CSI rating and peer effect measures, which are described as follows, Table 2: Ownership Cost Linear Model Predictor Coefficient (Intercept) 1.138391e+04 totalVal 7.377731e-04 totalLvg 1.518842e-01 csirating 6.213036e+03 totalAdoptSD -1.062339e+00 • numAdopt: The number of completed CSI applications by the current month in the zip code the house belongs to • numInstall: The number of completed PV installations by the current month in the zip code the house belongs to • fracAdopt: The fraction of houses that have completed CSI applications in the zip code the house belongs to 50 percent of new adopters who chose to lease PV systems. Since choice between lease and buy involves complicated financial decisions, which is out of scope the paper, we only focus on a more general type of adoption. However, because lease is becoming an appealing option in today’s solar energy market, we included relevant variables in our individual model. In particular, we calculated the total leasing cost as follows: • fracInstall: The fraction of houses that have completed PV installations in the zip code the house belongs to • num8Mile: The number of installations within eight mile radius around the house • num4Mile: The number of installations within four mile radius around the house • num2Mile: The number of installations within two mile radius around the house LeaseCost = c0 + • numMile: The number of installations within one mile radius around the house c1m ∗ 12 ∗ (1 − β ∗ (1 + ξ))L 1 − β ∗ (1 + ξ) (1) where, • c0 : upfront cost or lease down payment • c1m : monthly payment in first year • ξ: annual escalation rate • L: lease contract length in years • β: discounting factor, use 0.95. These variables above are stated clearly in the solar lease contracts. We extracted the information from a sample (about 70 individuals) of lease contracts and fitted a linear regression model with same set of features as we used for ownership cost model plus incentive. Moreover, the linear model used l1 regularization in order to pull out the most significant features from correlated features. This also helped to avoid model over-fitting and improve predictive efficacy. We followed the one standard error rule discussed in the previous section, and finally chose a model with only one feature, that is CSI rating. The coefficients of this model is given in Table 3. • numHalfMile: The number of installations within half mile radius around the house • numFourthMile: The number of installations within fourth mile radius around the house • numEighthMile: The number of installations within eighth mile radius around on the house • baseline: electricity baseline which determined by the utility company, i.e. SDGE • aveKwhExcessTier2: average kilo watt hours which exceeds tier 2 threshold • totalInstallSD: total number of PV installations by the current month in San Diego County • totalAdoptSD: total CSI program application by the current month in San Diego County Intuitively, the above variables are highly correlated, which can be problematic for linear regression. Appropriate feature selection methods need to be applied. We applied l1 regularization, also known as lasso penalty (Friedman, Hastie, and Tibshirani 2001), to the regular linear ownership cost model. Moreover, parameter lambda, the weight of regularization item in the objective function, is determined by standard cross validation. We followed the one standard error rule (Friedman, Hastie, and Tibshirani 2001) to pick lambda, by which we choose a lambda as large as possible but the cross validation error is no more than one standard error of the best model. Then, we fitted our full training data with the selected lambda and obtained a linear model with coefficients shown as in Table 2. Table 3: Lease Cost Linear Model Predictor Coefficient (Intercept) 10446.832 csirating 1658.389 Estimation of Missing Variables After we trained CSI rating, ownership cost and least cost models, we estimated the CSI ratings, ownership cost and lease cost for all non-adopters. To comply with consumer decision theory in literatures, we calculated net present values of lease and ownership separately. The net present value of ownership (denoted by N P V.own) is computed as follows. N P V.own = I − C o + B (2) Lease Cost The solar energy market has changed dramatically since early 2008, when solar leases first came into being. Empirical data of year 2012 reveal that there had been more than 46 where, I is incentive, C o is ownership cost and B is solar economic benefits. The net present value of lease (denoted by N P V.own) is computed as follows. N P V.lease = −C l + B (3) where, C l is lease cost and B is solar economic benefits. Notice that, since lease customers do not receive solar incentives (instead, installers who purchase the system will), they are not included in the calculation of lease net present value. In addition, solar benefits are computed as follows. For months before year 2009, we have (eT 1 + eT 2 ) ∗ 12 ∗ 0.95 + 1 − 0.95 (eT 3 + eT 4 + eT 5 ) ∗ 12 ∗ 0.95 ∗ 1.035 1 − 0.95 ∗ 1.035 SolarBenef its = (4) For months in and after year 2009, we have T1 Figure 1: Logistic Regression CV errors T2 + e ) ∗ 12 ∗ 0.95 ∗ 1.01 + 1 − 0.95 ∗ 1.01 (5) (eT 3 + eT 4 + eT 5 ) ∗ 12 ∗ 0.95 ∗ 1.035 1 − 0.95 ∗ 1.035 SolarBenef its = (e Logistic Regression Notice many variables are correlated, a direct use of these features as a whole can get us model with misleading coefficients. A logistic regression model with lasso regularization was trained on a sample (30% of entire training data, around 6841501 data entries) of the training data. Due to the size of training data, we only applied 5 folds in cross-validation. The cross validation error is shown in Figure 1. By one standard error rule, we chose the largest lambda with cross validation error within one standard error of the best model. The coefficients of this model are listed in Table 4. Both equations suppose the solar economic benefit can last for infinite number of years and a 3.5% of annual growth rate for tier 3, 4 and 5 consumption groups, which is approximately the average growth rate for these tiers over the last 10 years.1 . However, they differ on the first part, where we assume no change in tier 1 and 2 electricity consumption categories prior to year 2009, however, 1% for months since year 2009. We then used the actual electricity rates to calculate solar benefits according to the following equation: M onthlySolarT ieredBenef its(et ) = rt ∗ Seot t ∈ {T 1, T 2, T 3, T 4, T 5} Table 4: Logistic Regression Model Predictor Coefficient (Intercept) -8.814706e+00 ownerocc 2.503317e-01 fracInstall 1.627659e+01 NPV.own 2.994175e-06 NPV.lease 7.020925e-06 (6) where rt is tiered electricity rate and SEOt is the part solar electricity output fallen into tier t, total solar electricity output is computed as follows, Seo = CSIrating ∗ Hrssun (7) where, Hrssun denotes the full sun hours (that is the yearly average amount of solar insolation, 5 hours is used for San Diego County).2 Finally, other than the net present values for ownership and lease, we also added a few dummy variables, such as, season indicators and lease availability indicator. Our final logistic regression model was trained on the extended set of features. The lasso regularization provides us with a sparse model. Notice first that the most important feature in the group of peer effects is at zip code level, but not the mile-based radius (weak measures are shrunk to zero coefficient). As the cost of computing mile-based peer effect measures usually increases by the scale of population, the benefit of shrinkage regularization is also apparent. Finally, N P V.lease seems stronger than N P V.own, which is informative since a large portion of the solar market involves leased systems. 1 Some is available at SDG&E website, http://www.sdge.com/. Note that actual tier may vary over years, we have converted non 5-tier rates into 5-tier rates 2 Sun Hours/Day Zone Solar Insolation Map, see http://www.wholesalesolar.com/InformationSolarFolder/SunHoursUSMap.html Agent-based Simulation The models of CSI rating, ownership cost and lease cost and logistic regression model of solar adoption propensity were 47 composed in a widely-used, open-source agent-based simulation toolkit, Repast.3 Agents The main type of agent is named ”household”, which represents household entity in the residential solar market. It further derives two sub-types, ”adopter” and ”non-adopter”. In addition, in order to flexibly control the execution of simulation, we defined a special type agent called ”updater”, which is responsible for updating attributes of household agents at each time step. Time Step At each tick of the simulation, updater agent first updates predictors, i.e., ownership and lease cost, incentive and NPVs etc for all agents based on the state of world. Lease and ownership cost are estimated by the lease and ownership cost models respectively, and incentives imitated original CSI program rate schedule, i.e., a step function of total solar kilo watts have been reserved in San Diego County. Next, non-adopter agents compute adoption likelihood given a set of attributes shown as in the logistic regression model. Stochastically, a non-adopter agent draws a random number between 0 to 1 and compares it with its own adoption likelihood to decide whether if to adopt or not. The exactly simulate a random event (in this case solar adoption) under certain probability. In details, if the number is less than the probability given by logistic regression model, it will choose to adopt a solar system. If an agent chooses to adopt, consequently, we technically remove the non-adopter and add a new adopter into the environment. Moreover, when we create a new adopter, we also assign an installation period of the solar system, adopter agents will not update the number of installation until installation is completed. The number of months to be taken for the solar adoption to become a visible system is uniformly distributed in 1 to 6, reflecting the typical installation range in the CSI data. At the end of every time step, data can be collected and output to an user-specified file. Typical data we collected were aggregate adoption and system cost. Figure 2: Average Adoptions Figure 3: Likelihood Ratio We took the individuals from a representative zip code in San Diego county (approximately 13000 households) and initialized the simulation with their assessors features as well as adoption states, i.e., who has adopted or who has not. We started the simulation beyond the time we trained the predictive models. As our agent-based simulation is highly stochastic in its nature, we averaged results of 1000 runs for both lasso and baseline models. The results of average adoptions by month is illustrated in Figure 2. The results generated by lasso model is more close to the real path than the baseline model. Formally, we traced likelihood of each model in each run, and computed average likelihood ratio (lasso/baseline) of all sample runs for each month. As shown in Figure 3, the ratio is generally much larger than 1, suggesting that the lasso model significantly outperforms the baseline. We varied original CSI incentive rates, i.e., multiplying original rates with 2, 4, 8 and 16 etc and holding the same Experiment Results We present the final ABM results in this section. To validate our modeling approach, we compared it with a baseline model, which is a non-regularizated logistic regression model with only three features. Its coefficients are shown as in Table 5. Table 5: Baseline Logistic Regression Model Predictor Coefficient (Intercept) -8.858e+00 fracInstall 1.313e+02 NPV.own 1.142e-05 3 Repast home page, http://repast.sourceforge.net/ 48 Figure 4: Varying Incentive Rates Figure 5: Expected Adoptions Incentives vs. Seeding amount of targeted mega watts in each step, and compared the expected adoption with outcomes of original CSI incentive rate structure. Although a single simulation run takes a few seconds, large number (say 1000) of sample runs would take couple hours. To cope with the computational difficulty, we therefore introduce a heuristic measure to approximate expected number of adopters of very large sample runs. We called the heuristic ”average step”. As its name suggests, the simulation will proceed in special a manner: it generates multiple (typically 1000) instances of one-step runs simultaneously at each time step, but only the one close to the average number of adopters becomes the ”true” outcome. If a simulation is run in ”average step” mode, any step can be considered as an average outcome of the previous step. Using this new measure, the time of computing utility of a policy has been reduced from hours to minutes. The ”average step” function is enabled by the ”updater” agent mentioned earlier. We applied the simple ”average-step” heuristic idea and hoped to learn insights of how the predictive model react to incentive changes. The average adoption over 10 sample (”average step”) runs for each month v.s. original incentive plans is shown in Figure 4. This impact of incentives seems very limited, that suggests incentive may not major concern when people are considering install solar PV systems. Moreover, the interesting result also encouraged us to think of alternative policies which could be more efficacious than current incentive schemes. This is discussed in the next section. to give away more free systems under a fixed budget. We compared incentives with a seeding policy that seeds all free solar systems at very beginning subject to identical budget constraints. A comparison of average adoptions between the seeding policy and incentives can be seen in Figure 5. The plot shows seeding policy with the same budget as spent on original incentives can generate more adopters. Also, as we increase available budget of the seeding policy, more adopters can be induced. An interesting question is whether if it is optimal to seed all budget at time zero or the time period right before the last period or some fractional policy in between. We run our simulation with different splits of fixed budget between time period t=0 and t=T-1, and searched for optimal split which leaded to maximum number adopters. Similarly, we tested different budgets regarding this problem. The total adoptions i.e., seeded and unseeded adopters are illustrated in Figure 6, where alpha is the fraction of budget used in period t=T-1. For smaller budgets, seeding at period t=T-1 is more efficacious than t=0. However, for larger budget we get an internal optimal solution (some fraction of the budget should be used at time 0, and the rest in the penultimate period). The intuition behind this internal optimal solution is seeding early peer effect can last longer, whereas, seeding later can give away more free systems as cost decreases over time. The two effects work oppositely, that implies the optimal alpha could be between 0 and 1. When budget is small, peer effects are not strong enough, since only a few individuals can get free solar PVs. However, large budget can create much stronger peer effects, which makes a fractional lambda be optimal. A further examination of adoptions not including individuals who were seeded is shown in Figure 7. It suggests that for same budget the more one splits expenditure to t=0, the stronger the peer effects will be and therefore the more Seeding Strategy A seeding policy assumes that we can give away free solar PV systems to qualified individuals, which is expected to stimulate further adoption through peer effects. This is motivated by the fact that peer effect is a stronger predictor of adoption than the net present value. Eligibility for a free solar system is assumed to be determined by system cost. We implemented a scheme that gives free solar to households with lowest system costs. This makes sense because it tends 49 niques (Friedman, Hastie, and Tibshirani 2001). Logistic regression (Bishop and others 2006), an extension of linear regression, has been successfully deployed in variety of applications. Agent-based modeling (ABM) is recognized as a powerful tool to understand and analyze phenomena in complex systems (Bonabeau 2002). It’s one kind of micro-scale model, bottom-up approach that captures the simultaneous interactions of multiple agents in an attempt to recreate and predict macro system-level events (Gustafsson and Sternad 2010). An ”agent” in ABM is any autonomous entity with its own properties and behaviors. Due to the definition, ABM is considered as the most natural way to model more realistic systems, say markets. Unfortunately, acceptance of this method in traditional research communities is slow. People criticize that ABM does not handle real data but toy problem. Nevertheless, if agent-based models are properly validated, they can add a layer of realism that is not captured by many analytical models (Rand and Rust 2011). The highlight of our work is embedding statistical models learned from real data into multi-agent simulation and empirically validate the agent base model. To our knowledge, such a rigorous use of ABM in marketing research is rarely seen. Marketing research in solar market typically focuses on modeling diffusion processes. It often relies on diffusion models (Bass 2004; Geroski 2000; Rao and Kishore 2010) with aggregate data to make predictions on adoption of a new product or technology. These models do provide us with intuitions but are seldom applicable to real-world data. Besides, econometrics research provides our insights of what affect individual decisions on solar products. Particularly, peer effects has been extensively studied in literatures to model interactions of individual decisions in a social perspective. For solar PV market, researcher developed method to identify peer effect(Bollinger and Gillingham 2012). Moreover, individual decisions on solar technology can have impact on some system variable, i.e., system cost. Learning by doing (Arrow 1962; Harmon 2000; McDonald and Schrattenholzer 2001) was introduced to account for the externalities in terms of how system cost evolve over time. The inclusion of net present value in our individual model is due to rationality assumption of individuals who treat solar purchasing as a type of investment. A few closed work are worth mentioning. For example, Zhai and Williams developed a fuzzy logic model, which related purchasing probability with variables, such as, perceived cost, perceived maintenance requirement, and environmental concern. An deterministic consumer choice model was developed by Lobel and Perakis to forecast aggregate solar adoptions. Robinson et al. developed an agentbased model which explicitly modeled social interactions in a spatial context using GIS data. Our work differs from these previous work in several aspects. First, our models are built on richer and larger data set, which include up to date characteristics of solar market. Second, we are targeted in prediction rather than descriptive data analysis. Third, the agent-based model we developed is highly stochastic and non deterministic, which we expect to account for uncertainties in individual decisions. Finally, Figure 6: Expected Adoptions Figure 7: Expected Adoptions Non-seeded Individuals adopters induced finally. It also indicates peer effects increase as available budget increases. These initial findings based on our validated agent-based model provided insights of timing strategies in seeding policy. However, the search space of the optimal seeding policy problem is still large, i.e., how about seeding in the months between t=0 and t=T1? Does seeding to lowest cost individuals guarantee an optimal solution? These are all interesting questions worth deep investigation in the future. Related Work Our predictive modeling methodology is highly influenced by methods in fields of machine learning and statistics. Feature selection (Guyon and Elisseeff 2003) is a common problem in almost any data analysis with a large set of variables. Stepwise regression, lasso regularization with tuned parameter by cross validation are widely-employed standard tech- 50 our experiments on optimal seeding policies compared with original incentives are rarely seen in the literatures so far. Guyon, I., and Elisseeff, A. 2003. An introduction to variable and feature selection. The Journal of Machine Learning Research 3:1157–1182. Harmon, C. 2000. Experience curves of photovoltaic technology. Laxenburg, IIASA 17. Kwan, C. L. 2012. Influence of local environmental, social, economic and political variables on the spatial distribution of residential solar pv arrays across the united states. Energy Policy 47:332–344. Lobel, R., and Perakis, G. 2011. Consumer choice model for forecasting demand and designing incentives for solar technology. Social Science Research Network, MIT, Cambridge. McAllister, J. A. 2012. Solar adoption and energy consumption in the residential sector. McDonald, A., and Schrattenholzer, L. 2001. Learning rates for energy technologies. Energy policy 29(4):255–261. Rai, V., and Sigrin, B. 2013. Diffusion of environmentallyfriendly energy technologies: buy versus lease differences in residential pv markets. Environmental Research Letters 8(1):014022. Rand, W., and Rust, R. T. 2011. Agent-based modeling in marketing: Guidelines for rigor. International Journal of Research in Marketing 28(3):181–193. Rao, K. U., and Kishore, V. 2010. A review of technology diffusion models with special reference to renewable energy technologies. Renewable and Sustainable Energy Reviews 14(3):1070–1078. Robinson, S. A.; Stringer, M.; Rai, V.; and Tondon, A. 2013. Gis-integrated agent-based model of residential solar pv diffusion. Zhai, P., and Williams, E. D. 2012. Analyzing consumer acceptance of photovoltaics (pv) using fuzzy logic model. Renewable Energy 41:350–357. Conclusion In summary, we claim two major contributions in this paper. First, we developed a reliable agent-based model to predict solar demands in the residential market. Second, we proposed an alternative policy which could potentially outperforms the current CSI incentive scheme. Despite of the accomplishment, in the future, we still need to address a few problems. We have demonstrated in the paper that ABM successfully forecast adoptions in a zip code area with thousands individuals. More convincing work would need us to run ABM for the entire studied area, San Diego county, which has about 80 times of current population. Second, our model can be further improved by adding more meaningful features. For example, human decisions on solar products is also influenced by environmental, social, economic and political variables (Kwan 2012; Gromet, Kunreuther, and Larrick 2013). Also, better prediction on CSI rating could dramatically improve overall modeling processes. Finally, the minor impact of incentive suggests that the relationship between individual adoptions and subsidies might need further investigation. References Arrow, K. J. 1962. The economic implications of learning by doing. The review of economic studies 155–173. Bass, F. M. 2004. Comments on ?a new product growth for model consumer durables the bass model?. Management science 50(12 supplement):1833–1840. Bishop, C. M., et al. 2006. Pattern recognition and machine learning, volume 1. springer New York. Bollinger, B., and Gillingham, K. 2012. Peer effects in the diffusion of solar photovoltaic panels. Marketing Science 31(6):900–912. Bonabeau, E. 2002. Agent-based modeling: Methods and techniques for simulating human systems. Proceedings of the National Academy of Sciences of the United States of America 99(Suppl 3):7280–7287. Commission, C. P. U. 2013. California solar initiative program handbook. Friedman, J.; Hastie, T.; and Tibshirani, R. 2001. The elements of statistical learning, volume 1. Springer Series in Statistics New York. Geroski, P. A. 2000. Models of technology diffusion. Research policy 29(4):603–625. Gromet, D. M.; Kunreuther, H.; and Larrick, R. P. 2013. Political ideology affects energy-efficiency attitudes and choices. Proceedings of the National Academy of Sciences 110(23):9314–9319. Gustafsson, L., and Sternad, M. 2010. Consistent micro, macro and state-based population modelling. Mathematical biosciences 225(2):94–107. 51