Predicting Rooftop Solar Adoption Using Agent-Based Modeling

Energy Market Prediction: Papers from the 2014 AAAI Fall Symposium
Predicting Rooftop Solar Adoption Using Agent-Based Modeling
Haifeng Zhang and Yevgeniy Vorobeychik
Electrical Engineering and Computer Science
Vanderbilt University
Nashville, TN
{haifeng.zhang, yevgeniy.vorobeychik}@vanderbilt.edu
Joshua Letchford and Kiran Lakkaraju
Sandia National Laboratories
Livermore, CA
{jletchf,klakkar}@sandia.gov
Abstract
Predicting solar demands for residential customers is
challenging. First, human decision processes are essentially
dynamic and nondeterministic. Second, network effects
make consumer decisions more hard to anticipate. Lastly,
environmental variables change over time, which make reliable prediction even more difficult. Fortunately, huge
amount of data has been stored in modern databases, which
could contain many significant information involving individual solar adoptions. Our modeling approach is featured in
utilizing machine learning techniques to learn reliable models from the real data. Specifically, we rely on a logistic
regression model, which defines the conditional probability
of adoption for individuals given their observations over a
set of attributes. However, the training process is not direct,
since non-adopters in the data have fewer explicit attributes
than adopters do. Thus, we need to apply a few estimations of these unknown variables for non-adopters. Moreover, to guarantee the prediction accuracy as well as model
interoperability, we would better apply the standard methods
of model selection, i.e., step-wise regression and lasso (l1 )
regularization. The regularized logistic regression model
encompasses peer effect, net-presented values and housing
characteristics.
The goal of this study is to train an individual behavioral
model and compose multiple agents into an environment to
simulate and forecast solar adoptions in the future. In the
following sections, we will demonstrate how this methodology is applied to forecast solar demands in a typical zip
code area. A remarkable finding is that incentives appears to
have little impact on adoption rates. This finally motivated
us to seek alternative policies that could stimulate more solar adoptions. To this end we considered a free-solar seeding policy relying on peer effects. By comparing this policy
with original incentive outcomes, we demonstrate that it is
significantly more efficacious than the incentive layout with
the same budget.
In this paper we present a novel agent-based modeling methodology to predict rooftop solar adoptions in
the residential energy market. We first applied several
linear regression models to estimate missing variables
for non-adopters, so that attributes of non-adopters and
adopters could be used to train a logistic regression
model. Then, we integrated the logistic regression
model along with other predictive models into a multiagent simulation platform and validated our models
by comparing the forecast of aggregate adoptions in a
typical zip code area with its ground truth. This result shows that the agent-based model can reliably predict future adoptions. Finally, based on the validated
agent-based model, we compared the outcome of a hypothesized seeding policy with the original incentive
plan, and investigated other alternative seeding policies
which could lead to more adopters.
Introduction
The rooftop solar market in the US has experienced explosive growth over last decade. This is partly because of
the government’s incentive programs, which effectively reduced the solar system cost and successfully created number of early adopters. Given what we knew about residential
consumers and how they responded to incentives and other
economic and social factors, can we reliably predict solar
adoptions in the future? The problem is inspiring since predictions could become valuable resources for policy makers who are seeking more efficient policies to promote solar
adoptions. The predictive models may also exhibit important patterns involving individuals’ decisions on solar products. If some of the patterns can be confirmed, a policy
maker would consider to reinforce them and therefore attain
higher solar adoptions. Our modeling methodology is motivated by these scenarios, which first aims to derive individual models of decisions, and then simulate joint behaviors in
an energy market.
Predictive Modeling
We provide complete details of the modeling approach in
this section.
c 2014, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
44
Data
• bedrooms(numeric): The number of bedrooms in the
house
We mainly used two datasets: the California Solar Initiative (CSI) dataset (shared by the California Center for Sustainable Energy and tailored to only residential customers in
San Diego County) and assessor dataset (San Diego County
only). The CSI dataset includes information of various aspects of a typical solar photovoltaics (PV) project, such as,
system size (usually CSI rating), price, incentive amount,
ownership type (i.e., lease or buy) and several important
dates to determine solar adoption and installation. The CSI
data covers completed projects since May 2007 through
April 2013 (about 6 years, 8500 adopters). The assessor
data stores comprehensive housing characteristics for the
residential sector (about 440955 households in San Diego
County), including square footage, acreage, number of bathroom and car storage etc.
The CSI data and assessor data were merged to comply
with our modeling framework. Using the final adoption information we were able to generate historical observations
for each individual. In details, for a given month m, if someone’s adoption month was after m, we construct an observation with negative label (i.e., ”did not adopt in this month”).
If one’s adoption month happens to be the same month we
are looking at, we construct a positive observation labeled as
”adopt” instead. The positive and negative cases can be encoded by a binary class, i.e., 1 for “adopted” and 0 for “did
not adopt”, in order to fit a logistic regression model. Once
we expand individual entries by month, some original information, such as, incentive and cost will no longer be valid
since those variables change over time. But, we assume no
changes for home characteristics. We used simple linear regression models to estimate those unknown variables from
the data. The assumption behind the practice is that we suppose non-adopters make decisions over same set of variables
as adopter do, i.e., cost, economic benefits, and peer effect
measures etc. Once those unknown variables are estimated,
we are ready to train the logistic regression model. Note
that, all models were trained on data spanning the first four
years, while empirical adoptions in the last two years were
used only for evaluation.
Our previous modeling efforts revealed that one variable
involving system size is critical in determination of system
cost. As a result, our first step is to estimate this variable.
• baths(numeric): The number of bathrooms in the house
• pool(binary): 1 if the property has a private pool, otherwise 0
• numCarStorage(numeric): The sum of garage space and
private parking spaces
• totalLvg(numeric): The total number of livable square
feet
• parView(binary): 1 if the property is designated as having
a particularly pleasant (valuable) view, otherwise 0
• acreage(binary): 1 if the property has at least .25
acres,otherwise 0
• aveKwh(numeric): Average electricity utilization of the
zip that a house belongs to, measured in kilo watt per hour
We fitted adopter sub-dataset with these variables, and stepwisely eliminate those unimportant variables, i.e., step-wise
regression in statistics. The final features and their coefficients are listed in Table 1.
Table 1: CSI Rating Linear Model
Predictor
(Intercept)
ownerocc
pool
totalLvg
acreage
aveKwh
Estimate
1.592e+00
-2.547e-01
6.315e-01
7.582e-04
1.319e+00
8.249e-04
Std. Error
1.553e-01
1.149e-01
7.658e-02
3.385e-05
8.507e-02
1.913e-04
t value
10.255
-2.217
8.247
22.403
15.505
4.311
Pr(> |t|)
< 2e − 16
0.0267
< 2e − 16
< 2e − 16
< 2e − 16
1.66e − 05
The Adjusted R-squared is about 0.27, which seems pretty
low. Limited by our data we can get, this is the highest we
can get using linear regression. Although a solar system is
typical sized to a scale to compensate one’s electric usage
of higher tiers, there are many reasons why a system may
be sized differently either smaller or larger in terms of how
individual maximizes his internal rate of return (McAllister
2012). Fortunately, the subsequent modeling efforts show
that the low R-squared actually does not affect our final prediction dramatically.
Ownership Cost
CSI Rating
The two primary ways to have a solar system installed in
one’s house are buying and leasing. For people who plan to
purchase a solar system, they will usually face a large down
payment and eventually own the system. For people who
however choose to lease, they could have zero down payment, but they do not own the system. Choice between lease
or buy is an interesting research question in econometrics,
also, which is very common in many markets, such as, automobiles, house etc (Rai and Sigrin 2013).
Ownership cost is the actual payment made by a solar PV
owner. This information is available in the CSI program
database. Note that, in practice, some owners may choose
to finance a solar PVs system through bank loans. However, for now, we do not deal with this layer of complication.
CSI rating (Commission 2013) is a key parameter that measures the size and efficiency of PV system to be installed.
The choice of system size is usually the first step to go for
solar, but for a non-adopter, this value is unknown. We estimated the CSI rating based on the home characteristics, such
as, living square area, pool, bath, car storage etc. In specific,
candidate variables involve all available assessor features,
such as,
• totalVal(numeric): Total property value including both
land and building values
• ownerocc(binary): 1 if the property is occupied by the
owner, otherwise 0
45
Initial candidate variables used to train the ownership cost
model involve all assessor features and aveKwh, CSI rating
and peer effect measures, which are described as follows,
Table 2: Ownership Cost Linear Model
Predictor
Coefficient
(Intercept)
1.138391e+04
totalVal
7.377731e-04
totalLvg
1.518842e-01
csirating
6.213036e+03
totalAdoptSD -1.062339e+00
• numAdopt: The number of completed CSI applications
by the current month in the zip code the house belongs to
• numInstall: The number of completed PV installations by
the current month in the zip code the house belongs to
• fracAdopt: The fraction of houses that have completed
CSI applications in the zip code the house belongs to
50 percent of new adopters who chose to lease PV systems.
Since choice between lease and buy involves complicated financial decisions, which is out of scope the paper, we only
focus on a more general type of adoption. However, because lease is becoming an appealing option in today’s solar
energy market, we included relevant variables in our individual model. In particular, we calculated the total leasing
cost as follows:
• fracInstall: The fraction of houses that have completed
PV installations in the zip code the house belongs to
• num8Mile: The number of installations within eight mile
radius around the house
• num4Mile: The number of installations within four mile
radius around the house
• num2Mile: The number of installations within two mile
radius around the house
LeaseCost = c0 +
• numMile: The number of installations within one mile
radius around the house
c1m ∗ 12 ∗ (1 − β ∗ (1 + ξ))L
1 − β ∗ (1 + ξ)
(1)
where,
• c0 : upfront cost or lease down payment
• c1m : monthly payment in first year
• ξ: annual escalation rate
• L: lease contract length in years
• β: discounting factor, use 0.95.
These variables above are stated clearly in the solar lease
contracts. We extracted the information from a sample
(about 70 individuals) of lease contracts and fitted a linear
regression model with same set of features as we used for
ownership cost model plus incentive. Moreover, the linear
model used l1 regularization in order to pull out the most significant features from correlated features. This also helped
to avoid model over-fitting and improve predictive efficacy.
We followed the one standard error rule discussed in the
previous section, and finally chose a model with only one
feature, that is CSI rating. The coefficients of this model is
given in Table 3.
• numHalfMile: The number of installations within half
mile radius around the house
• numFourthMile: The number of installations within
fourth mile radius around the house
• numEighthMile: The number of installations within
eighth mile radius around on the house
• baseline: electricity baseline which determined by the
utility company, i.e. SDGE
• aveKwhExcessTier2: average kilo watt hours which exceeds tier 2 threshold
• totalInstallSD: total number of PV installations by the
current month in San Diego County
• totalAdoptSD: total CSI program application by the current month in San Diego County
Intuitively, the above variables are highly correlated,
which can be problematic for linear regression. Appropriate
feature selection methods need to be applied. We applied
l1 regularization, also known as lasso penalty (Friedman,
Hastie, and Tibshirani 2001), to the regular linear ownership
cost model. Moreover, parameter lambda, the weight of regularization item in the objective function, is determined by
standard cross validation.
We followed the one standard error rule (Friedman,
Hastie, and Tibshirani 2001) to pick lambda, by which we
choose a lambda as large as possible but the cross validation error is no more than one standard error of the best
model. Then, we fitted our full training data with the selected lambda and obtained a linear model with coefficients
shown as in Table 2.
Table 3: Lease Cost Linear Model
Predictor Coefficient
(Intercept) 10446.832
csirating
1658.389
Estimation of Missing Variables
After we trained CSI rating, ownership cost and least cost
models, we estimated the CSI ratings, ownership cost and
lease cost for all non-adopters. To comply with consumer
decision theory in literatures, we calculated net present values of lease and ownership separately. The net present value
of ownership (denoted by N P V.own) is computed as follows.
N P V.own = I − C o + B
(2)
Lease Cost
The solar energy market has changed dramatically since
early 2008, when solar leases first came into being. Empirical data of year 2012 reveal that there had been more than
46
where, I is incentive, C o is ownership cost and B is solar
economic benefits. The net present value of lease (denoted
by N P V.own) is computed as follows.
N P V.lease = −C l + B
(3)
where, C l is lease cost and B is solar economic benefits.
Notice that, since lease customers do not receive solar incentives (instead, installers who purchase the system will),
they are not included in the calculation of lease net present
value. In addition, solar benefits are computed as follows.
For months before year 2009, we have
(eT 1 + eT 2 ) ∗ 12 ∗ 0.95
+
1 − 0.95
(eT 3 + eT 4 + eT 5 ) ∗ 12 ∗ 0.95 ∗ 1.035
1 − 0.95 ∗ 1.035
SolarBenef its =
(4)
For months in and after year 2009, we have
T1
Figure 1: Logistic Regression CV errors
T2
+ e ) ∗ 12 ∗ 0.95 ∗ 1.01
+
1 − 0.95 ∗ 1.01
(5)
(eT 3 + eT 4 + eT 5 ) ∗ 12 ∗ 0.95 ∗ 1.035
1 − 0.95 ∗ 1.035
SolarBenef its =
(e
Logistic Regression
Notice many variables are correlated, a direct use of these
features as a whole can get us model with misleading coefficients. A logistic regression model with lasso regularization
was trained on a sample (30% of entire training data, around
6841501 data entries) of the training data. Due to the size
of training data, we only applied 5 folds in cross-validation.
The cross validation error is shown in Figure 1.
By one standard error rule, we chose the largest lambda
with cross validation error within one standard error of the
best model. The coefficients of this model are listed in Table
4.
Both equations suppose the solar economic benefit can
last for infinite number of years and a 3.5% of annual growth
rate for tier 3, 4 and 5 consumption groups, which is approximately the average growth rate for these tiers over the last
10 years.1 . However, they differ on the first part, where we
assume no change in tier 1 and 2 electricity consumption
categories prior to year 2009, however, 1% for months since
year 2009. We then used the actual electricity rates to calculate solar benefits according to the following equation:
M onthlySolarT ieredBenef its(et ) =
rt ∗ Seot
t ∈ {T 1, T 2, T 3, T 4, T 5}
Table 4: Logistic Regression Model
Predictor
Coefficient
(Intercept) -8.814706e+00
ownerocc
2.503317e-01
fracInstall 1.627659e+01
NPV.own
2.994175e-06
NPV.lease
7.020925e-06
(6)
where rt is tiered electricity rate and SEOt is the part solar electricity output fallen into tier t, total solar electricity
output is computed as follows,
Seo = CSIrating ∗ Hrssun
(7)
where, Hrssun denotes the full sun hours (that is the yearly
average amount of solar insolation, 5 hours is used for San
Diego County).2
Finally, other than the net present values for ownership
and lease, we also added a few dummy variables, such as,
season indicators and lease availability indicator. Our final
logistic regression model was trained on the extended set of
features.
The lasso regularization provides us with a sparse model.
Notice first that the most important feature in the group of
peer effects is at zip code level, but not the mile-based radius (weak measures are shrunk to zero coefficient). As the
cost of computing mile-based peer effect measures usually
increases by the scale of population, the benefit of shrinkage
regularization is also apparent. Finally, N P V.lease seems
stronger than N P V.own, which is informative since a large
portion of the solar market involves leased systems.
1
Some is available at SDG&E website, http://www.sdge.com/.
Note that actual tier may vary over years, we have converted non
5-tier rates into 5-tier rates
2
Sun
Hours/Day
Zone
Solar
Insolation
Map,
see
http://www.wholesalesolar.com/InformationSolarFolder/SunHoursUSMap.html
Agent-based Simulation
The models of CSI rating, ownership cost and lease cost and
logistic regression model of solar adoption propensity were
47
composed in a widely-used, open-source agent-based simulation toolkit, Repast.3
Agents
The main type of agent is named ”household”, which represents household entity in the residential solar market. It further derives two sub-types, ”adopter” and ”non-adopter”. In
addition, in order to flexibly control the execution of simulation, we defined a special type agent called ”updater”, which
is responsible for updating attributes of household agents at
each time step.
Time Step
At each tick of the simulation, updater agent first updates
predictors, i.e., ownership and lease cost, incentive and
NPVs etc for all agents based on the state of world. Lease
and ownership cost are estimated by the lease and ownership cost models respectively, and incentives imitated original CSI program rate schedule, i.e., a step function of total solar kilo watts have been reserved in San Diego County.
Next, non-adopter agents compute adoption likelihood given
a set of attributes shown as in the logistic regression model.
Stochastically, a non-adopter agent draws a random number between 0 to 1 and compares it with its own adoption
likelihood to decide whether if to adopt or not. The exactly
simulate a random event (in this case solar adoption) under
certain probability. In details, if the number is less than the
probability given by logistic regression model, it will choose
to adopt a solar system. If an agent chooses to adopt, consequently, we technically remove the non-adopter and add a
new adopter into the environment. Moreover, when we create a new adopter, we also assign an installation period of
the solar system, adopter agents will not update the number
of installation until installation is completed. The number
of months to be taken for the solar adoption to become a
visible system is uniformly distributed in 1 to 6, reflecting
the typical installation range in the CSI data. At the end
of every time step, data can be collected and output to an
user-specified file. Typical data we collected were aggregate
adoption and system cost.
Figure 2: Average Adoptions
Figure 3: Likelihood Ratio
We took the individuals from a representative zip code
in San Diego county (approximately 13000 households) and
initialized the simulation with their assessors features as
well as adoption states, i.e., who has adopted or who has not.
We started the simulation beyond the time we trained the
predictive models. As our agent-based simulation is highly
stochastic in its nature, we averaged results of 1000 runs for
both lasso and baseline models. The results of average adoptions by month is illustrated in Figure 2.
The results generated by lasso model is more close to the
real path than the baseline model. Formally, we traced likelihood of each model in each run, and computed average
likelihood ratio (lasso/baseline) of all sample runs for each
month. As shown in Figure 3, the ratio is generally much
larger than 1, suggesting that the lasso model significantly
outperforms the baseline.
We varied original CSI incentive rates, i.e., multiplying
original rates with 2, 4, 8 and 16 etc and holding the same
Experiment Results
We present the final ABM results in this section. To validate our modeling approach, we compared it with a baseline model, which is a non-regularizated logistic regression
model with only three features. Its coefficients are shown as
in Table 5.
Table 5: Baseline Logistic Regression Model
Predictor
Coefficient
(Intercept) -8.858e+00
fracInstall 1.313e+02
NPV.own
1.142e-05
3
Repast home page, http://repast.sourceforge.net/
48
Figure 4: Varying Incentive Rates
Figure 5: Expected Adoptions Incentives vs. Seeding
amount of targeted mega watts in each step, and compared
the expected adoption with outcomes of original CSI incentive rate structure. Although a single simulation run takes a
few seconds, large number (say 1000) of sample runs would
take couple hours. To cope with the computational difficulty,
we therefore introduce a heuristic measure to approximate
expected number of adopters of very large sample runs. We
called the heuristic ”average step”. As its name suggests,
the simulation will proceed in special a manner: it generates
multiple (typically 1000) instances of one-step runs simultaneously at each time step, but only the one close to the
average number of adopters becomes the ”true” outcome. If
a simulation is run in ”average step” mode, any step can be
considered as an average outcome of the previous step. Using this new measure, the time of computing utility of a policy has been reduced from hours to minutes. The ”average
step” function is enabled by the ”updater” agent mentioned
earlier.
We applied the simple ”average-step” heuristic idea and
hoped to learn insights of how the predictive model react
to incentive changes. The average adoption over 10 sample
(”average step”) runs for each month v.s. original incentive
plans is shown in Figure 4.
This impact of incentives seems very limited, that suggests incentive may not major concern when people are considering install solar PV systems. Moreover, the interesting result also encouraged us to think of alternative policies which could be more efficacious than current incentive
schemes. This is discussed in the next section.
to give away more free systems under a fixed budget.
We compared incentives with a seeding policy that seeds
all free solar systems at very beginning subject to identical
budget constraints. A comparison of average adoptions between the seeding policy and incentives can be seen in Figure 5.
The plot shows seeding policy with the same budget as
spent on original incentives can generate more adopters.
Also, as we increase available budget of the seeding policy,
more adopters can be induced.
An interesting question is whether if it is optimal to seed
all budget at time zero or the time period right before the
last period or some fractional policy in between. We run
our simulation with different splits of fixed budget between
time period t=0 and t=T-1, and searched for optimal split
which leaded to maximum number adopters. Similarly, we
tested different budgets regarding this problem. The total
adoptions i.e., seeded and unseeded adopters are illustrated
in Figure 6, where alpha is the fraction of budget used in
period t=T-1.
For smaller budgets, seeding at period t=T-1 is more efficacious than t=0. However, for larger budget we get an internal optimal solution (some fraction of the budget should
be used at time 0, and the rest in the penultimate period).
The intuition behind this internal optimal solution is seeding
early peer effect can last longer, whereas, seeding later can
give away more free systems as cost decreases over time.
The two effects work oppositely, that implies the optimal alpha could be between 0 and 1. When budget is small, peer
effects are not strong enough, since only a few individuals
can get free solar PVs. However, large budget can create
much stronger peer effects, which makes a fractional lambda
be optimal.
A further examination of adoptions not including individuals who were seeded is shown in Figure 7. It suggests
that for same budget the more one splits expenditure to t=0,
the stronger the peer effects will be and therefore the more
Seeding Strategy
A seeding policy assumes that we can give away free solar
PV systems to qualified individuals, which is expected to
stimulate further adoption through peer effects. This is motivated by the fact that peer effect is a stronger predictor of
adoption than the net present value. Eligibility for a free solar system is assumed to be determined by system cost. We
implemented a scheme that gives free solar to households
with lowest system costs. This makes sense because it tends
49
niques (Friedman, Hastie, and Tibshirani 2001). Logistic
regression (Bishop and others 2006), an extension of linear
regression, has been successfully deployed in variety of applications.
Agent-based modeling (ABM) is recognized as a powerful tool to understand and analyze phenomena in complex systems (Bonabeau 2002). It’s one kind of micro-scale
model, bottom-up approach that captures the simultaneous
interactions of multiple agents in an attempt to recreate and
predict macro system-level events (Gustafsson and Sternad
2010). An ”agent” in ABM is any autonomous entity with its
own properties and behaviors. Due to the definition, ABM
is considered as the most natural way to model more realistic systems, say markets. Unfortunately, acceptance of this
method in traditional research communities is slow. People
criticize that ABM does not handle real data but toy problem. Nevertheless, if agent-based models are properly validated, they can add a layer of realism that is not captured by
many analytical models (Rand and Rust 2011). The highlight of our work is embedding statistical models learned
from real data into multi-agent simulation and empirically
validate the agent base model. To our knowledge, such a
rigorous use of ABM in marketing research is rarely seen.
Marketing research in solar market typically focuses on
modeling diffusion processes. It often relies on diffusion
models (Bass 2004; Geroski 2000; Rao and Kishore 2010)
with aggregate data to make predictions on adoption of
a new product or technology. These models do provide
us with intuitions but are seldom applicable to real-world
data. Besides, econometrics research provides our insights
of what affect individual decisions on solar products. Particularly, peer effects has been extensively studied in literatures to model interactions of individual decisions in a
social perspective. For solar PV market, researcher developed method to identify peer effect(Bollinger and Gillingham 2012). Moreover, individual decisions on solar technology can have impact on some system variable, i.e., system cost. Learning by doing (Arrow 1962; Harmon 2000;
McDonald and Schrattenholzer 2001) was introduced to account for the externalities in terms of how system cost evolve
over time. The inclusion of net present value in our individual model is due to rationality assumption of individuals
who treat solar purchasing as a type of investment.
A few closed work are worth mentioning. For example,
Zhai and Williams developed a fuzzy logic model, which
related purchasing probability with variables, such as, perceived cost, perceived maintenance requirement, and environmental concern. An deterministic consumer choice
model was developed by Lobel and Perakis to forecast aggregate solar adoptions. Robinson et al. developed an agentbased model which explicitly modeled social interactions in
a spatial context using GIS data.
Our work differs from these previous work in several aspects. First, our models are built on richer and larger data
set, which include up to date characteristics of solar market.
Second, we are targeted in prediction rather than descriptive
data analysis. Third, the agent-based model we developed
is highly stochastic and non deterministic, which we expect
to account for uncertainties in individual decisions. Finally,
Figure 6: Expected Adoptions
Figure 7: Expected Adoptions Non-seeded Individuals
adopters induced finally. It also indicates peer effects increase as available budget increases. These initial findings
based on our validated agent-based model provided insights
of timing strategies in seeding policy. However, the search
space of the optimal seeding policy problem is still large,
i.e., how about seeding in the months between t=0 and t=T1? Does seeding to lowest cost individuals guarantee an
optimal solution? These are all interesting questions worth
deep investigation in the future.
Related Work
Our predictive modeling methodology is highly influenced
by methods in fields of machine learning and statistics. Feature selection (Guyon and Elisseeff 2003) is a common problem in almost any data analysis with a large set of variables.
Stepwise regression, lasso regularization with tuned parameter by cross validation are widely-employed standard tech-
50
our experiments on optimal seeding policies compared with
original incentives are rarely seen in the literatures so far.
Guyon, I., and Elisseeff, A. 2003. An introduction to variable and feature selection. The Journal of Machine Learning
Research 3:1157–1182.
Harmon, C. 2000. Experience curves of photovoltaic technology. Laxenburg, IIASA 17.
Kwan, C. L. 2012. Influence of local environmental, social,
economic and political variables on the spatial distribution
of residential solar pv arrays across the united states. Energy
Policy 47:332–344.
Lobel, R., and Perakis, G. 2011. Consumer choice model for
forecasting demand and designing incentives for solar technology. Social Science Research Network, MIT, Cambridge.
McAllister, J. A. 2012. Solar adoption and energy consumption in the residential sector.
McDonald, A., and Schrattenholzer, L. 2001. Learning rates
for energy technologies. Energy policy 29(4):255–261.
Rai, V., and Sigrin, B. 2013. Diffusion of environmentallyfriendly energy technologies: buy versus lease differences
in residential pv markets. Environmental Research Letters
8(1):014022.
Rand, W., and Rust, R. T. 2011. Agent-based modeling in
marketing: Guidelines for rigor. International Journal of
Research in Marketing 28(3):181–193.
Rao, K. U., and Kishore, V. 2010. A review of technology
diffusion models with special reference to renewable energy
technologies. Renewable and Sustainable Energy Reviews
14(3):1070–1078.
Robinson, S. A.; Stringer, M.; Rai, V.; and Tondon, A. 2013.
Gis-integrated agent-based model of residential solar pv diffusion.
Zhai, P., and Williams, E. D. 2012. Analyzing consumer
acceptance of photovoltaics (pv) using fuzzy logic model.
Renewable Energy 41:350–357.
Conclusion
In summary, we claim two major contributions in this paper.
First, we developed a reliable agent-based model to predict
solar demands in the residential market. Second, we proposed an alternative policy which could potentially outperforms the current CSI incentive scheme.
Despite of the accomplishment, in the future, we still
need to address a few problems. We have demonstrated
in the paper that ABM successfully forecast adoptions in a
zip code area with thousands individuals. More convincing work would need us to run ABM for the entire studied
area, San Diego county, which has about 80 times of current
population. Second, our model can be further improved by
adding more meaningful features. For example, human decisions on solar products is also influenced by environmental, social, economic and political variables (Kwan 2012;
Gromet, Kunreuther, and Larrick 2013). Also, better prediction on CSI rating could dramatically improve overall modeling processes. Finally, the minor impact of incentive suggests that the relationship between individual adoptions and
subsidies might need further investigation.
References
Arrow, K. J. 1962. The economic implications of learning
by doing. The review of economic studies 155–173.
Bass, F. M. 2004. Comments on ?a new product growth
for model consumer durables the bass model?. Management
science 50(12 supplement):1833–1840.
Bishop, C. M., et al. 2006. Pattern recognition and machine
learning, volume 1. springer New York.
Bollinger, B., and Gillingham, K. 2012. Peer effects in the
diffusion of solar photovoltaic panels. Marketing Science
31(6):900–912.
Bonabeau, E. 2002. Agent-based modeling: Methods and
techniques for simulating human systems. Proceedings of
the National Academy of Sciences of the United States of
America 99(Suppl 3):7280–7287.
Commission, C. P. U. 2013. California solar initiative program handbook.
Friedman, J.; Hastie, T.; and Tibshirani, R. 2001. The elements of statistical learning, volume 1. Springer Series in
Statistics New York.
Geroski, P. A. 2000. Models of technology diffusion. Research policy 29(4):603–625.
Gromet, D. M.; Kunreuther, H.; and Larrick, R. P. 2013.
Political ideology affects energy-efficiency attitudes and
choices. Proceedings of the National Academy of Sciences
110(23):9314–9319.
Gustafsson, L., and Sternad, M. 2010. Consistent micro,
macro and state-based population modelling. Mathematical
biosciences 225(2):94–107.
51