View/Open

advertisement
MORNING SESSION
Base setup
1.
Introduction and importing data: GD
2.
Creation of objects: GD
3.
The presentation of a rudimentary model: Ph
4.
Essential output: Ph
5.
Links: GD
6.
Static and dynamic modelling: GD
Extended
1. Introduction and importing data
1.1.
Get started (2.1.)
a. Copy the LIAM2 bundle on a directory on your computer
b. Open notepad++portable
c. Open a model, press F6
d. Check out for help; PRESENT HERE the
i. Documentation
ii. Mailing list
iii. Email address
iv. Google groups
1.2.
An introduction: what happens when you launch LIAM2 in Notepad?
Open notepad; show /macro/import F5, run, run with console F6, explore
results without simulating
Explain colors in the code: green=comment, blue= header of the process,
usually the endogenous variable,
Show user interface
Ctrl-q: put in comments; Ctrl-d: duplicate; Ctrl-z: undo
View/move/clone current document
Note that “tabs” do not exist in this pre-configured version of Notepad
Demonstrate VITABLES (http://vitables.org/)
!!! note that LIAM2 automatically saves the code at each launch!!
1.3.
A glossary of terms
1.3.1. Globals: are parameters that do not relate to an entity or object level in
the model. Globals can be tables or multi-dimensional arrays and they
need to be imported and declared.
1.3.2. Entities: these are the “object levels”. Each entity has a unique
identifier. E.g. individuals, households, firms
1.3.3. Fields aka global variables: entities are describes by a set of attributes.
Each attribute is a field. E.g. age, gender. Fields thus describe the
entities. 3 types: Booleans, integers or floats. Fields are global
variables; they have to be identified (only the identifier id and period
are implicit) and each process in the simulation block can use and
change the value. Note that fields do not necessarily are present in the
input dataset!
1.3.4. Links: objects or individuals can be linked with other objects on the
same entity (mother, child, spouses, …) or different entity (households)
1.3.5. Macros: CAPITAL letters. Are baskets of code; evaluated wherever
they appear. Useful for making code easier to read.
1.3.6. Processes: the processes describe how the entities behave i.e. under
what circumstances objects are created or removed, and how and when
their attributes (fields) change. The order in which the processes are
defined here is irrelevant. It is the order in the simulation block that
matters.
2. A rudimentary model and some output (Philippe)
2.1. A typical model in LIAM2 consists of several blocks
2.1.1. Open and discuss demo01.yml
-
globals, entities (household, person), simulation
-
within each entity: fields, links (see later), processes
-
simulation block consists of processes, input, output, start_period, periods
o Endogenous variables: fields not present in the input dataset
2.1.2. Output
•
Deterministic simulation: demo01.yml and conditions
to the interactive log-file
option “skip_shows: False”
2.1.3. checking your results via the interactive console
•
option “default_entity: person”
•
mean, min, max, groupby,
•
groupby(workstate, gender, expr=grpavg(age)) is average age for each combination
of workstate and gender
- qshow(groupby(lefthander, filter=FEMALE))
•
qshow(groupby(lefthander, filter=FEMALE, percent=True))
dump(… filter=(…))
!! you can check out the starting dataset after having ran the model if you set period 2001!!
2.1.4. producing simple output tables through LIAM2
-
Dumps
-
Year-specific outputs
2.1.5. Importing data: start without a hdf5 file; only with a csv.
THIS PART HAS CHANGED PLACE –
LIAM2 discerns
-
Entities: objects of different levels, each with a unique identifier and
described by fields,
-
fields (variables –
both global and local i.e. within procedure;
fields can be floats, integers or Booleans
fields can be observed in the initial dataset, or not
observed fields by definition are global fields, because they
have to be defined.
-
periodic globals (or parameters)
-
Links (see later)
-
Macro’s (see later)
2.1.5. Check the data in csv
Each object level (ind/hh): separately introduce each object number in
the starting year: household.csv, person.csv, firm.csv, …
For each object level, the dataset at least contains two colums “id” and
“period”. Household.csv
Imputed – no missings (a value may represent a missing e.g. -9)
2.1.6. Check parameter tables in csv
These are the “periodic globals” in LIAM2: section 3.1.
Show that both transposed or not transposed are possible
Parameter table: transposed = False. So each parameter is a column.
Show that import of globals_transposed.csv is possible when
“transposed: True”
2.1.7. Regular import
Demo_import_canberra.yml
Discuss this HAVE THEM DO THIS AS AN EXCERCISE
a. Run C:\usr\LIAM2-course_CANBERRA\examples\ demo_import_canberra.yml
b. Old name is “male”: show that this variable occurs in person.csv
c. Persoon 0 is false (female) Voorbeeld invert: [gender]. Person 0 is now true
2.2. working with LIAM2
2.oc2.1. oops – a mistake!
- breakpoints
Discuss breakpoints, always check local and global variables, discuss the log-file
and discuss macro/run the model with debugging information, traditional mistakes include:
mixin tabs and dummies, open and close brackets, local and global variables, for a procedure
no SPACE between’-‘ and the dependent variable, nan’s (
2.2.2. Using parameters (YML FILE)
Task: the variable agegroup takes age in 5 or 10 year groups, with 50 as the pivotal
age. Now we are going to take this 50 and replace it by a parameter called WEMRA. Why?
No idea …
2.2.3. Local and global fields
Why locals? Save memory, use temporary variables. Easier, because you do not
need to declare them.
Why globals? Output, lag(x), duration(x), tsum(x), tavg(x), information transfer
between modules,
2.2.4. Parameters
2.2.5. Make life easier – use macros ! (YML file)
Take demo01.yml
Add
macros:
MALE: gender
FEMALE: not gender
On the level of the individual
Then Take
agegroup: if((age < 50) and (gender),
5 * trunc(age / 5),
10 * trunc(age / 10))
And replace by
agegroup: if((age < 50) and (MALE),
5 * trunc(age / 5),
10 * trunc(age / 10))
Watch out : macro’s are evaluated when they appear. Discuss manual page 7
3. essential simulations we have not divided this
3.1. Stochastic simulation (ph?)
-
Choice: see demo01_stochastic.yml, but without discussing the init-phase of simulation.
o
EXCERCISE: Have them open demo01 and simulate a process personhasblueeyes
where 35% of the sample have blue eyes.
o
Run this and ask why “personhasblueeyes” is not in the log? (“stochastsimulation”
not in process list)
o
Personhasblueeyes: replace .35 by 0.349999 and show that LIAM2 does not complain:
it adapts to (minimum!) rounding errors
o
Show that “- personhasblueeyes2: uniform() < 0.35” is equivalent
o
!!! never do boolean: if (x, True, False) !!!
o
Have them copy the “groupby” to the procedure “show_demography”, why doesn’t
it work? (because “personhasblueeyes” is a local field)
-
Logits: see ao demo_stochast_2.yml procedure lefthander
Very simple and basic logit: p=probability.
Logit(p) = log( p / (1 - p)).
The inverse, the logistic function can convert any real number in a probabiliy:
logistic(a) = 1 / (1 + exp(a))
Logit_regr takes a logit with a random part (logistic(a – logit(u)) with u = random from a uniform
distribution [0,1) And confronts it with .5 to evaluate whether the event actually happens.
- lefthander: logit_regr(0.7106698 * if(agegroup < 60, 1, 0)- 3.336578, filter = FEMALE)
is equivalent to
- lefthander_score: if(FEMALE, logit_score(0.7106698 * if(agegroup < 60, 1, 0) - 3.336578), -1)
- lefthander_2: if(FEMALE, lefthander_score > 0.5, False)
At this point: LIAM2 allows for logits, continuous regressions, clipped or log-continuous regressions.
Ambitions include: multinominal logits,
3.2. Alignment in the base setup (GD?)
o
Alignment in one logit
Open demo01_stochast_2.yml again
a.
Simplest form
- lefthander_al: align(lefthander_score, 0.3, filter= FEMALE)
b.
Check out al_p_lefthander_f.csv, which is actually a COPIED version of al_p_inwork_f.csv,
and which contains proportions to agegroup_work. For agegroup_work, this is 11%
c.
- agegroup_work: if(age < 70, 5 * trunc(age / 5), 70)
d.
- lefthander_al: align(lefthander_score, 'al_p_lefthander_f.csv', filter= FEMALE)
e.
- qshow(grpavg(lefthander_al, filter=FEMALE and agegroup_work==15)), which is
11%.
f.
Later on we will see what happens if you want, say, ALL single women to have or become a
lefthander, wile still accomatding to the 11%
o
Alignment for multiple logits: scores via logits (demo06.yml)
Discuss demo06.yml
o
Order and alignment: deterministically set the risks

Example Fertility of married and single women
- to_give_birth: logit_regr(0.0,
filter=ISFEMALE and (age >= 15) and (age <= 50),
align='al_p_birth.csv')
Now we want married women to have a higher probability to give birth than cohabiting women,,;
who have a higher probability than women who are neither
birth:
- birth_score: if(MARRIED, logit_score(2), if(COHAB, logit_score(1), logit_score(0)))
- to_give_birth: if(FEMALE and (age >= 15) and (age <= 50),
align(birth_score, fname='al_p_birth.csv'),
False)
Finally, a simple application of alignment with take subconditions
Open demo01_stochastic.yml, goto process stochastsimulation
Suppose we want to simulate which WOMEN have blond hear, and we want that 50¨% are blonds,
and ALL women with blue eyes have blond hear
- blondie : align(uniform(), 0.5, filter=FEMALE, take=(personhasblueeyes))
- qshow(groupby(blondie, percent=True, filter=FEMALE))
TO DO SIMPLIFY AND EXPLAINS
3.3. Life cycle functions (PH?)
-
Life, death: demo03.yml, demo04.yml
-
Matching: aka marriage market: demo04.yml
-
Divorce (demo05.yml)
3.3. Static simulation: the ‘init’ phase of simulation: see demo06.yml (GD)
A dynamic model that is based on a dataset at t will simulate from t+1 on. Suppose that you want to
add or modify a variable in your starting dataset at t, so OUTSIDE the prospective model. Or suppose
that you want to perform some actions before the model actually begins.
e.g. at t+1 (so in simulation), you need a lagged value of a variable which is currently unavailable in
the starting dataset => Derive or simulate it using available observed variables in the dataset
e.g. create headers for the output datasets that will be filled during the simulation.
Open demo01_init.yml
Run for x periods, set “period 2001” for starting dataset and groupby(agegroup). This will not work.
But for period > 2001, it will. Next, put agegroup in the “init” block.
Net onder simulation blok,
simulation:
init:
- person: [agegroup]
processes: ...
en haal “agegroup” weg uit het simulatieblok. Run voor een aantal perioden en laat dan zien dat
agegroup nu NIET meer wordt aangepast.
3. linking your objects (Gijs)
links must be supported by available data. Take a look at the original datasets. In the
individual dataset person.csv, there is a variable hh_id which is the household idenficiation
number. These numbers must coindide with the identification numbers (id) in household.csv.
open demo06.yml
a link has the following form
name: {type: ..., target: ..., field: ...}
-
name e.g. persons (on the hh-level), household, mother, father, child (on the person
level)
-
type:
1. many2 one: links individual to ONE other individual in the
same entity (person -> father, mother), or another entity (person
-> household)
2. one2many: links the individual to AT LEAST one other
individual in the same entity (person to his children) or another
entity (househhold to its members)
-
target: entity to where the link points to.
-
Field: integer that contains the id number of the linked individual at entity
level”target”. For example partner_id must be a variable for each individual.
Then the link must be established within the model. There are two possibilities
1. Many2one: link of the entity to ONE other entity (another person, a household)
Open demo01.yml
1. Many2one (one person 2 one person) within the individual level
On the individual level, include
Links:
Partner: {type: many2one, target: person, field: partner_id}
include partner_id as integer in the field list
include a separate procedure olderthanpartner: if((partner.age > age), True, False) in the
process block and in the simulation block
2. One2many (one household 2 many individuals): gathering information from
individuals on the household level
On the household level, include
Links
Persons: {type: one2many, target: person, field: hh_id}
!!! hh_id moet als variabele gedefiniteerd zijn op het INDIVIDUELE NIVEAU
Voeg toe op het hushoudniveau
Fields
Numchildren: {type: int, initialdata: false}
processes:
numchildren:
- numchildren: persons.count(age < 18)
- qshow(groupby(numchildren, percent=True))
3. (one household 2 one person) take household information back to the level of the
individual
On the individual level, include
Link
Household: {type: many2one, target: household, field: hh_id }
Processes
bigfamily:
- bigfamily: household.numchildren > 1
- alternatief: household.get(persons.count(age < 18))
THIS IS BETTER
- qshow(groupby(bigfamily, alternatief, percent=True))
The “get” can also be used in the case of macro’s! E.g. retirement status of the partner:
"ps.get(RETIRED)"
open demo01_links.yml and discuss the whole lot
1. gather personal information on the household level (many2one)
i. on hh level, show link “persons”
a.
Use linkname.fieldname (e.g. partner.age, household.nb_children; see [177], or
household.(persons.count(age < 18)))
b.
Compound links:
grand_parents_income: mother.mother.income +
mother.father.income + father.mother.income +
father.father.income
c.
Another option to get values in the linked individual is to use the form:
link_name.get(expr) zie alternatief
this syntax is a bit more verbose in the simple case, but is much more powerful
For example, in demo06.yml, on the household level, there is a routine that set the number of
individuals in the household
household_composition: (HOUSEHOLD LEVEL)
- nb_persons: persons.count()
- nb_children: persons.count(age < 18)
Now on the INDIVIDUAL level, you can use this to establish whether an individual lives
alone: the classical way is to
- alone: household.nb_persons == 1
- alone: household.get(persons.count() == 1
3.
changing and breaking links (demo04.yml)
- justcoupled: to_couple and (partner_id != UNSET)
- hh_id: if(justcoupled,
if(ISMALE, partner.newhousehold, newhousehold),
hh_id)
Afternoon session: advanced stuff in LIAM2
-
Run LIAM2 in batch
Discuss and run c:\usr\liam2\“run_liam2_canberra_gd.bat”
-
cloning
This is very similar to “new” (see the birth routine), but specific to cases where the variables
describing the new individual should be copied from a source instead of being missing. The
entity created is always the same as the source entity.
Open demo04.yml, discuss briefly the new routine in the “birth process”.
- new('person', filter=to_give_birth,
mother_id = id,
hh_id = hh_id,
age = 0,
partner_id = UNSET,
civilstate = SINGLE,
gender = choice([MALE, FEMALE], [0.51, 0.49]))
Suppose that we want to clone person id == 29 (woman of 29 years old)
1. The difference or equivalence between “new” and “clone”
Open demo01.yml
clonetest:
- new('person', filter=(id == 29), source_id = id, age=age, gender=gender,
civilstate=civilstate)
# source_id will result in an error message when it is not defined as a global field!!
- qshow(dump(id, age, gender, civilstate, filter=(id == 29)))
- qshow(dump(id, age, gender, civilstate, filter=(source_id == 29)))
# all undefined variables are MISSING
- clone(filter=(id == 29), source_id = id)
- qshow(dump(id, age, gender, civilstate, filter=(source_id == 29)))
application: expanding a dataset using frequency weights
Discuss the “expand routine”
o
Importing and using multiple tables
1. importing multiple tables: discuss “demo_import_tables.yml
# Immigration foreigners
MIG_FO:
path: input\MIG_FO.csv
type: float
Discuss input table C:\usr\LIAM2-course_CANBERRA\examples\input
Open simple2001.h5 and show that the table is now included. Cell(1,1) contains ROW 1
(women, age=0, ALL YEARS), Cell(1,2) = ROW 2, in de subtable for WOMEN. Cell(2,1)
contains ROW 1 (men, age=0, all years) in the table for MALES. So the order is GENDER,
AGE, PERIOD)
Open demo01.yml. First, DEFINE, MIG_FO in the globals-block (NOT periodic, because
those are the parameters)
globals:
MIG_FO:
type: float
next, CALL the array, for example on the individual level
- MIG_PERIOD: MIG_FO[:, :, period - base_period]
#FE period separately, this is a matrix of two rows (gender False/True) and 105 (age)
columns. Sall values are taken, and the matrix MIG_FO of 3 dimensions is thus reduced to 2
dimensions
#MIG_FO[False, 1, period - base_period] = 429
- idx: gender * 1
# turn boolean into integer
- MIG_PERIOD: MIG_FO[idx, age, period - base_period]
# now the array becomes a scalar which is different FE gender, and age withing period.
NOTE that the arguments of an array MUST be integers!
o
Alignment of absolute numbers, using linked objects (IN CASE OF SELECTION
ON EMPTY BASKETS OF INDIVIDUALS THIS BECOMES THE Chenard algorithm);
also note that align_abs only SELECTS households.
In the household entity
# A. select 38 random households from the available 14700
- aligned: align_abs(0.0, 38)
# B. select 38 households with largest household first
- aligned: align_abs(persons.count(), 38)
#- C. select 38 households with largest household first from households that have
more than two persons in it
- aligned: align_abs(persons.count(), 38, filter = persons.count() > 2)
# show in the output that now the total sample has decreased from 14700 (total Nr
households) to 1200!!
# D. select 38 households from a 50% of the total sample of households from
households in descending order of size
- aligned: align_abs(persons.count(), 38, filter = uniform() < 0.5)
NOW CHENARD
BEFORE DOING THIS SOME PREPARATIONS NEED TO BE MADE
define MIG_FO (which is individual data) on the level of the HOUSHOLD, because the same
array can of course be used on ANY entity level!
MIG_FO:
path: input\MIG_FO.csv
type: float
The field “need” is the array MIG_FO
need: MIG_FO[:,:, period - 2002]
Next, establish one2many link from household to persons:
# links:
# persons: {type: one2many, target: person, field: hh_id}
NOTE that for this LINK to work, the field “hh_id” MUST be defined on the level of the
indivual: - hh_id: {type: int, initialdata: false}
And define the local field num_persons as the number of individuals in each household
num_persons: persons.count()
now we have all the information required for CHENARD
# E. Chenard select "need" households from a 50% of the sample of households.
"need" is information pertaining to individuals (link=persons)
# bin is full if 1. par age and sex number is filled, or 2. if the aggregate within
gender (secondary_axis) is filled
# errors are taken to the next period. Alternative: "default" means forget
- need: MIG_FO[:,:, period - 2002] / 4.59
- aligned_FO: align_abs(persons.count(), need, filter = uniform() < 0.5, link =
persons, secondary_axis = gender, errors='carry')
#
- breakpoint()
o
updating the links between cloned objects, and between clones and their sources
Suppose that I select 38 households and I clone these
On the HOUSEHOLD level
- aligned: align_abs(0.0, 38)
- clone_id: clone(aligned, source_id=id)
# show that there are now 14738 instead of 14700 households
On the INDIVIDUAL level: # 1. clone the individuals whose household has been cloned
For this to work, some information needs to be available
1. Defined the “target link”, the link between a cloned household and the original
household where he/she comes from.
target: {type: many2one, target: household, field: clone_id}
2. For this to work, the field clone_id must be global on the household level
- aligned: {type: bool, initialdata: false} !!! the variable must be a BOOLEAN
#- clone_iid: clone(household.aligned, source_iid = id, hh_id = household.target.id)
- hh_aligned: household.aligned
# because when 38 households with "aligned" are cloned, there are 2 * 38
households that have "tobealigned"
# so select ONLY the clones
- tobealigned: hh_aligned & household.get(clone_id > 0) & household.get(clone_id
== clone_id)
- groupby(tobealigned)
- clone_iid: clone(tobealigned, hh_id=household.id, yearofclone=period)
# now adapt the idnumbers of the partner BUT ONLY in the year of cloning. SEE
THE EXAMPLE
o
application How to model immigration in LIAM2? (GD)
-
State alignment, hard and soft take and leave conditions (PH)
-
Output and reporting (GD)
1. produce a simple output file for each year
- csv(dump(id, hh_id, age, MALE, married, just_matched, civilstate, dur_in_couple),
suffix='mmkt_dump')
csv(groupby(civilstate,
gender,
lag(civilstate),
percent=True),
suffix='mmkt')
but watch out; if you do this with a very large sample, you will have time to go for coffee!
- csv(groupby(workstate, lag(workstate), filter=MALE), suffix='whatever')
2. produce a file with output for multiple years: suppose that we want to produce the average
age,
median
age
and
percentage
65+
for
all
simulation
years.
Step 1: produce the header of the file in the init-file
- csv(['period', 'average age', 'median age', '% 65+'], fname='output_age.csv')
You
can
also
have
a
two-line
- csv([age information], ['period', 'average age', 'median age'], fname=’output.csv’)
header
Step 2 provide the information in the simulation part of the model
- csv(avg(age), median(age), avg(age > 65) * 100, fname='output.csv', mode='a')
-
Importing a LIAM2 model in your model (GD; see demo07.yml)
!! Additional arrays, periodic globals (parameters), links and variables can be defined!
-
Tips and tricks (GD)
1. demonstrate graphs and stuff from the user interface
2. Dealing with missings
If var is a variable which contains some nan values:
if(var == nan, x, y) ---> all y EVEN where var is nan
if(var != nan, x, y) ---> all x EVEN where var is not nan
if(var > 0, x, y) ---> y where var is nan
if(var < 0, x, y) ---> y where var is nan
In short, comparing a nan with anything always returns False (for all possible values, including "nan")
for any kind of comparison except != which always returns True EVEN if the value compared to is also
nan.
3. Tips for checking data
>>> any(a == b)
>>> all(a == b)
3 .ASSERTS
assertTrue(1 == 1)
assertEqual(1, 1)
Download