MORNING SESSION Base setup 1. Introduction and importing data: GD 2. Creation of objects: GD 3. The presentation of a rudimentary model: Ph 4. Essential output: Ph 5. Links: GD 6. Static and dynamic modelling: GD Extended 1. Introduction and importing data 1.1. Get started (2.1.) a. Copy the LIAM2 bundle on a directory on your computer b. Open notepad++portable c. Open a model, press F6 d. Check out for help; PRESENT HERE the i. Documentation ii. Mailing list iii. Email address iv. Google groups 1.2. An introduction: what happens when you launch LIAM2 in Notepad? Open notepad; show /macro/import F5, run, run with console F6, explore results without simulating Explain colors in the code: green=comment, blue= header of the process, usually the endogenous variable, Show user interface Ctrl-q: put in comments; Ctrl-d: duplicate; Ctrl-z: undo View/move/clone current document Note that “tabs” do not exist in this pre-configured version of Notepad Demonstrate VITABLES (http://vitables.org/) !!! note that LIAM2 automatically saves the code at each launch!! 1.3. A glossary of terms 1.3.1. Globals: are parameters that do not relate to an entity or object level in the model. Globals can be tables or multi-dimensional arrays and they need to be imported and declared. 1.3.2. Entities: these are the “object levels”. Each entity has a unique identifier. E.g. individuals, households, firms 1.3.3. Fields aka global variables: entities are describes by a set of attributes. Each attribute is a field. E.g. age, gender. Fields thus describe the entities. 3 types: Booleans, integers or floats. Fields are global variables; they have to be identified (only the identifier id and period are implicit) and each process in the simulation block can use and change the value. Note that fields do not necessarily are present in the input dataset! 1.3.4. Links: objects or individuals can be linked with other objects on the same entity (mother, child, spouses, …) or different entity (households) 1.3.5. Macros: CAPITAL letters. Are baskets of code; evaluated wherever they appear. Useful for making code easier to read. 1.3.6. Processes: the processes describe how the entities behave i.e. under what circumstances objects are created or removed, and how and when their attributes (fields) change. The order in which the processes are defined here is irrelevant. It is the order in the simulation block that matters. 2. A rudimentary model and some output (Philippe) 2.1. A typical model in LIAM2 consists of several blocks 2.1.1. Open and discuss demo01.yml - globals, entities (household, person), simulation - within each entity: fields, links (see later), processes - simulation block consists of processes, input, output, start_period, periods o Endogenous variables: fields not present in the input dataset 2.1.2. Output • Deterministic simulation: demo01.yml and conditions to the interactive log-file option “skip_shows: False” 2.1.3. checking your results via the interactive console • option “default_entity: person” • mean, min, max, groupby, • groupby(workstate, gender, expr=grpavg(age)) is average age for each combination of workstate and gender - qshow(groupby(lefthander, filter=FEMALE)) • qshow(groupby(lefthander, filter=FEMALE, percent=True)) dump(… filter=(…)) !! you can check out the starting dataset after having ran the model if you set period 2001!! 2.1.4. producing simple output tables through LIAM2 - Dumps - Year-specific outputs 2.1.5. Importing data: start without a hdf5 file; only with a csv. THIS PART HAS CHANGED PLACE – LIAM2 discerns - Entities: objects of different levels, each with a unique identifier and described by fields, - fields (variables – both global and local i.e. within procedure; fields can be floats, integers or Booleans fields can be observed in the initial dataset, or not observed fields by definition are global fields, because they have to be defined. - periodic globals (or parameters) - Links (see later) - Macro’s (see later) 2.1.5. Check the data in csv Each object level (ind/hh): separately introduce each object number in the starting year: household.csv, person.csv, firm.csv, … For each object level, the dataset at least contains two colums “id” and “period”. Household.csv Imputed – no missings (a value may represent a missing e.g. -9) 2.1.6. Check parameter tables in csv These are the “periodic globals” in LIAM2: section 3.1. Show that both transposed or not transposed are possible Parameter table: transposed = False. So each parameter is a column. Show that import of globals_transposed.csv is possible when “transposed: True” 2.1.7. Regular import Demo_import_canberra.yml Discuss this HAVE THEM DO THIS AS AN EXCERCISE a. Run C:\usr\LIAM2-course_CANBERRA\examples\ demo_import_canberra.yml b. Old name is “male”: show that this variable occurs in person.csv c. Persoon 0 is false (female) Voorbeeld invert: [gender]. Person 0 is now true 2.2. working with LIAM2 2.oc2.1. oops – a mistake! - breakpoints Discuss breakpoints, always check local and global variables, discuss the log-file and discuss macro/run the model with debugging information, traditional mistakes include: mixin tabs and dummies, open and close brackets, local and global variables, for a procedure no SPACE between’-‘ and the dependent variable, nan’s ( 2.2.2. Using parameters (YML FILE) Task: the variable agegroup takes age in 5 or 10 year groups, with 50 as the pivotal age. Now we are going to take this 50 and replace it by a parameter called WEMRA. Why? No idea … 2.2.3. Local and global fields Why locals? Save memory, use temporary variables. Easier, because you do not need to declare them. Why globals? Output, lag(x), duration(x), tsum(x), tavg(x), information transfer between modules, 2.2.4. Parameters 2.2.5. Make life easier – use macros ! (YML file) Take demo01.yml Add macros: MALE: gender FEMALE: not gender On the level of the individual Then Take agegroup: if((age < 50) and (gender), 5 * trunc(age / 5), 10 * trunc(age / 10)) And replace by agegroup: if((age < 50) and (MALE), 5 * trunc(age / 5), 10 * trunc(age / 10)) Watch out : macro’s are evaluated when they appear. Discuss manual page 7 3. essential simulations we have not divided this 3.1. Stochastic simulation (ph?) - Choice: see demo01_stochastic.yml, but without discussing the init-phase of simulation. o EXCERCISE: Have them open demo01 and simulate a process personhasblueeyes where 35% of the sample have blue eyes. o Run this and ask why “personhasblueeyes” is not in the log? (“stochastsimulation” not in process list) o Personhasblueeyes: replace .35 by 0.349999 and show that LIAM2 does not complain: it adapts to (minimum!) rounding errors o Show that “- personhasblueeyes2: uniform() < 0.35” is equivalent o !!! never do boolean: if (x, True, False) !!! o Have them copy the “groupby” to the procedure “show_demography”, why doesn’t it work? (because “personhasblueeyes” is a local field) - Logits: see ao demo_stochast_2.yml procedure lefthander Very simple and basic logit: p=probability. Logit(p) = log( p / (1 - p)). The inverse, the logistic function can convert any real number in a probabiliy: logistic(a) = 1 / (1 + exp(a)) Logit_regr takes a logit with a random part (logistic(a – logit(u)) with u = random from a uniform distribution [0,1) And confronts it with .5 to evaluate whether the event actually happens. - lefthander: logit_regr(0.7106698 * if(agegroup < 60, 1, 0)- 3.336578, filter = FEMALE) is equivalent to - lefthander_score: if(FEMALE, logit_score(0.7106698 * if(agegroup < 60, 1, 0) - 3.336578), -1) - lefthander_2: if(FEMALE, lefthander_score > 0.5, False) At this point: LIAM2 allows for logits, continuous regressions, clipped or log-continuous regressions. Ambitions include: multinominal logits, 3.2. Alignment in the base setup (GD?) o Alignment in one logit Open demo01_stochast_2.yml again a. Simplest form - lefthander_al: align(lefthander_score, 0.3, filter= FEMALE) b. Check out al_p_lefthander_f.csv, which is actually a COPIED version of al_p_inwork_f.csv, and which contains proportions to agegroup_work. For agegroup_work, this is 11% c. - agegroup_work: if(age < 70, 5 * trunc(age / 5), 70) d. - lefthander_al: align(lefthander_score, 'al_p_lefthander_f.csv', filter= FEMALE) e. - qshow(grpavg(lefthander_al, filter=FEMALE and agegroup_work==15)), which is 11%. f. Later on we will see what happens if you want, say, ALL single women to have or become a lefthander, wile still accomatding to the 11% o Alignment for multiple logits: scores via logits (demo06.yml) Discuss demo06.yml o Order and alignment: deterministically set the risks Example Fertility of married and single women - to_give_birth: logit_regr(0.0, filter=ISFEMALE and (age >= 15) and (age <= 50), align='al_p_birth.csv') Now we want married women to have a higher probability to give birth than cohabiting women,,; who have a higher probability than women who are neither birth: - birth_score: if(MARRIED, logit_score(2), if(COHAB, logit_score(1), logit_score(0))) - to_give_birth: if(FEMALE and (age >= 15) and (age <= 50), align(birth_score, fname='al_p_birth.csv'), False) Finally, a simple application of alignment with take subconditions Open demo01_stochastic.yml, goto process stochastsimulation Suppose we want to simulate which WOMEN have blond hear, and we want that 50¨% are blonds, and ALL women with blue eyes have blond hear - blondie : align(uniform(), 0.5, filter=FEMALE, take=(personhasblueeyes)) - qshow(groupby(blondie, percent=True, filter=FEMALE)) TO DO SIMPLIFY AND EXPLAINS 3.3. Life cycle functions (PH?) - Life, death: demo03.yml, demo04.yml - Matching: aka marriage market: demo04.yml - Divorce (demo05.yml) 3.3. Static simulation: the ‘init’ phase of simulation: see demo06.yml (GD) A dynamic model that is based on a dataset at t will simulate from t+1 on. Suppose that you want to add or modify a variable in your starting dataset at t, so OUTSIDE the prospective model. Or suppose that you want to perform some actions before the model actually begins. e.g. at t+1 (so in simulation), you need a lagged value of a variable which is currently unavailable in the starting dataset => Derive or simulate it using available observed variables in the dataset e.g. create headers for the output datasets that will be filled during the simulation. Open demo01_init.yml Run for x periods, set “period 2001” for starting dataset and groupby(agegroup). This will not work. But for period > 2001, it will. Next, put agegroup in the “init” block. Net onder simulation blok, simulation: init: - person: [agegroup] processes: ... en haal “agegroup” weg uit het simulatieblok. Run voor een aantal perioden en laat dan zien dat agegroup nu NIET meer wordt aangepast. 3. linking your objects (Gijs) links must be supported by available data. Take a look at the original datasets. In the individual dataset person.csv, there is a variable hh_id which is the household idenficiation number. These numbers must coindide with the identification numbers (id) in household.csv. open demo06.yml a link has the following form name: {type: ..., target: ..., field: ...} - name e.g. persons (on the hh-level), household, mother, father, child (on the person level) - type: 1. many2 one: links individual to ONE other individual in the same entity (person -> father, mother), or another entity (person -> household) 2. one2many: links the individual to AT LEAST one other individual in the same entity (person to his children) or another entity (househhold to its members) - target: entity to where the link points to. - Field: integer that contains the id number of the linked individual at entity level”target”. For example partner_id must be a variable for each individual. Then the link must be established within the model. There are two possibilities 1. Many2one: link of the entity to ONE other entity (another person, a household) Open demo01.yml 1. Many2one (one person 2 one person) within the individual level On the individual level, include Links: Partner: {type: many2one, target: person, field: partner_id} include partner_id as integer in the field list include a separate procedure olderthanpartner: if((partner.age > age), True, False) in the process block and in the simulation block 2. One2many (one household 2 many individuals): gathering information from individuals on the household level On the household level, include Links Persons: {type: one2many, target: person, field: hh_id} !!! hh_id moet als variabele gedefiniteerd zijn op het INDIVIDUELE NIVEAU Voeg toe op het hushoudniveau Fields Numchildren: {type: int, initialdata: false} processes: numchildren: - numchildren: persons.count(age < 18) - qshow(groupby(numchildren, percent=True)) 3. (one household 2 one person) take household information back to the level of the individual On the individual level, include Link Household: {type: many2one, target: household, field: hh_id } Processes bigfamily: - bigfamily: household.numchildren > 1 - alternatief: household.get(persons.count(age < 18)) THIS IS BETTER - qshow(groupby(bigfamily, alternatief, percent=True)) The “get” can also be used in the case of macro’s! E.g. retirement status of the partner: "ps.get(RETIRED)" open demo01_links.yml and discuss the whole lot 1. gather personal information on the household level (many2one) i. on hh level, show link “persons” a. Use linkname.fieldname (e.g. partner.age, household.nb_children; see [177], or household.(persons.count(age < 18))) b. Compound links: grand_parents_income: mother.mother.income + mother.father.income + father.mother.income + father.father.income c. Another option to get values in the linked individual is to use the form: link_name.get(expr) zie alternatief this syntax is a bit more verbose in the simple case, but is much more powerful For example, in demo06.yml, on the household level, there is a routine that set the number of individuals in the household household_composition: (HOUSEHOLD LEVEL) - nb_persons: persons.count() - nb_children: persons.count(age < 18) Now on the INDIVIDUAL level, you can use this to establish whether an individual lives alone: the classical way is to - alone: household.nb_persons == 1 - alone: household.get(persons.count() == 1 3. changing and breaking links (demo04.yml) - justcoupled: to_couple and (partner_id != UNSET) - hh_id: if(justcoupled, if(ISMALE, partner.newhousehold, newhousehold), hh_id) Afternoon session: advanced stuff in LIAM2 - Run LIAM2 in batch Discuss and run c:\usr\liam2\“run_liam2_canberra_gd.bat” - cloning This is very similar to “new” (see the birth routine), but specific to cases where the variables describing the new individual should be copied from a source instead of being missing. The entity created is always the same as the source entity. Open demo04.yml, discuss briefly the new routine in the “birth process”. - new('person', filter=to_give_birth, mother_id = id, hh_id = hh_id, age = 0, partner_id = UNSET, civilstate = SINGLE, gender = choice([MALE, FEMALE], [0.51, 0.49])) Suppose that we want to clone person id == 29 (woman of 29 years old) 1. The difference or equivalence between “new” and “clone” Open demo01.yml clonetest: - new('person', filter=(id == 29), source_id = id, age=age, gender=gender, civilstate=civilstate) # source_id will result in an error message when it is not defined as a global field!! - qshow(dump(id, age, gender, civilstate, filter=(id == 29))) - qshow(dump(id, age, gender, civilstate, filter=(source_id == 29))) # all undefined variables are MISSING - clone(filter=(id == 29), source_id = id) - qshow(dump(id, age, gender, civilstate, filter=(source_id == 29))) application: expanding a dataset using frequency weights Discuss the “expand routine” o Importing and using multiple tables 1. importing multiple tables: discuss “demo_import_tables.yml # Immigration foreigners MIG_FO: path: input\MIG_FO.csv type: float Discuss input table C:\usr\LIAM2-course_CANBERRA\examples\input Open simple2001.h5 and show that the table is now included. Cell(1,1) contains ROW 1 (women, age=0, ALL YEARS), Cell(1,2) = ROW 2, in de subtable for WOMEN. Cell(2,1) contains ROW 1 (men, age=0, all years) in the table for MALES. So the order is GENDER, AGE, PERIOD) Open demo01.yml. First, DEFINE, MIG_FO in the globals-block (NOT periodic, because those are the parameters) globals: MIG_FO: type: float next, CALL the array, for example on the individual level - MIG_PERIOD: MIG_FO[:, :, period - base_period] #FE period separately, this is a matrix of two rows (gender False/True) and 105 (age) columns. Sall values are taken, and the matrix MIG_FO of 3 dimensions is thus reduced to 2 dimensions #MIG_FO[False, 1, period - base_period] = 429 - idx: gender * 1 # turn boolean into integer - MIG_PERIOD: MIG_FO[idx, age, period - base_period] # now the array becomes a scalar which is different FE gender, and age withing period. NOTE that the arguments of an array MUST be integers! o Alignment of absolute numbers, using linked objects (IN CASE OF SELECTION ON EMPTY BASKETS OF INDIVIDUALS THIS BECOMES THE Chenard algorithm); also note that align_abs only SELECTS households. In the household entity # A. select 38 random households from the available 14700 - aligned: align_abs(0.0, 38) # B. select 38 households with largest household first - aligned: align_abs(persons.count(), 38) #- C. select 38 households with largest household first from households that have more than two persons in it - aligned: align_abs(persons.count(), 38, filter = persons.count() > 2) # show in the output that now the total sample has decreased from 14700 (total Nr households) to 1200!! # D. select 38 households from a 50% of the total sample of households from households in descending order of size - aligned: align_abs(persons.count(), 38, filter = uniform() < 0.5) NOW CHENARD BEFORE DOING THIS SOME PREPARATIONS NEED TO BE MADE define MIG_FO (which is individual data) on the level of the HOUSHOLD, because the same array can of course be used on ANY entity level! MIG_FO: path: input\MIG_FO.csv type: float The field “need” is the array MIG_FO need: MIG_FO[:,:, period - 2002] Next, establish one2many link from household to persons: # links: # persons: {type: one2many, target: person, field: hh_id} NOTE that for this LINK to work, the field “hh_id” MUST be defined on the level of the indivual: - hh_id: {type: int, initialdata: false} And define the local field num_persons as the number of individuals in each household num_persons: persons.count() now we have all the information required for CHENARD # E. Chenard select "need" households from a 50% of the sample of households. "need" is information pertaining to individuals (link=persons) # bin is full if 1. par age and sex number is filled, or 2. if the aggregate within gender (secondary_axis) is filled # errors are taken to the next period. Alternative: "default" means forget - need: MIG_FO[:,:, period - 2002] / 4.59 - aligned_FO: align_abs(persons.count(), need, filter = uniform() < 0.5, link = persons, secondary_axis = gender, errors='carry') # - breakpoint() o updating the links between cloned objects, and between clones and their sources Suppose that I select 38 households and I clone these On the HOUSEHOLD level - aligned: align_abs(0.0, 38) - clone_id: clone(aligned, source_id=id) # show that there are now 14738 instead of 14700 households On the INDIVIDUAL level: # 1. clone the individuals whose household has been cloned For this to work, some information needs to be available 1. Defined the “target link”, the link between a cloned household and the original household where he/she comes from. target: {type: many2one, target: household, field: clone_id} 2. For this to work, the field clone_id must be global on the household level - aligned: {type: bool, initialdata: false} !!! the variable must be a BOOLEAN #- clone_iid: clone(household.aligned, source_iid = id, hh_id = household.target.id) - hh_aligned: household.aligned # because when 38 households with "aligned" are cloned, there are 2 * 38 households that have "tobealigned" # so select ONLY the clones - tobealigned: hh_aligned & household.get(clone_id > 0) & household.get(clone_id == clone_id) - groupby(tobealigned) - clone_iid: clone(tobealigned, hh_id=household.id, yearofclone=period) # now adapt the idnumbers of the partner BUT ONLY in the year of cloning. SEE THE EXAMPLE o application How to model immigration in LIAM2? (GD) - State alignment, hard and soft take and leave conditions (PH) - Output and reporting (GD) 1. produce a simple output file for each year - csv(dump(id, hh_id, age, MALE, married, just_matched, civilstate, dur_in_couple), suffix='mmkt_dump') csv(groupby(civilstate, gender, lag(civilstate), percent=True), suffix='mmkt') but watch out; if you do this with a very large sample, you will have time to go for coffee! - csv(groupby(workstate, lag(workstate), filter=MALE), suffix='whatever') 2. produce a file with output for multiple years: suppose that we want to produce the average age, median age and percentage 65+ for all simulation years. Step 1: produce the header of the file in the init-file - csv(['period', 'average age', 'median age', '% 65+'], fname='output_age.csv') You can also have a two-line - csv([age information], ['period', 'average age', 'median age'], fname=’output.csv’) header Step 2 provide the information in the simulation part of the model - csv(avg(age), median(age), avg(age > 65) * 100, fname='output.csv', mode='a') - Importing a LIAM2 model in your model (GD; see demo07.yml) !! Additional arrays, periodic globals (parameters), links and variables can be defined! - Tips and tricks (GD) 1. demonstrate graphs and stuff from the user interface 2. Dealing with missings If var is a variable which contains some nan values: if(var == nan, x, y) ---> all y EVEN where var is nan if(var != nan, x, y) ---> all x EVEN where var is not nan if(var > 0, x, y) ---> y where var is nan if(var < 0, x, y) ---> y where var is nan In short, comparing a nan with anything always returns False (for all possible values, including "nan") for any kind of comparison except != which always returns True EVEN if the value compared to is also nan. 3. Tips for checking data >>> any(a == b) >>> all(a == b) 3 .ASSERTS assertTrue(1 == 1) assertEqual(1, 1)