Working with GDX files Information paper 20 March 2014 Market Performance Working with GDX files Version control Version Date amended Comments 0.0.1 19/3/2014 Initial draft 0.0.2 20/3/2014 Finalised i 20 March 2014 1.56 p.m. Working with GDX files Contents 1 Introduction GAMS and GDX files GAMS license arrangements 1 1 1 2 Using GDX files Write some GAMS code Use GDXdump to create CSV or GMS files The py-gdx (Python) utilities Interfacing GAMS with R Writing to Excel from the GAMS IDE Passing data between GAMS programs using merged GDX files Worked examples A more sophisticated approach 1 2 3 4 5 5 5 6 7 Glossary of abbreviations and terms 9 ii 20 March 2014 1.56 p.m. Working with GDX files 1 Introduction 1.1 Several of the models that the Electricity Authority (Authority) makes publicly available are 1 formulated using the GAMS software and therefore make extensive use of GDX files. We often get asked for advice on how to work with GDX files; specifically, how to: 1.2 (a) extract data from a GDX file (b) modify data in a GDX file (c) translate from the GDX file format to some other file format. The purpose of this paper is to provide a few useful tips on working with GDX files. GAMS and GDX files 1.3 The General Algebraic Modeling System (GAMS) is a high-level modelling system for mathematical programming and optimization. The GAMS software consists of an integrated development environment (IDE), a language compiler, and a stable of integrated highperformance solvers. GAMS is particularly well-suited to formulating and solving complex, large scale modelling applications such as vSPD, HSS and GEM. 1.4 GDX stands for GAMS data exchange. A GDX file is a binary file format for use with GAMS. GDX files are a convenient way to: 1.5 (a) store the input data for a GAMS-based model and easily read it into GAMS (b) pass data from one GAMS-based program to another (c) collect the output from a GAMS-based model for further processing, either with GAMS or some other software. While users of EMI tools need no experience with GAMS, some familiarity with GAMS would be useful. For users wishing to learn the basics of GAMS, the guided tour and the GAMS tutorial in chapter 2 of the GAMS Users Guide are sensible places to start. Both of these resources can be found under the GAMS IDE help menu. GAMS license arrangements 1.6 A GAMS license is required to solve the Authority's GAMS-based models. Contact GAMS directly to purchase a GAMS licence. A minimal requirement is a runtime license for the GAMS base module and a GAMS open source solver. Users who already own a license for a solver that is suitable for solving the Authority's models may be able to purchase a GAMS solver link at less cost than a new GAMS solver of the same type. Again, consult GAMS directly for information regarding the details of all available options. 1.7 Feel free to email us at emi@ea.govt.nz if you wish to learn of our experience with the various solvers required to solve the Authority's models. 2 Using GDX files 2.1 There are many tools for working with GDX files; some of them are listed under the Contributed software section on the GAMS website. It is necessary to have GAMS installed on your computer 1 See www.gams.com. 1 of 9 20 March 2014 1.56 p.m. Working with GDX files in order to exploit the following tips, but it is not necessary to have a GAMS license. Six topics on working with GDX files are now discussed: (a) Write some GAMS code (b) Use GDXdump to create CSV or GMS files (c) The py-gdx (Python) utilities (d) Interfacing GAMS with R (e) Writing to Excel from the GAMS IDE (f) Passing data between GAMS programs using merged GDX files Write some GAMS code 2.2 2.3 An obvious way to interact with a GDX file is to write a little GAMS code. For example, the snippet of GAMS code shown below is taken from GEM and accomplishes the following: (a) Five sets called y, f, r, t, and lb are declared. (b) Two parameters called i_fuelPrices and i_NrgDemand are then declared. Note that i_fuelPrices has as its domain the sets f and y, while energy demand is indexed on sets r, y, t and lb. (c) The GDX file called GEMinputData.gdx is then designated as the GDX file to be read from, and the $load statements tell GAMS to reach into the GDX file and extract the nominated data. In GAMS parlance, the $load statements are said to initialise the previously declared symbols using the data read from the GDX file. Sets y 'Modelled calendar years' f 'Fuels' r 'Regions' t 'Time periods (within a year)' lb 'Load blocks' ; Parameters i_fuelPrices(f,y) 'Fuel prices by fuel type and year, $/GJ' i_NrgDemand(r,y,t,lb) 'Load by r, y, t and lb, GWh'; $gdxin "GEMinputData.gdx" $load y f r t lb $load i_fuelPrices i_NrgDemand 2.4 In a realistic setting, the program would then presumably continue with statements that make use of the loaded data. A similar syntax is adopted to write data from a GAMS program to a GDX file. For example, the following statement would write the symbols a, b and c to a GDX file called test.gdx. execute_unload "test.gdx" a b c 2.5 As an aside, it is worth highlighting that in the examples given above, the $load statement is effected during the compile phase whereas the execute_unload statement is effected during the execution phase. Running a GAMS job first compiles the code and then executes it. Both 2 of 9 20 March 2014 1.56 p.m. Working with GDX files actions – reading from and writing to a GDX file – are able to be implemented during either phase. 2.6 Additional information on using GAMS to interact with GDX files can be found in the document entitled GAMS GDX facilities and tools, which is available from the help menu of the GAMS IDE – see Help > docs > tools > GDXutils.pdf. Excel users in particular may find the tools GDXXRW, XLSDump and XLSTalk helpful. Use GDXdump to create CSV or GMS files 2.7 An easy-to-use tool, which is described in GAMS GDX facilities and tools, is GDXdump. GDXdump can be executed from within a GAMS program or, more conveniently, directly from the command line. Among other things, the GDXdump utility can be used to extract a symbol from a GDX file and write it to a CSV file. 2.8 By way of example, consider the parameter noted above called i_fuelPrices from the GDX file called GEMinputData.gdx. The following instruction entered at the command prompt will extract i_fuelPrices and write it to a CSV file called fuelPrices.csv: gdxdump geminputdata.gdx symb=i_fuelPrices format=csv cdim=y > fuelPrices.csv 2.9 The argument called symb denotes which symbol is to be extracted, the format argument instructs the resulting file to be of a CSV format, and the cdim argument (set equal to y for yes) says to use the last dimension in the domain of i_fuelPrices as the columns. The default option, or cdim=n, would write i_fuelPrices to the CSV file as a list rather than a table. 2.10 The first few rows and columns of the resulting CSV file look like this: "Dim1","2012","2013","2014","2015","2016" "Coal",4.14,4.14,4.13,4.13,4.13 "Lig",1.05,1.07,1.1,1.12,1.14 "Gas",4.95,4.68,4.98,7.28,7.89 2.11 Another useful function of the GDXdump utility is the ability to inspect the contents of a GDX file by dumping out a list of all symbol names along with their dimensions, type and any explanatory text. Executing the command: gdxdump geminputdata.gdx symbols 2.12 yields something like this on the screen (the complete symbol list has been truncated for presentation purposes): 1 2 3 4 5 6 2.13 Symbol Benmore coal cogen demandGen diesel e Dim 1 1 1 1 1 1 Type Set Set Set Set Set Set Explanatory text Benmore substation Coal fuel Cogeneration technologies Demand side technologies as generation Diesel fuel Zones Similarly, the following command would direct the output to a file called symbolList.txt rather than the console: gdxdump geminputdata.gdx symbols > symbolList.txt 2.14 Of course, while the GDXdump tool is useful for getting data out of a GDX file, it can't be used to put data back into a GDX file. But there is a simple enough way to do just. If GDXdump is executed without any arguments, it will write the contents of the GDX file (i.e. sets, scalars, 3 of 9 20 March 2014 1.56 p.m. Working with GDX files parameters, etc) to standard output formatted as a GAMS program with data statements. A GAMS programs by convention has the .gms suffix. The GDX file called GEMinputData.gdx can be written to a file called, say, GEMinputData.gms by issuing the following instruction at the command prompt: gdxdump geminputdata.gdx > geminputdata.gms 2.15 The resulting file is a legitimate GAMS program that can be edited using the GAMS IDE. Code can then be appended to the end of geminputdata.gms to manipulate the data, and an execute_unload statement can be used to write the result back to a GDX file. For example, if geminputdata.gms was created as just illustrated, and the following two lines were added to the end of geminputdata.gms, and the file was then processed or submitted as a GAMS job, a GDX file called highFuelPrices.gdx would be created. i_fuelPrices(f,y) = 5.0 * i_fuelPrices(f,y) ; execute_unload "highFuelPrices.gdx" i_fuelPrices 2.16 Yet another option is to use GDXdump to create geminputdata.gms and append the line multiplying fuel prices by five to the file, as shown above. Then, rather than use the execute_unload statement to write selected (or all) data symbols to a new GDX file, run the appended geminputdata.gms file as a GAMS program to create a new GDX file containing all symbols. For example, issuing the following command: gams geminputdata.gms gdx=newGEMinputData will yield a GDX file called newGEMinputdata.gdx. The py-gdx (Python) utilities 2.17 A collection of Python utilities called py-gdx is available for manipulating GDX files. Python is an 2 open-source, interpreted, high-level programming language. The py-gdx utilities are available for 3 download from GitHub. 2.18 The py-gdx utilities are designed to be executed from a command line. While the py-gdx package contains several utilities, the two main one are: 2.19 (a) gdx_insert_csv.py – inserts or replaces symbols in the GDX file with data from a CSV file. If required, it can be used to create a new GDX file from scratch. (b) gdx_extract_csv.py – produces a CSV file of symbol (parameter or variable) values in one or more GDX files. If parameter domains are missing from the GDX file, it guesses them. It can act as if several GDX files were one, or it can compare the same parameters in several GDX files. It can export a selected set of parameters, it can export all parameters that are defined over a selected set of domains, or it can export all parameters and variables from a GDX file. The CSV files produced by gdx_extract_csv.py are readable by gdx_insert_csv.py. Hence, it is quite straightforward to export an entire GDX file to CSV, edit it, and rebuild a revised copy of the GDX file. Help on all of the utilities can be seen by typing the name of the utility followed by --help. For example, type the following where <example> is replaced with the literal name from the utility of interest: python gdx_<example>.py --help 2 https://www.python.org/. 3 https://github.com/geoffleyland/py-gdx/. 4 of 9 20 March 2014 1.56 p.m. Working with GDX files Interfacing GAMS with R 2.20 Users of R, an open-source statistics application, are able to use the GDXRRW package to 4 transfer data back and forth between R and GDX files. A simple one-line statement in R will quickly transfer all or selected symbols from a GDX file to an R dataframe, and vice versa. This approach will suit users who prefer to use R as their primary tool for data work. Writing to Excel from the GAMS IDE 2.21 As illustrated in the image below, it is a straightforward matter to write, or export, from a GDX file to an Excel file from within the GAMS IDE. The steps are: (a) open a GDX file in the GAMS IDE (b) select or highlight the symbol of interest (c) right-click the symbol and choose Write followed by Write symbol to Excel file. 2.22 If the option to Write ALL symbols to Excel file is chosen, then the resulting Excel file will contain many worksheets, one per symbol from the GDX file. Data from a GDX file can be similarly written to HTML files or copied to the clipboard. 2.23 While this method is useful for getting data out of a GDX file and into an Excel, whereupon it can be edited, it doesn't provide a means of getting the modified data back into the GDX file. However, the GDXXRW tool noted earlier in paragraph 2.6 can be used to easily do that. Passing data between GAMS programs using merged GDX files 2.24 It is very common to run a model many times, once for each of many scenarios, and write the results from each model run to a GDX file. The GDX files can then be merged into a single GDX 4 http://www.r-project.org/. 5 of 9 20 March 2014 1.56 p.m. Working with GDX files file that can be used as the input into a GAMS-based report writing program. Such a report writer would then generate reports on all scenarios in a single step. 2.25 However, this seemingly simple process can be complicated if the set membership changes with each scenario. For example, consider running vSPD, say, for 30 days in a row. It is easy to imagine that in each trading period in each of the 30 days, the set of nodes or branches or offered units might be different. For report writing purposes, a ‘super set’ is desirable. That is, a set containing a unique listing of the entire union of set elements defined over all trading periods and days. Hence, the super set can be used as the domain for parameters read into GAMS, say, into a report-writer, from the merged GDX file, ensuring all data for all scenarios is loaded from the merged GDX file. 2.26 A collection of small GAMS programs that demonstrate the worked examples to follow should accompany this report – see combiningSetsFromMultipleGDXFiles.zip. Worked examples 2.27 The following GAMS programs should each be executed in the same order as they are described below to demonstrate how to prepare reports based on results from multiple runs of a GAMSbased model. Comments explaining what is being done at each step are included in each of the programs. Example1.gms 2.28 This program creates two sets, i and j, and three parameters, a, b and c. The domain of a is set i, the domain of b is set j, and the domain of c is both sets i and j. All five symbols (sets and parameters) are written to a GDX file called ex1_everything.gdx and a subset of symbols is written to a GDX file called ex1.gdx. 2.29 In practical modelling applications, it is helpful to collect all output from a model run in a single GDX file for archive purposes, e.g. ex1_everything.gdx. At some future date, data can be extracted from this file without the need to re-run the model. At a minimum, the archive GDX should contain all sets and parameters used in the model, all variable levels and all equation marginal values. Note that many symbols in a GAMS program are used as an intermediate step to create the parameters used in the model – these symbols probably don’t need to be archived. 2.30 One or more additional GDX files should be created to collect the information used to generate reports, e.g. ex1.gdx in the present case. The advantage of this approach is that the report writing process makes use of much smaller files. Similarly, if they need to be distributed by email or published on the web, it is convenient to be working only with those symbols actually needed. It is nearly always the case in any realistic modelling application that report writing requires only a tiny fraction of all the symbols created to formulate, parameterize and solve the model. Example2.gms 2.31 This program is very similar to example1.gms; the key difference is that membership of the sets i and j is different. Actually, there is some overlap in the set membership across the two programs. As with example1.gms, example2.gms creates two GDX files – ex2_everything.gdx and ex2.gdx. combineSets.gms 2.32 This program reads in sets i and j from both ex1.gdx and ex2.gdx. In the first instance, it reassigns the sets to new symbols, ii and jj. It combines the elements from each case to form a new ‘super set’ and then writes these combined super sets to a GDX file called 6 of 9 20 March 2014 1.56 p.m. Working with GDX files combinedSets.gdx. In the process of writing the GDX file, it changes the set names back to i and j. mergeGDXfiles.gms 2.33 This program creates and executes a batch file that copies ex1.gdx and ex2.gdx to sc1.gdx and sc2.gdx, respectively, and then merges sc1.gdx with sc2.gdx to create a yet another GDX file called mergedResults.gdx. 2.34 This step may appear trivial. However, it is a crucial step in being able to combine results from all scenarios into a single GDX file containing the same number of symbols as each individual scenario GDX file. 2.35 While in the case of just two scenarios this may seem insignificant, it is very powerful when there are many scenarios. Furthermore, every assignment statement in a GAMS report writing program can now be applied to all scenarios at once. In other words, there is no need to execute the report writing code once per scenario and then somehow (Excel vlookup, perhaps, if you’re very patient?) join all the results together into a single table. 2.36 The key to this process is the creation of the scenario set, sc, in mergeGDXfiles.gms. The choice of element labels in this set is deliberate – the first element is sc1 and the second is sc2. These labels are in fact the names given to the GDX files to be merged. The reason for this is that all symbols in the merged GDX file acquire a new domain index and the name of that domain is taken from the name of the files being merged. 2.37 For example, ex1.gdx (copied to sc1.gdx) and ex2.gdx (copied to sc2.gdx) each contained a symbol called c, which was defined on sets i and j. The merged file, mergedResults.gdx, also contains the symbol called c. But note how it is defined not only on sets i and j, but also on sc. Example3.gms 2.38 This program reads in the relevant data from combinedSets.gdx and mergedResults.gdx. It can be used as the beginning of a report writing program. A more sophisticated approach 2.39 All of the above can be made more sophisticated if the model is solved inside a loop on the scenario set. The GAMS put_utility can be used to write the GDX files each time around the loop and the GDX files would take their name from the text labels (.tl) of the elements assigned to set sc. The syntax would be something like this: Set sc / sc1, sc2, etc, etc / ; file dummy ; loop(sc, .. .. code to assign parameter values for this solve of the model .. Solve vSPD minimising TOTALCOSTS using lp ; .. .. more code (maybe) to do post-solve computations on model output .. put dummy ; put_utility 'gdxout' / sc.tl ; 7 of 9 20 March 2014 1.56 p.m. Working with GDX files execute_unload i j c or whatever needs to be merged in the GDX files ); execute gdxmerge sc*.gdx output=mergedResults.gdx 8 of 9 20 March 2014 1.56 p.m. Working with GDX files Glossary of abbreviations and terms Authority Electricity Authority CSV A file type that contains so-called comma-separated variables, which is nothing more than a comma-delimited text file with a .csv suffix GAMS General Algebraic Modeling System GDX GAMS data exchange - a binary file format with a .gdx suffix GEM Generation expansion model GMS A file containing GAMS code - a text file with a .gms suffix HSS Hydro supply security test IDE Integrated development environment SPD Scheduling, Pricing and Dispatch vSPD Vectorised Scheduling, Pricing and Dispatch 9 of 9 20 March 2014 1.56 p.m.