High Level Overview of IBM Mainframe Environment vs. Microsoft Windows Platform Howard E. Hinman hhinman@microsoft.com Enterprise Architect, Microsoft Consulting Services July 4, 2009 Version 2.0 Table of Contents Introduction ........................................................................................................................................................................ 4 Mainframe vs. Windows “Lingo” and Technology Breakdown .......................................................................................... 4 Mainframe Hardware vs. Windows Hardware ................................................................................................................... 5 Mainframe Physical Resource Allocation............................................................................................................................ 5 Mainframe Systems Software vs. Windows Systems Software .......................................................................................... 6 Operating Systems .............................................................................................................................................................. 6 z/OS ............................................................................................................................................................................. 6 z/VSE ........................................................................................................................................................................... 6 z/VM ............................................................................................................................................................................ 6 Windows Server .......................................................................................................................................................... 6 Security in the Mainframe z/OS (MVS) Environment ......................................................................................................... 7 On-Line Transaction Monitors/Application Management .................................................................................................. 7 Screen I/O Managers ................................................................................................................................................ 10 Screen Presentation Languages ................................................................................................................................ 10 Batch Application Management ............................................................................................................................... 11 IBM Mainframe Job Control Language (JCL) ......................................................................................................... 12 IBM REXX ............................................................................................................................................................... 12 Job Scheduler ........................................................................................................................................................ 12 Job Entry Subsystem (JES or JES2 or JES3) ............................................................................................................ 13 IBM’s POWER (z/VSE Environment) ...................................................................................................................... 13 Mainframe Data vs. Windows Data .................................................................................................................................. 13 The Mainframe System Catalog ................................................................................................................................ 13 Mainframe EBCDIC vs. ASCII Data Representation ................................................................................................... 14 Mainframe EBCDIC vs. ASCII Collating Sequence Issues ........................................................................................... 14 Mainframe Data Types .............................................................................................................................................. 14 Variable Length Data Files......................................................................................................................................... 15 Data Bases ................................................................................................................................................................. 16 Indexed File System (VSAM) ..................................................................................................................................... 17 Partitioned Data Sets (PDS)....................................................................................................................................... 18 Flat Files (QSAM) ....................................................................................................................................................... 18 2 Generation Data Groups (GDG’s) .............................................................................................................................. 19 Data Tools ................................................................................................................................................................. 20 Report Writers and Managers .......................................................................................................................................... 20 Source Code Management Systems ................................................................................................................................. 20 Programming Languages................................................................................................................................................... 21 COBOL ....................................................................................................................................................................... 21 IBM Mainframe Assembler ....................................................................................................................................... 22 PL/1 ........................................................................................................................................................................... 22 4th Generation Languages ......................................................................................................................................... 22 Development Management Technologies ........................................................................................................................ 23 TSO and ISPF ............................................................................................................................................................. 23 ICCF ........................................................................................................................................................................... 23 Rational Developer for System Z............................................................................................................................... 23 Microsoft .NET and Visual Studio .............................................................................................................................. 23 Summary ........................................................................................................................................................................... 24 Resources .......................................................................................................................................................................... 24 3 Introduction The purpose of this paper is to provide information related to comparing the IBM Mainframe environment to the Microsoft Windows platform – primarily for Windows professionals who are attempting to provide solutions to IBM mainframe customers. The similarities from a technology standpoint are sometimes striking and it’s good to have an understanding of such when considering leveraging Microsoft Windows in an enterprise mainframe organization. Mainframe vs. Windows “Lingo” and Technology Breakdown Below we have a high level and very simplistic breakdown of the “vanilla” IBM Environment – just slightly touching on some of the extensive third party software available. Function Mainframe Operating Systems MVS, z/OS, z/VSE, VM Windows Server On-Line Transaction Monitor/Application Management CICS, IMS/TM Internet Information Sever (IIS), Windows Communication Foundation (WCF), COM+, BizTalk Screen I/O Manager CICS, IMS/TM, VTAM IIS (ASP.NET), WinForms WPF, Silverlight Screen Presentation “Languages” BMS (CICS), MFS (IBM/TM) HTML, JAVA Script, others Batch Application Management JES, Batch Regions , JCL, Job Schedulers Windows Batch, Windows Processes, Powershell, Job Schedulers, COBOL Vendors have JCL Emulators Databases DB/2, IMS (DL/I) SQL Server Indexed File System (Key accessed flat files) VSAM none (use SQL Server), COBOL vendors have VSAM on Windows, Pervasive has Btrieve Flat Files QSAM, PDS, GDG Windows Flat Files, SQL Server Report Writers/Managers EZTREIVE SQL Server Reporting Services, Crystal Reports Source Code Management Systems Panvalet, Librarian, Endeavor, SCCS Team Foundation Server (TFS), Visual Source Safe, PVCS, Subversion 4 Windows Programming Languages COBOL, ASSEMBLER, PL/1, JAVA C#, VB.NET, COBOL, JAVA, Numerous other languages Development Management Technologies TSO, ISPF, ICCF Visual Studio Team System Let’s discuss what the functional areas above really mean and get into a bit more realistic detail. Mainframe Hardware vs. Windows Hardware IBM mainframes come in a wide variety that you may encounter. The good news is that little focus is needed on the differences in various models and sizes with the exception of estimating the horsepower in play (generally measures in MIPS (millions of instructions per second). A somewhat older (from 2005) paper was done on this subject and is available here. IBM has done a fairly good job of upwards compatibility over the years. Mainframe computers do not have peripherals built in like many Windows Servers, although Windows Server using SANs (Storage Attached Networks) are a closer match to the mainframe hardware environment. IBM mainframes typically are just CPU’s and RAM with high speed I/O channels to connect to various types of peripherals (e.g. (Disk Drives, Tape Drives, Terminal Controllers, etc). In many cases, an older IBM mainframe can be upgraded to a newer model by simply unplugging the old mainframe from the peripherals, removing it, moving the newer mainframe in place and reconnecting the peripherals, and rebooting (known as “IPLing” – Initial Program Loading in mainframe lingo) . While some after configuration may be required or even desired to reallocate resources, it can be far simpler than replacing a Windows Server with a newer type of hardware – which typically requires reinstallation of the Windows Server operating system and all of the application software on top of reconfiguring. Because of this fact, the main concerns regarding mainframe hardware vs. Windows hardware are typically capacity, performance and reliability. In effect, how big of a Windows Server do I need to replace a mainframe or how many windows servers do I need to scale out to replace a mainframe? There is nothing magic about mainframe hardware vs. windows hardware – each is capable of doing what the other does – executing instructions and moving data. At the end of the day, this is all that computer platforms do, typically. There are white papers on hardware performance characteristics and sizing on http://www.microsoft.com/mainframe that assist in determining how much Windows hardware you may need to replace mainframe resources. This paper is going to focus more on the more complicated software environment. Mainframe Physical Resource Allocation Modern IBM mainframes are typically viewed as some number of processors (CPUs), RAM (memory) and I/O Channels leading out to peripheral resources such as disk drives, terminals, tape drives, etc. An IBM mainframe (or set of mainframes in a datacenter) is typically “carved up” resource wise into one or more logical partitions known as “LPARs”. This is not the same as virtualization of resources as an LPAR is typically associated on a one to one basis with physical resources (e.g. CPU’s and RAM). A single modern IBM mainframe will generally have at least one LPAR (if for all of the resources) or instead maybe divvied up into multiple LPARs. This is done for management and security purposes. When multiple LPAR’s exist, this is known as a Sysplex. A Sysplex can span one or more machines. 5 Within a single LPAR, multiple Regions can be defined. I like to think of a region as being somewhat equivalent to a Windows process. Separate regions typically exist on a mainframe for CICS, DB/2, and Batch Region(s) for batch applications. There is often an additional level of granularity defined by multiple separate regions for development, test and production versions of applications. I don’t believe that Windows itself provides an LPAR like capability, although in the Windows world, this is more of a hardware driven capability. Some manufacturers of larger Intel or AMD based Windows servers allow the servers to be partitioned into a set of logical servers assigning CPU’s and Memory to these logical servers much like an LPAR approach. Each logical sever then has its own instance of a Windows operating system (and/or any other operating system that can run on the platform). Windows Server does come in a Data Center Edition that can also manage multiple servers for purposes of clustering and high availability. Additionally, in the Windows world, it is much easier to distribute physical resources by use of multiple individual servers or blade servers, which I believe explains for the lack of a need for an LPAR capability in the Windows world. The heavy use of virtualization (which also exists separately on the mainframe) in the Windows world also negates the need for logical partitioning. Mainframe Systems Software vs. Windows Systems Software I like to distinguish between “systems software” and “application software”. Systems software means software that is provided by a vendor like IBM or Microsoft, such as an operating system, or a transaction monitor like CICS or IIS that supports the system environment. Application software typically means customer developed programs such as COBOL applications on a mainframe or user written C# programs on Windows – or COTS (off the shelf applications software) packages – vendor developed applications software. While this paper will discuss application software somewhat, the main focus is on the systems software environment that supports applications and application development. Operating Systems In the IBM mainframe environment there are three main operating systems typically encountered. z/OS z/OS (formerly known as MVS) is IBM’s current flagship large scale mainframe operating system. This is the replacement for the older MVS operating system although you may often find these terms used interchangeably. This is where the majority of computing takes place in the IBM mainframe world. z/VSE z/VSE (formerly known as “DOS/VSE” and just “VSE”) is IBM’s smaller scale mainframe operating system – on smaller and less powerful mainframes. z/VM z/VM (Virtual Machine) is IBM’s operating system that allows for virtualizing the above two systems. You will rarely encounter this. But it’s useful to know that virtualization has been alive and well in the IBM mainframe environment since the 1970’s. Windows Server In the Microsoft Windows environment, Windows Server 2008 is the current flagship Microsoft Server operating system. There are multiple editions targeted at smaller (Standard Edition) to larger (Enterprise Edition) to very large scale (Data Center Edition). Windows Server 2008 is an extremely advanced operating system and stacks up well to z/OS 6 functionality. In many cases, significantly cheaper Intel based servers running Windows Server operating systems provide significantly more capacity and power than IBM mainframes at a fraction of the cost. Security in the Mainframe z/OS (MVS) Environment One of the areas that Windows has been unfairly criticized is security vs. the mainframe. The reason that this has been an unfair criticism is that Windows is both a Server and a Client Desktop operating system which means that millions of Windows based machines have been exposed directly to the Internet and its potential dangers – both Servers which often act as Web Servers (or are exposed for other purposes) and client desktop machines which are used by the masses to access the Internet. Very few mainframes on the other hand are exposed directly to the dangers of the Internet. Windows actually offers many separate layers of security that are not found in a mainframe environment. Determining the options and the proper configuration of such can be challenging however. The main facility on a z/OS mainframe (formerly “MVS”) is typically an IBM product called RACF (Resource Access Control Facility), although there are some optional 3rd party products in use as an alternative such as “Top Secret” and “ACF/2”. RACF generally implements access policies against resources and mainframe facilities such as CICS, DB/2 and even batch jobs running under JES interact with RAFC for security rights to access resources. RACF also integrates in with LDAP to assist in integrating heterogeneous environments including to Microsoft’s Identity Management Server (see article here). There is an article discussing CICS using RACF security vs. Windows 2003 Server security here for more information. On-Line Transaction Monitors/Application Management IBM provides primarily two main transaction monitors for running on-line applications (applications that typically interact with users) on the mainframe. These are system software level facilities (although they run as applications under the main operating system – much like IIS runs under Windows). These are: CICS (Customer Information Control System) - IBM’s most popular and most common on-line transaction monitor is CICS – known affectionately as “KICKS”. CICS is the most common transaction monitor found on IBM mainframes. It is available for both z/OS and z/VSE. This is where on-line applications typically run in an IBM shop. CICS manages both transactional integrity with the ability to roll back data or commit such on a transactional level as well as interfacing with a number of external devices such as terminals, ATM’s, printers, etc. If you see a character mode green screen application running off an IBM mainframe, chances are it is running under CICS. CICS uses a macro assembler language known as “BMS” (Basic Mapping Services) to externalize interfacing to devices such as terminals from applications. When a programmer needs to create a screen or set of screens in CICS, they write a BMS macro or use any number of available BMS screen painters that generate BMS macros. The macro is “Generated” (assembled) and this process produces a load module and also creates a COBOL COPY file that contains the layout of the data fields on the screen (an application interface block between the program and the screen). COBOL programs then use this data structure to pass data to and from the screen at run time using CICS programming API’s. Application programmers typically access CICS services using a macro level programming interface known as “EXEC CICS” (similar in nature to SQL’s “EXEC SQL” interface). A pre-compiler is run before the program compiler 7 that actually comments out the EXEC CICS statements and inserts a lower level CALL interface to the CICS API that is compatible with the language in use (e.g. COBOL). CICS Services that are callable from application programs include File I/O, Screen I/O, transactional COMMIT/ROLLBACK, program flow though transactions, exception conditional handling, some base services like date formatting, and others. Additionally, IBM has enhanced CICS over the years to work with JAVA and provided the ability to wrapper existing mainframe based CICS applications with Web Service interfaces and/or HTML front ends to be exposed to the internet. Some people “affectionately” refer to this as “lipstick on the pig” as large COBOL applications (for example) running on the back end are simply given an HTML front end with little or no change. On the systems side, CICS uses Resource Definition Tables (RDT) including the Program Control Table (PCT) where programs are associated with transaction codes, File Control Table (FCT) where data files under the control of CICS are defined and exposed to programs via symbolic names, and a few others (e.g. Terminal Control Table – TCT). To begin executing a CICS transaction, an end user typically logs onto CICS, and enters a transaction id, which is then looked up in the PCT which tells CICS which application program to launch and away you go. When an application program exits it typically executes a CICS RETURN statement and specifies a transaction id to be executed when the user hits an attention key on the keyboard (sometimes itself, other times another program and if no transaction id is specified the overall transaction is considered completed). When a CICS program wants to invoke another program is typically has two choices – it can execute a CICS LINK command to call the program associated with a specified transaction id, which means when the called program exits, control is returned to the original program that issued the LINK. Or it can do a “transfer of control” (CICS XCTL) which effectively drops the entire call stack and will return back to the CICS executive if the called program returns with specifying a transaction id to invoke upon return. There are actually a number of CICS emulators for Windows available from various sources including the COBOL vendors. It is possible to move mainframe CICS/COBOL applications to Windows using these COBOL vendors’ products and they can execute under CICS emulators with little or no change, or they can be migrated to run under Microsoft’s IIS as web applications. Some COBOL vendors even offer products that convert CICS applications into IIS applications including migrating the screen definitions (BMS Macros) into ASP.NET. IMS/TM (Information Management System/Transaction Monitor). IMS/TM is far less common than CICS today. This was previously known as IMS/DC (Information Management System/Data Communications). The IMS product family also includes an older hierarchical database system known as IMS/DB. Much like the SQL language is used to communicate with relational databases, a language (actually a callable API) known as DL/I (Data Language Interface and sometimes called DL/1 (the number one instead of the “I”) is used to access IMS databases. As a side note, the IMS/DB (older hierarchical database) product can be installed on IBM operating systems without IMS/TM and accessed from mainframe applications and even from CICS acting as the 8 transaction monitor instead of IMS/TM. When doing this under CICS, there is a macro language known as “EXEC DLI” used with a pre-compiler later converting this into a lower level CALL interface. IMS/TM is not coupled with the IMS database however. Online applications running under IMS/DC may access IBM’s DB/2 relational database using SQL and conversely, online applications running under CICS may access IMS/DB databases using DL/I. While IMS/TM is only available under the z/OS operating system (not available on VSE), IMS/DB (the hierarchical database system) is available on both z/OS and VSE. Like CICS, IMS/TM provides another macro assembler language known as “MFS” (Message Formatting Services). MFS macros can be written or generated using a screen painter and like CICS this process produces a load module and a COBOL COPY file that is used as the data interface between a screen and a program. I am aware of one IMS TM/DB emulator available for Windows from one of the COBOL vendors. Microsoft’s Internet Information Server (IIS) Microsoft’s primary transaction monitor for on-line applications is IIS (Internet Information Server), which is a Web based transactional monitor providing capabilities similar to mainframe transaction monitors. If you examine the characteristics of typical mainframe on-line applications such as CICS, there is a striking architectural similarity to IIS on Windows. The CICS programming paradigm is known as “Pseudo Conversational” which means that programs typically display a screen, create a scratch pad area (known as a “COMMAREA” or Communications Area), exit the program and then wait for a users to hit an “attention key” (e.g. the “Enter” key). Once an attention key is hit, CICS takes the data from the screen, determines which program is supposed to be called to receive the screen data (often the same program that sent the screen out in the first place), calls that program and passes the COMMAREA back in. This maps directly to IIS and internet server behavior in general. A typical web application pushes a screen to a client machine (via the web browser), and no persistent connection between server and client is maintained. Instead, the server technology (IIS) and ASP.NET running under such provides a scratch pad area known as a “session state”, when a user hits an “attention key” (mainframe terminology) in the browser such as a Submit button, data from the screen is returned to the server and IIS determines which program to call (send an event to) and the session state is available if needed. This maps on to the way that CICS pseudo conversation programs work exactly. In the mean time, IIS and other Windows technologies (e.g. SQL Server) provide transactional capabilities with full data COMMIT/ROLLBACK capabilities. Again, a very similar match to current mainframe transactional technologies. COM+ - COM+ (Component Object Model +) is a Microsoft technology used heavily for distributed transactions. It is utilized heavily by Microsoft’s IIS to provide transactional capabilities as well. COM itself encompasses a number of Microsoft object component technologies including DCOM, OLE, and ActiveX. COM+ is actually a later generation of Microsoft transaction Server (MTS). COM+ is not a .NET technology although the .NET framework includes a wrapper for it that allows .NET applications to interact with COM+. Windows Communications Foundation (WCF) – While WCF may not be a traditional “On Line Transaction Monitor” in that not all use of such is necessarily transaction oriented, nonetheless it does support a 9 transactional model in many cases and is worth mentioning here. It additionally supports a number of messaging technologies. BizTalk – BizTalk is Microsoft’s integration and connectivity server product. It provides the ability to connect disparate systems together running on a variety of platforms. It includes an orchestration capability with Workflow, data transformation services and an adapter framework that allows for connection to almost any type of system accessible via a variety of protocols on a network (including IBM Mainframe). Screen I/O Managers The vast majority of screens (known affectionately as “green screens”) that are presented to end users by an IBM mainframe are created and managed using CICS and its related technologies. Screens can also be created by on the mainframe using IMS/TM related technologies and under ISPF Dialog Manager (discussed below). While mainframe screens are usually created using Assembler language macros (e.g. “BMS” for CICS, and “MFS” for IMS/TM), there are screen painters (albeit character mode only) that allow programmers to lay out screens and generate these Assembler macros. On the mainframe, the systems software facility that typically interacts with terminals - including PC’s with 3270 terminal emulators - is known as VTAM (Virtual Telecommunications Access Manager). VTAM is actually rarely discussed and most mainframers think of CICS as the screen I/O manager – even though CICS interacts with VTAM at a low level to perform screen I/O. ISPF (Interactive System Productivity Facility) is the development environment on a z/OS mainframe running under TSO (Time Sharing Option). It includes a facility known as Dialog Manager that allows users to create character mode interactive screens that run under TSO. Dialog Manager is rarely used for production but rather to enhance the mainframe development and configuration management environment. There are some exceptions however. On the Windows side, there are a wealth of screen manager technologies – with Microsoft technologies primarily under the .NET Framework. For GUI Desktop applications, there is Windows Forms, and the newer Windows Presentation Foundation (WPF). For Internet application, Web Forms leveraging ASP.NET is the general workhorse similar in usage to CICS, along with newer and more advanced technologies usable on the Web such as Silverlight (which actually leverages WPF on the Web), and even XAML. Screen Presentation Languages On the mainframe, applications typically interface to screen I/O using a special API which is dependent on the transaction monitor they are running under (e.g. CICS) and screens themselves are developed outside of the actual application programs with a standard data interface defined between each screen itself and the application program working with such. The screens themselves are generally written in a non application programming language such as Macro Assembler. For CICS, Basic Mapping Services (BMS) is used to create screens, and in IMS/TM, Message Formatting Services (MFS) is used. While complex in the fact that Assembler code is used to develop such for BMS and MFS, because the screens are typically character mode (text based and non-graphical), the layouts are pretty much as simple as writing code indicating where on the screen each field should appear, what the length of the field is, what attributes should be turned on (e.g. highlighting), if the field is read only (e.g. a label) or read/write, and a few other simple details. Screens are typically requested to be displayed by a program with “attention keys” defined that create a sort of event. The program then exits immediately, the system waits for the user to hit one of these attention keys and a program is 10 fired up which will then request for the screen data to be read into the program and execution will proceed. Typically, validation of screen input is part of the application – not the screen management system itself. This contrasts greatly to GUI screen development and the event driven model of many non mainframe platforms – although emulators are available for Windows that mimic this mainframe behavior to great detail if desired. On Windows, there are several different screen presentation languages – ASP.NET, Windows Forms, WPF, etc. which typically adhere to event driven programming. Batch Application Management One of the most misunderstood (and I believe underappreciated by Windows people) environments on the mainframe is the batch environment. A batch job is a program or set of related programs that executes without any user interaction. For example, a batch job might run once a month to print bills that are mailed to customers. It simply kicks off at a predetermined day and time, determines all of the charges that have accumulated by reading files or a database, computes the charges, taxes and penalties and prints a bill for each customer found in the system. A Scripting language may be used to string together multiple steps of a batch process if it requires more than one application to be executed to create the overall process. This scripting language may point the application dynamically at external resources such as data files, provide logic to check return codes for success or error, and provide conditional branching logic for proceeding down the other potential job steps if there are multiple steps required to complete the process. Windows now offers the very powerful Powershell scripting language. Many mainframe shops are more batch oriented than on-line oriented. What this means is that instead of transactions that perform instantaneous updates and results as we are used to in the Windows world, work is often batched up and accumulated over time and then an application is executed that processes all of the updates later en masse. Batch applications may run once an hour, once a day, once a week or even less often depending on their purpose and the processing power, state of data and business processes in a shop. It should be noted however, that the mainframe batch environment is generally far more complex and rigid than that of running simple .bat or .cmd files (or even Powershell scripts) on a Windows machine. On the mainframe (for example) information about resource consumption by batch jobs may be recorded using add on packages that analyze such for purposes of capacity planning or charge backs to departments sharing the mainframe. While there are capacity planning tools for Windows, imagine (for example) an entire company moving from mainframe to Windows who have for the past 30+ years assigned IT budgets to individual departments based upon their actual cost of resource consumption of their specific applications running on a shared mainframe – including CPU consumption, Disk consumption, I/O consumption, print job consumption, etc. Imagine trying to gather up these statistics in a multi Windows Server environment. The good news is that Windows is an excellent batch platform – particularly because it is easier to separate batch from On-line processing needs across multiple physical or virtual servers. The cost of adding horse power or scaling out is far cheaper and much more realistic than in the IBM mainframe environment – a major plus for Windows. Many mainframe shops cannot afford to run both on-line CICS and batch applications concurrently and often the model is that they take down CICS at night and run batch applications at night during off hours when on-line may not be required. To appear to be more modern, many of these shops sport web applications that appear to be 24x7 but in effect do not do updates immediately and instead batch requests up and run them at night during batch processing or have specific periods during the day when they do batch runs to give the appearance of 24x7 real time processing. 11 In order to better understand and appreciate the mainframe batch environment, it is best to break the discussion out into a couple of different areas. IBM Mainframe Job Control Language (JCL) Although most computing platforms offer some sort of batch scripting language or even more fully fledged programming languages with systems extensions (e.g. the use of ALGOL in the Unisys ECL environment), I’ve found IBM’s JCL to be the most complex. It’s worth mentioning that IBM’s JCL differs in syntax and support between operating systems. MVS and z/OS JCL differ greatly from VSE JCL for example and are not portable across systems. This complexity is not confined to the language or syntax itself, but exists because of the way in which JCL ties into the IBM mainframe operating system facilities and the set of utilities that are associated with it. Additionally, there are a large number of significantly powerful and complex add on tools such as job schedulers, logging and restart facilities and report management systems that create extremely powerful environments in the batch realm. Mainframe JCL interacts with the system catalog to allocate disk space, create and delete data files and even manage GDG’s (discussed below). Additionally a set of utilities is made available via JCL such as IDCAMS for VSAM utilities, a very powerful SORT (something else not provided by Windows), and others. The SORT utility may be leveraged as a complete applications environment in itself in some shops. There are third party packages available for Windows to fill in some of these needs and it is also worth remembering that the COBOL vendors typically provide a JCL emulation environments with a system catalog and these utilities being made available on Windows. When moving away from COBOL however, a great deal of thought needs to be performed to determine how to migrate the business requirements around batch processing to a new Windows environment. This might include replacing some batch operations with real time update for example. For more information on JCL, click here. IBM REXX IBM offers another more generalized scripting language by the way known as REXX. Some people consider REXX to be closer to a high level programming language, but I classify it as scripting because it is typically an interpreted language and was used primarily to write utilities and such as opposed to writing production applications in it. There have been some REXX interpreters/compilers offered for Windows in the past. REXX may be embedded in IBM JCL as well. Job Scheduler Almost every mainframe shop will have a job scheduler software package that actually manages when batch jobs (known as “JCL Streams”) will be submitted to the Job Execution Subsystem (JES) and executed. Job Scheduling is typically a very complex process on the mainframe. The entire batch application yearly schedule and dependencies which affect scheduling are typically programmed into the Job Scheduler. The Job Scheduler will also interact with other sometimes optional systems software applications that might hook in and deal with application exceptions (known as “abends” on the mainframe), automate restoring data files and managing automated restarts of job steps or entire job streams, and log information regarding resource consumption and job metrics. Microsoft provides a job scheduler in the Windows® HPC Server 2008, which brings high performance computing (HPC) to industry standard, low-cost servers that support larger and heterogeneous clusters. Jobs—discrete activities scheduled to perform on the HPC cluster—are the key to operating Windows HPC Server 2008. Cluster jobs can be as simple as a single task or can include many tasks. Each task can be serial, running one after another, or parallel, running 12 across multiple processors. Tasks can also run interactively as Service-Oriented Architecture (SOA) applications. The structure of the tasks in a job is determined by the dependencies among tasks and the type of application being run. In addition, jobs and tasks can be targeted to specific nodes within the cluster. Nodes can be reserved exclusively for jobs or can be shared between jobs and tasks. There are additionally third party job schedulers available for Windows such as Active Batch and CA’s Autosys. Job Entry Subsystem (JES or JES2 or JES3) The main IBM operating system z/OS (previously MVS) provides a Job Entry Subsystem (JES) that manages batch JCL streams and manages the output of such (e.g. “SYSOUTS” or output files including reports). JCL jobs are loaded into JES which manages the execution of such and tracks the output artifacts. Print spooling and management is a major facility provided also. SYSOUTs are output files that include an execution summary report of the JCL stream that it is associated with, as well as reports generated by the applications programs. These are static text based reports typically, but may also include IBM AFP graphics reports (kind of the PCL of the mainframe). JES provides an interface for managing and finding these SYSOUT reports as well which can easily accumulate into the tens or hundreds of thousands or more in very large shops through a year. IBM’s POWER (z/VSE Environment) The job entry subsystem on IBM’s smaller mainframe operating system – z/VSE – is known as POWER and provides similar facilities to JES on a somewhat cut down basis. Mainframe Data vs. Windows Data Data in the IBM Mainframe environment is quite different physically than data in the Windows environment. However, data may typically be migrated from mainframe to Windows and converted into Windows formats and the opposite is true as well. The Windows COBOL vendors go to great lengths to support mainframe data types natively in COBOL on Windows and even provide intelligent data conversion tools. However, when not using COBOL on Windows, then special considerations come into play. Data files on the IBM mainframe are typically referred to as data sets. Microsoft has a facility in Host Integration Server (HIS – now part of the BizTalk family) that can interrogate mainframe file and database definitions in COBOL and create automated data conversion templates – to be used for data migration or real time data translation for cooperative applications between Windows and the mainframe. BizTalk/HIS have adapters available that allow for the direct access of DB/2 databases, VSAM and QSAM files. The Mainframe System Catalog One significant difference between the mainframe and Windows is that the mainframe implements a system catalog where data files are typically registered (“cataloged” in mainframe terminology) and which includes attributes about the files not kept in the Windows file allocation table. This (for example) includes file type, record length, record type (e.g. fixed or variable), and some additional details about each file. This makes it easy for mainframe based system software facilities and tools writers to perform generalized file activities as they can query and interact with the system catalog to find out and record information about any file. The COBOL vendors typically implement the mainframe system catalog concept on Windows using their JCL (Job Control Language) emulation facilities. For more information search on “catalog” under the MVS File System section in the article here. 13 Mainframe EBCDIC vs. ASCII Data Representation The IBM mainframe environment relies on the EBCDIC format to store data, while Windows relies on ASCII. Additionally, there are data types in the IBM mainframe environment that either are not native to Windows or are represented in a different physical format. This presents a couple of challenges when migrating mainframe data to Windows. We will discuss below. It is important to note, however, that mainframe data can be transformed (migrated) to Windows using a variety of techniques and tools available. If you choose to use COBOL on Windows, this is greatly simplified because the Windows COBOL vendors support mainframe data types and provide migration tools for such. If you are moving away from COBOL but need to migrate mainframe data, then selecting the appropriate data model is key in determining what approach to take when doing such. For example, if data is going to be migrated from DB/2, VSAM and/or QSAM (discussed below) to SQL Server, there are good approaches for such. A combination of SQL Server and flat files on Windows may also be a desirable approach. For a more detailed discussion on ASCII vs. EBCDIC in general, go here. Mainframe EBCDIC vs. ASCII Collating Sequence Issues Issues that can adversely affect applications being migrated from the IBM mainframe environment to Windows can arise because of the difference in the collating sequence between EBCDIC and ASCII. In ASCII, the numeric digits 0-9 are represented in sequence in hex values x”30” through x”39” with alpha characters coming afterwards - starting with x”40” for the letter “A” and up. In EBCDIC, the same numeric digits 0-9 are represented by hex values x”F0” through x”F9” while the alpha characters come before – starting with x”C1” for the letter “A” and up. This means that on Windows, numeric text when compared to alpha text is less than alpha, while exactly the opposite is true on the mainframe. So if you consider programming logic such as: IF “1” < “A” DISPLAY “1 is less than A” ELSE DISPLAY “1 is greater than A” END-IF If you execute this identical code on the IBM mainframe and on Windows you will achieve different results. The same holds true for SQL statements such as: SELECT * FROM CUSTOMERS WHERE ACCOUNT-NUMBER > “ZZZZZZZZ” If you have account numbers that are alpha numeric for some reason, you may receive different results from the above SELECT statement when running on the IBM mainframe vs. Windows. You can, however, turn on the EBCDIC collating sequence in SQL Server which is aware of such (although the data physically remains in ASCII (UNICODE) in the SQL Server database). This type of issue will only surface if you have alphanumeric fields in your data that you perform conditional logic upon. It is typically easily fixable once recognized programmatically. For a more detailed discussion of EBCDIC vs. ASCII collating sequence issues go here. Mainframe Data Types Migrating data from the mainframe can sometimes be more complex than migrating COBOL programs because of the ASCII vs. EBCDIC format differences - coupled with mainframe data types that don’t exist natively on Windows or exist in 14 a different physical format. One of the Windows COBOL vendors supports all mainframe EBCDIC data types in native EBCDIC on Windows by the way. The upside of this is easy migration of COBOL applications and data to Windows (when using COBOL). The downside is that almost nothing else in the Windows environment (e.g. SQL Server) can work with EBCDIC data. You will often find combinations of the four most common mainframe data types in mainframe data files and data bases: o EBCDIC Text – text fields containing EBCDIC text, special characters and/or spaces. This can typically be easily converted to ASCII equivalents. o Zoned Decimal – this is effectively a text field containing a number (unsigned zone decimal) with the exception that if it is defined as signed, either the leading or trailing byte will use a special character to combine the last digit and the sign. Mainframe EBCDIC Unsigned Zone Decimal fields may be easily converted to ASCII because it is just a text conversion of the numeric digits 0-9. Signed Zoned Numeric fields however, are a bit trickier to convert. For more information on this data type go here. o Packed Decimal (known in COBOL as “COMP-3”) – this is a specialized numeric format that is well suited to mathematical operations on the IBM mainframe hardware platform. There is no direct equivalent on Intel or natively in Windows, although the Windows COBOL vendors support this format in the same physical format as the mainframe. For a detailed description of this format go here. Unless using COBOL on Windows, this format must be converted to something else, such as an integer format. The Windows COBOL vendors support this format just fine. It requires special consideration, however, for conversion to something like SQL Server. o Binary (known in COBOL as “COMP”) – This is effectively Integer in mainframe format. The main physical difference is that mainframe binary is in a sort of reverse byte order from Intel binary (the “Big Endian/Little Endian” format difference). The Windows COBOL vendors typically support both the mainframe and the Intel variations of this format. Again, it requires special consideration for conversion to something like SQL Server. For more information on this format, go here. Variable Length Data Files Both Windows and the mainframe support variable length data files. However, it may be dangerous to use the native Windows mechanism for storing variable length files if the files contain data types other than text. The reason for this has to do with the approach taken to physically store variable length files. Windows’ native format for storing variable length files derives from the storage of simple text files to save disk space. Windows uses a record delimiter which is typically a carriage control and line feed character at the end of each line to signify the end of a record. This translates into hex as x”0D0A” (two bytes added to the end of each record). So the system will begin reading a file on disk from its current position until it hits the x”0D0A” bytes which will tell it where the end of the record occurs and return data up to this point, positioning in the file just after the x”0D0A” bytes to begin the next record. This works quite well for text files. This format is also known in COBOL terminology as a “LINE SEQUENTIAL” file. The mainframe takes a different approach which is to prefix each record with a couple of bytes that indicate the length of the proceeding record. This is known in mainframe terms as the “LLZZ” field. So each variable length record in a mainframe flat file will have the LLZZ bytes on the front of the record which contain a binary number indicating how long 15 the following record is and the mainframe will read that number of bytes and stop – which should set its position onto the start of the next record’s LLZZ field. The problem with the Windows approach for mainframe oriented variable length data files is that if they contain binary data fields, the record delimiter – x”0D0A” is a valid byte combination in certain binary numbers and thus could be mistaken for a premature end of record – thus cutting off the record being read and leaving part of the current record as the start of the next record. While Windows does not natively support variable length flat files containing binary data (actually it may support this if you have no x”0D0A” byte occurrences in your data file), there are multiple solutions for such. First of all, remember that the Windows COBOL vendors support this just fine if you are using COBOL. Secondly, you may want to consider just where you want this data if you are not using COBOL and migrating to Windows. You may want to convert it to a relational database using SQL Server. This provides many benefits including field level selection and reporting services on such. Variable Length data storage using the mainframe approach (prefixing each record with a record length) is easily recreatable programmatically in .NET programming using C# or VB.NET. Additionally, the SQL Server FileStream facility can be used which adds transactional integrity if needed. XML may be another viable approach if there are additional benefits in having the data in XML format. For more information regarding how COBOL deals with variable length records, go here. Data Bases The most common database found on IBM mainframes today is IBM’s DB/2 which is a relational database system similar in functionality to Microsoft’s SQL Server. You might sometimes hear reference to “UDB” (IBM’s Universal Database family of which DB/2 is a member) IBM’s relational database offerings on all platforms. Interestingly enough DB/2 is a completely different product from the rest of the UDB family having been written in an entirely different language although many mainframers believe that UDB on another platform (e.g. IBM’s AIX system) is exactly the same as DB/2 on the mainframe. Microsoft’s enterprise relational database offering is SQL Server which is regarded by many as being as powerful and scalable as DB/2, providing very similar functionality when compared to DB/2. I have moved many applications myself from the mainframe DB/2 environment to SQL Server with no problems. My experience has typically been that these applications execute significantly faster on the Windows platform. Additionally, facilities like SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS) can be brought to bear to greatly enhance functionality and agility at a much lesser cost than similar mainframe add-ons to DB/2. I was involved in a customer driven proof of concept in May of 2008 wherein the customer’s DB/2 on a large mainframe was performing about 470 transaction per second (a transaction in their terms was a unit of work – a COBOL application reading a sequential file and performing approximately 9-13 database accesses trying to do a match on the record). I was able to achieve over 1800 transactions per second on Windows by running multiple instances of the application in parallel against a SQL Server using the customer’s data imported from DB/2 to SQL Server. There are other databases in use on IBM mainframes including but not limited to IBM’s IMS/DB, and several 3rd party vendor databases like Adabas, Datacom/DB, IDMS and others. 16 Indexed File System (VSAM) IBM Mainframes offer an indexed file system known as VSAM (Virtual Storage Access Method). VSAM seems to confuse a lot of non-mainframers. It is actually a very straight forward concept. Data is stored in record format (a VSAM file consists of one or more similarly defined data records). One part of the record, however, is defined as a primary key. Optional Alternate Indexes may be defined on other parts of a record as well. For example a record containing all of a customer’s information may consist of a number of individual fields and the primary key might be the customer number which might exist in columns 1-10 of each record. Records can be accessed directly by requesting the appropriate customer number as the key. An alternate index might be created on the customer name field which exists somewhere else in the record – allowing customers to be looked up by either customer number (the primary key) or by customer name (an alternate key). VSAM actually offers multiple types of file organizations. It is primarily used as Key Sequenced (KSDS) with keyed access. It can also be used where a relative record number is the key (RRDS) or in the actual time sequence the file was created (ESDS). VSAM ESDS data sets are essentially logically the same as a sequential flat file. This means that records can be read sequentially like a typical flat file, or they can be accessed by one or more keys. It is important to note that a VSAM file consists of a data portion which has the actual records and a primary key portion that must be part of the record as well and the VSAM “cluster” contains an index with pointers into the data portion (it’s actually slightly more complicated than this, but this view is a good one). The combination of the index and the data portions is known as a “cluster” in VSAM terminology. One or more alternate indexes may be created as well with their own set(s) of key(s) with pointers into the data portion. VSAM is not transactional in nature itself – it does not provide COMMIT/ROLLBACK capabilities natively. CICS adds transactional integrity to VSAM files by journaling file I/O and allowing providing rollback capabilities using such. It’s important to understand that VSAM knows almost nothing about the data it stores except for key values. A 10,000 byte record containing numerous fields with a primary key means that VSAM knows about the key but has no idea what’s in the rest of the record. It’s just a big glob of data it picks up and delivers to an application requesting the record or writes to disk when requested to write a record. VSAM is not a database system – it is an indexed file system. IBM’s DB/2 relational database itself actually uses VSAM extensively as its physical storage mechanism. VSAM files can be accessed by using keys or read in sequential order or a combination of both (commands are available to position somewhere within a VSAM file and then start reading sequentially). Mainframe batch applications (e.g. COBOL programs) access VSAM with straight forward READ/WRITE/DELETE/UPDATE statements, along with START statements that position in various places within a VSAM file given parameters included with the START command. CICS applications must request records from a VSAM file using CICS service requests. The reason for this is that CICS adds transactional capabilities to VSAM files by forcing all requests for I/O through CICS services so that it may journal (keep track) of all operations in order to allow CICS to COMMIT or ROLLBACK transactions that involve VSAM I/O. While Microsoft does not offer an indexed file system on Windows, there are multiple third party packages like Pervasive’s BTRIEVE, CISAM, etc. The COBOL vendors offer their own VSAM equivalents on Windows as well. In many cases, mainframe VSAM data is converted to SQL Server which provides a wealth of new capability such as reporting and easier field level data access. VSAM files are typically converted to and from flat QSAM files for movement to and from Windows using a VSAM utility known as IDCAMS REPRO. 17 For more in depth information there is an excellent website at www.simotime.com . Also remember that VSAM files often contain multiple data types and may be fixed or variable length records. Partitioned Data Sets (PDS) Partitioned data sets are actually collections of flat files which are generally text files. They are generally used to store source files such as COBOL and Assembler programs (often under the control of a source code control system), as well as control cards. Control cards are text files used by batch JCL streams (similar to .bat or .cmd files in Windows in some regards) as input to one or more steps. For example, a SORT utility being called to sort data may accept instructions (e.g. which fields to sort by and in what order) via control cards. A close equivalent in Windows is the use of folders or directories to organize files. Flat Files (QSAM) Many mainframe shops rely heavily on batch processing. While batch COBOL applications can access VSAM files (when not locked out by the on-line CICS system accessing such), flat files (typically known as “QSAM” – Queued Sequential Access Method) are often a major component in many IBM mainframe shops. Flat files are record oriented files with no keys (e.g. not unlike a text file in Windows). They may have LLZZ fields on the front of each record signifying record length of each record, or no LLZZ or other record delimiter at all for fixed length record files. In the case of fixed length records, you can think of one contiguous block of data on disk with each fixed length record proceeding each by each. So (for example) if you have records that are 100 bytes in fixed length, there will simply be x number of 100 byte records one after another straight out to disk. If you had a total of 1000 records of fixed 50 byte length, the file would take exactly 50,000 bytes of storage on disk – no more, no less (unless some compression tool was in use). It is absolutely possible to store the same physical format on a Windows system. However, if you try to edit it in something like Windows Notepad editor, you will see one line of data going to the right for the entire length of the file because there are no record delimiters in place. Two major differences may exist, however, when comparing with Windows text files: Multiple data types – unlike Windows text files where ever byte contains an ASCII character, this is often not the case for a mainframe QSAM file (the same holds true for VSAM files). A common error made by many is to try to simply FTP (or transfer through some other means) QSAM files containing multiple data types (e.g. text and binary) from the mainframe to Windows and allowing FTP to convert to ASCII. This will convert the EBCDIC text fields appropriately, but it will absolutely trash the binary fields, as the FTP facility will simply look at each byte and try to convert it to an ASCII text character (as opposed to recognizing an entire field as binary numeric and converting it to Intel format). One of the simple rules I use is that if someone claims they’ve transferred a real data file from IBM mainframe to Windows, I edit it in Notepad and look for two things: o o If each record is broken into a separate line and aligned nicely I know that a problem could occur if there is a binary field containing the value X”0D0A” (carriage control and life feed character) because they have allowed the file transfer process to add record delimiters onto each record. If I see non alphanumeric or common special characters (e.g. graphics characters or non-displayable characters maybe represented by a dot), I know that binary and/or packed decimal fields are present and most likely were corrupted by the file transfer process trying to convert each byte in the file to an ASCII display equivalent. 18 There is no way that a generalized utility can intelligently convert a mainframe data file containing mixed data types properly unless it has the intelligence to read a record layout definition of some sort (e.g. a COBOL definition of the data) and apply it on a field by field basis. Microsoft’s Host Integration Server (part of BizTalk now) has this capability as do file conversion utilities from the Windows COBOL vendors. If you do not have a tool however, one of the best approaches is to convert the data file to all text (binary and packed decimal fields to zoned decimal or simply text display) on the mainframe first and then transfer this down letting FTP or the equivalent transfer utility convert to ASCII text and place record delimiters on the end of each record (which makes the data easily viewable in a simple editor on Windows like notepad). You may still have a challenge with signed numeric fields as typically the last byte of the field will have the last digit and sign combined into a single special character. These are well defined however and it is now possible to write a program to read, convert and load such into SQL Server (for example). Variable length files – variable length files can present a special challenge in bringing into the Windows environment if you are not using COBOL in Windows. When you FTP a variable length file to Windows, the system drops off the LLZZ record length information as it is not part of the logical record and you effectively get a big glob of data without any record length information which makes it useless. If you let FTP add on carriage control and line feed characters as record delimiters, you can advert this problem, although you may still have the issue with carriage control and linefeed characters (x”0D0A”) appearing as part of valid integer fields and being inadvertently recognized as premature end of record. Also, if you allow FTP to convert to ASCII you will corrupt packed decimal and binary fields as well. You need to plan carefully on how to migrate variable length files to Windows. This can get complex if you are trying to get the data into SQL Server on a field by field basis (how do you create a variable length table in SQL Server?). Solutions often include converting the variable length file to fixed length on the mainframe first (e.g. every record is the maximum record length with space filling to the right), or writing a program on the mainframe to extract the LLZZ field from the system and place it physically onto the front of each record as part of the record and then having a program on Windows that knows how to look for this. Generation Data Groups (GDG’s) Generation Data Groups (known in mainframe lingo as “GDG’s”) are actually QSAM files that are being versioned by IBM mainframe system facilities automatically. The versioning exists in maintaining prior versions of the file for some given number of occurrences. When the limit specified is reached, the versioning wraps around overwriting the first iteration of the file. A GDG will have a base name such as “PROD.PAYROLL.JULY2009” and then each version of the file will have a generation number and version number appended onto the end known in mainframe lingo as a “Goo-Voo”). So for example, the first version of this file might be named “PROD.PAYROLL.JULY2009.G0001.V00”. The next version would be named PROD.PAYROLL.JULY2009.G0002.V00” and so on until the specified limit is reached. I believe the maximum number of generations (the limit) you can specific is 255. There is no equivalent in the native Windows file system for this, although the COBOL vendors support this capability. It is not difficult at all, however to implement this sort of data versioning using a relational database which may also provide additional advantages from an applications and business perspective. 19 When faced with migrating GDG’s from the mainframe remember that at the end of the day, each GDG version is simply a QSAM file with a similar name. You can migrate them as groups of related QSAM files. Another plus is that mainframe data set names are valid Windows file names as well. For more detailed information, go here. Data Tools There are a variety of data tools available for the mainframe that some shops may use in a production like way to create data extracts and such DB/2 has a variety of utilities that perform database loading, unloading, extraction etc. SQL Server has similar but different tools to accomplish these types of tasks as well. Another type of tool is one that works on non-database files. Compuware’s FILE-AID is an excellent example of a commonly found tool in this category. The Windows COBOL vendors supply tools and facilities that allow for this sort of functionality but there is no FILE-AID directly compatible tool on Windows. I have a personal criticism against Visual Studio in that it offers a very poor set of data tools for working with non data base files. Basically, you can edit data files as text files or look at them in a dump sort of format and that’s about it. The COBOL vendors have very elegant tools that can read COBOL record layouts and display and translated mixed data types. However, the toolset available on Windows to work with data is vastly superior to the toolset on the mainframe, given the advent of GUI interfaces, and the very nature in which data is typically more easily available on Windows. Report Writers and Managers Many mainframe shops have little more than a static reporting approach which means that they write programs to product static text based reports. IBM offers AFP (Advanced Function Printing) graphics for creating forms with graphical elements, but this is not in use in many shops. AFP requires specialized IBM printers by the way. The COBOL language itself used to have a facility called Report Writer with special syntax for creating support, but this feature was pulled from the COBOL language many years ago and is not supported in newer compilers (although some of the Windows COBOL vendors still support it for compatibility purposes). Writing reports in COBOL is thus tedious and time consuming although many shops do this still today. A number of reporting oriented “languages” became available on the mainframe as a result, and the most popular is CA’s EasyTrieve (often noted as “EZTRIEVE”). This is a kind of shortcut high level language for creating what are once again static reports. There is no EZTRIEVE facility on Windows, so these programs must be rewritten (often into COBOL during a COBOL migration scenario), or a Windows based reporting facility like Crystal Reports or SQL Server Reporting Services can be used (which provides new capabilities as well) if the data is migrated into a format that these facilities can touch. Some of the S.I.’s have tools on Windows and/or mainframe to convert EZTRIEVE to other entities like COBOL also. Mainframe shops that do mostly static reporting suffer from this burden as IT is responsible for report creation as opposed to business users who need them. Moving to a Windows based platform with dynamic easy to create reporting tools may greatly benefit the business overall in shifting at least some of this burden off of IT. Source Code Management Systems The heart and soul of many mainframe COBOL development shops is the source code management system in use where all programs and related resources are stored and versioned typically. In a few mainframe shops it’s possible they don’t make use of such and instead simple store programs in a series of PDS files. 20 The primary mission of the source code management system is to store all versions of programs and related artifacts into a library system that performs versioning and provides auditing capabilities. The most popular source code management systems found on the IBM mainframe includes CA’s Panvalet, CA’s Librarian, CA’s Endeavor, and IBM’s Software Configuration Library Manager (SCLM). IBM does offer other library management products for other platforms such as Rational Clearquest, but this is not a mainframe source code management system. Microsoft offers both Visual SourceSafe and Team Foundation Server (TFS). Of the two products, Visual Source Safe is closest in base functionality to a mainframe source code control system, while TFS is a significant superset, offering significant additional functionality such as reporting and project management capabilities for team collaboration. While it is possible to store COBOL source code into Visual Source Safe and Team Foundation Server, COBOL is treated the same way as a text file – there is no special syntactical support as there is for VB and C#. My understanding is that one of the Windows COBOL vendors is currently building higher level support for COBOL into TFS, however. Programming Languages There are a number of third and fourth generation programming languages in use on IBM Mainframe today. Some are supported in the Windows environment and some are not. COBOL COBOL still today ranks as the most common programming language found on IBM mainframes. There are multiple COBOL compilers available for Windows - both .NET and non .NET – that are highly compatible with mainframe COBOL. Some of these vendors also provide mainframe data file formats (e.g. VSAM, QSAM, GDG) on Windows along with emulators for CICS and IMS TM/DB, JCL, and DB/2 (in some cases on top of SQL Server). COBOL applications typically exist as a set of programs, and a set of copy files (sometimes called “Copy books”) which are simply include files – partial bits of COBOL code that is used across more than one program that can be copied in at compile time. It is common to maintain data file record layouts and DB/2 table layouts in external COBOL copy files that are shared by many programs for example. Migrating COBOL applications and data to Windows can still be a labor intensive task however, although the benefits in such in the way of increased performance, significantly improved development environment, and cost reduction typically outweigh the initial investment. Issues that may arise include ASCII vs. EBCDIC collating sequence (if not using the Micro Focus EBCDIC emulation system), Data Migration, unsupported third party software in use on the mainframe that has no Windows equivalent, etc. Because there are mainframe compatible COBOL compilers for Windows including .NET, along with a multitude of tools and emulation facilities, moving COBOL applications to Windows is very feasible. There are typically four separate scenarios which may be suitable for dealing with legacy COBOL applications on the mainframe in leveraging Windows: Move the COBOL applications to a COBOL vendor supported mainframe emulation environment on Windows. Increased performance, scalability and lower cost are typically the drivers in this scenario. The first question I ask customers interested in this is “Are your COBOL applications today doing a good job supporting your business needs?”. The second question is “Do you foresee having the resources to maintain your COBOL applications into the future?”. If the answer to both is “Yes”, then this might be a feasible scenario. Move parts of the COBOL applications to .NET – the well working and/or overly complex COBOL programs that would be difficult to rewrite. Write new and/or replacement code in a .NET language and create a hybrid of 21 COBOL and .NET language working together. COBOL programs may then be replaced in a piece meal fashion over time creating less risk. Look for a COTS package (off the shelf software) on Windows to replace parts or all of COBOL systems. Rewrite or develop the missing pieces to tie it together on Windows. Rewrite the COBOL applications taking of advantage of newer technologies on the Windows platform. This option, while the most risky and sometimes most expensive, can derive the greatest business benefit for the following reasons: o A chance to weed out the unneeded code that many shops have in production because it’s always been there and they have never had to opportunity to go in and clean up their existing COBOL systems. o A chance to modernize look and feel completely. For example not only screens, but replace static text based reports with dynamic graphical reports. Moving reporting to the business users, etc. o A chance to create a new data model that better fits today’s enterprise needs. An opportunity to take VSAM and QSAM data and migrate it to a relational database environment. o A chance to take advantage of lower cost programming resources on the market and retool into technologies that are more easily supportable. o A piece meal approach can be taken wherein some COBOL remains on the backend mainframe for some period of time as new front ends are written on Windows and business process moved from mainframe to Windows over time in a less risky approach. IBM Mainframe Assembler IBM Assembler code may be in use in some shops. Typically, it is delegated to subroutines called by COBOL applications to perform tasks that were not easy in COBOL many years ago (e.g. date conversion, dynamic file I/O, etc.). Many assembler routines can today be easily replaced by modern COBOL routines as the COBOL language was enhanced greatly over the past 30 years, but many shops did not take advantage of such. Micro Focus is the only vendor I am aware of that offers a mainframe Assembler language facility on Windows, but this is generally used more for development compatibility (for working on COBOL applications on Windows that are moved back to the mainframe for production execution) than for production on Windows. Typically, Assembler routines will have to be replaced on Windows. PL/1 PL/1 is another language that some shops use on the IBM mainframe. While there are one or more PL/1 compilers that will run on native Windows, there are no PL/1 compilers for .NET, and I’m not aware of any large scale PL/1 migration to Windows. I believe the best option for PL/1 applications is to rewrite them into a .NET language on Windows. 4th Generation Languages There may be some Java on the mainframe and there are also a number of 4th generation languages (4GL’s) such as NATURAL, FOCUS, etc. that will need to simply be rewritten into a .NET language. I’ve even run into Smalltalk on an IBM mainframe. 22 Development Management Technologies The main development systems on IBM mainframes are TSO, ISPF and ICCF. ISPF runs under TSO on IBM MVS and z/OS operating systems, while ICCF runs on IBM’s smaller z/VSE operating system. TSO and ISPF In the MVS and z/OS mainframe environments, the vast majority of programmers use TSO (Time Sharing Option) and specifically ISPF (Interactive System Productivity Facility) running under TSO. The development environment on the mainframe is by far the worse of any major platform today. It is today typically character mode menu driven screens which provide access to a character mode editor, a set of utilities, the ability to submit programs to be compiled and linked and very little on-line debugging. Testing on an IBM mainframe is thus often done in black box mode which means that programmers write programs, compile and link them and then execute them – afterwards manually examining the inputs and outputs without actually debugging and watching the source code execution paths. While debuggers such as Compuware’s Xpediter and IBM’s Debug Tool allow programmers to step through source code execution on the mainframe, these products are very expensive to use resource wise, complex to set up and lock certain resources (e.g. data files) which make them unrealistic to use on a mass scale in a mainframe COBOL shop. Tools like Compuware’s Abend-Aid are thus common on the mainframe that assist programmers after an exception is encountered in trying to find the portion of a COBOL program where the exception occurred to provide some clues. ICCF IBM’s ICCF (Interactive Computing and Control Facility) is a very rudimentary development environment basically providing an editor and some tools to compile and link programs in IBM’s z/VSE environment. Rational Developer for System Z IBM offers a more modern development extension system from the mainframe that actually runs on Windows known as the Rational Developer for System Z or “Rdz”. This is a graphical IDE that provides an editor, syntax checking, remote compile submission and remote debugging tied back to the IBM mainframe for COBOL development. While a significant enhancement over ISPF, it is in very limited use in the marketplace. Microsoft .NET and Visual Studio Microsoft .NET has a much richer exception handling model than the IBM mainframe which allows for the programmatic catching and handling of exceptions. On an IBM mainframe, an exception typically knocks the application down and the program has little or no chance to intervene. Visual Studio also provides world class graphical debugging capabilities that are light years ahead of any such product on the mainframe. One of the greatest benefits of doing COBOL development on Windows – whether targeting the results back at the mainframe or for production on Windows is the wealth of sophisticated COBOL debugging tools available on Windows. Microsoft’s Visual Studio itself may be used for COBOL development when used in conjunction with a .NET COBOL compiler. Graphical editors with language intelligence features (e.g. Microsoft’s Intellisense) are a major productivity boon for COBOL development in Windows, or for simply learning and programming in other languages on .NET. When moving away from COBOL, Windows has a massive advantage in offering the .NET framework and languages like VB.Net and C# along with a wealth of development and deployment tools. Additionally, the Internet is packed full of 23 programming examples and advice on developing .NET applications which empowers developers to find solutions quickly. Summary At the end of the day almost all computing platforms read and write data, perform computations on such, create reports and provide on-line facilities to interact with users. I’ve never run into any type of application on an IBM mainframe that cannot be duplicated on Windows – and typically will perform faster and cheaper – simple at that. Moving workloads from IBM mainframes to Windows is not always a piece of cake however, and so understanding a bit better how things work on the mainframe should hopefully make this process a little more predictable. The intent of this paper has been to educate Windows professionals a bit on the IBM mainframe platform and some of the terminology and processes that maybe encountered when attempting to come up with solutions on Windows for such and to be able discuss such intelligently with mainframe staff. Please feel free to send comments, suggestions and question to me at hhinman@microsoft.com. Resources A number of white papers related to this topic can be found here. The main Microsoft Web portal for Mainframe Migration is found here. Microsoft sponsors a Web Portal of partner companies that provide products and services in this marketplace here. 24