1999 Systems Engineering Capstone Conference • University of Virginia INTEGRATION OF A FINANCIAL DATA SERVICE AND THE INTRODUCTION TO DISTRIBUTED COMPUTING SYSTEMS Student team: Andrew Brix, Andrea Brotto, John DeGuenther, Ryan Lentell Faculty Advisor: James W. Lark, III Department of Systems Engineering Client Advisor: Kevin Cromer Shadwell Capital LC Charlottesville, VA E-mail: kcromer@shadwell.com KEYWORDS: Database Management System, Distributed Computing Network (DCN), CORBA, Microsoft SQL Server 7.0 ABSTRACT Shadwell Capital LC is a hedge fund located in Charlottesville, VA. This project involved assisting them optimize their computer resources to meet their needs. This included two major tasks – integrating their new financial data service and developing a distributed computing network. The integration of the financial data provider entailed linking the Bridge Stock Data service to a database to store stock prices. A comparison between Microsoft SQL Server 7.0 and Oracle 8.0 was performed in which it was determined SQL Server 7.0 better fit Shadwell Capital’s needs. A second aspect of the project involved the creation of a distributed computing network for Shadwell. Distributed computing involves running parallel processes across a network of computers, and is implemented based on a client/server model. The system that has been implemented was built as a framework for the future development of a fully functional system. It provides Shadwell a good understanding of the possibilities available with distributed computing, and will help them design a more functional system in the future. INTRODUCTION Every instant of the day stock prices are changing. Companies trading stocks need to be able to monitor what is happening as precisely as possible. To keep track of stock prices and to forecast future changes in stock value, companies need to store information on stock prices over specific time intervals. These time intervals need to be as small as possible so the stored data will show all trends over time. A company can use the data to perform analyses that will provide them with information to do such things as calibrate existing forecasting models. Modern and more sophisticated methods are needed to accomplish this job more effectively. The integration of a financial data service is one way in which companies are accomplishing this task. This capstone project addresses performing this task for Shadwell Capital LC. To analyze the data collected though the use of the financial data service, Shadwell will use neural network models in the future. These models are complex and burdensome to a single processor. To alleviate this burden, Shadwell wishes to distribute this burden across their Windows NT network. Distributed applications are developed using object-oriented technology, whereby an application is broken into self-contained modules that are located on different processors. At run-time, the modules can interact with each other to produce the desired output. By distributing an application, the modules can run simultaneously, as opposed to sequentially, which will reduce processing time. This capstone project addresses installing a distributed computing network at Shadwell Capital LC and provides information on distributed application development. INTEGRATION OF A FINANCIAL DATA SERVICE Overview of System. Data enters Shadwell Partners through the Bridge Stock Data System. Our task was to store this data on the Shadwell Server for use by Shadwell employees. The data is downloaded from the Bridge System into Microsoft Excel through a dynamic data link. SQL Server then saves this data on the Shadwell server in Excel format where it can be uploaded to Microsoft SQL Server using the Data Transformation Services provided. 9 Integration Of A Financial Data Service And The Introduction To Distributed Computing Systems Finally, Shadwell employees can access the information through an OBDC1 link to his models. See Figure 1 Figure 2. Life Expectancy of Shadwell Database Years of Use Shadwell Server Database while Tracking 2000 Companies Time Interval Between Microsoft SQL Storing Information (min.) Oracle 8.0 Server 7.0 78 31.87 58.76 39 17.62 33.16 26 12.18 23.10 20 9.31 17.72 16 7.53 14.38 13 6.32 12.09 11 5.45 10.43 10 4.79 9.18 9 4.27 8.19 8 3.85 7.39 7 3.22 6.19 6 2.77 5.33 5 2.29 4.40 4 1.95 3.75 3 1.30 2.51 2 0.98 1.89 1 0.65 1.26 Data Enters via Bridge Imported to Excel Saved on Shadwell Server Imported to SQL Server Imported to Models via ODBC below for a visual representation of the system. Figure 1. Diagram of Data Flow through Shadwell Computing Network Comparison of Databases. Our analysis showed Microsoft SQL Server 7.0 is the optimal database for Shadwell Capital LC needs. The main basis for this recommendation was testing of the life expectancy of the database on the two architectures. As can be seen in Figure 2 Microsoft SQL Server 7.0 provides nearly twice the life expectancy compared to Oracle 8.0. Additionally, it provides superior tools for database creation and management. Price is not a differentiating factor as both packages cost nearly $1,400. One major drawback of Microsoft SQL server 7.0 is its incompatibility with any operating system other than a Microsoft product. This should not be a problem in the Shadwell Capital LC environment because they are a completely Microsoft operated workplace and along with SNL Securities have no intention to change. For these reasons, we believe Microsoft SQL Server 7.0 is the optimal choice for Shadwell Capital LC. Database Design. Shadwell’s staff required the database to perform the following functions 2: Import data nightly from Bridge Migrate data from old database into new database Backup data Use stored data to run existing models Query databases for specific periods or companies Adding/Deleting a company to/from the list Edit company attributes Adjust for stock splits The class diagram that describes classes and their relationships within the database, list the classes needed for the database as: Company, Stock Price At Time t, Stock Daily Statistics, and Stock Quarterly Statistics. Price of stocks are constantly changing. For this reason, every company has many Stock Prices according to the time at which the price is measured. In addition to storing intraday stock prices, daily summary statistics will be stored in the Stock Daily Statistics table. Additionally, quarterly data will be stored in the Quarterly Statistics Table. Database Implementation. The SQL Server 7.0’s tables were developed manually. Data types were chosen to minimize storage space. The data types for each attribute are shown in Figure3. 1 ODBC Driver – (Open Database Connectivity Driver) a cross platform Application Programming Interface (API) that can be used to access any DBMS or DBMS Server that has an ODBC Driver. This enables a software developer to build and distribute a client/server application without targeting a specific DBMS 10 2 The last four use-cases are operations enabled from an appropriate and functional graphical user interface. This is not in place in the present system.1 1999 Systems Engineering Capstone Conference • University of Virginia Figure 3. Attributes and Data Types ATTRIBUTE MS SQL Server 7.0’s Data Types Quarter Number Small int Volume Real Key Funding Char(7) DayAndTime Date-time Price Real Company Name Char(50) Ticker Symbol Char(6) Exchange Char(1) Shares Outstanding Real Further, we needed to take into consideration the necessity of backing up the data with a certain frequency and storing the backup files in a different place from the one where the system is located. This task will be accomplished through the use of SQL Server 7.0’s Database Maintenance Plan Wizard. The wizard is very similar to the one use to import/export files into/from SQL Server 7.0. Every night at midnight the plan backs up the data as appropriate. Finally, every time a plan is executed, e-mail alerts are sent to Shadwell employees in order to keep them updated about data entry and any errors that may have occurred. Database Integration. Shadwell decided in October 1998 to acquire Bridge as their financial data provider. A bridge-tool is added to Excel allowing the user to operate some bridge-functions integrated in the Excel Tools Menu3. A Dynamic Data Link (DDL) between Telerate and Excel allowed us to import data into Excel. We then needed to integrate the transfer of data from Telerate into SQL Server 7.0 through Excel. In order to perform such integration, we have solved two different tasks: We have created a task to transfer the data from Telerate into the Excel table. We have developed, using the Data Transformation Services in SQL Server, a plan that grabs the desired data from the Excel file at appropriate times and stores them in the appropriate fields of the database. These two tasks and the export of data from SQL Server into other applications are described in the following three sections. 3 Note: Telerate and Excel are said to be linked through a Dynamic Data Link. DISTRIBUTED COMPUTING SYSTEMS Since the development of the computer earlier in the twentieth century, computed processes have become more complex. Initially, the computer was an elaborate tool that was used for relatively simple calculations and operations, yet modern computers perform tasks that are far more complicated than those of the past. Even with the increase in computing power, these tasks are often so extremely intensive that they require great periods of time to be executed. There are ways to improve the performance of these systems—some costly, requiring the users to purchase expensive new equipment and others that can be implemented without the need for any additional equipment. Distributed computing is a costeffective means to improve the efficiency of time consuming computed processes, because it has evolved as a way to leverage the power of existing computer networks. The distributed computing architecture is similar to client/server computing. Each component of the working system, called an object, can be called upon by another program to perform certain tasks (Lewis, 1997). The application called upon is the server, while the process performing the request is called the client. Upon completion, server objects send results back to the client program. The client program would then compile the results and proceed with its process. This method would be more efficient than previous computing methods because it would allow for multiple components of the system to be run simultaneously across the network, yet the results of each can be compiled and viewed on a single computer. One advantage of distributed computing is that each object can reside on a different computer, which allows the computing process to spread the workload over a network of computers (Lewis, 1997). Distributed computing allows several networked computers to work together as if they were one machine. There are several technologies currently supporting the distributed architecture that our project team intends to design, yet the specifications for each are rather different. Microsoft and the Object Management Group, a consortium of over 800 companies, have developed competing architecture standards that allow for the remote request of computed processes; these software components must meet strict guidelines regarding their design and interfacing. Each of these components, called objects, implements a set of functions and encapsulates data, yet the requirements for each standard architecture are quite different (Foody, 1996, 43-45). Because of this, a large portion of the project was devoted to the 11 Integration Of A Financial Data Service And The Introduction To Distributed Computing Systems comparison of Microsoft’s Distributed Component Object Model (DCOM) and the Object Management Group’s Common Object Request Broker Architecture (CORBA). These two architectures will be discussed later in more detail. Shadwell wanted to find the most cost-effective solution to its problem. For this reason we proposed the development of a distributed computing network, which allows us to solve Shadwell’s problems, yet do so by using existing hardware. CORBA vs. DCOM. Based on what was discussed and read, our capstone team chose to work with CORBA. Aside from major differences between DCOM and CORBA, DCOM has a variety of weaknesses, which were relevant to this project: Complexity - DCOM and its interfaces are complexperhaps unnecessarily so (Rock-Evans, 1998, 271). To learn all the aspects of developing the code for a DCOM application is much more involved than learning the same aspects in CORBA. Legacy system integration difficult - Microsoft is having a lot of difficulty in integrating “legacy” technology. Integration in DCOM has been achieved via a very complex set of drivers, translators, proxy objects, and gateways (RockEvans, 1998, 272). New and possibly unstable - Microsoft’s middleware products have taken some time to come to final realization and are only now beginning to take shape (Rock-Evans, 1998, 273). Other platform support is weak - Microsoft’s entire middleware strategy and services are entirely dependent on the user having Windows NT as their strategic platform (Rock-Evans, 1998, 274). Our team has a variety of reasons for choosing CORBA over DCOM, which are related to both the weaknesses of DCOM and the differences between CORBA and DCOM. These reasons are as follows: 12 Minimize complexity - CORBA was most similar to what we had learned in a course on object oriented programming with C++ through its use of interface inheritance. Also, CORBA’s use of inheritance from the CORBA::Object reduces the number of complex details about the interworkings of the Distributed Computing Network (DCN) that we would have to learn. Minimizing complexity was important because we were learning a new technology with little guidance and had to work within a limited time frame. Interference with the legacy system at SNL - The application would be installed on the network at SNL/Shadwell where they currently have a legacy system. We did not want any of our work to interfere with this legacy system and CORBA does not interfere with such systems, whereas DCOM may pose a problem. Possible system change - If SNL/Shadwell ever chose to change to a different operating system such as Unix, integration with the DCOM application would be difficult. CORBA is blind to operating system so a system change would have minimal effect. What was developed for Shadwell. Once CORBA had been chosen we were able set up a CORBA ORB on the Shadwell network. What is An ORB? An Object Request Broker, or an ORB, is a software component that acts as middleware for other software components. ORB’s allow objects to “talk” to each other as if they were part of the same program; in essence ORB’s permit software components to work together as a larger program (Katiyar, 1995). Software components written in any language can communicate with components written in different languages via ORB’s; they can even communicate with components running on other platforms. For example, one component may be implemented in a Windows 95 environment using the C++ language and another component may be running on a Unix system and developed using a programming language called COBOL, yet by using an ORB these two programs may be able to communicate with each other. Another advantage that ORB’s provide is that they allow software to be distributed across a network. In the past software developers designed programs that could only run a single machine, yet often these programs required tremendous computing capabilities. To run such a program would require the use of a large mainframe or supercomputer, which may be quite expensive. By using an ORB developers recognized that they could exploit the vast computing power of already existing networks, such as a company’s Local Area Network (LAN). ORB’s have the ability to bridge the gap between networked computers, thus allowing separate pieces of a single software application to communicate as if they were physically located on the same machine (Orfali, 1998). 1999 Systems Engineering Capstone Conference • University of Virginia The ORB technology provides the ability to create the distributed computing system for Shadwell Capital. Using an ORB, the system can link Shadwell’s existing software components, allowing separate components to communicate via SNL’s LAN and act as one program. The specific ORB we chose was the Iona Technologies Orbix ORB, an ORB supplied in the supplementary CD for the book CORBA for Dummies. For the ORB to be used across the Windows NT network at Shadwell, the ORB had to be installed on any computer that would be used for distributing applications. For the purposes of our project, we installed the ORB on two specific computers to test the ORB and generate a simple distributed application. The actual distributed computing network is just the series of computers on which the ORB is installed. The ORB enables applications to be distributed among those computers. With the ORB installed on several computers, we needed to understand the basics of distributed application development using the given ORB. The most efficient means possible to gain such an understanding was through the development of a sample application. The application we chose to develop is called “Messenger.” The main idea behind messenger was to build an application that would be able to send messages from a client on one workstation to a server on another workstation that would then display the message. The basic structure of the application is client/server where the server stores the message and the client makes commands associated with the message (phrase). We determined that the following functionality was needed for the program: Ability to store a phrase at the server. Ability of client to set a new phrase. Ability of client to print the phrase at the server workstation. Ability of client to check the current stored phrase. The completed application encompasses all of the desired abilities. The server stores a phrase, which is initialized to a default value, when the server is started. A client can then set the stored phrase on the server to whatever a user enters at the client workstation when prompted to do so. The client can then check the stored phrase or print it on the screen of the server workstation. Another client can also access the same server and perform operations on the stored phrase. The server remains running as long as at least one client is in contact with the server. An account of the development of the messenger application was recorded as the application was produced. The description allows a user to acquire a grasp on how to develop distributed applications. Although “Messenger” is a simple application, it allows one to understand the basic underlying principles of the complex technology of distributed application development. This account was formulated for the purpose of educating our client. Ultimately, Shadwell will be using distributed technology to develop distributed applications, and the account will help in this process. The first application to be developed, partially with our aid will allow data to be passed from the Bridge workstation to another workstation. Future Development. The system implemented at Shadwell does not contain the full functionality that was originally proposed, as the project’s scope was scaled back to make it more reasonable. The system is a simple distributed client/server program that allows a client to start a server application, and then allows the client and server to communicate across the network. The implemented design provides the elementary framework for a fully functional system. Shadwell currently has several applications, such as financial models, that could be enhanced by incorporating them into the distributed system. Shadwell employees have stated that in the near future they will be developing new financial models and other processes that would be better executed if they could be run across the SNL network. One particular process that is under consideration is the transfer of data from a server program to an Excel spreadsheet. Processes such as these could be incorporated into the distributed system in the future as well. Because it is expected that the project will undergo further development in the future, it is important that Shadwell employees understand the specifics of the implementation. If Shadwell determines that it wishes to further the development of the distributed system, then at least one of its employees must have a working knowledge of both CORBA and C++. For this reason a significant effort will be placed on the explanation of CORBA and the specifics of the developed application. By educating Shadwell’s employees the costs associated with future development will be reduced. 13 Integration Of A Financial Data Service And The Introduction To Distributed Computing Systems CONCLUSIONS The results of this project are extremely beneficial to Shadwell. The working database will allow employees at Shadwell to have permanent access to valuable data. This is beneficial because it allows them to manipulate data, such as in complex financial models, helping them make important investment decisions. The introduction of distributed computing may also aid Shadwell’s employees, as it will make their computed processes more efficient. Distributed computing should permit Shadwell to create complex financial models that will help them make important investment decisions. REFERENCES Foody, Michael A. “OLE and COM vs. CORBA.” Unix Review. April 1996: 43-45. Katiyar, Dinesh. “Notes on OLE/CORBA.” 1995. Online. Internet. Available: http:\\cui.unige.ch\OSG\people\jvitek\Resources\Languages \Year95\msg00106.html Lewis, Bob. “The Distinction Between Distributed Computing and Client Server is Essential.” InfoWorld. July 21, 1997: 80. O’Hara, Liz and John Schettion. CORBA for Dummies. New York, NY: IDG Books Worldwide, Inc., 1998. Orfali, Robert and Dan Harkey. “Client Server Programming with JAVA and CORBA, Second Edition.” Wiley Computer Publishing. 1998. Rock-Evans, Rosemary. DCOM Explained. Boston, MA: Digital Press, 1998. BIOGRAPHIES Andrea Brotto – Mr. Brotto is a fourth-year Systems Engineering major from Carimate (Italy). He is the UVA Golf Men's Team Captain and the scholar athlete of the year for 1998 (Ralph Sampson Scholarship Award). Andrew Brix – Mr. Brix is a fourth-year Systems Engineering major with a minor in Economics from North Caldwell, NJ. Next year he plans on joining the Management Consulting team at Ernst & Young LLP. John DeGuenther – Mr. DeGuenther is a fourth-year Systems Engineering major from Atlanta, GA. This past 14 summer he worked as a Systems Developer and Programmer for two University of Virginia professors on a joint VDOT/UVA project. Mr. DeGuenther will be joining American Management Systems, Inc. next year. Ryan Lentell – Mr. Lentell comes to the University from Cape Cod, MA. For the past two summers he interned with PaineWebber working on building their new trading floors. He is minoring in economics at the University of Virginia along with his major in Systems Engineering.