Management Information Systems Rutgers Business School / Undergraduate New Brunswick Professor Eckstein, Fall 2005 Class Notes Class 1 — Overview, Course Rules, General Definitions and Trends Overview Topic: using computer and network technology to help run businesses and other organizations Won’t focus especially on “managers” Will combine “Top-down” descriptive learning (the TRP book) with “bottom-up” learning by example (Microsoft Access and GB book) Rules and Procedures – see the syllabus and schedule Data, Information and Knowledge Datum is singular, data is plural Information is data structured and organized to be useful in making a decision or performing some task Knowledge implies “understanding” of information o Knowledge representation in computers is called “artificial intelligence” (AI). It got a lot of hype in the 1980’s, and then went somewhat out of fashion, but it is still growing gradually. We will not discuss it much, and stick to information instead. Information systems The ways that organizations o Store o Move o Organize o Manipulate/process their information Components that implement information systems – in other words, Information Technology o Hardware – physical tools: computer and network hardware, but also low-tech things like pens and paper o Software – (changeable) instructions for the hardware o People o Procedures – instructions for the people o Data/databases Information systems existed before computers and networks – they just used very simple hardware that usually didn’t need software (at least as we know it today). Impact of electronic hardware o Greatly reduces cost and increases speed of storing, moving (etc.) information o Information doesn’t have to be stuck with particular things, locations, or people -- 1 -- o Can increase efficiency of things you already do o Can permit new things Combine scale efficiencies of a large firm with responsiveness of a small one – for example, produce at a mass scale, but customize each item Can remove middlemen or levels of inventory that shielded you from handling information Makes physical location etc. less important – for example, one can now browse for obscure products from rural North Dakota Can make it easier for parties to know one another exist and transact with one another (for example, eBay) o Can have a downside Small glitches can have much wider impact (for example, “Bugs Ground Planes in Japan”, TRP p. 23) Fewer people in the organization understand exactly how information is processed Sometimes malfunctions may go unnoticed (American Airlines yield management story) Waves of technology The first wave of electronic technology replaced manual/paper systems with isolated or tightly coupled computers (often “mainframes”) These systems were gradually interconnected between organizations’ functional units The second wave of technology has been toward networked/distributed computing – many systems tied together by large networks, and networks tied together into “internetworks”, that is, “the internet”. Networked systems have provided a lot of new ways to operate businesses and perform transactions (what Chapter 1 of TRP calls the “new economy”) Networked electronic information systems encourage a trend it to reduce the degree to which each transaction/operation is tied to a physical place or thing. Examples (not all equally successful!): o Shopping online o MetroCards instead of tokens o Digital images instead of film o Electronic payments instead of cash o Online college courses How do trends in IT affect existing industries? Consider Porter’s “5-forces” conceptualization of market pressures on firms (Figure 1.3, TRP p. 16) o New entrants: typically IT makes it easier/cheaper for new competitors to “set up shop”. For many businesses “turn-key” website software may be available for competitors to get started easily o Suppliers: supplier bargaining power may be reduced; it’s easier for you find alternative suppliers and switch. However, IT has also encouraged a trend towards suppliers and their customers sharing information to improve planning and make more efficient use of inventory. These trends may make a firm more tightly coupled to its suppliers and discourage switching. -- 2 -- o Customers: in retail businesses, customers can shop around more easily and exert more pressure. For commercial customers, there can be similar effects, but also supplier/customer coupling effects from sharing information. o Substitution pressure: can be intensified, especially in industries where the real product is information that can be digitized and decoupled from traditional media (printed books, videotapes, CDs). In some industries, this trend may undermine the entire business model. o Rivalry among established competitors: information technology may make it easier for competitors to infer your internal practices. New approaches to websites etc. are easily copied. Class 2 – Hardware Basics Electronic computing equipment is constructed from Wires Transistors and the like Storage devices (such as tiny magnets) that can be in one of two possible states Although technically possible, we do not want to think about complex systems as being made out transistors and tiny magnets. If somebody said “make an accounts payable system, and here is a pile of transistors and tiny magnets”, you would probably not get very far! The keys to organizing information systems (and other computer-based systems) are Layering – provide foundations that do simple tasks and then build on them without worrying about how they work internally Modularity – divide each layer into pieces that have well-defined task and communicate with one another in some standardized way The most basic layering distinction is hardware and software Hardware consists of physical devices (like PC’s) that are capable of doing many different things – often generic devices suitable for all kinds of tasks Software are instructions that tell hardware what to do (for example: word processing, games, database applications…) Kinds of hardware Processors (CPU’s = central processing units; like “Pentium IV”); typical processor subcomponents: o Control unit/instruction decoder o Arithmetic logical unit (ALU) o Registers (small amount of very fast memory inside the processor) o Memory controller/cache memory o A “microprocessor” means all these subcomponents on a single “chip” that is manufactured as a single part. This only became possible in the 1970’s. Before that, a CPU consisted on many chips, or even (before that) many individual transistors or vacuum tubes! Primary storage o RAM o ROM (read only) o Variations on ROM (EPROM – can be changed, but not in normal operation) “Peripherals” that move information in and out of primary storage and the CPU, -- 3 -- o Things that can remember data: secondary storage Should be “non-volatile” – remembers data even if electrical power is off Generally it is slower to use than primary storage Most ubiquitous example – the “hard” disk Removable – “floppy” disks, optical CD/DVD disks, memory sticks Read/write Write once (like a CD-R and DVD-R) Read only o Other input/output (“I/O”) – screens, mice, keyboards etc. o Network hardware The wires that move data between hardware components are often called “buses” (much faster than Rutgers buses!) Cache memory is fast memory that is usually part of the processor chip. The processor tries to keep the most frequently used instructions in the cache. It also tries to use the cache to keep the most frequently used data that will not fit in the registers. The more cache, the less the processor has to “talk” to the primary storage memory, and generally the faster it runs. If you look at things at each hardware module, you’ll find layers and modules within it. For example, a CPU will have modules inside like the ALU, control unit, registers, memory controller, etc. Within each of these, you will in turn find modules and structure. Standard way of arranging hardware (like PC’s and laptops) One processor and bank of memory, and everything attached to them o A key innovation in the design of modern computers was to use the same main memory to hold instructions and data. This innovation is generally credited to Hungarian-born mathematician John Von Neumann (who spent the last 12 or so years of his life in New Jersey at the Institute for Advanced Study in Princeton), and his EDVAC research team. It was critical to modern computing, because it allowed computers to manipulate their own programs and made software (at least as we now conceive it) possible. Variations on this basic theme that are common today: o Desktops are regular PC’s o Laptops are similar but portable o Servers are similar to desktops, but with higher quality components – intended to run websites, central databases, etc. o Mainframes are like PC’s, but designed to do very fast I/O to a lot of places at the same time (they used to compute faster as well). Mainframes can perform much better than PC’s in applications involving moving lots of data simultaneously between many different peripherals (for example, an airline reservation system) o Supercomputer can mean several things. At first it meant a single very fast processor (see picture at top left of TRP p. 410) designed for scientific calculations. This approach gradually lost out to parallel processing supercomputers (see below), and is now fairly rare. More recent things – -- 4 -- Thin client systems – a cheap screen/processor/network interface/keyboard/mouse combination without secondary storage or very much memory. Typically, the software inside the thin client is fairly rudimentary and is not changed very often; the “real” application resides on a server with secondary storage and more RAM. Thin clients are basically more powerful, graphics-oriented revivals of the old concept of “terminals”. These can be cost-effective in some corporate settings. 2 to 16 processors sharing memory – “Symmetric Multiprocessors” or “SMP’s” (servers and fast workstations). Most larger servers and even high-end desktops are now of this form. Parallel processing involves multiple memory/CPU units communicating via a (possibly specialized) network. Such systems can contain a few standard (or nearly standard) microprocessor modules, up to tens of thousands. o Large websites and now large database systems are often implemented this way: each processor handles some of the many users retrieving data from or sending data into the system o Large scientific/mathematical supercomputers are now constructed this way o Depending on the application, each processing module might or might not have its own secondary storage o Blade or rack servers: one way of setting up parallel/distributed process capabilities. The “blades” are similar to PC’s, but fit on a single card that can be slotted into a rack to share its power supply with other “blades”. Enterprise storage systems or “Disk farms” that put together 100’s-1000’s of disks and connect them to a network as a shared storage device Mobile devices such as web-enabled cell phones, wireless PDA’s, or Blackberries. Presently, these are not particularly powerful, but can already be an integral part of an organization’s information system, and very useful because of their mobility. Their processing power is already quite significant, but battery life, small screen size, and small keyboard sizes are problems. Basically, network technology has “shaken up” the various ways that hardware modules are connected, although the basic PC style is one of the most common patterns Nowadays, only a one-person company has only one computer. So all companies are doing a certain amount of “parallel” or “distributed” computing. Specialized and embedded systems: microprocessors, typically with their programs (“firmware”) burned into ROM, are “embedded” in all kinds of other products, including music players, cars, refrigerators,… Data representation – introduction: Computers store number in base 2, or binary. In the decimal number system we ordinarily use, the rightmost digit is the “1’s place”; as you move left in the number, each digit position represents 10 times more than the previous one, so next we have 10’s place, then a 100’s place, then a 1000’s place and so forth. Thus, 4892 denotes, or (2 1) + (9 10) + (8 100) + (4 1000), or equivalently (2 100) + (9 101) + (8 102) + (4 103). In the binary system, we also start with a 1’s place, but, as we move left, each digit represents 2 times more than the previous one. For example, -- 5 -- 1001012 = (1 20) + (0 21) + (1 22) + (0 23) + (0 24) + (1 25) = (1 1) + (0 2) + (1 4) + (0 8) + (0 16) + (1 32) = 3710 When bits are combined to represent a number, sometimes one bit – often called a “sign bit” – is set aside to indicate + or – . (Most computers today use a system called “two’s complement” to represent negative numbers; I will not go into detail, but it essentially means the first bit is the sign bit). There are also formats that are the binary equivalent of “scientific notation”. Instead of 3.478 105, you have things like 1.00101011 213. These are called “floating point”. They are usually printed and entered in decimal notation like 3.478 105, but represented internally in binary floating point notation (note: this can occasionally cause non-intuitive rounding errors, like adding 1000 numbers all equal to 0.001, and not getting exactly 1). Some common amounts of memory for computers to manipulate at one time: A single bit – 1 means “yes” and 0 means “no” 8 bits, also called a “byte” – can hold 28 = 256 possible values. These can represent a single character of text, or a whole number from 0 to 255. If one bit is used to indicate + or –, can hold a whole number from –128 to +127. 16 bits, or two bytes. Can hold a single character from a large Asian character set, a whole number between 0 and about 65,000, or (with a sign bit) a whole number between about –32,000 and +32,000. 32 bits, or four bytes. Can hold an integer in the range 0 to about 4 billion, or roughly –2 billion to +2 billion. Can also hold a “single precision” floating-point number with the equivalent of about 6 decimal digits of accuracy. 64 bits. Can hold a floating-point number with the equivalent of about 15 digits of accuracy, or some really massive whole numbers (in the range of + or – 9 quintillion). Performance and memory measures for processors: Clock speed – number of hardware cycles per second. A “megahertz” is a million cycles per second and a “gigaherz” is a billion cycles per second. But a “cycle” is hard to define and what can be accomplished in a cycle varies, so don’t try to compare clock rates of different kinds of processors. For example, a “Pentium M” does a lot more per cycle than a “Pentium 4”. o Note: it is very uncommon nowadays, but there are such things are CPU’s with no clock, called asynchronous CPU’s. There are some rumors they could be revived. o An alternative measure is MIPS (millions of instructions per second); this measure is less sensitive to details of the processor design, but different processors can still do differing amounts of work in a single “instruction”. o Another alternative measure is FLOPS (floating point operations per second). This is less processor-sensitive than clock speed and MIPS, but measures only certain kinds of operations. The values usually quoted are “peak” values that are hard to achieve, and how close you can get to peak depends on the kind of processor. -- 6 -- o Another alternative is performance on a set of “benchmark” programs. This is probably the best measure, but is rarely publicized. Word length – the number of bits that a processor manipulates in one cycle. o Early microprocessors in the late 70’s had an 8-bit (or even a 4-bit) word length. However, some of the specialized registers were longer than 8 bits (otherwise you could have “see” 256 bytes of memory!) o The next generation of microprocessors, such as used in the IBM PC, had a 16-bit word length o Most microprocessors in use today have a 32-bit word length o However, 64-bit processors are gradually taking over o Other sizes (most typically 12, 18, 36, and 60 bits) used to be common before microprocessors but started disappearing in the 1970’s in favor of today’s “multiple of 8’s” scheme, invented by IBM in the 1960’s. Bus speed – the number of cycles per second for the bus that moves data between the processor and primary storage. In recent years, this is generally slower than the processor clock speed Bus width – the number of bits the CPU-to-primary-storage bus moves in one memory cycle. This is typically the same as the processor word length, but does not have to be Note: the TRP text conflates the concept of bus width and bus speed! Bottom line: processors are extremely complicated devices with many capabilities and it’s hard to boil down their performance into a single number. Memory measures: Kilobyte or KB. Typically used in the binary form, 210 = 1,024 bytes. This is about 103 = 1,000, hence the prefix “kilo”, meaning “1,000”. Just to confuse things, in some other contexts, the “kilo” prefix is sometime used in its decimal form, meaning exactly 1,000. Megabyte or MB. In binary form, 210 = 1,024 kilobytes = 210 210 = 220 = 1,048,576 bytes. In the decimal form it means precisely 1 million. Gigabyte or GB: In binary form, 210 = 1,024 megabytes = 210 220 = 230 = 1,073,741,824 bytes. In the decimal form it means precisely 1 billion. Terabyte or TB: 210 = 1,024 gigabytes Petabyte: 210 = 1,024 terabytes Exabyte: 210 = 1,024 petabytes Today, primary storage is typically in the hundreds of megabytes to small numbers of gigabytes per processor. Secondary storage is usually tens to hundreds of gigabytes per hard disk, with one to four hard disks per processor. Terabytes are currently the realm of enterprise storage systems, but single hard disks storing a terabyte should appear soon. Petabytes and exabytes are big. Performance trends: Moore’s law: the number of transistors that can be placed on a single chip – roughly equivalent to the computing power of a single-chip processor – double approximately every 18 to 24 months. This “law” has held approximately true for about 25 years. Gordon Moore was a co-founder of Intel Corporation. Doubling in a fixed time period is a form of exponential growth. Physics dictates this process cannot continue indefinitely, but so far it has not slowed down. -- 7 -- Primary storage sizes are also essentially proportional to the number of transistors per chip, and roughly follow Moore’s law. Hard disks, measured in bits of storage per dollar purchase price, have grown at an exponential rate even faster than Moore’s law o This leads to problems where disk storage outstrips processing power and the capacity of other media one might want to use for backup purposes (like magnetic tape) Class 3 – Software Basics In the context of information systems, hardware is the physical equipment that makes up a computer system. Software consists of the instructions that tell the equipment how to perform (presumably) useful tasks. Software is possible because of the Von Neumann architecture, called the stored program concept in the textbook. This innovation, dating back to fundamental research from World War II through the early 1950’s, allows computers to manipulate their software using the same processors, memory, secondary storage, and peripherals that they use for things we would more normally think of as “data”, such as accounting transaction information. The original term for software as program, meaning a plan of action for the computer. People who designed programs were called programmers. Now they are also called developers, software engineers, and several other things. At the most fundamental level, all computer software ends up as a pile of 0’s and 1’s in the computer’s memory. The hardware in the processor’s control unit implements a way of understanding numbers stored in the computer’s memory as instructions telling it to do something. <Review binary representation> For example, the processor might retrieve the number “0000000100110100” from memory and interpret it as follows: 00000001 “Add” 0011 0100 “R3” “R4”, meaning that the processor should add the contents of register 3 (containing a 32- or 64-bit number) to the contents of register 4 (recall that registers are very fast memory locations inside the processor). The scheme the processor uses to interpret numbers in this way is called its machine language. The dialect of machine language depends on the kind of processor. For example, the PowerPC processors in Macintosh computers use a totally different machine language from Intel Pentium processors – a sensible program for one would be nonsense to the other. However, many internally different processors implement essentially identical machine languages: for example, -- 8 -- all Intel and AMD 32-bit processors implement virtually the same machine language, so most machine-language programs can be moved freely from one to the other. Writing programs in machine language is horrible: only very smart people can do it at all, and even for them it becomes very unpleasant if the program is longer than a few dozen instructions. Fortunately, the stored program concept allows people to express programs in more convenient, abstract ways. The next level of abstraction after machine language is called assembler or assembly language. Assembly language allows you to specify exactly what a machine language program should look like, but in a form more intelligible to a person. A fragment of a program in assembly language might look like Add Comp Ble Load Etc. R3, R4 R4, R7 cleanup total, R3 // Add register 3 to register 4 // Compare register 4 to register 7 // If register 4 was less <= register 7, go to “cleanup” // Otherwise, load a memory location called “total” into // register 3 Note that the stuff after the “//” markers are explanatory comments and not part of the program. People can actually write amazingly long and powerful programs this way, but it is very inefficient and error prone. Furthermore, the program is still tied to a particular machine language and thus to a particular family of processors. After assembly language, higher-level languages evolved. These languages allow programs to be expressed in a form closer to human forms of communication like English and algebra. For example, PROFIT = REVENUE – COST IF PROFIT < 0 THEN PRINT “We lost money!” Examples of higher-level languages: BASIC (numerous dialects) C (allows a lot of low-level control) C++ (similar to C, but “object-oriented”) Java (looks a bit like C++, but simplified in many ways) JavaScript (web-oriented, minimal version of Java) FORTRAN (most often used for science and engineering applications) COBOL (used for old-fashioned mainframe-style business application) … many, many, more… The classic way of implementing a high level language is called a compiler. Compilers translate a program in a higher level language into a machine-language program (in Windows, a “.exe” file). -- 9 -- Internally, they may actually translate the program into various levels of internal intermediate representation, then into assembly language, and finally into machine language Pieces of the program can be compiled separately into little chunks of machine language called object modules (“object” here has a different meaning from “object-oriented”), which are then combined into a single machine language program by a program called a linker. Commonly used object modules can be grouped into “libraries” so they don’t have to be recompiled all the time. The result of the compiler/linker combination is a “raw” machine language program. So, to run on computers with different machine languages (or even different operating systems; see below) a program may have to be compiled multiple times in different ways. Interpreters are an alternative approach to using compilers. With classic interpreters, the program is stored in essentially the same textual form the programmer used to write it. When you “run” the program, the interpreter essentially translates each line as it is encountered. This can be inefficient: if a line is executed many times (in a loop, for example), it could get translated many times. There are ways of reducing this penalty, but fundamentally interpreted programs run slower than compiled ones. Interpreters tend to make writing and debugging the program easier. The program can often be easily moved between different kinds of computers so long as each has an interpreter for the language being used In many applications, the difference in speed isn’t noticeable (especially as processors keep getting faster), so the portability and easier debugging of interpreters may make them a better choice. There are intermediate approaches between compilers and interpreters; one example is the Java language. A compiler can turn the program into some intermediate form (still not tied to a particular machine language); at run time, a simpler, faster interpreter “translates” the intermediate form. Software structure: Only computers with very simple jobs (like running a thermostat) have just one layer of program. Most software systems, like hardware, are constructed of layers and modules so that the complexity of the system is manageable. The most popular current arrangement is: BIOS is lowest level (Basic I/O System) o The BIOS typically comes with the computer in “read-only” memory, but can be updated by “flashing” that memory. Updating the BIOS is typically rare. o The BIOS provides low level “subroutines” (program modules) to send data to and from peripherals etc. Operating system (OS) on top of BIOS. o The OS implements basic user-understandable capabilities Arbitrates between tasks Tracks files Structures contents of screen o You can run different operating systems on top of the same BIOS, for example either Windows or Linux on the same kind of computer. -- 10 -- o Because the BIOS manages the details of communicating with the computer’s peripherals etc., the same operating system can run in an outwardly identical manner on computers with different motherboards etc. o Within an operating system, you will find modules and layers, for example The kernel manages the basic scheduling of tasks Device drivers send and receive data from particular peripherals The file system understands how secondary storage is organized The GUI (Graphical User Interface) manages the screen, mouse etc. Compilers and linkers o Some of these modules and layers may have their internal own modules and layers, and so forth. o Modern operating systems are written (mostly, anyway) in high-level languages, so they can be moved between computers with different machine languages (once everything is compiled properly). Application software on top of the operating system. Application software makes the computer perform a specific, presumably useful task. You may run totally different software on top of the same operating system – for example, a video game or accounting software. o Application software often has layers. A very common situation: Relational database engine (MS Access, Oracle, Sybase, DB2, etc.) Specific business application built on top of engine (we will study how to do that!) o As always, these layers may contain their own modules and layers The term system software refers to lower-level software like the BIOS or the operating system, whereas application software is more oriented to specific tasks. As software gets more complicated and varied, it sometimes is hard to the exactly “draw the line” between “systems” and “applications” software; it can depend on your viewpoint. The situation with languages is also blurring. In addition to classic procedural languages like assembly language, C, and JavaScript, which express lists of explicit instructions, other languages have evolved for expressing other sorts of information. For example: HTML (HyperText Markup Language) expresses the appearance of textual information with “hypertext” links between elements, graphics, and “fill-in” boxes or “form” elements. SQL (Structured Query Language) describes information to be retrieved from a database. It doesn’t necessarily say exactly how to assemble the information, just what kind of information is needed. Such nonprocedural languages are often a critical element of modern software. Classes 4-5: See Chapters 2-3 of the GB Book -- 11 -- Class 6: See the Handout on Memory Storage Calculations Partial Notes for Class 7 (TRP Sections 3.1-3.4) We have seen a little bit now about the tables in which business data are stored, and to how to calculate the amount of storage they might consume. Managing data in large modern organizations can be very challenging. Particular challenges: The volume of data increases as new data is added. New technologies mean that gathering new data is easier and faster. So, not only is the total volume of data increasing, but the rate at which is increasing is also increasing! Data tend to be scattered throughout the organization. It is often desirable to centralize data storage, but by no means always – it may be better to leave departments or working groups “in charge” of the data they use the most. It is costly and risky to replace older “legacy” information subsystems that are working smoothly. Sometimes it may be better to created “federated” systems that combine information from constituent systems. We may also want to use data from outside the organization (either public-domain or purchased). It may also be advantageous to share some information with suppliers or vendors (for example, sharing information about inventories can reduce inventory fluctuations and costs throughout a “supply chain”). Data security and quality are important, but are more easily jeopardized the larger an information system becomes. The text distinguishes between two main modes of using data: Transactional processing (sometimes called TPS): keeping track of day-to-day events, such as logging orders and shipments, and posting entries to accounting ledgers. In terms of a data table, transaction processing means an ongoing process of adding rows (for example, to reflect a new order), modifying table cells here and there (for example, if a customer changes their telephone number), and perhaps deleting rows. Analytical processing: means using multiple table rows to obtain “higher-level” information. Entering a row into a table to reflect a newly-received order would be transaction processing; an example of analytical processing would be computing the number of orders and total dollar value of orders for this month, and comparing them to last month. o Analytical processing can be as simple as sorting, grouping, and summary calculations in an Access query or report. For example, providing all group managers with a summary of their groups’ costs for the month, broken down by cost category. This kind of application can be called “classic” MIS (Management Information Systems). o Analytical processing can get a lot more sophisticated. For example, data mining refers to using sophisticated statistical or related techniques to discover patterns that might not be obvious in classical reports. o Decision support can involve using the data to help managers make complex decisions, i.e. how to route 400 shipments from 20 warehouses to 100 customers. o Database systems like Access don’t ordinarily do data mining or decision support by themselves. For such uses, usually need to be connected to other pieces of software. Sometimes it can be mistake to do transaction processing and analytical processing on the same database, especially if the analytical processing is very time consuming or complex. -- 12 -- The analytical processing may make the transaction system run slowly Conversely, the transactions may interfere with the analytical processing and make it run to slowly If an analytical processing step takes too long, the data it is using may change in the middle of its calculation. “Locking” the data to avoid this can block transaction processing. It may be better to make a copy or “snapshot” of the database used for the transaction system. This is often called a “data warehouse” You can do a lot of analysis on the data warehouse without disrupting the transaction system and a lot of transactions without disrupting data analysis The data warehouse will not reflect the very latest transactions, but for large-scale aggregate analysis, that may not be a big problem. Remainder of Class 7: roughly follows GB pages 210-212 (However, I also introduced the notion of a repeating group.) Class 8: Followed Database Design Handout Class 9: Video Store Database Design Example; Subtypes Class 10: Lab Class Class 11: Lab Class, Personnel Database Design Class 12: Information System Application Areas and Classification Figure 2.2 on TRP p. 36 is a nice overview of the interaction of IT and organization structure. Base of “pyramid” is the IT infrastructure (hardware, system/basic software) IT services constitutes the human base of the pyramid Transaction processing systems managed day-to-day record keeping, and may or may not cut across functional divisions. Above the transaction base are various functional divisions o Accounting o Finance o Human Resources o Operations/Production o Marketing o Etc. These may have their own systems/applications fed by data from the transaction base -- 13 -- Decision making levels above the transaction base are often roughly categorized as o Operational o Tactical o Strategic The operational/tactical/strategic levels might or might not require specialized information systems or applications Enterprise Resource Planning (ERP) systems are conceived as a means of integrating transaction processing and most higher level IT applications. o Try to fulfill most of your IS software needs by purchasing a massive product from a single vendor, then configuring it to your needs (and ignoring unneeded parts). o Advantages: Speeds up information flow between functional areas, since everybody is using the same, up-to-date information. Without ERP, you need to build “bridges” between systems used by different functional areas: “As needed” bridges may be slow and cumbersome Periodic bridges may result in out-of-date information; for example, accounting’s information system may be a few days behind sales’ information system. Benefit from enormous design investment by ERP vendors Less in-house programming o Disadvantages Massive complexity – typical ERP databases have many thousands of table Configuration/customization may be tricky and require specialized skills Loss of flexibility: it’s hard for ERP to anticipate everything every customer might want. You may have to do things “their way”. High purchase cost. Material on TRP pp. 37-39 was covered in earlier lectures. Information systems may be classified by organizational scope (TRP p. 41) Functional – supports a single traditional department such as marketing and HR. o Example: the personnel database we designed last time supports HR Enterprise-wide – serve several traditional departments or the entire enterprise. o ERP systems are an example Interorganizational – connect to the information systems of suppliers, customers, or partners. Examples: o Supply chain management systems are a currently trendy example. Idea is to increase efficiency by sharing some information with suppliers/customers, and somehow sharing the monetary gains from sharing the information. o When you order a used book through Amazon.com, their information systems automatically pass the order to an appropriate used book dealer. Information systems may be classified by their degree of processing sophistication (not mentioned in chapter 2 of TRP, but important) -- 14 -- Transaction Processing Systems (TPS): recording day-today business events; displaying results of very specific queries. Examples: o Record an order for customer X o Record that shipment Y was loaded onto truck Z o Record that truck Z left the warehouse to make deliveries o Display recent orders placed by customer X In Access terms, we can think of such systems as functioning on the level of tables and forms. Management Information Systems (classic MIS): distill/aggregate information gathered by TPS into a form (usually reports) that give some sort of “bigger picture”. MIS systems typically perform relatively simple processing operations like sorting, grouping, adding up, and averaging, much as one can do in a Microsoft Access query. Examples: o Weekly, monthly, or quarterly sales reports broken down by region and product o Quarterly inventory summary o Annual report on employee sick time, broken down by location and department Examples of tasks performed by classic MIS systems (TRP table 2.2, p. 44): o Periodic reports – example: weekly sales figures o Exception reports – example: when production defects exceed some limit o Ad hoc reports – generated on demand (essentially an arbitrary query) o “Drill down” – an interactive “report” allowing the user to focus detail on particular areas In Access terms, we can think of such systems as functioning on the level of queries and reports. Decision Support Systems (DSS): try to go beyond simple reports, combining information from a database with relatively sophisticated analytical techniques, with a view to helping make some potentially complex or difficult decision. Examples: o Take today’s list of deliveries, and suggest how they should be split up among delivery trucks, and how each truck should be routed. o Statistically forecast next quarter’s demand for 20 different lines of shoes, and suggest an order quantity for each kind of shoe. DSS systems typically include a database, but also require special programming. On its own, a DBMS (Database Management System) like Access or Oracle will not have sufficient functionality to implement a DSS. To implement a DSS, you need to extract information from the DBMS into another kind of program, or add special code to the DBMS; for example, Access allows users to add specialized code in Visual Basic. o Sometimes a DSS can be entrusted to make certain routine decisions, in which case I guess it would become a “DS” (Decision System), and not a Decision Support System. Information systems may be classified by the management/decision level they support (TRP pp. 42-45) Clerical/operational: customer service, claims processing, package delivery, etc. o Supported by TPS, and occasional DSS (example: routes for a delivery driver) Operational decision making: first line operational managers such as dispatchers o Supported by TPS, plus MIS and/or DSS -- 15 -- Tactical management o Supported by MIS and possibly DSS Strategic management o Supported by MIS and possibly DSS All of the above may also be supported by basic office automation tools: e-mail, word processing, spreadsheets, etc. No categorization approach is perfect or useful for all purposes. Real systems may cut across boundaries – for example, have TPS and DSS aspects, or be used by both operational and tactical managers. Class 13: Networks Note: this class covers roughly the material in TRP chapters TG4, TG5, and 4. However, my emphasis will be different, and I’ll give some details not in the text. We now discuss transmitting data between physically separated computers Something that connects two computers is a link Many computers connected by many links comprise a network. Each computer on the network is called a node. Generally speaking, data should be able to get from any network node to any other node. There are many different shapes (or topologies) that can be used to build a network o Today the predominant network topology is a collection of interconnected “stars”. o At one time, interconnected “rings” were also popular, and they are still in use. Some nodes on the network serve primarily as connection points or to make sure data gets sent to right place o Switches o Hubs o Routers Kinds of links Link speed is usually measure in bits per second (b/s), with the usual (decimal) prefixes K (kilo/1,000), M (mega/1,000,000), G (giga/1,000,000,000), etc. Wires (usually copper) these can be used in many ways. o Twisted-pair wires, such as Regular telephone wiring Ethernet wirres, which currently come in three flavors, 10 Mb/s, 100 Mb/s, and 1Gb/s. o Coaxial cables, like those used for cable TV. These have higher theoretical capacity but are harder to work with. Wires can carry a lot of data for short distances, but slow down for longer distances Optical fiber (carries light pulses) o Invented about 30 years ago o More economical than wire for high data rates and long distances. Links can have capacities in the many Tb/s -- 16 -- o More difficult to work with than either twisted-pair wires or coaxial cables. In particular, it’s hard to “splice” two of them together (however, that also makes them more secure). Broadcast electromagnetic waves (radio/infrared/microwave) – “wireless” o Microwave links (directional – don’t broadcast equally in all directions) o Satellite links o Within-building (“wi-fi”) broadcast: capacities typically about 11 Mb/s right now o Wider area wireless: slower than wi-fi right now (cell-phone modems); this technology is just emerging, but I believe that demand will make it successful. A history of computer communications: The first large-scale electronic networks built were telephone networks. But they were not used by computers initially (Because computers didn’t exist! In fact, “computer” was actually a job title for a person who did mathematical calculations for engineers and scientists) When computers started, each organization had its own computer in its own room. Data got in and out of the computer room by being physically carried as punched cards, printed reports, magnetic tape etc. (eventually floppy disks, too) – later called “sneakernet”. People began placing I/O devices outside the computer room, connected by wires: printers, card readers, terminals (=printer + keyboard or screen + keyboard), etc. Technology was invented to encode (modulate) data into sounds the telephone network could carry. The data would be “demodulated” back into bits at the other end (thus the term “modem” – modulator/demodulator); see TRP p. 461. o This allowed people to have terminals at home and work over telephone lines o Many other business applications involving sending or receiving data from remote locations o Early modems were slow (100 b/s = 0.1 Kb/s in the 1960’s). This gradually increased to about 56 Kb/s today. o The technology is still widely used, but in decline In the late 1960’s, interest was growing in large general-purpose data networks independent of the telephone network. o Before, these existed only for specialized application (mostly military) o ARPANET – the (defense department) Advanced Research Projects Agency NETwork was built in the early 70’s o This became the “internet” o The internet had a fairly small user base until the mid 80’s. Then it began to gather momentum o In the early 90’s, the “world wide web” became very popular and drove a massive expansion of the internet (along with the “.com” boom) o In the 90’s there was a general telecommunications boom of which the internet boom was a big part. Numerous firms tried to secure their place in the boom by building lots of network links, especially in North America o A lot of network capacity was built. About the same time technology appeared that greatly increased the amount of data an optical fiber could carry (by simultaneously sending multiple information streams using light beams of several different colors). Things cooled off a lot, but internet use continues to climb. How networks work: LAYERING is very important -- 17 -- Bottom: physical layer – the physical workings of the links (wire, fiber, wireless, etc.) Network layer (typically “IP”, standing for “internet protocol”): lets the network figure out what computer the data is meant for. o Currently, each computer has a 32 bit “IP address” (usually split into four bytes printed in decimal like 128.6.59.202). o The addresses have structure – for example “128.6” in the first two bytes of the address means somewhere at Rutgers (although 165.230 could also be at Rutgers), the 59 designates a particular “subnet” (roughly the same as a building), and the 202 identifies which computer on the subnet. o Note that most computers also have a “hostname” and “domain name” that is easier for humans to remember, like “business.rutgers.edu” or www.amazon.com. While these are related to IP addresses, they aren’t exactly the same. Special computers called “name servers” provide the translation. Small organizations may not have a name server, relying on a name server elsewhere. Large organizations like Rutgers may have dozens of name servers. o 32 bits are no longer enough space for an IP address, and we will gradually move from IPv4 (32 bits) to IPv6 (128 bit addresses). Various workarounds suffice for now: Dynamically allocating IP addresses only when computers are connected to the network (“DHCP” is a common way of doing this), or Grouping small sets of computers to share a single IP (network address translation or “NAT”) Transport layer (typically “TCP”). Specifies how data is split up and logically moved in the network o TCP specifies up to 65,000 logical “ports” for each computer on the network. Each port can be used for a different application. o For each port, there can be more than one “session” or logical connection between to computers (for example, you could have two independent web browser windows connected to the same website from your own PC) o For each session, there may be a sequence of messages in each direction o TCP is a “packet switched” protocol – messages are cut up into “packets” that might take different paths through the network and are reassembled at the destination. Telephone networks are “circuit switched” – the whole conversation uses the same route through the network. Application layer: specifies different protocols for moving data in different ways. These constitute an “alphabet soup”: o First: TELNET (old) – run a terminal session (a text-based interaction between a person and a computer) o FTP (old) – move files back and forth (still in some use when security isn’t an issue) o SSH – encrypted terminal sessions and file transfers. This is how you connect to the “Eden” system to do text-based interactions. This works much the same way as TELNET and FTP, but is far more secure. o HTTP/HTTPs – hypertext transmission. This appeared in the early 1990’s and rapidly evolved into a way of projecting a wide range of graphic user interfaces across the internet. The “s” means secure/encrypted. HTTP is a much easier and -- 18 -- more secure way to do arbitrary things on a remote user’s screen than making them run custom software. o SMB, NFS – file sharing. Making disks on a distant computer look like they’re on yours o SMTP – sending e-mail to and between mail servers (computers that can route email). This is a “push” protocol: the computer initiating the connection sends the messages. o POP3, IMAP – retrieving mail from e-mail servers. These are “pull” protocols: the computer initiating the connection receives the messages (if there are any) o And many, many, more… Typically, each protocol uses a single TCP port (or perhaps a few). For example, HTTP usually uses port 80, and SSH usually uses port 22. Some more notes on layers and protocols As you move downwards in the protocol layer “stack”, more and more “bookkeeping” data — also called “headers” — get appended around the data you are sending. This means the actual number of bits transmitted can be substantially more than you might think. Header information may get appended to each packet, if the message is divided into packets. TCP and IP usually go together and are known as “TCP/IP” You can run more than one network layer on top of a physical layer on the same link (for example, IP and AppleTalk) You can run several transport layers on top of a network layer (for example, TCP and UDP on top of IP) And, of course, you can run many application layers on top of a transport layer (SSH and HTTP on top of TCP) Kinds of networks LAN – “Local Area Network” on the level of a single building WAN – “Wide Area Network” a vague term for something larger than a LAN Enterprise network – a larger-than-LAN network dedicated to a particular company or organization Internet – multiple networks networked together o The idea of an internet preceded the current notion of the internet – “the” internet happened when most things got connected! o The “IP” network layer was specifically designed to make it easy to create internets. That is why “the” internet could grow so quickly in the 1980’s and 1990’s, and conversely why TCP/IP is now the dominant network layer. VPN – “Virtual Private Network” – runs over the internet but encrypted in such a way that it looks like a private WAN that outsiders can’t snoop on (we hope!) Current network technology Most firms now have LANs implemented with copper wire, usually Ethernet, and now also building-level wireless Many larger firms have WANs and/or enterprise networks containing wire and/or fiber and maybe some satellite/microwave (depending on the firm’s size). The longer links in these networks are typically leased from ISP’s (see the next item) -- 19 -- Internet service providers (ISP’s) maintain interconnected, overlapping networks made primarily of fiber (examples: AOL, ATT, Sprint, etc.) ISP’s also lease capacity for use in enterprise networks. Large and medium firms connect directly to ISP’s. o Also, there are some non-profit alternatives to ISP’s, like “Internet2” which serves large universities like Rutgers Large firms can afford to lease dedicated high speed connections to ISP’s, like “T3” lines The dreaded “last mile”: smaller firms and individual households connect to the ISP’s in various non-ideal ways o By phone and modem (sometimes directly to employer instead of ISP) o Cable modem – signals carried over the same coaxial cable that distributes TV signals. Capacity usually 0.5-5 MB/s, but capacity may be shared with other users in the neighborhood o DSL – signals carried over regular phone lines, but not at audible frequencies. About 0.5-1 Mb/s, but occasionally faster. Only works if you are within 2 miles of telephone switching center, but does not have capacity sharing problems. Most network connections carry a fixed charge per month, without tracking the exact number of bits sent – one reason we have so much “spam”! Uses for networks Sending messages and data between people by “push”: e-mail possibly with attachments, instant messaging, voice over IP (VoIP) telephone Sharing/disseminating information by “pull” (basic web, FTP). Computers that are “pulled” from are usually called “servers” Other modes of sharing data. Some (or all) computers hold data that other computers can share. o Computers that share data on their disks are often called “servers” or “file servers”. o An example: “network drives” disks that are not on your computer, but act like they are (if a little slowly) Sharing other hardware like printers, scanners (these actually contain processors) Combination push/pull messaging and sharing: chat rooms, newsgroups Specific teamwork applications o Calendar/scheduling applications o Joint authorship systems (Lotus Notes?) Gathering data and interacting with customers o Websites that gather data or take orders o Sensors and scanners Offsite backup (this used to be done with tapes, but they are so slow now compared to hard disks) And many, many, more… Data transfer calculations Calculate size of data to be moved Divide by the speed of the transmission line Remember: o File sizes are commonly in bytes, with binary-style K, M, G etc. -- 20 -- o Transmission line speeds are usually in bits per second, with decimal-style K, M, G etc. o This mismatch is annoying, but unfortunately, it is the common convention. o It’s easiest to convert the file size to decimal bits, and then divide. o The transmission protocol will probably add header information that will cause the real download to take longer o Network congestion or malfunctions could cause even more delays Sample file transfer calculation: Suppose we want to do “video-on-demand” downloads of 4 GB movies in DVD format (binary-style GB). How long would that take over a 1 Mb/s DSL line, or a 50 Mb/s cable modem connection? Size of movie = (4 GB)(1024 MB/GB)(1024 MB/KB)(1024 B/KB)(8 bits/B) = 3.44 1010 bits Seconds to transfer with DSL = (3.44 1010 bits)/(1 106 bits/sec) = 3.44 104 sec = (3.44 104 sec)/(60 sec/min 60 min/hr) = 9.54 hours – probably not acceptable! Seconds to transfer with fast cable modem = (3.44 1010 bits)/(50 106 bits/sec) = 687 sec = (687 sec)/(60 sec/min) = 11.5 minutes – probably OK Note that actual transfer times would be somewhat larger due to overhead (headers) added by the application, transport, network, and physical network layers Class 14 – Brief Discussion of E-Commerce This mini-lecture relates to Chapter 5 of the text, which attempts to categorize all manner of ecommerce. I will take a different, quicker approach, and try to categorize what I view as the key aspects of successful e-commerce. In the 90’s, there was a “dot-com boom”. During this boom, many people was assumed that the internet would take over the entire economy and people trying to sell products like dog food over the web would become billionaires. We have since returned to reality. But the internet has had and continues to have a huge impact. A few companies, like Amazon, E-Bay, and Google have been spectacularly successful, and many smaller companies have done well. I believe the key aspects of successful e-commerce are: Making it easier to locate other parties with which to have an economic transaction Making it easier to identify desired products and services Decoupling the information aspects of a transaction from the physical transfer of goods Reducing the costs of communication between economic parties Promoting pull-based communication between economic parties Note that these categories are interrelated and overlapping. Making it easier to locate parties (firms or individuals) with whom to conduct transactions: If I am looking for obscure cross-country ski accessories, I can now locate the quickly even if I live in Alabama. A simple web search will probably identify dozens of firms -- 21 -- selling the sort of thing I am looking for. Before the internet, I would have had to go the library and search through national business directories, or try my luck with 800 directory assistance. E-Bay’s core business is helping buyers and sellers of possibly obscure used items find one another easily. If I have a very specialized item, services like E-Bay allow prospective buyers from all over the world to find me, instead of hoping that one of them will drive by my yard sale. Making it easier to identify desired products and services Suppose we don’t know what particular product/service we want. It is now relatively easy to o Search the entire e-commerce world using a search engine like Google o Search within particular suppliers’ websites Decoupling information and physical aspects of a transaction A bookstore or music store stocking all the titles carried by Amazon would be impossibly large We do not have to depend on having goods automatically distributed to our local area before we make a purchase decision. It is now possible to purchase goods not popular in one’s geographical area, and to have much wider choice of products and suppliers than if we depended wholly on local stores For physical goods, online commerce is dependent on an efficient parcel delivery system. Parcel delivery underwent a revolution in the 1980’s with the formation of Federal Express. It has grown symbiotically with web commerce. Physical stores and showrooms are not going away, though o Physical examination of goods very important for some products o Immediate delivery can have enormous value Reducing communication costs A supplier can now update pricing and availability information and transmit it into our firm’s information system as frequently, without human intervention Our customers can query flight times, inventory availability, pricing etc. without our having to dedicate so many people to physically talk to them People can lengthily custom-configure products (like computers) without consuming a lot of a firm’s labor Pull-based communication When communication is initiated by the potential consumer of a good or service Internet technologies greatly reduce the cost and human effort required for pull-based communication Successful e-commerce tends to contain a large “pull” element. Even effective internet advertising (as on Google) is based on determining a level of customer interest, based on their query The internet is also tempting for push-based communication because of very low costs. But the results are frequently irritating: spam and pop-up ads. I believe e-commerce have the potential to make it possible for market economies to function efficiently with smaller, more entrepreneurial, less internally politicized economic units. But it is not clear whether that will happen. -- 22 -- Classes 16-21 In these classes we covered Many-to-many relationships Normal forms and normalizing a database Designing more complicated databases Multiple relationships within a single pair of tables Setting up more complicated relationships in Access More advanced query techniques Query chaining: having one query use the results of another Please refer to The in-class database design example handouts The long database design handout in PowerPoint format Access examples on the class website. Classes 22-23 – More on Transactions, Functional-Level Systems, Integration, and Decision Support This material is based loosely on TRP chapters 7, 8, and 10. Basics and history: Functional-level information systems are those that function at the level of a single functional department of a firm, or part of such a department. In the early days of IS, these kinds of systems were the only feasible ones. Early systems of this kind were report driven, and had the original name “MIS” – Management Information Systems. They produced routine scheduled reports (for example, quarterly sales broken down by various standard categories), and exception reports – alerts about noteworthy situations like significant budget overruns or unusually high or low inventories. On-demand, more customizable reporting facilities arose later, as computing systems became more interactive. “Drill down” is one common approach here, where the user can selectively focus on parts of the report in increasing detail. Transaction Processing Systems (TPS): Keeping track of the basic activities of the organization As discussed above, involves changes to a limited number of table rows at time Batch versus online processing o Batch processing means grouping business events/transactions and processing them together. In early systems, it was the only feasible approach o Online processing means updating the organization’s master files as soon as each transaction occurs o Pure batch processing is totally antiquated at this point o However, many organizations maintain some limited batch processing under the surface: Credit card transactions may not be visible in your online account until they are “posted” in a batch operation that may occur several days after the actual transaction -- 23 -- Such procedures reduce the computational load on the system, or may be present because of older “legacy” software in the system. Characteristics of transaction systems: o High level of detail o Low complexity of computation o May have to process large amounts of information if the organization is large o Must be highly reliable. Loss or damage to data can do critical damage o High speed desirable to be responsive to users or in order to keep up with transaction volume o Must be able to handle inquiries about specific transactions, customers etc. o Must support a level of concurrency suitable to the size of the organization Concurrency: Except for very small firms/organizations, more than one use may want to use a database at the same time. If more than one user wants to make modifications at a given time, that is called concurrency. Most computer operating systems have a file locking mechanism whereby only one user can modify a file at a given time For databases, this is usually relaxed to record locking or page locking, in which individual table rows or groups of rows can be locked. For example, if you are entering information in a row of the CUSTOMER table, that row is locked to other users. Access supports record locking and thus supports a minimal amount of concurrency Another useful technique is to group record modifications into a “transaction” that must be executed in its entirety or “rolled back”. Example: o Add a row to ORDER o Add a matching row to ORDERDETAIL, specifying the order of 10 light fixtures o In the light fixture record of PRODUCT, decrease InventoryAvailable by 10 units, and increase InventoryCommitted by 10 units. o Locks are obtained on the all the relevant table rows, and all the modifications are made, then all locks released. If any modification are invalid (for example, InventoryAvailable becomes negative), then all the modifications are “rolled back” and removed together. Grouping the transaction as above insures that no other user will see the database in an inconsistent state, for example with inconsistent order and inventory information. Concurrency in MS Access: Access has limited ability to handle concurrency; it is really designed as a relatively small-scale desktop database tool. A brief search on the web indicates that the maximum number of users an Access application can handle will typically be about 10, and could be lower. The database tables need to be stored on a system supporting Microsoft (SMB) file sharing. MS Access allows you to store some objects in one .mdb files, and other objects in another. Typically, you would store all changeable tables in one .mbd file, stored on a file server, and all other objects (such as forms, tables that don’t change, and macros) in -- 24 -- duplicated .mdb files on each user’s hard drive. This technique will reduce network traffic. MS Access supports the concept of grouping record modifications into a transaction If you anticipate needing more than 10 users or need rapid transaction processing, you should be considering using a more “industrial strength” database engine. There are a lot of these (Microsoft SQL server, Oracle, SyBase, FoxPro, DB2, …), but they are generally more expensive than MS Access. The general hierarchy that applications go through is: o Ad hoc record keeping in spreadsheets or word processor files (plus macros and/or manual procedures) o A system based on a desktop relational database like Access o A system based on a concurrent relational database like Oracle Because of their limited scope, functional-level systems are relatively easy to design, build, understand, and maintain. That means they are relatively easy to create “in-house”, cheaper to buy from outside, and cheaper/easier to customize. Integration is a key issue with functional-level systems. At some point, they must inevitably retrieve data from or provide data to other functional-level systems. Integration is hard to do smoothly. For each particular kind of data, one system must be declared the “master” Data can be periodically copied or “mirrored” from one system to another. For example, sales data may be periodically batch-loaded from sales to the accounting and production departments. This practice means that some information may be out-of-date and there may be no consistent “big picture” snapshot of the entire operation Systems can query one another for information on an as-needed basis. This can result in tighter integration, but also slow performance and excessive network traffic In an extension of the above technique, one might create a “federated” portal, that makes the entire organization’s data look like one giant database, but would actually decompose queries into subqueries, one for each functional system, and then combine the results. Such solutions can be effective but are subject to slow performance. The alternative is to replace multiple functional systems with one larger integrated system. This approach, called Enterprise Systems, is becoming the most common and effective approach, but has its drawbacks. Curiously, ERP evolved from systems designed for the production/manufacturing function. Originally, such systems were called MRP (Material Requirement Planning). These systems kept track of how many of each kind of part or subassembly make up each other part/subassembly. Thus, an order for six airplanes could be fairly easily translated into an order for 6072 screws, 88 miles of wire, etc. MRP was mainly concerned with inventory and fairly crude management of production schedules “MRP II” added the ability to estimate financing, labor, and energy requirements Finally “ERP” (Enterprise Resource Planning) evolved to include a transaction base for other functional departments Some related acronyms: -- 25 -- o CRM: Customer Relationship Management o PRM: Partner Relationship Management o SCM: Supply Chain Management. SCM concerns the related flow of Material Information Money, both within a firm’s production process and to/from suppliers and customers. It is particularly fashionable to concentrate on material inventory issues. A claim made by SCM adherents is that automated sharing of information can reduce inventories, improve forecasting, and reduce inventory variability along the entire supply chain; that is, improved information flow, facilitated by information systems, can reduce the need for expensive physical inventory “buffers”. Leading ERP vendors: SAP (R/3) Oracle Computer Associates PeopleSoft (in process of takeover by Oracle) ERP systems typically consist of “modules” roughly comparable to functional-level systems, but integrated in a single giant database design scheme (picture and entity-relationship diagram with 9000 tables!). ERP drawbacks Very high cost Having to buy modules you will never use Loss of flexibility – you may not be able to do it exactly “your way” Modules may not have all the functions of more specialized software Configuring/customizing ERP is a very specialized skill ERP tends to marketed as a way to wipe away all the problems of legacy systems. But the difficulty of “crossing over” to ERP is frequently underestimated. Sometimes the budget is grossly exceeded and the ERP project may be killed before tangible results are obtained (see TRP pp. 260-261). To get around some of these limitations, ERP systems can be interfaced with custom or “best of breed” systems with specialized functions. In this case, the ERP system is typically keeper of the transaction-based “master” data. Such interfacing has many of the same drawbacks as trying to integrate individual functional-level systems. ERP vendors keep adding capabilities and modules to reduce the need for interfacing and capture even more of the business software market. Personal observation: many business decision makers don’t know (at least any more) as much about designing information as you do now. Even if they do, the wider the scope of a system, -- 26 -- the more complex it tends to be, and the harder it becomes for non-specialists to understand. Therefore there is a natural tendency of very wide-scope information systems like ERP to be outsourced to firms that specialize in them. There are very strong economic arguments for this arrangement, but implies a certain loss of control and understanding by the users of the system. Thus, instead of deeply understanding how one’s information systems are working, one can be reduced to throwing around acronyms and trying to interface between gigantic, monolithic, and perhaps poorly understood applications like R/3. Perhaps we need a more modular approach to building systems out of component parts, somewhat akin to the old functional-level systems, but designed for easier integration with one another. Decision Support Systems (DSS): systems to help with planning and decision making. DSS goes beyond just presenting reports and leaving the decision process itself entirely up to people. Such systems may be needed because: The decision may be too complex for unaided human decision makers It may be manageable for human decision makers, but not in the time available Sophisticated (mathematical) analysis may be required, which is much easier and more reliable if automated. Decision support systems require modeling of the system to be decided about. There are two kinds of modeling: Explicit, often using “management science” and/or statistics tools. This type of modeling typically requires mathematical skills and makes the model assumptions explicit. The rudiments of such skills are covered in the “Operations Management” and “Statistical Methods for Business” classes. Such modeling can also be more logic-oriented than mathematical, based on systems of rules and deductions (see the topic of expert and “intelligent” systems below). Implicit/automated – we may use some kind of “canned” methodology to construct the model. In this case there is generally some sort of “learning” or “fitting” procedure that adapts the model to the problem at hand. The model assumptions may not be immediately discernable, and model behavior in unusual situations may be hard to predict. This sort of “magic box” approach is therefore risky. Example mentioned in book: artificial neural networks. DSS systems typically need to “harvest” data from TPS or other systems. This data is then processed and plugged into the model. Some standard categories of decision support: Logistical/operational planning systems – these tend to use mathematical “management science” models and analytical (although more complicated) akin to what’s in the “Operations Management” course, augmented by an interface to the organizations databases and a suitable user interface. Data Mining – systems to detect patterns in large volumes of data. These may use a combination of analytical techniques, including statistical methods Expert Systems – systems that try to mimic the abilities of human experts who may be in short supply. -- 27 -- o A common approach here is to build a “rule base” or “knowledge base” of rules or facts that you believe to be true about your business. These rule/knowledge bases are far less structured than relational databases. Examples: A defective power supply 20-volt output causes an overheating finisher unit Overheating finisher units cause paper jams o The rule/knowledge base can be processed by an inference engine that tries to combine the elements of the rule base in an intelligent way. A very simple example: Based on the above information, a defective power supply 20-volt output causes paper jams o Such systems have been applied to Medical diagnosis Equipment troubleshooting Online decisions whether to extend credit to customers Allocating gates to flights at airline terminals o An advantage of these systems is that they can are very flexible in terms of the kinds of data and constraints they can handle. They often are the best solution to decision problems that have very complex constraints and few possible solutions. o Dangers: A contradiction within the knowledge base can cause the system to go haywire Due to the complexity of the interactions within a large rule base, the system’s behavior in unusual situations my be hard to predict If there are too many possible solutions, these methods may not be very efficient and finding the “best” one. This is contrast to “management science” methods, which are less flexible in handling complicated constraints, but better at determining the best of many possible alternatives. Classes 23-25 Remainder of class 23: labor mediation review example Class 24: second midterm Class 25: results of midterm, self-relationships (employees and course catalog examples) Class 26 – Acquisition and Development We now have a rough idea how standard business information systems are typically constructed, with a focus on the relational database elements. Now, we’ll briefly discuss the management and “business process” issues of creating or replacing information systems. These topics are covered (somewhat differently) in TRP chapters 11 and TG6. Discussion here is written from the standpoint of software, but principles also apply when system development involves acquiring hardware or developing custom hardware. -- 28 -- Organize the discussion around four main issues: The structure of the process by which the information system gets created; here, most experienced people favor various forms of the SDLC (System Development Life Cycle) framework. The technical aspects of how the system is implemented Who does the implementation How the services are delivered Common problems with developing information systems, especially software: Schedules slip Budgets are exceeded Final product may not meet stakeholders’ expectations It is hard to make systems reliable; problems can be very disruptive Extra features tend to creep in along the way (FAA story) Systems Development Life Cycle (SDLC) framework: 1. A 5- to 8-stage process. There are many variations in the exact number of steps and their names; see Figure TG6.2 (p. 490) for one example. 2. Each step has a “deliverable” on which all interested parties “sign off”. In the early stages this is a document. Later, it may be a system. 3. If problems are found at any stage, you go back to the previous stage, or perhaps back more than one stage. But the idea is to plan ahead at each stage to reduce the probability of having to go back later, and the severity of issues that might have to be revisited. A 6-stage version: 1. Feasibility and planning (called investigation in the book) 2. System analysis 3. System design 4. Implementation 5. Cutover (sometimes called “implementation”, just to confuse things) 6. Maintenance Sometimes there is an additional “evaluation” stage at the end, but perhaps that is best considered something that should be done throughout the other stages: How effective is our solution? Did we meet budget and deadline goals? (Once system is operating) Is it reliable? What improvements can we make? Step 1: Feasibility and Planning – identify the problem and the general form of the solution Identify problem to be solved Determine goals Evaluate alternatives Examine feasibility o Technical: do the necessary technology and skills exist? Do we have access to them? -- 29 -- o Economic: will it be cost effective to develop/acquire the system? Will the system be cost effective in practice? o Organizational: will the system be compatible with the organization’s legal and political constraints (both internal and external) o Behavioral: will the people in the system accept the system? Will they be likely to sabotage, override, or ignore it? What kind of training and orientation will be necessary? Are we attempting technical fix to an organizational problem that would be best addressed another way? Step 2: Systems Analysis – specify exactly what the system will do Define inputs, outputs, and general methodology Create basic conceptual structure Specify in detail how the system will look to users and how it should behave Can construct dummy screens/forms and reports, or prototype systems Leads to a requirement or specification document. This document should be “signed off” the parties involved. Step 3: Systems Design – specify how you will meet the specification Describe as collection of modules or subsystems Each module may be given to different programmer or team Design specifies how modules will communicate (inputs, outputs, etc.) Can use specialized/automated design tools Can build prototypes Leads to a design document – a description of how you will create the system. Managers and programmers sign off on this document. o Many “computer people” like writing code but not documents, so they may resist this phase o But it is much cheaper and easier to catch big mistakes in a design document than after you’ve started writing a huge program or bought an off-the-shelf product that can’t do what you want. Step 4: Implementation – build the system! Test things thoroughly as you create them Make unit tests to exhaustively test each module before connecting modules together Some firms have separate QA developers to test things again Step 5: Changeover or cutover – start using the new system Crucial: final testing before cutover Cutover can be really painful, especially if the old system was already automated Options: o “Cold turkey” – do it all at once; very risky o Parallel – use both systems at once o Phased – gradual By part of system By part of organization (regions, departments) -- 30 -- Can be difficult to implement Not unusual for organization to “roll back” to an old system (and maybe try again) Cutover is much easier if users were already “on board” in specifying the new system Preparation/training might be crucial in some cases Step 6: Maintenance – fixing problems, adding features Except in emergencies, best to collect changes into a release which can be carefully tested Install new releases periodically; not too often Develop a “QA suite” or regression test to check that bug fixes don’t create more problems or revive old bugs (“rebugging”) o Expand QA tests as features are added Critical to involve both users in decision making in most stages (exceptions: implementation and design). Try to avoid having additional features and capabilities creep in at each stage (“scope creep”): decide what you are going to do, how you’ll do it, and then “just do it”. Overall benefits of SDLC as opposed to less structured approaches: Easier to estimate time and effort for the project Easier to monitor progress More control over scope creep Can stay closer to budget and deadline Easier to integrate work of different contributors More communication between users and developers, less disappointment in final results. Main drawbacks of SDLC: Can be cumbersome and slow. Inflates cost of making small changes or adjustments. Alternative or complementary approaches: Prototyping: rapidly build test/demonstration systems. Users can interact with these systems and suggest changes. o Iterate (repeat) until the system looks acceptable o Users may produce much more specific and helpful suggestions from interacting with a prototype than from reading a specification document or attending a meeting. o But – may tempt you to skip necessary analysis o Tends to lack documentation and paper trail o Can result in too many iterations o Can be combined with SDLC to try to get benefits of both approaches (use in planning and analysis stages) JAD – Joint Application Development: try to pack a lot of the process into one gigantic meeting (or a few of them). o I don’t have experience with this approach -- 31 -- o Gigantic meetings can be horrible RAD – Rapid Application Development: use specialized software tools to speed up the process. o Example: software tools for the design phase that then automatically do the implementation o Usually best applicable to narrow, well-defined applications or portions of the application. For example, there are very efficient tools for building user interfaces, so long as those user interfaces stay within certain rules o CASE (Computer Aided Software Engineering) tools are an example o Can lose design flexibility o You can think of MS Access as a simple form of RAD. For database applications that conform to certain restrictions, you can just hook up some tables, queries, forms, and reports, and get something reasonable without any activity that seems like “classical” programming. Bottom line: SDLC has proven itself in practice. How to Build Information Systems: technically, how do you design and implement? A spectrum of options, running accross: Custom programming Assemble from components Buy an external solution and configure or modify it to meet your needs Buy “off the shelf” or “turnkey” external solutions Combinations of the above are possible, for example: buy one part of the system, program another part yourself, and then combine them. Custom programming: Advantages: o Lots of control o Easy to include special features and capabilities Disadvantages: o Programming is difficult, time-consuming, hard to control, expensive o Need to keep programmers around for maintenance phase Assemble applications from components: Examples of common parts to buy: o Specialized math/analysis components o Security components: encryption/decryption o Credit card interfaces; shopping cart Advantages: o Reduce development time o Leverage expertise of others o Hide complexity of things like credit card approval – a classic “layering” technique Disadvantages: -- 32 -- o If you only need a few features, expense and interface complexity may not be needed o Some loss of control Note that virtually nothing is “totally” custom any more. Even if you develop the “whole” system yourself and do a lot of programming, you are still probably using o Standard operating systems o Standard programming languages and supporting software libraries o Standard database engines (like Access, Oracle, etc.) Purchasing external solution: but something off-the-shelf and configure it for your needs o Extreme example: ERP software like SAP o Amount of customization depends on generality of software o There are lots of “turnkey” systems for “standard” businesses like auto repair, doctors’ offices, and landscaping. You just enter some basic data and use it. o More complex, general software needs more configuration; it can potentially get very complicated and specialized o Allows you to take advantage of outside expertise o May lose flexibility – you may have to modify your business processes to conform to the system o General turnkey solutions like SAP can be very expensive o For simple, standard applications, turnkey systems will be hard to beat. For very large or specialized systems, the economics are not necessarily so favorable. Recent development: open source software like Linux. The source code is freely available. Users can understand what the code is doing, make modifications, and submit upgrades back to the central repository. The general trend in the industry has been for systems to become gradually more interoperable. Examples: Adoption of standard networking protocols (TCP/IP) Apple Macintosh idea (partially based on earlier developments at Xerox) of a standard user interface and cutting/pasting between different desktop applications (from different vendors!). This idea spread to Windows and other operating systems. Perhaps this trend will continue and it will become easier to piece together systems from components. Who does the development work? Whether you are custom programming, assembling components, or just configuring an off-the-shelf product, some development work needs to be done. Who will do it? Choices: Your own employees Contract or temporary employees working for salary An external vendor, as part of contract to develop/deliver the system Own employees: Advantages: -- 33 -- Easiest to integrate with users and managers they’ll need to interact with Understand your problem best Commitment to your firm Cheapest over the long term if their skills are fully utilized Disadvantages May be difficult to find and keep May not have the specialized skills needed for every project If they do have specialized skills, you may not be able to use them very efficiently Contract/temporary employees: Advantages: o Can take advantage of specialized skills it’s in inefficient to maintain in-house o Pay them only when you need their specialized skills, then move on to another client Disadvantages o Possible communication problems o Might be less committed to solving problems o May not understand your business as well as permanent employees o Could be more expensive if you need them more than you originally estimated. Outside vendors/outsourcing: Advantages and disadvantages are similar to contract/temporary employees, only more so. Additionally: If them outside vendor promises to develop system for a fixed-fee contract, you can shift development risk to them But, disagreements could arise whether the contract has fulfilled the contract requirements. If you use outside vendors to build the system, or buy an off-the-shelf solution, you may have to invest far less effort in the SDLC process. But you are not totally excused from it! For example, you still need to perform planning and analysis to determine what your needs are, and whether an outside vendor’s product will meet those needs. How the system’s services are delivered: Traditional approach: you own the hardware, you have non-expiring licenses for all external software components (at least for the release you’re using), and your own employees operate the system ASP (Application Service Provider) approach: you get the service the system provides, but you don’t possess the hardware, software. Instead, an external vendor maintains the system, and perhaps supplies some of the labor to operate it. You may simply access it via your employees’ web browsers. o This is a very well-established approach in a few areas, like payroll processing o When the internet and web got “hot” in the 90’s, there was a lot of hype about expanding these kind of services o Supposedly would let firms cut costs dramatically by focusing on core competencies and cutting back massively on their own IT staffing needs. -- 34 -- o Drawbacks: You need a lot of confidence in the ASP. They have your data and they are operating a key aspect of your business Many firms are reluctant to give up control/ownership of their data Firms are reluctant to be so dependent on an external vendor for their basic operations. Suppose the vendor starts having problems, or raises fees once you’re “locked in” and have lost key IT staff? o For these reasons, I don’t expect to see massive growth in ASP’s. ASP activity will be in focused areas which present low risk and where ASP’s are well established and can be cost effective. Examples: payroll, processing rebates. Class 27 – Ethics and Security (TRP 12) Ethics… …is the branch of philosophy concerned with right and wrong (which I won’t try to define here!) Ethics and legality should not be confused with one another, although they are (we hope!) related The existence of modern information technology raises some ethical issues (p. 364) Accessibility o Who has access to information? Is technological change benefiting or hurting particular groups by changing their (relative) degree of access? Accuracy o Who is responsible for making sure information is accurate? o Is new technology making it easier to distribute false or distorted information? Privacy Property Privacy and property are probably the most critical. Privacy: Evolving technology is making it easier to amass ever-growing amounts of data about individuals This information is often sold and exchanged by organizations without our knowledge, sometimes intentionally and sometimes unintentionally (JetBlue example, pp. 367-377) Is a reasonable degree of personal privacy being eroded? Corporations are now required to have privacy policies that are distributed to their customers o These are dense, long, legalistic documents. Does anybody read them? o How about regulatory standards? Monitoring: evolving technology means that we may be monitored more than we might expect in the past. o Security cameras (for example, in the UK) o If you carry a mobile phone, your movements can be tracked quite accurately as long as it is on; you do not have to be making a call -- 35 -- o Keystroke monitors – can capture every action an employee makes on their computer/terminal o It is legal (but not expected) for companies to monitor/read all their employee’s email. For some people, it may be important to maintain separate work and personal e-mail accounts. Is there a good reason e-mail communication should be less private than telephone and postal communication? Property: issues regarding ownership and distribution of information Trade secrets: information that firms intend to keep confidential or share only with business partners who have agreed not to disclose it. o Modern information technology makes trade secrets easier to steal; however, this is primarily a security issue (see below) Copyright and distribution issues: concerns information-based products that firms sell to their customers o Examples: Text and graphics material (books etc.) Films and videos Music Database resources Software o In the past: Such information was more strongly tied to physical media Physical media were relatively expensive, slow, and/or difficult to copy Quality of copies might be poor It might be hard to make large numbers of copies Copying equipment required major capital investment Copyright laws were instituted dating back to 1662 to protect books from massive copying Augmented at various points in the 18th, 19th, and 20th centuries Since printing presses are fairly easy to trace, such laws were mostly adequate for about 300 years. o Modern information technology has altered the situation: Information more easily separated from the physical medium Can be stored on hard disks etc. and transmitted over high bandwidth networks Modern input and output devices make quality, hard-to-trace physical reproduction feasible at low cost CD and DVD burners Laser printers Scanners o Result: massive “piracy” of copyrighted material in some areas Music Film/video Software -- 36 -- o Copy protection technology is only partially effective. Information that reaches the user in unencrypted form can always be copied. o Piracy uses both physical media and networks (sharing sites like Kazaa, Napster, etc.) o Music and text/graphics can now be distributed very effectively without the blessing of a mainstream “publisher”. Video will reach this point soon. This raises the issue of whether publishers remain necessary. Act as gatekeepers or certifiers of content quality. But how reliable? Give access to a marketing/promotion machine Still control physical distribution channels, but there are now effective competing channels o But how to ensure that musicians, authors, filmmakers etc. are paid for their work? Creating “content” still requires talent and a lot of labor. o High content prices may not be sustainable (examples: textbooks, some music) TRP 12 also has an interesting section on “impacts”, including some good information about ergonomics. I will not dwell on these topics in class, however. Security: modern information technology has made it much faster, easier, and cheaper to Store Move Organize Manipulate/process … information than with older “manual” technology. Unfortunately, the same technology can also make it faster, easier and cheaper to Steal Destroy Corrupt Abuse … that same information! There is no such thing as total security Don’t think of security issues as “one-time” problems; it is an ongoing process and a portion of the workforce needs to be dedicated to it Need to consider these costs when looking at cost-effectiveness of computer technology With awareness and effective countermeasures, security can usually be manageable Unintentional threats (accidents): Accidents always were a threat to organizations’ data. Fires and hurricanes can destroy paper files just as easily as computer files Centralized systems can be vulnerable to problems at the central site Distributed systems can be vulnerable to problems at just one site (depending on their design) Power failures can do a lot more damage than they used to With the introduction of computers, there are a lot of new ways for things to go wrong -- 37 -- o Hard disk “crashes” o Software “crashes” o Software “bugs” o Etc… Countermeasures: o Backup, backup, backup, backup, backup Can back up data to external media (CD-R, DVD-R, tapes) – protect the media! Unfortunately, hard disks have been growing much faster than backup options! Back up data to another site over a network Power backup devices (generators, UPS, etc.) Backup of software Have a backup plan for hardware (rent replacement hardware, for example) o For software developed in-house: proper development, maintenance, and lifecycle procedures to contain damage from bugs (see above) Remaining threats are intentional – caused deliberately by people Internal to your organization External to your organization o People/organizations you would ordinarily have contact with: partners, vendors, customers o Thieves and vandals Internal threats and problems – employees and consultants The larger the organization, the larger the frequency of o Employee mistakes or failure to follow procedures o Dishonest employees (rarer, but still a concern) Shortcuts or dishonesty by MIS employees may have a lot of “leverage” and may be hard to detect (trap doors, skimming, “time bombs”, …) Countermeasures: o Separate functions: for example, most programmers shouldn’t have access to real customer data o Use data access hierarchies and rules o Encryption? o Monitoring (ugh – this can take many forms, and has ethical drawbacks) o Support your employees – make it easy (or automatic) for them to do backup, install security software etc. External threats – business partner, vendor, and customer issues If you share interact electronically with vendors, customers, and partners, you may be exposed to their security problems as well as yours Exacerbated by recent “outsourcing” and cooperation trends like -- 38 -- o EDI (Electronic Data Interchange): firms automatically share data they believe are relevant. For example, we may let our suppliers see our parts inventories so they can plan better o ASP (Application Service Providers; see above) Web commerce technology can make improper/questionable monitoring of customers practical/profitable (cookies) In an e-business environment, it may be harder to tell legitimate vendors, customers, and partners from crooks masquerading as such Countermeasures? o Limit access o Investigate partners o Try to use reputable vendors/partners o Encryption o Monitoring (but how much is acceptable?) o Consumer awareness Other external threats Two motivations o Personal gain – thieves o Malice/troublemaking – hackers etc. (I find this harder to understand) These threats always existed, but computer technology – and especially network technology – makes attack much cheaper, faster, and easier Snooping and sniffing: monitoring networks as others’ data passes by (especially passwords) o Wireless networks especially vulnerable Hacking: gaining access to private systems and data (and possible abusing/damaging them) o Port scans o Bug exploitation (usually in operating systems, browsers, and e-mail programs) o “Social engineering” and “phishing” – faking messages from tempting or official sources to induce people to run booby-trapped software, reveal passwords, or disclose other confidential information New development: “spear phishing” – selecting a specific person so that you can send a more convincing phishing attack. Spam o Time-wasting o Nowadays, usually dishonest/fraudulent o Unintended consequences: would the inventors of e-mail have guessed that 50% of all e-mail would consist of unsolicited offers for fraudulent loans, get-richquick schemes and impossible anatomical “enhancement”? Unfortunately, the cost of sending spam is too low. o Legislation (“Can Spam”) has not been effective in reducing spam Annoyance/vandal attacks – denial of service (DoS) o For example, bombard a server computer with messages so it has no time to do its real job Self-replicating attacks: viruses and worms -- 39 -- o May move via e-mail and have a social engineering aspect (like much spam) o But may exploit security hole (like a forgotten trap door) and not require any human participation o Can reproduce very quickly o The more powerful software is, the more vulnerable (MS Office macros) Many attacks combine categories Hierarchy among hackers and spammers o “Script kiddies” o Spam pyramid schemes? Security Technologies/Techniques User identification o Passwords Make sure they are not vulnerable to guessing Have a change schedule Problems: You get too many of them Have to write them down or use one password for several systems Vulnerable to snooping interception with some older protocols like TELNET and FTP o Password generators: small electronic card that combines Fixed user password Internal passcode Time o … to produce a password with a very limited lifetime Example: “SecurID” o Biometrics: promising, but: Expense? Reliability? Ready yet? Access control within a computer system (server) o Read, write, execute, (delete) privileges for files or parts of files o Basic levels: user/group/all o More advanced: hierarchies and “access control lists” (ACL’s) Restrict physical Access o Example – US Government systems with classified data are supposed to have no physical connection to any unclassified system. o If a computer seems compromised by hackers or viruses, physically detach it from the network immediately Audits/verification o Example – user-verified paper voting records o IT audits Scanning software and hardware o Virus scanners: scan received disk files and arriving e-mail for suspicious patterns o Spam filters -- 40 -- o Watch network for suspicious packet patterns o Other forms of monitoring (again, how much is acceptable?) o Firewalls (hardware and software): block traffic into your network or computer Example – for a home, do not allow any connections initiated from outside Example – for medium-sized business, block all incoming connections except SMTP and HTTP into your mail/web server. Virtual private networks – use encryption to simulate a private network even if parts are carried over the internet (or some less secure private net) Encryption! o Encode network traffic so bystanders cannot snoop o Can also be used for files o Unintended consequences: also very useful for criminals o Will not discuss technical details here – discussions in most MIS textbooks are oversimplified. -- 41 --