CS 330 Management Information Systems (MIS) Kevin Lanctot Based on slides and lecture notes prepared by Michael Liu and Anne Banks Pidduck and the course text Management Information Systems: Managing the Digital Firm, 7th Canadian Edition by Laudon, Laudon and Brabston (2013) Table of Contents Topic 0 1 2 3 4 5 6 7 8 9 CS 330 Course Overview IT Infrastructure Databases Networking Management Information Systems Business Processes and Types of Info Systems Organizations and IS Social, Ethical, and Legal Issues Security Managing Knowledge Spring 2019 3 28 153 236 279 303 341 362 399 462 Welcome to CS 330! Topic 0 – Course Overview Four Questions • Who am I? Who are we? → Staff • Who are you? → Intended Audience • What are we doing? → Course Overview • How will we do it? → Course Delivery Reference • CS 330-S19-Outline.pdf available on Learn. CS 330 Spring 2019 Who are we? → Staff Instructor Kevin Lanctot, kevin.lanctot@uwaterloo.ca • Office: DC 2131 (near the skywalk to the M3/MC buildings) • Office hours: Tues and Thurs 12:30 – 1:30 pm • Things to know about me: - I’m a talker not a typer, i.e. best to ask me questions about course content in person rather than through email or Piazza. - I only check my email once or twice a day. - I typically replied to emails within 48 hours - Last name is pronounced long-k toe, i.e. “long toe” with the “k” sound after long. • CS 330 Spring 2019 4 Who are we? → Staff TAs • Rishav Agarwal • Kashif Khan • Mustafa Korkmaz • Mohamed Mhedhbi • Ke Nian Role • Some will have office hours before the assignment is due. • Others will have office hours after assignments or the midterm have been handed back in case you have any questions. • These office hours and their office hours will be posted in Piazza once that has been finalized. CS 330 Spring 2019 5 Who are you? Course Objective • Interested in learning more about computer science from the perspective of a manager who has to make informed decisions about information technology. Intended Audience • This course is most suitable for students interested in the application of computers to business (i.e. no programming). • Prereq: One of CS 106, 116, 136, 138, 146; Level at least 2B; Not open to Computer Science students. • Antireq: AFM 241, BUS 415W, 486W, CS 480/490, MSCI 441 • BBA/BMath double degree students interested in CPA should NOT take CS 330 because there is an ant-req problem. CS 330 Spring 2019 6 What are we doing? Major Topics Foundations of Management Information Systems • What are they? What is their role? How are they used? • Applications, types and its impact on business and society • Ethical and security concerns Technical Foundations of Information Systems • Hardware, software, files, databases, telecommunications, connectivity, standards. Building Information Systems • Tools and techniques to analyze and design information systems. The systems development life cycle. CS 330 Spring 2019 7 What are we doing? Course Objectives At the end of the course students should be able to: 1. Assess trade-offs in technological solutions, such as build, rent or buy decisions 2. Make informed decisions about using technology in a business environment 3. Define and explain the characteristics of a variety of business information systems 4. Describe strategic roles and management usage for information systems in business CS 330 Spring 2019 8 What are we doing? 5. 6. 7. 8. CS 330 Develop a Disaster Recovery Plan Develop and understand website security and privacy policies Understand the relationships among various information systems Understand the implications of wireless technology Spring 2019 9 What are we doing? Types of Questions We’ll Deal With • What do we need to secure our system? • Should I use a router, switch or hub to build an office network? • Who owns the pictures you posted on Facebook? • What is the big deal of 5G and IoT? • How much does it cost to have an IT infrastructure? How long does it last? • What do you need for your IT infrastructure? • Why do we need a data warehouse? • How much is the lifetime cost of a PC? CS 330 Spring 2019 10 What are we doing? Types of Questions We’ll Deal With • Can the police search my work computer without a warrant? • Can border officials search my phone without a warrant? • Is it illegal to hack an iPhone? • What is cloud computing? • What is pros and cons of cloud computing? • What is LTE and RFID? • Is internet the same as the world wide web? • Is a 3.1GHz processor necessarily faster than a 2.2GHz one? CS 330 Spring 2019 11 How will we do it? Course Delivery • • Lectures will include - slides - some class discussions - some notes made on blackboard / whiteboard Slides only contain key points - must supplement slides with ▪ course text and ▪ notes taken in class - key points on a slide will be written in blue italics - key technical terms, that you should learn, are written in red, such as DRAM CS 330 Spring 2019 12 How will we do it? Attendance • In Spring 2013, 9 random attendances were taken during the term for about 220 students • Final Grade Attended 7 or more classes Addended less than 7 classes 90s 7.1% 0% 80s 38.5% 12.4% Failed 3.1% 10.3% Conclusion: Regular attendance helps. CS 330 Spring 2019 13 How will we do it? Grade Calculation • 20% 4 assignments, each worth 5% • 25% Midterm Exam • 55% Final Exam • In order to pass CS 330, students are required to 1. pass the entire course (get at least a 50% overall grade) and 2. pass the weighted average of the midterm + final • Otherwise, the maximum course grade is 46% • Grades (will be) viewable on Learn. • Midterm and final are closed-book. CS 330 Spring 2019 14 How will we do it? Course Work • • • • • • Assignment 1: Assignment 2: Midterm: Assignment 3: Assignment 4: Final Exam: CS 330 Thursday June 6th at 5:00 pm Thursday June 20th at 5:00 pm Thursday June 27th 7:00-8:20 pm Thursday July 11th at 5:00 pm Thursday July 25th at 5:00 pm to be scheduled by the Registrars’ Office Spring 2019 15 How will we do it? Assignments • 4 assignments • can be up to 24 hours late, -10% (even if just a part of it is late) • They can be submitted in class or in the assignment boxes on the 4th floor of the MC near the tutorial centre. • We will set-up a way of submitting electronically on Learn (for proof of submission) and possibly as a preferred method of submission. • Assignments will be available for pick-up after class or during office hours. • Assignments must be well organized and easy to read. CS 330 Spring 2019 16 How will we do it? Lost Assignments • Softcopy submission to Learns is the only proof we will accept that the hardcopy of the assignment has been lost. Retention of Assignments • Unclaimed assignments will be retained for one month after the term grades become official in Quest. • After that time, they will be destroyed in compliance with UW’s confidential shredding procedures. CS 330 Spring 2019 17 How will we do it? Regrading Request • Requests for regrading will be accepted up to 14 days after students have the opportunity to pick up their assignments or midterm. • Details of how to request a regrade will be posted in Piazza after the first assignment is due. Regrading Policy • Grades are posted in Learn • It is your responsibility to verify that the posted grade corresponds to the grade actually received and to notify the instructor of any error. CS 330 Spring 2019 18 How will we do it? In Case of Illness • Accommodations for a missed assignment, midterm or final exam require a valid Verification of Illness form (VIF). - Submit the VIF to the MUO - MUO verifies it then notifies all your instructors - https://uwaterloo.ca/math/vif • If you miss the final exam, you will be given an INC if - there is a strong reason for missing the exam (generally a serious medical issue verified by a doctor's note) - AND a satisfactory performance during the term. CS 330 Spring 2019 19 How will we do it? Course Textbook • Information Systems: Managing the Digital Firm (2014), 7th Canadian Edition, Toronto, Pearson Prentice Hall. • A copy will be placed on 3-hour reserve in the Davis Library. Learn • lecture slides • assignments • marks • drop box for your assignments • we will have an official pinned post for each assignment, for the midterm and for the final exam. CS 330 Spring 2019 20 How will we do it? Piazza • For general questions about assignments • One [official] thread per assignment • Use private post if hinting about your approach, how you might solve it, implementation details • When asking about an assignment put the following in your title, AxxxQyyy. - E.g. if you asking about Question 2 on Assignment 1 include A1Q2 in the heading CS 330 Spring 2019 21 How will we do it? Piazza • You will receive an invitation to join the class discussion for CS 330 via your uwaterloo.ca email this week. • For questions about lecture material it is best to see me or a TA but you are welcome to use Piazza for questions that require a brief answer. • We will have extra office hours just before the midterm and final exams. CS 330 Spring 2019 22 How will we do it? How to succeed on assignments and exams What are some good studying habits? CS 330 Spring 2019 23 How will we do it? How to succeed on assignments and exams What are some good studying habits? CS 330 Spring 2019 24 How will we do it? UW Values • honesty, trust, fairness, respect and responsibility Avoid Cheating • Do your own work. • Do not try to look up answers to assignment questions on the web (unless the question states that it is ok to do so). • Midterm and final are worth 80% of your final mark. CS 330 Spring 2019 25 How will we do it? UW Values • honesty, trust, fairness, respect and responsibility Avoid Plagiarism • Use a proper reference and citation - Without which, you might be accused of plagiarism • Write in your own words. Don’t copy verbatim! - Even if you have proper reference and citation, but if you copy most material verbatim, you will not be charged with plagiarism, but you might get 0 for the assignment. CS 330 Spring 2019 26 How will we do it? Academic Integrity www.uwaterloo.ca/academicintegrity Grievance http://www.adm.uwaterloo.ca/infosec/Policies/policy70.htm Discipline http://www.adm.uwaterloo.ca/infosec/Policies/policy71.htm Appeals http://www.adm.uwaterloo.ca/infosec/Policies/policy72.htm CS 330 Spring 2019 27 Topic 1 – IT Infrastructure and Emerging Technologies Key Question • Why should you care about information technology (IT)? • What are the basic components of IT infrastructure? References • Course Text Chapter 5, IT Infrastructure and Emerging Technologies Acknowledgements • partially based on the lecture “Geek Speak” developed by John Finnson and John Doucette CS 330 Spring 2019 28 Why Understanding IT is Important Class Exercise Distinguish gibberish from genuine technical vocabulary. • Don’t be “cut out of the loop”! • Maintaining employees’ respect Exercise 1. I will show you a sentence 2. A student will answer whether it is genuine or gibberish 3. Discuss 4. Class vote CS 330 Spring 2019 29 Why Understanding IT is Important Class Exercise • Hackers are using snorters to cause cyber-lightening on our internal network. - Genuine or gibberish? • Hackers have used Trojan horses to introduce bots into our network. - Genuine or gibberish? CS 330 Spring 2019 30 Why Understanding IT is Important Class Exercise • Marketing wants a new server to support an in-house data mart. - Genuine or gibberish? • The head of IT suggests pinning business intelligence tools to our data cabinets to improve network protocols. - Genuine or gibberish? CS 330 Spring 2019 31 Why Understanding IT is Important Class Exercise • The security department suggests using stronger encryption for our wireless network to protect from war driving and cyber vandalism. - Genuine or gibberish? • Marketing wants to register our RFID tags with the domain name system for faster HTTP access. - Genuine or gibberish? CS 330 Spring 2019 32 Why Understanding IT is Important Class Exercise • We need to replace our existing 1 TB SSD with a 1,000 GB SSD because 1 TB is too small. - Genuine or gibberish? • If you install a duo core 32-bit CPU, you can have 64-bit computation power. - Genuine or gibberish? CS 330 Spring 2019 33 Why Understanding IT is Important Dr. Evil has discovered that Donald Trump has set the nuclear launch code to match his Twitter account password. So Dr. Evil plans to hack into Twitter to obtain Trump’s password to gain control of US nuclear arsenal. Is this plan technically feasible? a) Yes b) No CS 330 Spring 2019 34 Why Understanding IT is Important A friend runs an e-commerce company. Should they buy a DSL, ADSL, or T1 line? a) DSL b) ADSL c) T1 line d) This question is gibberish CS 330 Spring 2019 35 Why Understanding IT is Important Your IT expert informs you of employees falling victim to phishing and identity theft. She advises that a social engineering expert should be brought to the company to instruct the employees on how to avoid these attacks. a) Phoney / Gibberish b) Good idea c) Bad idea CS 330 Spring 2019 36 Why Understanding IT is Important Know Your Options • You will make better decisions if you understand the options and their trade-offs. • Important to understand security - Social engineering breaches can damage your company’s reputation and brand. - e.g. leaving a disk in the washroom that contains the label “Executive Salary Summary 2018” but really contains malicious software (malware) - Understand the structure of company’s security. CS 330 Spring 2019 37 Hardware Components Goal: Understand the basic components of a computer 1. Processor (a.k.a. central processing unit or CPU) is where symbols, characters, and numbers are manipulated 2. Primary Memory is where data and program instructions are stored temporarily during processing - e.g. registers, cache, RAM 3. Secondary Storage stores data and programs even when the computer is turned off - e.g. magnetic disks (HD) and optical disks (DVDs, Blu-ray), flash drives, solid state drives (SSD), magnetic tape CS 330 Spring 2019 38 Hardware Components Input Devices: convert data and instructions from the outside world into electronic form - e.g. keyboard, mouse, touchpad, touchscreen, microphone, camera 5. Output Devices: converts electronic data produced by the computer into a form understood by humans or the outside world - e.g. printer, speaker, monitor 6. Communication Devices: provide connections between the computer and communications networks - network interface card (ethernet or Wi-Fi), Bluetooth 4. CS 330 Spring 2019 39 Measuring the Amount of Data Capacity: KB, MB, GB, TB, etc. • Small b is a bit or single binary digit - there are only 2 possible bit values: 0 or 1 • Big B is a byte (8 bits), enough info to specify one English letter - there are 28 = 256 possible values: 00000000 to 11111111 • In general with n bits, 2n different values are possible • When dealing with data storage and data transfer rates the units refer to multiples of 1024 (or 210) and you should use large letters like K, M, G, T. • When dealing with the metric system or frequencies they refer to multiples of a thousand, like distance (1 km = 1000 metres) or weight (1 kg = 1000 grams). CS 330 Spring 2019 40 Measuring the Amount of Data Capacity: KB, MB, GB, TB, etc. • Common measures of size (typically use bytes) - KB = 1024 bytes - MB = 10242 (or roughly a million) bytes - GB = 10243 (or roughly a billion) bytes - TB = 10244 (or roughly a trillion) bytes Common measures of speed (typically use bits) - Kb = 1024 bits - Mb = 10242 (or roughly a million) bits - Gb = 10243 (or roughly a billion) bits • Occasionally manufacturers will use multiples of 1000 rather than 1024 but they will mention this in a footnote somewhere. • CS 330 Spring 2019 41 The Processor Word Size • When talking about processors, the word size is a measure of how many bits a processor can transfer or manipulate in in parallel (i.e. at the same time). • Recently processors for servers, laptops, tablets and cell phones come in two varieties. - 32-bit architecture has a word size of 32 bits. - 64-bit architecture has a word size of 64 bits. • Premium smart phones, recent laptops and servers would all use 64-bit architectures. • Older and inexpensive smart phones would still be using 32-bit architectures. CS 330 Spring 2019 42 The Processor 64-bit Architecture • When processor companies like Intel and AMD moved from 32bit to 64-bit architecture for their processors, they made sure that computer programs that worked for their old (32-bit) processors would also work on their newer 64-bit ones. • This feature is called backwards compatibility, i.e. when the new version will still work with the old system. • Programs optimized for 64-bit architectures will run faster. • Programs created for 32-bit architectures would still run on the newer processors. CS 330 Spring 2019 43 The Processor Processor Come in Two Varieties • Built for efficiency: typically ... - used in smart phones and tablets which are designed to run a long time with just a small battery. - these processors do not need a fan to keep cool - they try to minimize the number of transistors they use • Built for speed: typically … - used in laptops, desktops and servers which either have a large battery or are plugged in - these processors occasionally need a fan to keep cool - they are complex (i.e. use a lot of transistors) CS 330 Spring 2019 44 The Processor The Processor is the part of a computer that does the computation. It only executes simple instructions like … - Arithmetic and Logic: add, sub, mult, div, and, or - Comparisons: less than, greater than, equals, not equals - Accessing data: lw (load word from RAM), sw (store word in RAM) - Flow control: used to implement function calls, for loops, while loops etc. • The goal is to have simple instructions that can be executed very quickly by the processor. • Programs written in high level languages like Racket, C or C++ get converted to these simple instructions. • CS 330 Spring 2019 45 The Processor The Components of a Processor • Program Counter (PC): holds the address of the current (or next) instruction • Instruction Register (IR): holds the instruction that is being (or is about to be) executed • Arithmetic Logic Unit (ALU): performs arithmetic and logic operations (add, sub, mult, div, and, or) • General Purpose Registers: a small amount of temporary (and very fast) storage within the data path • Control Unit reads the instruction in the instruction register and turns on and off the other components of the processor to execute the instruction. CS 330 Spring 2019 46 The Processor Control Unit P C I R Registers $0, $1, ⁞ $31 ALU Random Access Memory (RAM) CS 330 Spring 2019 47 The Processor The steps of executing an instruction are 1. Fetch: get the next instruction from memory and load it into the instruction register. 2. Decode: get the source values from the registers 3. Execute: perform an ALU operation (if required) 4. Memory: access (i.e. read from or write to) RAM (if required) 5. Write Back: write the results back to a register (if required) Not all steps are used for each instruction. E.g. The instruction add $1, $2, $3 means • get the data from registers 2 and 3 (i.e. step 2) • add them together using the ALU and (i.e. step 3) • store the result in register 1 (i.e. skip step 4 but do step 5) CS 330 Spring 2019 48 The Processor Processor Caches • The speed at which you can access memory depends on the size of the memory, so the processor has a small amount of memory on the processor chip (called a cache) in (typically) three sizes 1. Level 1 Cache: 32 KB 2. Level 2 Cache: 256 KB 3. Level 3 Cache: 2 MB • The cache sizes vary from processor to processor. • The most frequently used data and instructions would be in the smallest cache and could be accessed very quickly. CS 330 Spring 2019 49 The Processor Multicore Processors • Each core acts like a separate processor on the same chip. • They may share some resources e.g. (L2 or L3 caches) and shared access to the rest of the computer. • Can also have multicore processors which do not share any caches. • The two most common multicore processors you would see in a laptop are - a duo-core processor (i.e. 2 cores) which can execute 2 instructions at the same time, - a quad-core processor (i.e. 4 cores) which can execute 4 instructions at the same time. CS 330 Spring 2019 50 The Processor Processing Power • Processor performance is typically reported as clock speed (frequency). • Its processing power is actually based on: - the number of bits that can be processed simultaneously (word size) - the speed that the data that can be moved between the processor, primary storage, and other devices (data bus width and speed) - how complex the instruction is - i.e. you could have an instruction that reads a value from RAM and adds it to another value, which you can think of as two instructions: 1) read value 2) add CS 330 Spring 2019 51 Where to Store Data? Varieties of Storage Devices • A computer has many storages devices, including - static random access memory (SRAM) used for registers - dynamic random access memory (DRAM or just RAM), - hard disk drive (HDD or HD) or solid state drive (SSD) - USB flash drive, secure digital (SD) card, mini SD, micro SD - digital versatile disk (DVD) , Blu-ray disk (BD) • Why make it so complicated? • Why not just have one type of storage device? CS 330 Spring 2019 52 Gap between CPU and Memory Performance Performance Source: Computer architecture: a quantitative approach by Hennessy, Patterson and Arpaci-Dusseau Processor Memory • • Year Processor performance has been increasing much faster than memory performance. Accessing (reading from and writing to) memory is the bottleneck. CS 330 Spring 2019 53 Clock Speed Measuring Clock Speed (Frequency) • The speed a clock “ticks” (really a square wave) is typically measured in - MHz: 1 million clock ticks per second or - GHz: 1 billion clock ticks per second or a clock tick every billionth of a second. Measuring Time CS 330 Unit milliseconds microseconds Symbol ms μs, us Fraction of a second 1/1000 s 1/1,000,000 s nanoseconds ns 1/1,000,000,000 s picoseconds ps 1/1,000,000,000,000 s Spring 2019 54 Memory Technology Typical performance and cost figures as of 2012 Technology SRAM Typical Access Time 1ns $500-$1000 70ns $10-$20 5,000 - 50,000ns $0.75-$1.00 Magnetic Disk 5,000,000 - 7,000,000ns $0.05-$0.10 DRAM Flash Memory 0.2 50 - $/GB - credit: Computer Organization and Design 5th ed. by Patterson and Hennessy pg. 378 • • faster memory is more expensive 1 ns access time (i.e. 10-9 seconds), means you can access memory 1 billion (i.e. 109) times per second. CS 330 Spring 2019 55 Types of Memory Two Types of Random Access Memory • Static RAM (SRAM) - expensive, but faster - use for registers (typically 128-256 B for a processor) • Dynamic RAM (DRAM) - less expensive, but slower - used for RAM (typically 4-8 GB for a laptop) Goal • make it seem like you have large amounts of fast memory • approach store commonly used data and instructions in fast memory and store rarely used data and instructions in slow memory CS 330 Spring 2019 56 Types of Memory Registers • a small number that are directly manipulated by the processor Caches (L1, L2, L3) • stores the most commonly used data and instructions Primary Storage / Main Memory (a.k.a. RAM) • when you click on a program or file, you load it from the hard disk into main memory in order to access it Secondary Storage / Hard Disk (or even a network drive) • where programs or files are stored when they are not being used CS 330 Spring 2019 57 Memory Hierarchy Type Registers L1 Cache L2 Cache L3 Cache Main Memory Hard Drive Network Size in Bytes Access Time 100s 10,000s 100,000s 1,000,000s less than 1 1s 10s 10s 1,000,000,000s 1,000,000,000,000s virtually unlimited 100s 100,000s 100,000,000s Access time is measured in clock cycles, i.e. it takes less than 1 clock cycle to access data from a register. CS 330 Spring 2019 58 Memory Hierarchy fastest, most expensive, smallest capacity, closest registers • cache L1, L2, L3 • main memory • disk • network • off-site archive (tape, optical, etc.) • slowest, least expensive, largest capacity, farthest away If memory was like sheets of paper and clock ticks were like inches... CS 330 Spring 2019 59 CS 330 Spring 2019 60 Primary Memory • • • • Includes registers, caches, and RAM Often called RAM (Random Access Memory) because it can directly access any randomly chosen address in roughly the same amount of time Characteristics: faster, expensive and volatile (disappears when there is no power) Stores: - all or part of the software program being executed - the operating system programs that manage the operation of the computer - the data that the program is using CS 330 Spring 2019 61 Secondary Storage Includes - Hard drive (HD, or HDD) - Optical drive (CD/DVD drive, Blu-ray drive) - Flash drive (SSD, SD, USB flash drive) • Often called external memory or external storage because it is not directly accessible by the processor. • Characteristics: slower, cheaper and non-volatile (permanent) • Data and programs must be copied into primary storage before being the processor is able to access it directly. • CS 330 Spring 2019 62 Secondary Storage: Parts of a Hard Drive source: http://www.quora.com/Why-is-the-physical-sizeof-a-hard-disk-drive-larger-compared-to-memory-cards This video (0:00-2:15) shows the parts of a hard drive: https://www.youtube.com/watch?v=kdmLvl1n82U CS 330 Spring 2019 63 Secondary Storage: Hard Drives Basics How it works • Platter - A set of disks stacked on top of each other, each with a smooth magnetic coating on both sides of the disk. - RPM: rotations per minute, i.e. how fast the disk is spinning (5400 rpm and 7200 rpm are common) - Higher RPMs means the data can be accessed faster. • An actuator arm moves across the disk to position the read/write heads. • The read/write head changes the orientation of the magnetic field at a particular location to represent 0 or 1. CS 330 Spring 2019 64 Secondary Storage: Hard Drives Basics How it Works • This video (0:00-0:55) shows a hard drive in action, e.g. booting up, deleting a folder, etc: https://www.youtube.com/watch?v=9eMWG3fwiEU Some Parameters • Mean Time Between Failures (MTBF) - approximately one hundred thousand hours • Follows a bathtub curve - more likely to fail initially due to manufacturer error - more likely to fail later do to wearing out CS 330 Spring 2019 65 Secondary Storage: Hard Drive Reliability Annualized Failure Rate (AFR) • 0.7% – 0.8% for enterprise drives (what UW buys) • 1.25% for consumer drives (what is in your laptop) if it is replaced every 4 years (as of 2018). • It has been reduced from 1.95% in the past few years. • An annualized failure rate of 1.25% means on average roughly (1.0 - 0.0125)4 x 100% ≈ 95% of the drives would still be working after 4 years. • For some recent data see https://www.backblaze.com/blog/hard-drive-stats-for-2018/ • The company has over 100,000 hard drives. • It tracks failures and makes the data public. CS 330 Spring 2019 66 Secondary Storage: Solid State Drives Some Parameters • A alternative for a hard disk drive. • Pros - It is typically around 10x faster to access data - It typically lasts longer. - It has no moving parts that can wear out. • Cons - It is more expensive. - It can wear out sooner than a hard disk drive when writing a lot of data. - The data can fade over time. E.g. compare the Western Digital 1 TB hard disk drive vs. solid state drive at bestbuy.ca CS 330 Spring 2019 67 Secondary Storage: Optical Drives Basics How it works • E.g. CDs, DVDs, Blu-ray Disks. • Very similar to a magnetic hard drive, except only one surface (the bottom of the disc). • It uses a laser and a mirror rather than an actuator arm and a read/write head to read and write the data. • The smooth aluminum surface reflects light very well to represent a 0. • The laser creates pits on the surface (which scatters light) to represent a 1. • Slower and less capacity than a hard drive but they are inexpensive and durable. CS 330 Spring 2019 68 Secondary Storage: Hybrid Drives How it works • combine a - smaller SSD (which offers speed) with a - larger HDD (which offers large capacity at a small price) • Software on the hybrid drive tracks which files are used often and puts then on the SSD to achieve faster access for commonly use files - common strategy: optimize for the common case • Price and performance between that of an HDD and an SSD. E.g. Seagate Firecude CS 330 Spring 2019 69 Secondary Storage: Assessing Performance Some Key Measures • Price per gigabyte ⇒ hard disk drives - getting cheaper • Capacity ⇒ hard disk drives - getting larger • Speed: typically measured in MB/s (megabytes per second) or GB/s (gigabytes per second) ⇒ solid state drives • Durability (look at how long the warranty period is) ⇒ answers vary: some would say DVDs others solid state drives. CS 330 Spring 2019 70 Improving Performance Will Adding RAM Improve Computer Performance? • Answer: It depends. • If there is not enough RAM (primary memory) to hold all the program (and some of the data) then the OS will use secondary storage (the HDD or SSD). • Secondary storage is much slower to access so this strategy (if needed) will degrade system performance. • If the OS never has to use this strategy (because there is sufficient RAM) then adding more RAM will not help. • Solution: Monitor how much RAM you are using. If you are using near the limit (especially when you have a lot of programs running / windows open / tabs open / documents open) then adding more RAM will help. CS 330 Spring 2019 71 Specialty Computers Mainframes The main characteristics of a mainframe computer are • reliability (often with redundant parts) • ability to hot swap, e.g. replace a failing hard drive while the computer is still running and processing other transactions • ability to support many users (e.g. 100,000 users) and process their requests very quickly - e.g. processing bank transactions, processing credit card transactions, airline reservations • ref: https://en.wikipedia.org/wiki/Mainframe_computer E.g. IBM zSystems, Unisys ClearPath Libra, Hewlett-Packard NonStop, Groupe Bull's GCOS, Fujitsu BS2000. CS 330 Spring 2019 72 Specialty Computers Supercomputers • Main characteristic: fast floating point computations • Main Use: For complex calculations like simulations, weather forecasting and scientific computations • Speed measures in how many floating point operations (FLOPS) they can do per second. • The (currently) fastest supercomputer can do 200 PFLOPS, i.e. 200,000,000,000,000,000 FLOPS (more than a million times faster than our podium computer). • Use on the order of 100,000s of cores (processors). • The challenge is managing the data on all these cores. • ref: https://en.wikipedia.org/wiki/TOP500 CS 330 Spring 2019 73 Specialty Computers Microcontrollers • Main Characteristic: Simple processors with RAM and I/O capabilities that cost as little as $0.25. • Used in embedded systems, i.e. as part of a home appliances, office equipment, digital watches, traffic lights, robots, cars. • Today’s (i.e. 2014) car has the computing power of 20 personal computers, features about 100 million lines of programming code, and processes up to 25 gigabytes of data an hour. Source: http://www.mckinsey.com/insights/manufacturing/whats_driving_the_connected_car E.g. for some current examples of microcontrollers https://www.digikey.com/products/en/integrated-circuits-ics/embedded-microcontrollers/685 CS 330 Spring 2019 74 New Subtopic: Evolution of IT Infrastructure What is IT Infrastructure? • Definition: The shared technology resources that provide the platform for the firm’s information system applications. - It includes investment in hardware, software, and services, such as consulting, education, and training. • It has evolved in five stages since the 1950s. 1. Mainframe / Minicomputer 2. Personal computer 3. Client/server 4. Enterprise computing 5. Cloud and Mobile Computing • Each configuration is still around today in some form. Ref: Section 5.1 in the course text. CS 330 Spring 2019 75 Evolution of IT Infrastructure Stage 1: Mainframe / Minicomputer • Very expensive. • One centralized system. • Controlled by operators. • Owned by large corporations - e.g. banks, insurance companies • Later users interacted with the mainframe directly via terminals. • Minicomputers where cheaper and came along later. • A large university could have several minicomputers. CS 330 Spring 2019 Course text Figure 5-2 76 Evolution of IT Infrastructure Stage 2: Personal Computers • The computer is used by one person. • Initially cost roughly $4,000 (after adjusting for inflation). • Could do simple word processing, accounting and game playing. • Users were technically sophisticated. • Started off text-based but eventually evolved to a graphical user interface and a mouse. • The software market was eventually dominated by Microsoft. CS 330 Spring 2019 Course text Figure 5-2 77 Evolution of IT Infrastructure Stage 3: Client/Server • Two types of machines: clients (typically inexpensive) and servers (typically more expensive). • Clients: requests and uses services provided by the servers - e.g. students in this course • Servers: runs an application and provides it to others over a network, e.g. - Google searches, - streaming music on Spotify, - streaming video on YouTube, - lecture slides on Learn, - course selection on Quest. Course text Figure 5-2 CS 330 Spring 2019 78 Evolution of IT Infrastructure Stage 4: Enterprise Computing • Link together different networks and applications throughout the firm. Sometimes called integration. • Link different types of hardware. Course text Figure 5-2 • Link different type of data formats. • Use internet protocols for the network. • Create standards for the data format. • Use software to translate between the various formats. Course text Figure 5-2 CS 330 Spring 2019 79 Evolution of IT Infrastructure Stage 5: Cloud and Mobile Computing • Extension of client/server but rather than a server have a shared pool of resources. • The resources include: - a cluster of computers - software (e.g. gmail, google docs) - storage • Can sell software applications as a service delivered over the internet - E.g. Microsoft’s Office 365 Course text Figure 5-2 CS 330 Spring 2019 80 Evolution of IT Infrastructure Client/Server Architecture • This is the most common form of distributed computing architecture but it is not the only form. Peer to Peer (P2P) • Every machine in the network consumes and provides service(s) at the same time. • E.g. torrent sites, you can download files from other people’s computers and they can download files from yours. • Hard to control, there is no central computer “in charge.” • Started out in software/game/music/video piracy but can also be used to download updates. CS 330 Spring 2019 81 New Subtopic: Drivers of Technology Drivers of Technology • The evolution in IT infrastructure has been driven by the following five drivers of technology. 1. 2. 3. 4. 5. • Moore’s Law The Law of Mass Digital Storage Metcalfe’s Law Declining Communications Costs The Creation of Technology Standards Why is this important: When designing a product that will be available in 18 months, consider what the hardware performance will be like in 18 months. Ref: Section 5.1 in the course text. CS 330 Spring 2019 82 Drivers of Technology 1. Moore’s Law • The number of transistors that can fit on a chip doubles every 18 months. • This law has been interpreted as: - the power of microprocessors doubles every 18 months, - computing power doubles every 18 months, - the price of computing falls by half every 18 months. • The trend has been true since 1959 but as of 2010-2013 it looks to be slowing down. • The graph on the next slide shows the number of transistors and the millions of instructions (MIPS) a processor can execute. • The trend also causes the cost of a single transistor to decrease. CS 330 Spring 2019 83 Drivers of Technology 1. Moore’s Law Source: course text Figure 5-4 CS 330 Spring 2019 84 Drivers of Technology 1. Moore’s Law has Contributed to Decreasing Costs Source: course text Figure 5-5 CS 330 Spring 2019 85 Drivers of Technology 2. Law of Mass Digital Storage • Observation: The amount of digital information is roughly doubling every year. • The growth is exponential. • Since 1990, the storage capacity for hard drive has increased at a rate of 65% per year. • The cost of storing a gigabyte is falling at an exponential rate, being cut in half every 15 months rate of 100% per year. • The textbook literally says “falling at an exponential rate of 100% per year” CS 330 Spring 2019 86 Drivers of Technology 2. Hard Disk Drive Capacity Observation: storage capacity grows exponentially source: 5th edition of course text CS 330 Spring 2019 87 Drivers of Technology 2. Data Storage per Dollar The amount of data that can be stored per dollar doubles every 15 month. Source: course text Figure 5-6 CS 330 Spring 2019 88 Drivers of Technology 3. Metcalfe’s Law Observation: The value of a network grows exponentially as a function of the number of network members. Image Source: http://www.collabworks.com/Main_WhatIsOpenIT/Metcalfe.htm CS 330 Spring 2019 89 Drivers of Technology 4. Declining Communication Costs Communication costs have been declining. The lower the cost of communication ⇒ the more reliance on it to conduct business. Source: course text Figure 5-7 CS 330 Spring 2019 90 Drivers of Technology 5. The Creation of Standards The creation of technology standards allows competition, increase interoperability and reduces costs. Some examples • ASCII and Unicode standards for representing alphabets • The Portable Operating System Interface (of Unix and Linux) • TCP/IP to interconnect different networks (i.e. the internet) • Ethernet and Wi-Fi to connect devices to the internet. • HTML and the World Wide Web for the formatting and displaying of text, pictures and video. CS 330 Spring 2019 91 New Subtopic: Infrastructure Components Drivers of Technology ⇒ Infrastructure Components • There are seven (major) components of IT infrastructure. • The choices must be coordinated - i.e. a choice in one component affects the options available in the other components. 1. 2. 3. 4. 5. 6. 7. Computer Hardware Platforms Operating System (OS) Platforms Enterprise Applications Data Management and Storage Network and Telecom Platforms Internet Platforms Service Platform Ref: Section 5.2 in the course text. CS 330 Spring 2019 92 1. Computer Hardware Platforms Two Varieties of Machines • Client machines: desktops, laptops, tablets and smart phones • Server machines (i.e. specialized high-end computers) - could be a single mainframe or - could be a large number of rack servers or blade servers (thin, modular computer, without a dedicated keyboard or monitor) image source: https://www.dell.com/en-ca/work/shop/povw/poweredge-r230 CS 330 Spring 2019 93 1. Computer Hardware Platforms • • Companies like Google and Facebook have server farms, collections of 100,000s of blade servers stored in racks in large, windowless, air-conditioned rooms. This design takes up the least amount of space. image source: https://www.computerhope.com/jargon/s/servfarm.htm CS 330 Spring 2019 94 2. Operating System (OS) Platforms • • • • Definition: The OS manages a computer’s hardware and software resources: processor, memory, peripherals, files, apps For laptops and desktops (in Q1 2013) - 91% of PCs ran Microsoft Windows - 6.5% ran macOS For smart phones (in Q1 2014) - 71% ran Android (bought by Google, based on Linux) - 19% ran iOS - 8.1% Windows Phone. For servers (in Q1 2013) - 65% of servers in the US ran Unix or Linux - 35% ran Windows CS 330 Spring 2019 95 3. Enterprise Applications (EA) • • • • Role: Computer programs used by organizations that integrate business applications and services across the many different departments. E.g. a central database and programs used by Sales and Marketing, Finance and Accounting, Human Resources, Manufacturing and Production. E.g. Quest at UW Previously departments had their own databases and it was hard to combine the data from all of them. Currently, the largest suppliers of enterprise software are SAP, Oracle, IBM and Microsoft. CS 330 Spring 2019 96 4. Data Management and Storage Database Management System (DBMS) • Role: organize and store the company’s data • Open source MySQL is available free of charge, and now supported by HP and most consulting firms. • Database server: you might need a server to run your DBMS, particular if it is to be accessible by several machines or even through the Internet. Currently, the leading database software providers are Oracle, IBM (DB2), Microsoft (SQL Server), and Sybase. • More on databases in Chapter 6 of course text. CS 330 Spring 2019 97 4. Data Management and Storage Data storage • Major types: hard disk drives, tape drives, cloud-based storage • Can use Redundant Array of Independent Disk (RAID) to improve hard disk performance ... Currently, the market is dominated by Western Digital, Seagate and Toshiba. • Tape drives are good for remote offsite backup (archiving) due to its portability. Currently, the market is dominated by IBM, HP, and Sony. • Cloud-based storage will be discussed later ... CS 330 Spring 2019 98 4. Data Management and Storage RAID Storage Architecture • Using many hard drives to achieve improvements in 1) reliability, 2) availability, 3) performance and 4) capacity • Currently 7 different types: RAID 0 - RAID 6. • Each achieves a difference balance of reliability, availability, performance, and capacity. • There are tradeoffs: e.g. having multiple copies of a file increases reliability (if one copy gets damaged) but decreases overall capacity. CS 330 Spring 2019 99 4. Data Management and Storage RAID Technique: Disk Mirroring • Store a copy of the data on another disk • Improved Reliability: if one disk fails, use the other • Improve Read Performance: if one disk is busy, read the data from other disk • Decreased Capacity: using twice as much space to store a file CS 330 Spring 2019 100 4. Data Management and Storage RAID Technique: Disk Striping • store sequential data on alternating disks, e.g. block 1 on disk 1, block 2 on disk 2, block 3 on disk 1, block 4 on disk 2, block 5 on disk 1, block 6 on disk 2, ... • Improve Performance: bandwidth twice as fast as a single disk • Decrease Reliability: file is corrupted if only one of the two disks fail. For RAID 0: only striping is used. For RAID 1: use mirroring (and possibly some striping) CS 330 Spring 2019 101 4. Data Management and Storage RAID Techniques: Parity • many different types of parity • even parity: add either an extra 0 or an extra 1 at the end of a sequence of bits in order to ensure that the number of 1’s in the sequence is even. • 1001000 has an even number of 1’s so add a 0 EvenParity(1001000) = 10010000 • 1001001 has an odd number of 1’s so add a 1 EvenParity(1001001) = 10010011 • 1001011 has an even number of 1’s so add a 0 EvenParity(1001011) = 10010110 • Parity can detect if a single error (or an odd number of errors) has occurred in the storage of the data CS 330 Spring 2019 102 4. Data Management and Storage Data Backup • Online backup (hot backup) - Instant real-time backup - Protects against one HD failure Examples RAID 1 and RAID 5 • Offline backup (archive) - Done at the end of the day, copy and ship to a different location - Example: backup to tape drive - Protect against complete failure, but can only recover data from one day ago (or more). - Full vs. incremental backup CS 330 Spring 2019 103 5. Network and Telecom Platforms Network Hardware • network: a group of computers linked together to share resources • hub: any data sent to a hub is sent to all connected devices • bridge: only one input and one output, looks at data and decides whether to forward it across the bridge • switch: has many ports, looks at data and decides which port to send it out on • router: like switch but works on many more network protocols • firewall: hardware or software (or both) put between the internal network and the internet to prevent outsiders from obtaining unauthorized access CS 330 Spring 2019 104 5. Network and Telecom Platforms Network Hardware • Computers have a Network Interfacing Card (NIC) - e.g. typically Ethernet, Wi-Fi or Bluetooth Leading network hardware providers are Cisco, Alcatel-Lucent, and Juniper Networks • Network Operating Systems (NOS) - manages features such as users, groups, file sharing, printer access, security NOS include Microsoft Windows Server, Linux, Cisco IOS and Novell NetWare CS 330 Spring 2019 105 5. Network and Telecom Platforms Network Hardware • Also includes telephone and cell phone services, telephones, cell phones, telephone systems (PBXs, i.e. the telephone equipment that sets up the extensions and voicemail for the campus), automated attendants, call centre software, fax machines (might be combined with the photocopier and scanner) Telecomm Service vendors include Rogers, Bell, Telus and Shaw, plus regional carriers. CS 330 Spring 2019 106 6. Internet Platforms Internet Service Provider (ISP) • provides the link from your home or company network to the rest of the internet • they own the telephone line and cable that runs to your home or office (i.e. the last mile) • many smaller regional ISPs lease the network from the ISPs and provide their own customer service, tech support etc. Major Canadian ISPs are Rogers, Bell and Shaw CS 330 Spring 2019 107 6. Internet Platforms Website Development • Can hire others or create and maintain it yourself. • Simple websites use languages like html (hypertext markup language) and JavaScript • Simple websites are typically static (i.e. the site does not change unless a person edits the web page files) • More sophisticated websites are dynamic (i.e. when a client makes a query, a web page is created using a combination of scripts and database queries in order to get the most recent and relevant information. • Many of the big players are using artificial intelligent to learn what to present to you. CS 330 Spring 2019 108 6. Internet Platforms Website Development • E.g. when you click on a YouTube video, besides providing the video, the webpage also lists how many views it has had, how many likes, how many dislikes, the latest comments, etc. • Check back later and these values will change (for a popular video) i.e. they were created dynamically Programming languages for dynamic web pages include: PHP by Rasmus Lerdorf, ASP.NET (Active Server Pages) by Microsoft, JSP (JavaServer Pages) and Java by Oracle. CS 330 Spring 2019 109 6. Internet Platforms Web Hosting You can create your own or use a web hosting service. In order to create your own you need... • a server, i.e. powerful computer(s) • a domain name and an IP address for your website (e.g. see online tools nslookup and whois) • a web server, i.e. software that the servers runs to accept the requests that web browsers makes. The two most common web servers are 1. Apache by the National Center for Supercomputing Applications (NCSA) (roughly 60% market share) 2. Internet Information Services (IIS) by Microsoft (roughly 20% market share) CS 330 Spring 2019 110 6. Internet Platforms More Information about Setting up your own Website • Get a domain name and map it to your website: http://www.thesitewizard.com/archive/registerdomain.shtml • Set up your own website: http://www.thesitewizard.com/gettingstarted/startwebsite.shtml CS 330 Spring 2019 111 7. Service Platform • • A Service Platform is a collection of services that enable the information system to function, i.e. consulting and system integration services Most firms cannot develop their systems without significant outside help including - identifying which parts of the business can be improved by using IT - ensuring new systems integrate with legacy systems - maintenance - training - security CS 330 Spring 2019 112 A Few Comments About IT Infrastructure The term server could refer to hardware (the server machine) or software (the server application) or both • Some IT components may be bundled - Machines might come with preinstalled with OS’s. - Server OS = NOS. - the Enterprise Application platform (coordinates activities across many departments), Data Management platform (database and storage) and Internet platform may dictate the server machines needed - Some EAs are bundled with their own DBMS - Some EAs need to run on a server machine - Some EAs are bundled with a service package (integration, maintenance and training) • CS 330 Spring 2019 113 New Subtopic: Contemporary H/W Trends Ref: Course text section 5.3 Key Topics Eight contemporary hardware trends and two future ones 1. The mobile digital platform 2. Consumerization of IT and BYOD 3. Grid computing 4. Virtualization 5. Cloud computing 6. Green computing 7. High-performance and Power-saving processors 8. Autonomic Computing 9. Near Future: Nanotechnology & Quantum Computing CS 330 Spring 2019 114 Hardware Technology Trends Trend 1: Mobile Digital Platform • Increasingly, internet access happens via highly portable devices: smartphones and tablets • Smart phones are taking over the functions of many other electronic devices, e.g. GPS. • Compare this old Radio Shack ad to a smart phone http://www.trendingbuffalo.com/life/uncle-stevesbuffalo/everything-from-1991-radio-shack-ad-now/ • The integration of voice (the telephone network) and data (computers) bring together two historically distinct global networks. CS 330 Spring 2019 115 Hardware Technology Trends Trend 2: Consumerization of IT and BYOD • BYOD = Bring Your Own Device (to work) • Allow employees to bring their own device. • Allow employees to use software services, such as Gmail, Google, Facebook and Twitter. • Key trend: consumerization of IT: technology that was meant for the consumer moves into the business world. • Companies must consider - what can be used and what cannot be used - security, - software availability, - ownership - privacy CS 330 Spring 2019 116 Hardware Technology Trends Trend 3: Grid Computing • Key Observation: processors are idle most of the time - e.g. System Idle Process in Windows • Idea: simulate a supercomputer by organizing the computational power of a network of computers • may be geographically remote, have different OS, etc. • Benefit: capable of working on problems that require shortterm access to large computational capacity • called grid computing • requires software to control and allocate resources on the grid • E.g. SETI: http://setiathome.berkeley.edu/ CS 330 Spring 2019 117 Hardware Technology Trends Trend 3: Grid Computing Limitations • Key Observation: some tasks can be broken up into smaller independent tasks (parallelized) - e.g. find all the occurrence of a keyword in a collection of documents • Key Observation: other tasks cannot be parallelized. - e.g. calculate the Fibonacci number F(x) for some large x, where F(n) = F(n-1) + F(n-2) • Limitation: only tasks that can be parallelized can take advantage of grid computing. CS 330 Spring 2019 118 Hardware Technology Trends Trend 4: Virtualization • Virtualization: the creation of a virtual (rather than actual) version of something, such as a hardware platform, operating system, a storage device or network resources. • many looks like one: e.g. many smaller hard drives can be configured to look like one large one. • one looks like many: e.g. a single powerful server can be configured to look like many smaller computers - hook-up dozens of displays and keyboards to it - the different virtual machines can even be running different OSs (e.g. Windows 7, Windows 10, macOS and Linux) CS 330 Spring 2019 119 Hardware Technology Trends Trend 4: Virtualization In a computer • the hardware is managed by the operating system (which is software). • and the application software interacts with the hardware through the operating system (rather than interacting directly with the hardware). Application Software Operating System Hardware e.g. Chrome Windows 10 Dell Laptop CS 330 Spring 2019 120 Hardware Technology Trends Trend 4: Virtualization What if you wanted to run a Linux app in a Windows computer? • Idea: Create software that simulates hardware which the Linux OS could run on. • Run the Linux app in this environment. • This setup would be running the Linux App in a virtual Linux environment which would be running in an actual Windows environment. • Example: VMware, VirtualBox CS 330 Spring 2019 Linux App Linux OS Virtual Hardware Windows 10 Hardware 121 Hardware Technology Trends Trend 4: Virtualization Benefits • Better resource management (when using one resource to look like many) by using more of the processor’s capacity, less space, less expense, less energy • Support legacy applications by running older versions of the OS • Testing: can test software on a variety of virtual configurations CS 330 Spring 2019 122 Hardware Technology Trends Scenario: Meeting Peak Demand • Example: imagine accounting system handles 10,000 transactions per day with a peak demand of 20,000 during tax season. • There are three technical options are available for this business. 5.1 Load balancing 5.2 Cloud computing 5.3 On-demand computing CS 330 Spring 2019 123 Hardware Technology Trends 5.1 Load Balancing • the work load is evenly distributed on many servers • creates a high availability computing system • e.g. 4 servers each handles 6000 transactions per day, each operates at between 40% - 85% capacity • Can deal gracefully with - Crashes - Upgrades - Seasonal peak demands • Average down time is drastically reduced. • Downside: must purchase and maintain hardware (the two extra servers) that is rarely used CS 330 Spring 2019 124 Hardware Technology Trends Trend 5.2: Cloud Computing • The purchase, as a service from another company, of hardware / programming tools / software that is accessed over the internet. • Examples - hardware: Amazon Web Services (AWS) - software: Microsoft 365 (Cloud Version of Microsoft Office) • for Microsoft 365 you pay a monthly subscription fee • a particular form of cloud computing is ... CS 330 Spring 2019 125 Hardware Technology Trends Trend 5.3: On-Demand (Utility) Computing • A form of cloud computing • Firms off-load peak demand for computing power to remote, large-scale data processing centers • Firms pay only for the computing power they use, as with an electrical utility • Excellent for firms with spiked demand curves caused by seasonal variations in demand, e.g. on-line shopping website on Black Friday • Saves firms from purchasing excessive levels of infrastructure CS 330 Spring 2019 126 Hardware Technology Trends Trend 5: Pros and Cons of the Cloud • Pros: - Cost: a less expense way to cover peak demand - Convenient: use as needed - Flexible: not fixed to one brand of computers, usage may easily increase or decrease • Cons: - Privacy: less control over it - Liability: Google Cloud went down 6 times in 1 year - Legal: must comply with Canadian privacy laws - Loss of control • Not for mission critical system CS 330 Spring 2019 127 Hardware Technology Trends Trend 6: Green Computing • Green computing is the design and use of computer systems in a way that minimizes their impact on the environment. - reduce power consumption - reduce e-waste (old cell phone, old laptops) - In Canada: https://www.recyclemyelectronics.ca/ • But must sanitize (i.e. erase data): - https://dban.org/ (for HHDs) - https://www.bleachbit.org/ Trend 7: High-performance and Power-saving processors • Multicore processors where cores can disconnect from power when not in use • Energy efficient designs (fewer transistors) CS 330 Spring 2019 128 Hardware Technology Trends Trend 8: Autonomic Computing • Computer systems have become so complex that the cost of managing them has risen - a significant portion of a company’s IT budget is spent preventing or recovering from system crashes - the most common cause is operator error • Autonomic computing is an industry-wide effort to develop systems that are capable of self-management: i.e. selfconfigure, self-protect, self-optimize and self-heal themselves • e.g. P2P (peer to peer) systems like Skype or the internet. - If nodes go down, network still functions. CS 330 Spring 2019 129 Hardware Technology Trends Scenario: Business Rising • Chris needs to set up 3 different servers: - An Apache web server on Linux - A MySQL DBMS server on Windows 10 - A very old accounting application on MS-DOS • It is estimated that a small tower server ($5000) can handle the workload of one category of service, while a medium size tower server ($10,000) can handle all the workload • How should this situation be handled? CS 330 Spring 2019 130 Future Hardware Technology: Nanotechnology What is it? • Nanotechnology: Science of using nanostructures to build devices. • A nanometer is a billionth of a meter and is the size of a few atoms or a small molecule. • Nanotechnology uses individual atoms and molecules to create computer chips and other devices • Presently a transistor is about 14 nanometers wide and made mostly of silicon (roughly 70 silicon atoms wide). • The limit with this approach seems to be 5 nanometers. • Looking for new materials and ideas to make smaller transistors. CS 330 Spring 2019 131 Future Hardware Technology: Quantum Computing What is it? • Classical Bit vs. Qbit • Qbit is a superposition of 0 and 1 - 2 Qbits need four numbers to specify the state. - 3 Qbits need eight numbers to specify the state. ⁞ - n Qbits need 2n numbers to specify the state. • Not a universal replacement for classical computers. It minimizes the number steps needed to arrive at result for some problems, e.g. factoring. • How Does a Quantum Computer Work? by Veritasium. https://www.youtube.com/watch?v=g_IaVepNDT4 CS 330 Spring 2019 132 New Subtopic: Contemporary S/W Trends Ref: Course text section 5.4 Key Topics We will look at four contemporary software platform trends 1. Linux and open-source software 2. HTML and HTML5 ▪ Java not covered in this course 3. Web services and service-oriented architecture 4. Software outsourcing and cloud services ▪ mashups and apps not covered in this course CS 330 Spring 2019 133 Software Technology Trends Trend 1: Open-Source Software • Open-source software is source code that is publicly available and that can be modified and redistributed by anyone for any purpose. • Different standards exist for open-source software e.g. - Free Software Foundation (FSF) started in 1985 - Open Source Initiative (OSI) started in 1998 • Originally “free” meant free to inspect and modify, now it more likely means available at no cost. • Often developed and maintained by a worldwide network of programmers and designers under the management of user communities. CS 330 Spring 2019 134 Software Technology Trends Trend 1: Open-Source Software • A company (e.g. Google) may fund an open source challenger (e.g. Firefox) to another company’s product (e.g. Microsoft’s Internet Explorer). • A company (Sun Microsystems) may make a product they no longer support (StarOffice) open source (now called OpenOffice, which is an competitor to Microsoft Office). Examples Linux is the most widely used open-source operating system. Other examples include Apache HTTP Web server, MySQL database, and the programming language Python. CS 330 Spring 2019 135 Software Technology Trends Trend 1: Open-Source Software: Costs and Benefits What are the benefits of open-source software? • lower cost • more security, less bugs - many people inspect code • flexibility - may modify the code • transparency - know exactly what the code does • not reliant on a single vendor What are the drawbacks of open-source software? They are less likely to • have easy of use • meet customer needs • be compatible with your particular hardware • have support CS 330 Spring 2019 136 Software Technology Trends Trend 2: HTML and HTML5 The format for displaying information on the web. • HTML stands for hypertext markup language - hypertext refers to text that contains links to other text that you can access quickly. - markup language refers to a way of annotating and presenting text, i.e. bold, italics, titles, subtitles etc. • HTML originally did not support audio and video and so you needed third party plugins • The latest version (HTML5) supports audio and video. The book mentions Java which is a common programming language that we will not discuss in this course. CS 330 Spring 2019 137 Software Technology Trends Trend 3: Web Services and SOA • Web services is software components that exchange information with each other using web communication standards and languages. • The web provides well-known and well-supported standards for presenting information. • Web browsers use Hypertext Markup Language (HTML) which specifies how text, graphics etc., is displayed in a browser. • A generalization of HTML is eXtensible Markup Language (XML) which can also specify what the data means. CS 330 Spring 2019 138 Software Technology Trends Trend 3: Web Services and SOA • XML provides a format for (possibly different programs) to exchange information Ref: Section 5.4 in the course text. E.g. it could specify that $16,800 represents the price in Canadian dollars. • Two different systems (possibly at different companies, with different operating systems and different programs) can speak a common language. • CS 330 Spring 2019 139 Software Technology Trends Trend 3: Web Services and SOA • The use of web services to achieve integration among different applications and platforms is referred to as service-oriented architecture (SOA) • SOA is a cost effective way to adopt to new technology and to integrate different applications • E.g. a car rental company (say Dollar Rent A Car) can interact with other companies’ web site (such as a airline, a tour company etc.) by converting its information to the language of the web. • Now customers can book a flight, rent a car and book a tour all at the same website. CS 330 Spring 2019 140 Software Technology Trends Source: course text Figure 5-10 CS 330 Spring 2019 141 Software Technology Trends Trend 4: Software Outsourcing Changing sources of outsourced software: • Purchase customizable generic software package e.g. SAP and Oracle-PeopleSoft • • Contract custom software development or maintenance to a third party which could even be located in another country. - started off as maintenance and data entry - now also includes developing new software Use software available from the cloud, called software as a service (SaaS) e.g. Salesforce.com for customer relations management CS 330 Spring 2019 142 New Subtopic: Management Issues Ref: Course text section 5.5 Subtopics We will look at managing IT infrastructure 1. Dealing with change 2. Management and governance 3. Infrastructure investments a) Total cost of ownership b) Competitive forces model CS 330 Spring 2019 143 Management Issues 1. Dealing with Change • firms need to be able to grow (or shrink) • scalability: ability to expand to serve a larger (or smaller) number of users without breaking down 2. Management and Governance Who is responsible for the IT infrastructure? • each department (decentralized) • one overall IT department (centralized) • mixture of both CS 330 Spring 2019 144 Management Issues - TCO 3a) Infrastructure investments: Total cost of ownership • There are different ways to estimate the total cost of ownership (TCO). • We will use the following: the acquisition costs for hardware and software represent 20% of the TCO. - It could range from 20-35% depending on what is bought. - TCO is like an iceberg (only see part of it). • Can break down TCO into - Capital expenditure: fixed, one-time cost to acquire system. - Operational expenditure: ongoing expenses for running it. • The table on the next slide lists the various components that contribute to the TCO. CS 330 Spring 2019 145 Management Issues - TCO Component Cost Hardware Computers, cables, terminals, storage, printers Software Operating systems, applications Installation Staff to install computers and software Training Time and people for both developers and end users Support Ongoing technical support and help desks Maintenance Upgrades for hardware and software Infrastructure Networks and backup units Downtime Lost productivity during system failures Space and Energy Real estate, computer furniture and utility costs for housing and powering the technology Source: course text Table 5-3 CS 330 Spring 2019 146 Management Issues - TCO 3a) Infrastructure investments: Total cost of ownership To get a sense of actual costs we will look at this report http://www.nashnetworks.ca/pdf/TCOofIT.pdf from Nash Networks http://www.nashnetworks.ca/index.php The report is 10 years old but gives a good sense of the issues. There are two types of costs 1. direct costs which include hardware, software, printer paper, ink, internet costs 2. indirect (or hidden) costs which include downtime, poorly trained users, user mistakes, using computer for nonbusiness purposes, users installing accessories CS 330 Spring 2019 147 Management Issues - TCO 3a) Infrastructure investments: Total cost of ownership Direct Costs for a PC over a 3 - 4 Year Lifetime Phase of Lifecycle Purchase (computer; printer/scanner/fax; cables, printer ink; paper) Deployment (setup, staff downtime) Operations (admin, downtime) Support Retirement Total Cost Approximate Annual Costs CS 330 Spring 2019 Cost $3,090 $500 $1,040 $1,680 $630 $6,940 $2,000 148 Management Issues - TCO 3a) Infrastructure investments for Large Companies in 2007 IT budget Average IT operating budget as % of revenue Average IT capital budget as % of revenue Average IT operating budget per employee 5.5% 2.5% $9,100 IT spending by category Hardware Software Support (staff, external providers, contractors) Telecommunications 26% 20% 41% 13% These costs are for companies with more than 2,500 employees. CS 330 Spring 2019 149 Management Issues - TCO 3a) Infrastructure investments for Large Companies in 2007 • One way to reduce costs is how the computers are managed. • There are two extremes 1. Unmanaged: users can install any application and change any setting. 2. Locked and well-managed: users cannot install software or change critical settings. There are policies in place to restrict what an employee can do. • The more a computer is managed, the less the size of the indirect costs. • See the graph on the next slide (again for large companies in 2007) for details CS 330 Spring 2019 150 Management Issues - TCO CS 330 Spring 2019 151 Management Issues 3b) Infrastructure investments: Competitive forces Consider six factors when deciding how much to spend on IT. 1. Demand for services: What services do you provide (to customers, suppliers and employees)? Are their needs being met? 2. Business strategy: What new capabilities will be needed to achieve these goals? 3. IT strategy: How will IT help achieve these goals? 4. IT assessment: Is your IT infrastructure too old or too new? 5. Competitor’s Services: What do your competitor firms offer customers, suppliers and employees? 6. Competitor’s IT Investments: How much have they spent? CS 330 Spring 2019 152 Topic 2 – Databases Key Concepts • flat files vs. relational databases • attributes, records and tables • primary keys, candidate keys, foreign keys • schema and data independence • database design and normalization • data warehouses, data marts, online analytical processing, data mining References • Course text, Chapter 6 Databases and Information Management CS 330 Spring 2019 153 Flat files vs. Databases How to Store Data Digitally • Two ways to store data digitally - in a flat file (i.e. as one large table) - in a relational database (i.e. many smaller tables) • Key Question: What are the problems with storing data in a traditional file environment? i.e. why doesn’t UW store all its student information in one gigantic Excel spreadsheet? CS 330 First Name Last Name Student ID Course Grade Chris Lee 20158888 CS115 83 Chris Lee 20158888 CS116 78 Chris Lee 20158888 CS230 81 Chris Lee 20158888 CS234 80 Spring 2019 154 Storing Data in a Flat File Benefits of a Flat File(s) • simple to create and administer, can use a spreadsheet • easy to understand • all the data is stored in one place • easy to sort or filter information • good for one person processing a small amount of data Key Question: how are the needs of storing data for one person different from the needs of storing data for a large company? Goal: accurate, timely, relevant information CS 330 Spring 2019 155 Storing Data in a Flat File Limitations of a Flat File • lack of security: each person has access to the whole file or none of it - cannot give different people different views of the data - cannot control who has access to what data ▪ want payroll info to be visible only to Payroll Dept - do no know what they have changed • lack of concurrent access: only one person can modify the same file at a time - concurrent access means multiple people (or programs) can access the file at the same time CS 330 Spring 2019 156 Storing Data in a Flat File Limitations of a Flat File • lack of data integrity - redundancy: the same data is stored in many places - e.g. if the is an update or an error is discovered ⇒ have to search whole file(s) and change it in many places - it is easy to miss a place, therefore … - redundancy leads to inconsistency: different values for the same attribute • lack of scalability: the file can become very large and then searching the whole file becomes much slower CS 330 Spring 2019 157 Storing Data in a Flat File Limitations of a Flat File • program-data dependence: the file format and the program that processes the data are tied together (strongly coupled) - change one (i.e. program or file format) and you must change the other - difficult to handle different user preferences ▪ Chris uses MS Word ▪ Kelly uses Adobe Acrobat • lack of custom formats: cannot display info in different formats for different people, e.g. for sales data - Regional Directors want it organized by region - Product Managers want it organized by product CS 330 Spring 2019 158 Storing Data in a Flat File Limitations of a Flat File(s) If each department has its own copy of the data and its own programs, there are more challenges. • Each department will be tempted to develop its own processes which use it own subset of data files (see next slide). • The company will not have - a single solution for security, backup/crash recovery - centralized data administration - a high amount of data sharing and availability • E.g. on the next slide, different departments… - use different programs and different subsets of data. - duplicate data (e.g. A, B) from a master file (i.e. they are using derivative files) CS 330 Spring 2019 159 Storing Data in Many Derivative Files Limitations of a Flat File(s) Source: Course text Figure 6-2 CS 330 Spring 2019 160 Databases and DBMS What are Database and DBMS? • Database: a collection of related information stored in a structured form - The structure (think column heading of the tables) is described by schema • Database Management System (DBMS): a collection of programs that manipulate a database - set up the storage structures - perform updates on the data - process queries (requests for data retrieval) from applications and users • The DBMS provides a central point of access to the data CS 330 Spring 2019 161 Databases and DBMS Why Use Databases and DBMSs? • They provide data integrity - reduce data redundancy and inconsistency • They provide data independence from the program - i.e. the data is stored in a standard format • They provide security, concurrent access and crash recovery - enable data sharing and high availability • The provide centralized data administration - for backing up and for access • They reduce application development time because standard software packages exist in the market CS 330 Spring 2019 162 Data Models Common Types of Databases • There are different types of database models based on how you view (or structure) the data - Network model: model data as a network - Hierarchy model: model data as a tree - Relational model: model data as a table - Object-oriented model: model data as objects • The most popular model is the relational database. CS 330 Spring 2019 163 Relational Databases Overview • First developed in the 1970s • The most widely used type of database - especially in business oriented transaction processing - most businesses use them in some form or another • Key Observation: information is related - e.g. for customers, purchases, products, suppliers customers make purchases purchases list products products have suppliers CS 330 Spring 2019 164 Relational Databases Structure • Attribute (property an entity might have) or field: a column - E.g. student number, given name, family name - Attribute values must be atomic ▪ atomic: a single value such as a number, a character, a string, a date, etc., e.g. CS330 ▪ non-atomic: a list of values e.g. a list of completed courses (could be many) - Domain: set of allowed values for an attribute ▪ e.g. {CS115, CS116, CS230, CS234, CS330, CS338 ...} ▪ e.g. positive integer (for Student ID) • Record or tuple: a row, i.e. a collection of attribute values - all rows (in a table) have the same number of values - each row is distinguishable from the other rows CS 330 Spring 2019 165 Relational Databases Student ID First Name Last Name 20158888 Chris Lee 20158889 Terry Lee 20158890 Terry Dodd Intro to Relational Databases Video row: record or tuple column: attribute or field Student ID Course Grade 20158888 CS115 83 20158888 CS116 78 20158888 CS230 81 http://www.youtube.com/watch?v=eXiCza050ug It talks about MS Access but what is says is true of all relational databases. CS 330 Spring 2019 166 Relational Databases Structure • On the previous slide the fields are: Student ID, First Name, Last Name, Course, and Grade. • A table (typically stored in a file) is a group of records - it relates (connects) rows to columns • A relation is a set of rows (tuples or records) - i.e. it is a set of related entities. - In mathematics a relation is a connection between two entities, - e.g. the Student ID, First Name and Last Name in a row are all related (i.e. associated with a particular student), - e.g. the Student ID, Course and Grade are all related in the second table. • A relational database is a collection of tables (relations) CS 330 Spring 2019 167 Relational Databases Structure Course text, Figure 6-4 In the table above, • the attributes are: Supplier_Number, Supplier_Name, Supplier_Street, Supplier_City, Supplier_Province, Supplier_PC (i.e. the column headings). • The records are the rows, i.e. the attributes of each supplier. • Each row represents an entity, i.e. a person, place, thing, event about which information is maintained. CS 330 Spring 2019 168 Relational Databases Structure Course text, Figure 6-4 In the table above, • the attributes are: Part_Number, Part_Name, Unit_Price, Supplier_Number (i.e. the column headings). • The records are each row, i.e. the attributes of each entity. • In this table, the entities are parts. CS 330 Spring 2019 169 Relational Databases: Keys Primary Keys • Primary key: a minimum set of attribute(s) whose values are unique in each row of a table • Used to uniquely identify (and retrieve) individual entities (rows), i.e. no two rows in a table have the same primary key. • In the table below Student ID uniquely identifies a student whereas First Name does not. • Sometimes a primary key Student ID First Name Last Name is created and assigned in 20158888 Chris Lee order to ensure that each 20158889 Terry Lee entity has a unique key 20158890 Terry Dodd (e.g. Student ID). ⁞ CS 330 Spring 2019 ⁞ ⁞ 170 Relational Databases: Keys Primary Keys Both tables are from the course text, Figure 6-4 In the table above, the primary key is the Part_Number. In the table below, the primary key is the Supplier_Number. CS 330 Spring 2019 171 Relational Databases: Keys Primary Keys • Sometimes it takes two or more attributes to uniquely identify a row, i.e. to create a primary key. • This combination of attributes is called a composite key. • E.g. in the table below, the pair (Student ID, Course) uniquely identifies each row whereas Student ID by itself does not. CS 330 Student ID Course Grade 20158888 CS115 83 20158888 CS116 78 20158888 CS230 81 ⁞ ⁞ ⁞ Spring 2019 172 Relational Databases: Keys Primary Keys • Sometimes there may be more than one key which could be used as the primary key. • These keys are called candidate keys and one of them is designated as the primary key. - E.g. at UWaterloo both your student number and your UW userid are unique. CS 330 Student ID First Name Last Name userid 20158888 Chris Lee c17lee 20158889 Terry Lee t47lee 20158890 Terry Dodd tdodd ⁞ ⁞ ⁞ ⁞ Spring 2019 173 Relational Databases: Keys Foreign Keys • One of the goals of good database design is to minimize redundancy. • E.g. in the table below the “Chris”, “Lee” and “20158888” are repeated many times. CS 330 First Name Last Name Student ID Course Grade Chris Lee 20158888 CS115 85 Chris Lee 20158888 CS116 76 Chris Lee 20158888 CS230 80 Chris Lee 20158888 CS234 84 Chris Lee 20158888 CS330 80 ⁞ ⁞ ⁞ ⁞ ⁞ Spring 2019 174 Relational Databases: Keys Foreign Keys • Foreign key: a field in a table that is a primary key in another table • The foreign key is used to link different tables together and avoid redundancy. Student ID First Name Last Name 20158888 Chris Lee 20158889 Terry Lee 20158890 Terry Dodd CS 330 Spring 2019 Student ID Course Grade 20158888 CS115 83 20158888 CS116 78 20158888 CS230 81 175 Relational Databases: Keys Primary and Foreign Keys Example • SID (i.e. Student ID) is the primary key for the Students table Students (SID, First Name, Last Name) • CID (i.e. Course ID) is the primary key for the Courses table Courses (CID, Instructor, Term, Building, Room, Time) • How do we express the courses students take and the grade they receive for each course? • We create a Completed table with two foreign keys: CID and SID. • The Completed table represents the relationship “students complete courses” and a row in a table students or courses represent one complete item in the relation. CS 330 Spring 2019 176 Relational Databases: Keys Completed CID Students SID Grade SID CS115 20158890 83 20158888 Chris Lee CS116 20158888 78 20158889 Terry Lee ⁞ ⁞ ⁞ ⁞ ⁞ ⁞ First Name Last Name SID links the completed course to the student’s information. CID links the completed course to the course details. CID CS 330 Instructor Term Bldg Room Time CS116 C. Smith S19 MC 4040 1:00-2:20 TTh CS135 J. Doe S19 MC 4041 2:30-3:50 TTh ⁞ ⁞ ⁞ ⁞ Spring 2019 ⁞ ⁞ 177 Relational Databases Another Example Both table are from the course text, Figure 6-4 In the table above, Supplier_Number is a foreign key because it is the primary key in the table below. ⇒ It links information about a supplier to information about a part. CS 330 Spring 2019 178 Relational Databases - Keys Exercise: What are possible Primary Keys? Student (SID, email, address, phone, birthday, social insurance #) a) SID? b) email? c) Phone? d) (SID, Address)? e) social insurance number? S19 Courses (CID, Instructor, Term, Bldg, Room, Time, Cap) • What are the possible primary key(s)? CS 330 Spring 2019 179 Database Management Systems Problem 1 • What if your database is growing so large that you need to split it over multiple hard drives? • Ideally: you would want to avoid modifying all your programs when this happens. Solution • Separate how the data is stored (the Physical Schema or Physical View) from how the data is used (the External Schema or the Logical View). CS 330 Spring 2019 180 Database Management Systems Problem 2 • What if different users (say the Payroll Clerk and Benefits Clerk) are interested in different parts of the database Solution • Create a single global view of the data (the Global View a.k.a. the Conceptual Schema) that feeds into many individual views of the data (the External Schema a.k.a. the Logical View) for the different user groups. CS 330 Spring 2019 181 Database Management Systems Conceptual Schema (Global View) CS 330 External Schema (Logical View) Spring 2019 182 Database Management Systems Three Schema Architecture • External Schema (or Logical View) - how the data is displayed to a particular user - different views for different user groups - the rest of the database is hidden from that user - e.g. Payroll sees net pay, the IT department does not. • Conceptual Schema (or Global View) - a global description of the whole database (all the data) - unbiased towards any particular group of users - we focus on this level • Physical Schema (or Physical View) - how the data is physically stored and organized - what data is in which file on which disk CS 330 Spring 2019 183 Database Management Systems Example of the Three Views Consider the attribute “birthday” with the value “June 20 1994” • Physical Schema (or Physical View) - a pattern of 0’s and 1’s located on a magnetic disk • Conceptual Schema (or Global View) - the date would be located in a particular row, in a particular table, under the column heading Birthday in a particular table in a particular file • External Schema (or Logical View) - to display someone’s age, the DBMS could subtract their birthday from the current date CS 330 Spring 2019 184 Database Management Systems Three Schema Architecture • Why have these three layers? • Answer: Data independence: i.e. the separation of logic, storage and presentation. • Can change software without changing the data and vice versa. • Just like file systems: regardless of where or how the file is stored, you can open it. • Easier to manage and control (e.g. want some users to only see age but not exact date of birth). CS 330 Spring 2019 185 Database Management Systems Data Independence • Key Idea: remove details related to data storage and access from application programs • Concentrate those functions in single subsystem: the Database Management System (DBMS). • Have all applications access data through the DBMS. • Make applications independent of data storage and make its display independent of data logic. CS 330 Spring 2019 186 Database Management Systems SQL • All modern databases support SQL. • It is the most commonly used language to create, manage and query a database. • SQL statements can be embedded in other programming languages (C/C++, Java, Python etc.) • The SQL command to access a database is often generated on the fly, behind the scenes - i.e. users specify what they want and click the search button. - e.g. http://www.lib.uwaterloo.ca - e.g. https://cs.uwaterloo.ca/cscf/teaching/schedule/ CS 330 Spring 2019 187 Database Management Systems Operation: Select Many operations on a table would involve obtaining information from particular rows or columns • Select finds the rows that match a certain criteria - e.g. select parts with part_number 137 or 150 CS 330 Part_Number Part_Name Unit_Price Supplier_Number 137 Door latch 22.00 8259 150 Door moulding 6.00 8263 Spring 2019 188 Database Management Systems Operation: Join • Join adds relevant columns from another table, say Suppliers Part_Number Part_Name Unit_Price Supplier_Number 137 Door latch 22.00 8259 150 Door moulding 6.00 8263 CS 330 Part_Number Part_Name Unit_Price Supplier_Number Supplier Name ∙∙∙ 137 Door latch 22.00 8259 CBM Inc. ∙∙∙ 150 Door moulding 6.00 8263 Jackson Composities ∙∙∙ Spring 2019 189 Database Management Systems Operation: Project • Project would only include certain columns … • Part_Number Part_Name Unit_Price Supplier_Number Supplier_Name ∙∙∙ ∙∙∙ 137 Door latch 22.00 8259 CBM Inc. ∙∙∙ ∙∙∙ 150 Door moulding 6.00 8263 Jackson Composities ∙∙∙ ∙∙∙ e.g. project the columns Part_Number, Part_Name, Supplier_Number and Supplier_Name Part_Number Part_Name Supplier_Number Supplier Name 137 Door latch 8259 CBM Inc. 150 Door moulding 8263 Jackson Composities CS 330 Spring 2019 190 Database Management Systems Data Manipulation • The contents of a database can be accessed using a data manipulation language which specifies the contents to extract. • e.g. The following MySQL query generated the table below. Part_Number Part_Name Supplier_Number Supplier Name 137 Door latch 8259 CBM Inc. 150 Door moulding 8263 Jackson Composities CS 330 Spring 2019 191 Database Management Systems Data Definition • The contents of a database must be clearly defined using a data definition language which specifies the type of each attribute / field / column heading. E.g. CREATE TABLE Parts (Part_Number number, Part_Name text, Unit_Price currency, Supplier_Number number, PRIMARY KEY (Part_Number)); • It could also specify valid ranges (for numbers) and whether duplicate values are allowed. CS 330 Spring 2019 192 Database Management Systems Limitations of Relational Databases • Multimedia data: graphics, audio, video - Tables (i.e. rows and columns of data) don’t handle multimedia data well • Arrays of data (all the same type of data indexed with a natural number) • Unstructured text: e-mail, text messages, tweets, user comments • Hierarchical data - Example: Taxonomy of Organisms - Hierarchy of categories: kingdom, phylum, class, order, family, genus, species CS 330 Spring 2019 193 Database Management Systems Hierarchical Database animals chordates vertebrates birds reptiles arthropods insects spiders crustaceans mammals A tree captures the relationship among the data: a parent (e.g. vertebrates) can have many children (e.g. birds, reptiles, …). • How would you design a relational schema for this? • Not as common as relational databases. • CS 330 Spring 2019 194 Database Management Systems Network Database University Department Student Course Section Completed Unlike a hierarchical database, a child (i.e. Completed ) can have multiple parents (i.e. Section and Student). • These databases can be faster than relational databases. • But they are not as common as relational databases. • CS 330 Spring 2019 195 Database Management Systems Object-oriented (OO) Database • Many applications need to store and retrieve text, graphics, audio and video (i.e. multimedia) • Organizing the database as tables with rows and columns does not handle multimedia very well. • OO Databases store both - the types of data and - the procedures that manipulate the data • Relatively slow because of complexity. • Support many OO concepts like inheritance and polymorphism - a grad student is a type of student with additional attributes (fields) CS 330 Spring 2019 196 Database Management Systems Object-oriented (OO) Database • Inheritance: a Grad_Student includes all the attributes of a Student plus possibly some additional ones (i.e. thesis_supervisor, office, office_phone) • Polymorphism: A Student and a Grad_Student can respond to many of the same operations, e.g. get_student_number • Object-oriented databases are becoming more popular. CS 330 Spring 2019 Student • student_number • name • userID • major Grad Student • thesis_supervisor • office • office_phone 197 Database Design Criteria for a Good Design What are the criteria for a good database design? - Correctness - Completeness: it characterizes all the data - Minimum redundancy: it cannot be completely eliminated in all cases. CS 330 Spring 2019 198 Database Design Steps in Database Design 1) Identify what data to store and the relationships between the entities. Use an entity-relationship (ER) diagram capture this data. 2) Convert the ER diagram into tables - Use a set of mapping rules which we will cover briefly. 3) Fine-tune your design - Apply the normalization process to remove redundancy. CS 330 Spring 2019 199 Example: Company Database Step 1a: Identify the Data • An Employee has a name, sex, address, salary, SIN, birthday, works for a department, works on projects, might have a supervisor (who is also an employee). • A department has a name, a department number, a manager (who is also an employee), located in one or more cities. • A manager has a starting date. • A project has an name, a project number and a location. It is controlled by a department. • The number of hours an employee works on a project should be recorded for performance evaluation. CS 330 Spring 2019 200 Example: Company Database Step 1b: Create an ER Diagram SIN Source: Fundamentals of Database Systems by Ramez Elmasri CS 330 Spring 2019 201 Example: Company Database Step 1b: Create an ER Diagram: Entities and Relationships • The rectangle are entities (things, nouns) that we store data about, e.g. EMPLOYEE, DEPARTMENT, PROJECT • • • The diamonds are relationships between the entities. - E.g. WORKS_FOR, MANAGES, SUPERVISION, CONTROLS, WORKS_ON Sometimes we store data about the relationship. - E.g. StartDate, Hours There are different formats for ER diagrams. - We will use Chen notation but not cover all its details. CS 330 Spring 2019 202 Example: Company Database Step 1b: Create an ER Diagram: Attributes • The ovals are the pieces of data (attributes) that we store about entities or their relationships, - e.g. Salary, Address, Sex, Name, SIN, … • The primary keys for each entity are underlined in the ovals. • Double ovals are pieces of data that we store that can have more than one value (multivalued attributes), - e.g. location of the department (i.e. Waterloo and Toronto). • Dashed ovals are pieces of data that can be derived from other attributes, - e.g. NumberOfEmployees. CS 330 Spring 2019 203 Example: Company Database Step 1b: Create an ER Diagram Relationships can be … • 1:1 A department only has one employee that manages that department and a manager only manages one department. • 1:N (one to many): One department has many employees that works_for it. • N:M (many to many): An employee works_on many projects and a project has many employees. CS 330 Spring 2019 204 Example: Company Database Step 1b: Create an ER Diagram Relationship R can be connected to an entity E by a • • single line meaning not every entity E participates in the relationship R, called partial participation, e.g. - not every employee is a supervisor - not every employee is a supervisee (i.e. the CEO) - not every department controls a project double line meaning every entity participates in the relationship, called total participation, e.g. - every employee works on 1 or more projects - every project has employees working on it - every department has employees and a manager CS 330 Spring 2019 205 Example: Company Database Step 2: Map the ER Diagram to DB Tables • There are about dozen different rules for the mapping. • We only introduce a few simple ones. • An entity from a ER diagram is represented a table. • Relationships are represented either as 1. foreign keys in one of the entity’s table 2. or they get their own table. • For each table you create, you must find a primary key (P-key) to uniquely identify a single record (i.e. row) in the table. CS 330 Spring 2019 206 Example: Company Database Step 2: Map the ER Diagram to DB Tables a) Entities get mapped to tables (just the headings of each table are shown) EMPLOYEE (Bdate, SIN, Fname, Minit, Lname, Sex, Address, Salary) DEPARTMENT(Dname, Dnumber, Location) PROJECT(Pname, Pnumber, Location) b) For 1:1 relationships: place the P-key from one entity into the other entity’s table For EMPLOYEE manages DEPARTMENT add MgrSIN to the DEPARTMENT table (i.e. it is a foreign key in DEPARTMENT). DEPARTMENT(Dname, Dnumber, Location, MgrSIN) CS 330 Spring 2019 207 Example: Company Database Step 2: Map the ER Diagram to DB Tables c) For 1:N relationships: place the P-key from the “1” entity into the “N” entity’s table. For EMPLOYEE works_for DEPARTMENT add the DEPARTMENT’s Dnumber to the EMPLOYEE table. EMPLOYEE (Bdate, SIN, Fname, Minit, Lname, Sex, Address, Salary, Dnumber) For DEPARTMENT controls PROJECT add Dnumber to the PROJECT table. PROJECT(Pname, Pnumber, Location, Dnumber) CS 330 Spring 2019 208 Example: Company Database Step 2: Map the ER Diagram to DB Tables c) For 1:N relationships: place the P-key from the “1” entity into the “N” entity’s table For EMPLOYEE supervisor EMPLOYEE add SupSIN to the EMPLOYEE table EMPLOYEE (Bdate, SIN, Fname, Minit, Lname, Sex, Address, Salary, Dnumber, SupSIN) CS 330 Spring 2019 209 Example: Company Database Step 2: Map the ER Diagram to DB Tables c) For N:M relationships: create a new table with composite Pkey (composed of P-keys from both entities) and include any associated data, e.g. hours. - For EMPLOYEE works_on PROJECT create a WORKS_ON table WORKS_ON (SIN, Pnumber, Hours) CS 330 Spring 2019 210 Example: Company Database Step 2: Map the ER Diagram to DB Tables Note we have greatly simplified things here. I.e. we have not talked about how to deal with • derived attributes like: NumberOfEmployees • multivalued attributes like: Locations • entities connected to a relationship by a single line vs. a double line CS 330 Spring 2019 211 Example: Company Database Step 3: Normalize the design • The last step in database design is to normalize it. • Normalization is a process to minimize the redundancy in a design. • There are different levels of strictness for reducing redundancy. • We will cover this subject only briefly by learning one way to check for and remove some redundancy. CS 330 Spring 2019 212 Normalization Functional Dependency • Boyce-Codd Normal Form (BCNF): every attribute for an entity depends only on the candidate key(s) (and not some other attributes as well). • What do we mean by depends? • Functional Dependency (FD): A → (B, C) means... - B and C depend on A - i.e. the value of A determines the values of B and C CS 330 Spring 2019 213 Normalization Functional Dependency • Observations - There may be many different students with the same last name. - But each student has a unique student number. • Conclusion - The last name depends on student number. - Student number does not depend on last name. • Given your student number, I can look up your last name. • But given (only) your last name, I cannot find out (for sure) what your student number is. • Write this dependency as: student number → last name CS 330 Spring 2019 214 Normalization More Examples Examples of functional dependencies: • employee-number → employee-name • course-number, section-number, term → lecture-room • course-number, section-number, term → instructor Examples that are not functional dependencies: • employee-name ↛ employee-number • lecture-room ↛ course-number • instructor ↛ course-number • last-name ↛ colour-of-socks-you-wore-today CS 330 Spring 2019 215 Normalization Looking for Functional Dependencies What are the functional dependencies in the following Emp (employee) table? Emp (EName, SIN, BDate, Address, DNum, DName, MgrSIN) where EName – employee name SIN – social insurance number BDate – birthday DNum – department number DName – department name MgrSIN – department manager’s social insurance number CS 330 Spring 2019 216 Normalization Looking for Functional Dependencies What are the functional dependencies? • social insurance number (SIN) determines: employee name (EName), birthday (BDate), Address, DNum (Department number) i.e. SIN → Ename, BDate, Address, DNum • department number (DNum) determines: department name (DName) and department manager social insurance number (MgrSIN) i.e. DNum → DName, MgrSIN The table is not in BCNF: DName depends on SIN (a primary key) and DNum (not a primary key, repeated many times in the table) CS 330 Spring 2019 217 Normalization Looking for Functional Dependencies Recall: Boyce-Codd Normal Form (BCNF): every attribute for an entity only depends on the candidate key(s) Solution • Break the table into two tables Emp(Employee) and Dept (Department), each with their own primary key: 1. Emp (EName, SIN, BDate, Address, DNum) 2. Dept(DNum, DName, MgrSIN) Now all the attributes of • Emp are determined by SIN and • Dept are determined by DNum CS 330 Spring 2019 218 Another Example: Ordering Parts Sample Order Determining the functional dependencies first is another way to build and ER Diagram. E.g. • An order consists of an Order Number, a Date and a list of parts and their supplier (called line items). Order: 19330 Date: June 10, 2019 Part # Part Name Quantity Unit Price Supplier 137 Door latch 200 $22.00 8259: CBM Inc. 74 5th Ave, Saint John, NB, E2M 5T3 150 Door moulding 300 $6.00 8263: Jackson Components 82 Micklin St, Hamilton, ON, L9H 7M4 152 Door lock 300 $31.00 8259: CBM Inc. 74 5th Ave, Saint John, NB, E2M 5T3 CS 330 Spring 2019 219 Another Example: Ordering Parts Look for Functional Dependencies First consider the dependencies. Supplier Supplier_Number → Supplier_Name, Supplier_Street, Supplier_City, Supplier_Province, Supplier_PC Part Part_Number → Part_Name, Unit_Price, Supplier_Number Line_Item Order_Number, Part_Number → Part_Quantity Order Order_Number → Order_Date CS 330 Spring 2019 220 Another Example: Ordering Parts Look for Functional Dependencies Use the dependencies to build the tables, Supplier (Supplier_Number, Supplier_Name, Supplier_Street, Supplier_City, Supplier_Province, Supplier_PC) Part (Part_Number, Part_Name, Unit_Price, Supplier_Number) Line_Item (Order_Number, Part_Number, Part_Quantity) Order (Order_Number, Order_Date) From these tables you can build the ER Diagram. Example taken from section 6.2 of the course text. CS 330 Spring 2019 221 Abilities of a Database Can a DBMS find the following? • Filter: List the names of students who get in the 90s for both CS 330 and STAT 371 • Predict: What would be the monthly sales figure if we raise prices by 10%? Lowered it by 5%? • Filter: Find all the professors that taught/are teaching the student Terry Lee. • Predict: Find out how likely students will pass CS 330 if they get 80+ in CS 115. • Summarize: Plot the CS330 grade distribution according to the programs its students are enrolled in. CS 330 Spring 2019 222 Abilities of a Database Can a DBMS find the following? • A Database can... - record data, - search for an item, - filter and project (i.e. select certain records and attributes) - group information, - summarize (max, min, average, count) • A Database cannot perform - statistical analysis: what-if, forecasting, correlation • This limitation is why we need Data Warehouses, Data Marts, Online Analytical Processing and Data Mining CS 330 Spring 2019 223 Business Intelligence and Analytic Tools Data Warehouse • Definition: A decision support database that is maintained separately from the organization’s operational database. • I.e. it provides information to help make decisions. • How: It stores data (both current and historic) that could be of interest to a decision maker. • A data warehouse is - integrated (i.e. connected to corporate databases) - time-variant (takes into account data that changes over time) - non-volatile (does not delete old entries) CS 330 Spring 2019 224 Business Intelligence and Analytic Tools Data Warehouse • Use a database to keep track of day-to-day transactions (i.e. ordering from suppliers, making products, selling to customers). • Use a data warehouse to find patterns in the data and to provide insights. E.g. - What products cost the most to maintain? - What products cost the most to develop? - What products have the lowest defect rate? - Did changing suppliers impact our defect rate? • Then make decisions based on the patterns in the data. CS 330 Spring 2019 225 Business Intelligence and Analytic Tools Data Warehouse Components Course text, Figure 6-12 CS 330 Spring 2019 226 Business Intelligence and Analytic Tools Why have a Separate Data Warehouse? • Performance - Operational databases (which are not data warehouses) are tuned for day-to-day transactions and workloads, i.e. processing daily transactions. - Complex queries (which take a long time to process) would degrade performance for processing daily transactions. - Special data organization, access and implementation methods needed for complex queries. CS 330 Spring 2019 227 Business Intelligence and Analytic Tools Why have a Separate Data Warehouse? • Function - Decision support requires historical data (up to 5 to 10 years of data). - Consolidates data from many operational systems as well as external sources (GDP, foreign exchange rates, inflation) - Data quality considerations (how trustworthy is the data) CS 330 Spring 2019 228 Business Intelligence and Analytic Tools Benefits of Data Warehouses http://www.youtube.com/watch?v=KGHbY_Sales Examples • • • • https://www.ibm.com/analytics/data-warehouse https://azure.microsoft.com/en-us/services/sql-data-warehouse/ https://cloud.google.com/bigquery/ https://aws.amazon.com/redshift/ CS 330 Spring 2019 229 Business Intelligence and Analytic Tools Data Warehouse vs. Data Marts Data Warehouse • Collects information about multiple subjects that span the entire organization • requires extensive business modeling • may take years to design and build Data Marts • departmental subsets that focus on selected subjects - E.g. marketing data mart that focusses on customers, products and sales • faster roll out (compared to a data warehouse) • more complex to integrate all the data marts in the long run. CS 330 Spring 2019 230 Business Intelligence and Analytic Tools Online analytical processing (OLAP) • Traditional database queries look for answers in (twodimensional) tables. • Online analytical processing (OLAP) supports multidimensional data analysis. • This feature enables users to view the same data broken down in different ways along different dimensions e.g. - by product, by regions, by time period, by cost, by price. - E.g. How well has the predicted vs. actual sales performed in each region and for each product since June? CS 330 Spring 2019 231 Business Intelligence and Analytic Tools Online analytical processing • Here data is being considered along three dimensions 1. product 2. region 3. actual vs. predicted Dimensions, Measures, Hierarchy and Grain Course text, Figure 6-13 https://www.youtube.com/watch?v=qkJOace9FZg OLAP: https://www.youtube.com/watch?v=2ryG3Jy6eIY CS 330 Spring 2019 232 Business Intelligence and Analytic Tools Data Mining • Instead of making a query, tools automatically analyze large pools of data to find hidden patterns, infer rules and predict trends, e.g. - Associations: customers who buy X will likely buy Y if it is on sale. - Sequences: customer who buy X will typically buy Y within two months. - Classifications: types of customers who are likely to stop using your product. - Clusters: group together similar customers. - Forecasts: predict what some future values will be based on current trends. CS 330 Spring 2019 233 Managing Data Resources Information Policy • A database stores information but organizations also need policies for how it is used. • An information policy specifies organizational rules for sharing, disseminating, acquiring, standardizing, classifying and inventorying information. - What data and information to store - How to store, manage and use it - Who can access what ▪ E.g. who can access and change an employee’s salary. • Database administration manages the structure and content of corporate databases as well as access rules and security. CS 330 Spring 2019 234 Managing Data Resources Ensuring Data Quality Try to find and correct errors in the data, e.g. different version of someone’s name. • A data quality audit is a structured survey of the accuracy and completeness of data in an information system, i.e. check the data. - Look at all (or a sample of) the data. - Ask end users for the opinion of the data. • Data cleansing consists of activities for detecting and correcting data in an information system - E.g. is the postal code accurate for the address? - Enforces consistency CS 330 Spring 2019 235 Topic 3 – Networking Key Concepts • principle components of a network • common types of networks, transmission media and internet connections • principle technologies and standards for networking References • Course text, Chapter 7 Telecommunications, the Internet, and Wireless Technology CS 330 Spring 2019 236 Overview of Computer Networks Computer Networks (a Review) • A computer network is two or more computers connected together so that they can share resources • Network components (on each machine): - a Network Interface Card (NIC): allows a computer to be connected to the network ▪ e.g. Google image search “ethernet card” “Wi-Fi card” or “Bluetooth card” - a Network Operating System (NOS): routes and manages communications on the network and coordinates network resources CS 330 Spring 2019 237 Overview of Computer Networks Other Network Components (mostly a review) • Connection medium: could be wire, fiber optic cable, radio waves (more on this topic soon). • Dedicated servers: e.g. file server, e-mail server, database server, web server. • Hubs, bridges and switches connect machines on the same network and forward data from one to another - typically only see these in larger networks (i.e. 16+ computers) • Routers connect two or more different networks - e.g. your home network (typically Wi-Fi) to the Internet (typically DSL on telephone lines or HFC on cable TV lines) CS 330 Spring 2019 238 Overview of Computer Networks Other Network Components (mostly a review) • Firewall: hardware or software (or both) put between the internal network (or individual computer) and the internet to prevent outsiders from obtaining unauthorized access • Key Question: How does it block unauthorized network access but allow authorized access? - The firewall keeps track which websites you have contacted recently and are waiting for a reply. - e.g. if you initiate a search using Google, you firewall will accept a reply from Google but not from any other website. • My home firewall gets dozens of unauthorized attempts to access my home network every hour. CS 330 Spring 2019 239 Overview of Computer Networks A Network for a Large Company would include • internet • public telephone network • internal wired network • internal wireless network • cell phone network • video conferencing system • extranet (a private network that partners, suppliers and vendors can access). CS 330 Spring 2019 Course text, Figure 7-2 240 Key Trend: Packet Switching Circuit Switching This was the first method of switching, dating back to the first telephone systems. • Originally a connection between two devices was achieved by creating a circuit, a temporary dedicated path between the source and the destination. • Think of a telephone call, you set up the circuit (dial the number) and the circuit exists until either party hangs up, even if there is no one talking. • This approach wastes network resources when no (or little) talking is taking place. CS 330 Spring 2019 241 Key Trend: Packet Switching Packet Switching Later on an alternative to circuit switching was developed, packet switching. 1. Data (such as a webpage) is broken down into small parts (called packets) roughly 1 KB in size. 2. Packets are sent from the source to the destination (possibly along different communication paths) 3. Packets are reassembled in their original order once they reached their destination. Key Advantage (compared to circuit switching): Only use the network when you have information to send ⇒ more people can share the network. CS 330 Spring 2019 242 Key Trend: Packet Switching Course text, Figure 7-3 The network consists of many nodes and there are multiple routes to the destination. CS 330 Spring 2019 243 Key Trend: Packet Switching An Analogy • Say, you are having a party in Toronto. • Twenty of your friends are attending. • How is packet switching different from circuit switching? - Hint: train vs. cars • Train: a single route • Cars: - group splits up with some people in each car - cars may take different routes - cars may mix in with other cars (not going to party) - group reassembles when at the destination CS 330 Spring 2019 244 Different Types of Networks • • • • • Topology: How are the nodes connected to each other? Geographic scale: How big is the network? Protocol: What are the rules for communication? - how to initiate and terminate communication, - message format, handle errors, control messages, - route messages, voltage levels Transmission media: Wired, wireless or fiber Services: E-mail, printing, file transfer, remote terminal, teleconference, database access, file sharing etc. CS 330 Spring 2019 245 Topologies Popular Topologies • The most common topologies for wired computer networks are - the star, - the ring, and - the bus. • Ethernet uses either a star or a bus. Source: Course text, 5th edition, Figure 7-6 CS 330 Spring 2019 246 Geographical Scale Popular Geographical Scales • NFC (near field communication) up to 4 cm, e.g. mobile payment systems • PAN (personal area network) up to 10 metres, - e.g. Bluetooth connecting a laptop to a wireless mouse. • LAN (local area network) within a small building or a single floor of a large building, - e.g. Ethernet in campus offices - e.g. Wi-Fi in your house or apartment, which can also be called WLAN (wireless local area network). • WAN (wide area network) typically means the internet but it could any network spanning regions or countries. CS 330 Spring 2019 247 Protocols Internet Protocol Suite • A network protocol is a set of rules governing how data is exchanged in a network • Internet Protocol Suite is the standard for most networks including the internet. • At its core are two protocols: the Transmission Control Protocol (TCP) and the Internet Protocol (IP). • Each computer is assigned and identified by an IP address - like a telephone number for a cell phone - It contains four 8-bit numbers, each separated by a dot (for IP version 4). - Example: 192.3.15.1 CS 330 Spring 2019 248 Protocols TCP/IP • You can check your IP address by typing ipconfig in a DOS shell or by asking “what is my ip address” in Google • You can find out who is responsible for that address with the nslookup command (or a whois server) • UWaterloo’s range is 129.97.0.0 to 129.97.255.255 • You can see what sites your computer is connected to with the netstat command • TCP and IP are part of a four layer protocol suite … CS 330 Spring 2019 249 Internet Protocol Suite Course text Figure 7-4 CS 330 Spring 2019 250 Internet Protocol Suite Application Layer • defines protocols for applications to exchange data • e.g. the HTTP protocol for web browsers and web servers • send the data (e.g. a webpage) to the transport layer to be transported Transport Layer • sets up and manages the connection with the destination • breaks up data into packets at the source and reassembles them at the destination • also handles flow control and congestion and optionally reliability (i.e. request retransmission if a packet is lost or corrupt) CS 330 Spring 2019 251 Internet Protocol Suite Internet Layer • addressing and routing a packet through the network • gets packet from source to destination based only on its address Network Interface (e.g. Ethernet, Wi-Fi, DSL) • transporting a bit (or a packet) in the network medium • i.e. placing the packet (a sequence of bits) on the network medium (at the source) and receiving it from the medium (at a neighbouring node) • e.g. deals with how to represent and recognize a 0 or a 1 on the medium (wire, fibre optic cable, radio waves) CS 330 Spring 2019 252 Internet Protocol Suite Application Layer Transport Layer Internet Layer Network Interface B F A Course text, Figure 7-3 CS 330 C D E I G H Spring 2019 253 Internet Protocol Suite Source Application Layer • Send an email with a large attachment from home (A) to UWaterloo (I). • Call a transport level function to send the message to the destination. Source Transport Layer • Sets up and manages the connection between A and I. • Breaks the email up into packets (tracking their order) and calls an internet layer function to send each packet. • It makes sure that they are not sent too quickly (flow control) and the arrive without error. CS 330 Spring 2019 254 Internet Protocol Suite Source Internet Layer • Finds a path from A to I. • Has 3 choices: go through B, C, or D. • Figures out the best choice and then calls the appropriate network interface function to send the packet on that link. Source Network Interface (e.g. Ethernet, Wi-Fi, DSL) • Just concerned with getting a single packet to the next node. • A to C could be Wi-Fi. • C to G could be DSL. • G to I could be Ethernet. CS 330 Spring 2019 255 Internet Protocol Suite Destination Network Interface (e.g. Ethernet, Wi-Fi, DSL) • Receives the packet and sends it up to the internet layer. Destination Internet Layer • If it is for this address then send it up to the transport layer. • If not, it figures out which link to send it out on and calls the appropriate network level function. Destination Internet Layer • Receive the packets and reassembles them. • When they have all arrived, it lets the application level know that a message has arrived. Destination Application Layer • Lets the user know they’ve got a new email. CS 330 Spring 2019 256 An Analogy: Mailing a Letter CS 330 Spring 2019 257 Physical Transmission Media Four Common Media 1. Twisted pair (of wires) – e.g. telephone lines, category 5 networking cable (CAT5), Ethernet cable 2. Coaxial cable – e.g. cable TV 3. Fiber optic cable – fast, massive bandwidth 4. Wireless transmission media and devices – more and more popular in LANs CS 330 Spring 2019 258 Physical Transmission Media Speed and Responsiveness • One way of characterizing network performance is by bandwidth, i.e. the number of bits that can transmitted per second. - typical units are Kbps, Mbps, Gbps - recall small b = bit not byte - K, M, G are (typically) multiples of 1024 - the larger the bandwidth, the faster the network, i.e. the more data that can be transferred in one second • Network responsiveness is measured by latency, i.e. how long it takes to receive the first byte of data or the time between a request and a response. CS 330 Spring 2019 259 Wireless Communication Wireless • • for PAN (Personal Area Network) i.e. 10 metres or less. e.g. cell phone to headset; computer to wireless mouse, keyboard, printer; cell phone tethering (smart phone uses computer’s internet connection) some say it stands for Wireless Fidelity, but it was a meaningless word meant to be similar to hi fi. • WLAN (wireless LAN), i.e. within a home or office • • comes in various speeds: a, b, g, n, ac CS 330 Spring 2019 260 Wireless Communication Wireless secure remote wireless access for longer distances (up to 50 kilometres) • stands for Worldwide Interoperability of Microwave Access • based on microwaves • typically used in rural settings that do not have cable access • needs a base station to connect with a remote tower • CS 330 Spring 2019 261 Wireless Communication 3rd, 4th and 5th Generation Cellular Networks • Newer generations of networks have faster data speeds - 3G (1-2 Mbps typical) since the mid 2000s - 4G (4-200 Mbps typical) current in most areas of Canada - 5G (1 Gbps or greater) about to happen • 4G comes in many flavours with different speeds: HSPA+, LTE and LTE Advanced Cell phone companies provide maps of their coverage • https://www.bell.ca/Mobility/Our_network_coverage https://www.rogers.com/consumer/wireless/network-coverage https://www.telus.com/en/bc/mobility/network/coverage-map • You can find out about other companies at http://en.wikipedia.org/wiki/List_of_Canadian_mobile_phone_companies CS 330 Spring 2019 262 Wireless Communication Cell Towers • • • The range of a single cell phone tower (in ideal circumstances, i.e. flat terrain) can be 35 km. In cities, the towers are much closer, e.g. 1 km apart. The location of cell phone towers is public information. • You can get a map of the ones in your area if you want to choose a provider that has a tower close by. https://www.ertyu.org/steven_nikkel/cancellsites.html • Most towers support multiple technologies, e.g. 3G and 4G. Most cell phones support multiple technologies, so if a 4G phone cannot find 4G, it will connect to 3G service. • CS 330 Spring 2019 263 Wireless Communication 5th Generation Cellular Networks • 5G will allow for smaller antennas with a shorter range and so (besides extra bandwidth) it can support more devices. - 4G can support 100,000 device per square km. - 5G can support 1,000,000 device per square km • • • This density means more support for the internet of things - i.e. the extension of the internet to devices like smart thermostats, lighting and home security systems. 5G systems also have faster response times (lower latency) so it can support devices that require more stringent timing. 5G is compatible with 4G so the 5G network can grow incrementally in an area with 4G service. CS 330 Spring 2019 264 Radio Frequency Identification (RFID) What is it? Course text, Figure 7-19 Similar to a bar code, i.e. the tag stores a unique number that identifies an item (or type of item). • When the tag is placed close to an RFID reader, the identifier is read off of the tag and sent to a computer. • Do not need line-of-sight (like the grocery store check out), the reader just has to be close to the tag. • CS 330 Spring 2019 265 Radio Frequency Identification (RFID) Types and Uses of RFIDs • RFID tags come in many sizes and shapes but are generally fairly small and flat • There are two types - passive: cost a few pennies, don’t need a battery, can only be read from within a few feet - active: cost a few dollars, need a battery, can be read from over 100 feet away • Great for inventory management: How many xyz’s do we have in inventory and were are they? • Similar to tracking a package with Canada Post/UPS/Fedex. CS 330 Spring 2019 266 The Internet ≠ The Web What is the Internet? • An internet: a network of networks • The Internet: a collection of local, regional, national and international computer networks linked together • It evolved from the late 60’s to mid 80’s as a way to link up different networks together. • Most homes and businesses connect to the internet by subscribing to an Internet Service Provider (ISP), e.g. Bell Internet or Roger’s High Speed Internet. • The first major application was email (then file transfer, electronic bulletin board services, etc.) CS 330 Spring 2019 267 The Internet ≠ The Web What is the World Wide Web? • Recall: The Internet is a large number of networks connected together. • The World Wide Web, created in 1989, is just one of the services available over the Internet. • The Web is a collection of interconnected documents and other resources, linked by hyperlinks • When you use a web browser (e.g. Chrome, Firefox, Safari) to request a webpage you are using the web. CS 330 Spring 2019 268 The Internet ≠ The Web What is the World Wide Web? • hypertext transfer protocol (HTTP): the protocol that structures the communication between the web browser (the client) and the web server - usually placed at the beginning of the web address - it has commands to get a webpage, see if a webpage has changed recently, etc. • hypertext markup language (HTML): is the file type (i.e. format) that a browser understands - placed at the end of a web address • e.g. http://www.uwaterloo.ca/index.html CS 330 Spring 2019 269 The Internet ≠ The Web What is the World Wide Web? • Every webpage has a unique address, called uniform resource locator (URL) - e.g. http://www.uwaterloo.ca is a URL • Typically webpages reference other webpages via their URL, which is printed in blue and underlined. • If you can click on the URL and the program jumps to another page it is called a hyperlink • Hypertext is just a file (or software system) that contains/implements hyperlinks. CS 330 Spring 2019 270 The Internet ≠ The Web What is not part of the World Wide Web? • Any program that needs to be connected to the internet to run, but does not use a browser. • Typically you download it as a separate program • Examples - Spotify: listen to streaming music - iTunes: purchase (and listen to) music, videos, etc. - some multiplayer on-line games - some utility programs, typically Unix/Linux based, like secure shell, secure copy CS 330 Spring 2019 271 IP Addresses and the DNS IP Addresses • Every device connected to the Internet has a unique identifier call its IP address - e.g. 129.97.208.24 (IPv4) - e.g. fe29::1725:c216:85fc:100d (IPv6) - currently use IPv4 but we are running out of addresses. - numbers are difficult to remember • The domain name is the English-like name that corresponds to the IP address - e.g. uwaterloo.ca • You need to register (pay) to get a domain name • The Domain Name System (DNS) translates domain names into IP addresses CS 330 Spring 2019 272 IP Addresses and the DNS 1. The computer sends the domain name to a DNS server. uwaterloo.ca 129.97.208.24 2. The DNS server responds with the IP address. 3. The pair is stored in a DNS cache 4. To see what is in your cache type ipconfig /displaydns uwaterloo.ca = 129.97.208.24 Image Source: http://www.windowsnetworking.com/img/gifs/tcpipdns.gif CS 330 Spring 2019 273 IP Addresses and the DNS For a domain name like sales.google.com the top level domain name is “com”, the second level is “google” CS 330 Spring 2019 274 Internet Services Voice Over IP (VoIP) • Voice over IP is a way of making telephone calls using the internet, as opposed to using the telephone system. • e.g. Skype, Google Voice plug-in for Gmail • Can use your computer (if it has speakers and a microphone) or buy a VoIP telephone. • Used as a way of cutting down communication costs. CS 330 Spring 2019 275 Internet Services Voice Over IP (VoIP) • Sound is digitized (sampled 8,000 times per second), broken up into packets, transported through the internet via IP, then reassembled at the other end. CS 330 Spring 2019 276 Internet Services VPN Motivation • Goal: provide the ability to work remotely and securely access files, e-mails and business data from your company’s internal network • Challenge: the Internet is not safe! • Malicious people can intercept IP traffic (called packet sniffing) • Need a way of securing data. • Idea: create a virtual private network (VPN) CS 330 Spring 2019 277 Internet Services VPN • A virtual private network (VPN) is a computer network that provides secure access using a public infrastructure such as the Internet • Avoid the need for many leased lines that individually connect remote offices (or remote users) to a private intranet. • VPN creates a secure virtual tunnel to transport the data • The original packet is encrypted before being transmitted through the public network and then decrypted after reaching its destination. CS 330 Spring 2019 278 Topic 4 - Management Information Systems Key Concepts • data vs. information, • information system (IS), management information system (MIS), business intelligence (BI) • objectives of an information system • contemporary approach to MIS References • course text, Chapter 1 Information Systems in Business Today CS 330 Spring 2019 279 What is MIS? Some Key Definitions • Data: raw facts (course text, pg. 13) e.g. a list of items scanned at a supermarket checkout scanner • Information: data shaped into a form that is meaningful ... to human beings (course text, pg. 13) e.g. which items are selling well, which aren’t, which need reordering • Information Technology: all the hardware and software that a firm needs to use in order to achieve its business objectives (course text, pg. 12) e.g. desktops, laptops, servers, smart phones, MS Office, custom software. CS 330 Spring 2019 280 What is MIS? Class Task: What data does YouTube track? • who uploaded the video, number of views, likes, dislikes, number of subscribers, comments, the region you are in, what videos you have seen before What information could we obtain from this data? • most popular videos, videos people view again and again, most popular YouTube channels, most popular YouTube channels in a certain region, YouTube channels that are popular in many regions CS 330 Spring 2019 281 What is MIS? Class Task: What are YouTube’s goals? • to sell advertizing, which means - to be a popular website - to keep you on their website for a long time (so you will see more ads) How could we use this information to further YouTube’s goals? • to easily identify the most popular videos in different genres (e.g. comedy, gaming, pet videos) or in a particular region or across many regions CS 330 Spring 2019 282 What is MIS? More Key Definitions • Information System: A set of interrelated components that collect, process, store and distribute information to support decision making and control in an organization (course text, pg. 12) • Information System Literacy: understanding the - technical - organization - management dimensions of an information system (course text, pg. 14) CS 330 Spring 2019 283 What is MIS? Dimensions of an Information System • technical: you’ve seen this already e.g. processors, secondary storage, servers, databases, networks, etc. • organization: (next topic) - different groups in a firm have different information needs, e.g. senior management does long range planning vs. operational workers deal with day-to-day transactions. - rules (such as course prerequisites) are embedding in the information system (such as Quest) CS 330 Spring 2019 284 What is MIS? Dimensions of an Information System • management: (future topic): make decisions, formulate action plans, design and deliver new products • The goal of studying Management Information Systems is to develop broader information systems literacy (course text, pg. 15) All definitions from course text, 7th Canadian edition. CS 330 Spring 2019 285 Why have MIS? The Mission of MIS • To improve the performance of people in organizations through the use of information technology • To (fully or partially) automate data gathering, processing, storage and information distribution with the help of information technology (IT) • To convert business data into information and business intelligence (IT technology to help make better decisions) CS 330 Spring 2019 286 Why have MIS? MIS vs. IS • Wikipedia and some other online resources consider MIS is a part of an IS and is designed to support or automate decision making • The textbook considers a MIS is a broader IS where both technical and behavioral issues are considered. CS 330 Spring 2019 287 Strategic Objectives of an IS Issues to Consider • Why consider a MIS? - cost benefit analysis - adapt to (internal and external) change - create/maintain a competitive advantage • How to develop and manage an IS? - design, implement, and integrate - training, new business practices - privacy and security CS 330 Spring 2019 288 Strategic Objectives of an IS Issues to Consider • Operational excellence, improved efficiency - Overhead (costs other than labour and materials) as a percentage of sales revenue ▪ Walmart spends 16.6% ▪ Sears spends 24.9% (went bankrupt in 2017 in Canada, 2018 in the US) ▪ Industry average in retail is 20.7% - Monthly Sales per square foot ▪ Walmart $28 US ▪ Target $23 US ▪ industry average in retail is $12 US - Walmart links suppliers to every Walmart store see https://www.youtube.com/watch?v=SUe-tSabKag&t=131s CS 330 Spring 2019 289 Strategic Objectives of an IS Issues to Consider • Help develop new products, services, and business models - e.g. iTunes, Spotify, Netflix. • Understand customers and suppliers better - to enhance customer loyalty know what the customers want - e.g. high end hotel, room temperature, etc. - keep suppliers informed • Improved decision making by basing decisions on the most recent and relevant information - avoid over production and under production - know the effectiveness of a tool or person CS 330 Spring 2019 290 Strategic Objectives of an IS Why • Survival, companies have to respond to - customers desire to use new technology - e.g. banking machines - new legislation in information gathering and reporting • These factors lead to a competitive advantage. - doing things better or cheaper than the competition. CS 330 Spring 2019 291 Strategic Objectives of an IS Change for Survival “As C.E.O., it’s also superimportant to keep focused on the future,” Mr. Page said. “Companies can tend to get comfortable doing what they’ve always done, with a few minor tweaks. It’s only natural to want to work on things you know. But incremental improvement is guaranteed to make you obsolete over time, especially in tech.” - Larry Page, CEO, Google Source: http://www.nytimes.com/2013/04/19/technology/googles-earnings-beat-expections-but-revenue-does-not.html CS 330 Spring 2019 292 Strategic Objectives of an IS Creative Destruction • new technology threatens existing business • a term coined by Joseph Schumpeter in his work Capitalism, Socialism and Democracy (1942) to denote a "process of industrial mutation that incessantly revolutionizes the economic structure from within, incessantly destroying the old one, incessantly creating a new one." Source: http://www.investopedia.com/terms/c/creativedestruction.asp CS 330 Spring 2019 293 Strategic Objectives of an IS Why invest in an IS? • provides real value to the company • provides a better return on investment than other options such as buildings, machines, etc. What helps achieve a better return? • need complementary assets: assets required to derive value from a primary investment (pg 21), e.g. new business models, new business methods, training, management behaviour etc. CS 330 Spring 2019 294 Strategic Objectives of an IS Course text, Figure 1-8 • Some companies get bigger productivity boost for their investment in IT than do others. CS 330 Spring 2019 295 Contemporary Approaches to MIS Course text, Figure 1-9 CS 330 Spring 2019 296 Contemporary Approaches to MIS Technical Approaches • Computer Science: methods of computation, storage, and access. • Operations Research: optimizing selected parameters such as transportation costs, inventory levels. • Management Science: models for decision making and management practices. CS 330 Spring 2019 297 Contemporary Approaches to MIS Behaviour Approaches • Sociology: how information systems affect individuals, groups and organizations. • Economics: production of digital goods, dynamics of digital markets. • Psychology: how humans decision makers use formal information. CS 330 Spring 2019 298 Contemporary Approaches to MIS A Sociotechnical Approach • Optimal organizational performance is achieved by jointly optimizing both the social and technical systems used in production (pg 24) • Both behavioural and technical aspects need to be considered (pg 24) • To illustrate the importance of including behaviour approaches along with technical issues, consider something simple like the colours used in PowerPoint slides... CS 330 Spring 2019 299 Contemporary Approaches to MIS Example: Colours for text and background • Yellow print has very low contrast on a white background. Try to read the following. Offer expires 07/31/13. Offer available to new High Speed Internet subscribers only. May not be used in conjunction with any other offer. Service is not available in all areas. Certain taxes and fees may apply. DSL: Offer requires a 12 month subscription. CS 330 Spring 2019 300 Contemporary Approaches to MIS Example: Colours for text and background • Blue has the shortest wavelength of visible light and red the longest. Blue is refracted more strongly than red in our lenses. • Result: our eyes can’t focus on red and blue at the same time, resulting in eye strain. CS 330 Spring 2019 301 Contemporary Approaches to MIS Case Study • A university decides to adopt Learn as their platform for course delivery (as opposed to creating course web sites using HTML) • The new system requires instructors to learn the new system. • Many senior profs refuse to learn the new system. • If you are the VP in charge of a project, what would you do? - Before purchasing a system like Learn, survey the users of the existing system to get a sense of what they like and dislike about the current system. - As a group, do an assessment of the current system and the available options to see which best meets their needs. - People are more likely to accept a solution if they have had their say and as many as possible of their needs are being met. CS 330 Spring 2019 302 Topic 5 –Business Processes and Types of Information Systems Key Concepts • Business Processes (BP) • Customer Relationship Management (CRM) • Supply Chain Management (SCM) • Accounting Information System (AIS) • Human Resources Information System (HRIS) • Transaction Processing Systems (TPS) • Decision-support Systems (DSS) • Management Information Systems (MIS) • Executive Support Systems (ESS) References • course text, Chapter 2 How Businesses Use Information CS 330 Spring 2019 303 Business Processes An Experiment in Behavioral Economics • Ask people for help getting a car out of a pothole. • Three Versions 1. randomly asked people passing by ⇒ many were happy to help 2. told people if they helped, they would get $10 ⇒ only a few helped 3. after helping, he gave the volunteer a gift worth $1 ⇒ all happily accepted the gift and thanked him • What is going on here? CS 330 Spring 2019 304 Business Processes An Experiment in Behavioral Economics • He concluded: - We live in a capital market and a social market. Each has its own rules and value system. - Different markets, different rules, different returns, different focus and different value systems. • Our goal: - Understand the role of an IS both in the capital market and in the social market (i.e. as opposed to personal use of an information system). CS 330 Spring 2019 305 Business Processes The Essence of a Business • The basic operation of a business is to convert resources into products and services. • A business can also be seen as a collection of business processes. • Business processes are the collection of activities required to produce a product or service (pg 32). - i.e. how is work organized, coordinated, focused - e.g. customer places an order, what happens next? Goal: examine business processes with a view to understanding how they might be improved by using information technology to achieve greater efficiency, innovation and customer service (pg 34). • CS 330 Spring 2019 306 Business Processes Major Business Functions for an Organization • Each business is a collection of business functions, e.g. Function Manufacturing and Production Sales and Marketing Finance and Accounting Human Resources • Course text, Table 1-2 Purpose Producing and delivering products and services Selling the organization’s products and services Managing the organization’s financial assets and records Attracting, developing and maintaining the organization’s labor force Each business function is a collection of business processes ... CS 330 Spring 2019 307 Business Processes Examples of Business Processes (BP) Course text, Table 2-1 CS 330 Spring 2019 308 Business Processes Business Processes and Functions • A business process might run across several business functions, e.g. order fulfillment Course text, Figure 2-1 CS 330 Spring 2019 309 Business Processes How does an Information System (IS) fit in? Since we are taking a sociotechnical approach, consider 1. Technical aspect (i.e. can we?) - First understand how the existing business process works. - Which parts of the business process steps be automated? - Can we modify the existing process to enable more automation? - What changes need to be made? 2. Behavioral aspect (i.e. should we?) - What is its impact on people and the organization? - What is the impact on the organization’s structure and culture. CS 330 Spring 2019 310 Business Processes How does an Information System (IS) fit in? • Once process is automated, ask what BI can we obtain? • Use the information to enhance business processes through (partial or full) automation. Ask… - Can we increase the efficiency of existing processes? - Can we enable a new product or service that can transform the business? - Can we enforce policy and regulation better? • e.g. - Which parts of a hiring process can be automated? - Which parts of an order fulfillment process can be automated? CS 330 Spring 2019 311 Class Exercise: Managing Travel Expenses A Business Process Example • Travelling is expensive. • Processing expense claims adds to the cost. • On average, it takes $48 US to process one claim. • What work needs to be done to process a claim? 1. manager approval 2. confirm budget 3. collect receipts 4. itemize 5. manager approval 6. expense clerk approval 7. notify payroll 8. get money via direct deposit CS 330 Spring 2019 312 Class Exercise: Managing Travel Expenses BP Automation • What part of the process can be automated? - pretty well every step, to a certain extent • What changes need to be made? - Possibly scanning in receipts - Use credit card companies that itemize receipts for us - Linking up systems - Create software CS 330 Spring 2019 313 Class Exercise: Managing Travel Expenses Business Intelligence (BI) Once a process is automated, what BI can we obtain? 1. Collect data What kind of business data can you obtain? 2. Extract information to support decision making What kind of information can you extract and you would like to extract? 3. Derive business intelligence (information technology to help make better decisions) How can you use this information to improve performance? CS 330 Spring 2019 314 Class Exercise: Managing Travel Expenses Deriving Business Intelligence (BI) • After automating the processing of a traveling claim, you observe the following facts: - Lots of business travels to California during the winter time - Lots of taxi fare from and to Pearson Airport - Lots of international calls during business trips • How would this information help you improve your business? CS 330 Spring 2019 315 Class Exercise: Managing Travel Expenses Deriving Information • Now focus on efficiency or enabling new business processes - Can the BP be improved by this technology? ▪ Negotiate better rates, e.g. taxis - Will this new technology bring new product/service? ▪ Someone to negotiate prices CS 330 Spring 2019 316 Types of Information Systems Approach for the Next Few Slides • For each of these business functions (from 307-308) A. Sales and Marketing B. Manufacturing and Production C. Finance and Accounting D. Human Resources • Ask the following questions 1) Which business processes can be automated? 2) What data can be gathered? 3) What information can help improve business? CS 330 Spring 2019 317 Types of Information Systems A. Sales and Marketing Systems 1) Which business processes can be automated? - Ordering process, order fulfillment, order inquiry, advertising and promotion etc. 2) What data can be gather? - Individual orders: who ordered what, when, where, when was it filled, was there is any issues - Customer data: name, contact info, purchases, returns CS 330 Spring 2019 318 Types of Information Systems A. Sales and Marketing Systems 3) What information can help improve business? - Purchase habits: who likes what, when and where sales trends: what is popular, when and where efficiency of fulfillment, return rates - Promotion strategy, production schedule, inventory level etc. • Terminology: called a Customer Relationship Management (CRM) system - Provides: Customer support, sales, marketing CS 330 Spring 2019 319 Types of Information Systems A. Sales and Marketing Systems • A point of sales system (partially) automates the in-store checkout process • It often produces standard sales-related (sales performance and sales trends) and customer-related information (like customer base) • Can you think of other useful information it might produce? - Say, employee-related information … CS 330 Spring 2019 320 Types of Information Systems B. Manufacturing and Production Systems 1) Which business processes can be automated? - Making of individual parts, assembling, testing, stocking/shelving etc. 2) What data can be gather? - What is produced, when, where, how many, whether there are any issues CS 330 Spring 2019 321 Types of Information Systems B. Manufacturing and Production Systems 3) What information can help improve business? - Efficiency of the production process, defective rate, production/schedule status - suppliers of defective parts, sources of error in process • Terminology: called a Supply Chain Management (SCM) system - Linked with suppliers and ensure materials and parts are available when needed CS 330 Spring 2019 322 Types of Information Systems B. Manufacturing and Production Systems • Based on the data shown in the above systems, can you answer the following questions: - If there is something wrong with an item sold to a customer, can you trace where is it sold? - Who else bought the same product? - Can you link it to a particular shipment? - Can you find out which sales representative handled the transaction? CS 330 Spring 2019 323 Types of Information Systems C. Finance and Accounting Systems 1) Which business processes can be automated? - Everything to do with money 2) What data can be gather? Just a sample... - Accounts receivable (A/R): payment received by the company - Accounts payable(A/P): bills owed by the company - Billing: produces invoices for clients/customers - Purchase Order: records company’s orders of inventory CS 330 Spring 2019 324 Types of Information Systems C. Finance and Accounting Systems - Sales Order: records customer’s orders - Cash Book: records collections and payments 3) What information can help improve the business? - match purchase order, goods receipt, pay invoice - match customer order, invoice, receive payment - Cash flow, financial status of the firm … • Terminology: called an Accounting Information System (AIS) - Collects, stores and processing accounting information CS 330 Spring 2019 325 Types of Information Systems D. Human Resource System 1) Which business processes can be automated? - automatic deposits (pay cheque), tax forms, pay slips, 2) What data can be gather? - payroll - time and attendance - performance appraisal - benefits administration - recruiting - Learning Management (CPR, hazardous materials) CS 330 Spring 2019 326 Types of Information Systems D. Human Resource System 3) What information can help improve the business? - high employee turnover in a certain area - high absenteeism - cost of overtime vs. hiring more employees - trouble filling certain positions • Terminology: called a Human Resources Management System (HRMS) or a Human Resources Information System (HRIS). CS 330 Spring 2019 327 Types of Information Systems The IS Challenge • Can you name a BP that cannot be automated or has absolutely nothing to with IT? - Can you foresee it might be automated in the future? • Can you think of a job that has absolutely nothing to with IT? - Can you foresee it might be automated in the future? • This avenue of thinking can lead to new business opportunities! CS 330 Spring 2019 328 Types of Management Three Levels of Management • For the previous dozen slides we looked at different business functions and their IS needs. • Now we will look at different levels of management and their IS needs • Senior Management: concerned with - long range strategic (of great importance) decisions - financial success of company as a whole • Middle Management: concerned with implementing the plans of the senior management • Operational Management: concerned with monitoring day-today activities of the company CS 330 Spring 2019 329 Types of Management Different IS Needs CS 330 Spring 2019 330 Types of Management Different IS Needs Tables from 4th edition of course text. CS 330 Spring 2019 331 Types of Management Different IS Needs • Which level of management would be most interested at the following questions? - Is an order filled properly? - Percentage of orders filled properly? - What is the status of an order? - How many tons of candy should we stock for Halloween? - How to promote our Halloween party package? - Should we open a branch in Guelph? - Should we be concerned about 5G? CS 330 Spring 2019 332 IS for Different Management Levels Different IS Needs • Different levels of management use different IS systems • Senior Management - Executive Support System (ESS) • Middle Management - Management Information System (MIS) - Decision Support System (DSS) • Operational Management - Transaction Processing System (TPS) • Let’s go from the bottom up. CS 330 Spring 2019 333 Types of Management Course text, 4th ed Figure 1-6 ESS DSS MIS TSP Different Levels of Management Have Different Concerns CS 330 Spring 2019 334 IS for Different Management Levels Transaction Processing System (TPS) • Automates business processes • Records routine transactions necessary to conduct day-to-day business • E.g. process sales order, fulfillment, billing • Allow frontline workers and managers to monitor status of operations and relations with external environment CS 330 Spring 2019 335 IS for Different Management Levels Management Information System (MIS) • Provides routine reports on department’s current performance to middle management • confusing, same term used to refer to whole course • Based on data from TPS • Summarizes TPS data • Typically have little analytic capability • E.g. sales and marketing summaries, actual vs. predicted sales of items by region CS 330 Spring 2019 336 IS for Different Management Levels Decision Support System (DSS) • Supports non-routine decision making by middle management - Example: What is impact on production schedule if December sales doubled? • Often uses external information as well as information from the TPS and MIS • E.g. create a statistical model of how sales relates to other factors, e.g. vacations abroad and the value of the Canadian dollar, car sales and the price of gasoline CS 330 Spring 2019 337 IS for Different Management Levels Decision Support System (DSS) • Example in book, DSS uses information about - ship speed and capacity - port distance - fuel consumption, fuel cost - cost to hire crew for that ship - expense to dock at port • to create competitive bids on transporting good by ship IS for Different Management Levels Executive Support System (ESS) • Support non-routine decisions requiring judgment, evaluation, and insight by senior management • Specialize version of DSS • Graphical displays, friendly user interface • ability to drill down to info from MIS and DSS • e.g.: ESS that provides minute-to-minute view of firm’s financial performance as measured by working capital, accounts receivable, accounts payable, cash flow, and inventory CS 330 Spring 2019 339 IS for Different Management Levels System Relationships Figure from 4th edition of course text CS 330 Spring 2019 340 Topic 6 –Organizations and IS Key Concepts • The behavioural view of organizations • The impact of IS on organizations • Two Ways of Creating a Competitive Advantage 1. Porter’s Competitive Forces Model 2. The Value Chain Model References • course text, Chapter 3.1 – 3.3 Information Systems, Organizations, and Strategy CS 330 Spring 2019 341 Overview of Organizations Motivation 1. In a high tech company, other than the senior executives, which position/job pays the most? - technical sales: i.e. people who have both technical knowledge and the ability to influence other people 2. What is office politics? - the strategies people use to gain advantage in the workplace 3. Has anyone ever observed office politics taking place? CS 330 Spring 2019 342 Overview of Organizations Office Politics Examples 1. In a group meeting, your boss tells a joke. You’ve heard it before and don’t think it is funny. Do you laugh at the joke anyway? Why or why not? 2. Your department head is proposing a project which you think is a doomed to fail. What do you do? CS 330 Spring 2019 343 Overview of Organizations Office Politics Skills What are some strategies for succeeding in an organization other than (or in addition to) excellent technical skills? • Give and receive feedback in an effective manner • Be unconditionally cooperative • Develop good communications skills • Develop good interpersonal skills • Don’t pass on gossip • Seek advice from knowledgeable people • Consult with the people who will be affected by a decision you are making CS 330 Spring 2019 344 Overview of Organizations Quick Review • Recall (from slide 299, Topic 4 Management Information Systems) that this course is taking a Sociotechnical Approach, i.e. - optimal organizational performance is achieved by jointly optimizing both the social and technical systems used in production (pg 24) - both behavioural and technical aspects need to be considered (pg 24) • Why? Because organizations have both of these components. CS 330 Spring 2019 345 Overview of Organizations Technical Microeconomic View • The technical view: An organization is stable, formal social structure that uses capital and labour from the environment as input and processes them to produce products and services (course text pg 66). course text, Figure 3-2 CS 330 Spring 2019 346 Overview of Organizations Behavioural View The behavioural view of an organization looks at the structures and processes within the organization. Environmental Resources Environmental Outputs course text, Figure 3-2 CS 330 Spring 2019 347 Features of Organizations Behavioural View In order to introduce an IS into an organization you would have to take the following into account... • Routines and Business Processes: organizations become very efficient over time because they develop routines (or standard operating procedures) to deal with (almost all) situations • Organizational Politics: people with different positions and backgrounds will have different points of view and will struggle for limited company resources. - Many will resist change they do not agree with. CS 330 Spring 2019 348 Features of Organizations Behavioural View • Organizational Culture: the unquestioned assumptions that organizations make about their goals and products. - Anything that challenges these assumptions will be met with resistance. • Organizational Environment: government (i.e. regulations), competitors, customers, financial institutions, culture, technology, knowledge. - IS can help identify changes that the company should respond to. CS 330 Spring 2019 349 Features of Organizations Behavioural View • Organizational Structure: different organizational structures would have different ISs, e.g. a entrepreneurial structure (simple flat structure) might have a single IS whereas professional bureaucracy (many independent departments, such as UW) may have several independent systems. • Other Organizational Features - democratic vs. authoritarian leadership - benefit stock holders (for profit) vs. benefit society (nonprofit) CS 330 Spring 2019 350 Impact of IS on Organizations IS Reduces the Cost of Information • IS helps reduce transactional costs - i.e. the costs associated with an organization buying a product or service - e.g. the cost of communicating with suppliers, obtaining information about products, monitoring contract compliance • IS helps reduce agency costs - i.e. the costs associated with managing agents (employees) so that they will act in the interests of the company rather than in their own self-interest CS 330 Spring 2019 351 Impact of IS on Organizations IS Reduces the Cost of Information • IT flattens organizations - management more efficient ⇒ need less of them - lower levels have easier access to relevant information • IT innovations cause resistance because it affects - the organizational structure - the job tasks - the people - the information technology • The most common reason for IT innovation failure is the organization’s resistance to change CS 330 Spring 2019 352 Competitive Advantage Recall Case Study: IT in Walmart • Walmart is the leader in retail sales, largely due to the fact that it is also among the leaders of utilizing information technology • They have a competitive advantage, i.e. they use commonly available resources more efficiently. • How can a company create a competitive advantage? Answer: we will consider two models ... 1. Porter’s Competitive Forces Model 2. The Business Value Chain Model CS 330 Spring 2019 353 Competitive Advantage #1: Porter’s Competitive Forces Model The strategies a firm use are determined by five factors. course text, Figure 3-8 CS 330 Spring 2019 354 Competitive Advantage #1: Porter’s Competitive Forces Model • Traditional competitors try to attract your customers. • New market entrants are more likely when the cost of entry is low. • If your prices get too high customers may seek substitute products. • The power of customers increases if they can easily switch to a competitors products, if prices are transparent, and products are undifferentiated. • The more suppliers the company has for an item, the more control it has over prices. CS 330 Spring 2019 355 Competitive Advantage #1: Four Basic Competitive Strategies Use information systems to... • decrease costs (e.g. Walmart) or increase quality (e.g. smart phones in general) • differentiate products and enable new products and services (e.g. Apple) • to focus on a market niche, i.e. specialize (e.g. high end hotels). • develop strong ties with suppliers (e.g. Chrysler) or customers (e.g. Amazon, Chapters). CS 330 Spring 2019 356 Competitive Advantage Case Study: UWaterloo • Which of the competitive forces in Porter’s Model is the biggest threat to UWaterloo? Why? • Recently, Maclean’s ranked UW on top in Best Overall, Most Innovative, and Leaders of Tomorrow. • However, UW only got an average in terms of student’s experience with their education. Based on that, which of the competitive forces in Porter’s Model is the biggest threat to UWaterloo? • How does MIS help? Which IS strategy would you recommend? CS 330 Spring 2019 357 Competitive Advantage #2: The Business Value Chain Model Course text, Figure 3-9 CS 330 Spring 2019 358 Competitive Advantage #2: The Business Value Chain Mode • Identifies where information systems are particularly helpful in creating a competitive advantage • Two broad areas to consider - primary activities: directly related to creating the product or service - support activities: makes the primary activities possible CS 330 Spring 2019 359 Competitive Advantage #2: The Business Value Chain Mode • Primary Activities (directly related to creating the product or service) include - automated warehouse systems - computer controlled manufacturing - computerized ordering systems - equipment maintenance systems - automated shipping CS 330 Spring 2019 360 Competitive Advantage #2: The Business Value Chain Mode • Support Activities (makes the primary activities possible ) include - electronic scheduling and messaging systems - workforce planning systems - computerize-aided design (CAD) systems - computerized ordering systems CS 330 Spring 2019 361 Topic 7 – Social, Ethical, and Legal Issues Key Concepts • The social, ethical and legal issues raised by information technology. • Ethical principles that may help us make decisions to deal with these issues. References • course text, Chapter 4 Social, Ethical, and Legal Issues in the Digital Firm CS 330 Spring 2019 362 Moral Dimensions of Information Age Technology Trends • Computing Power Increases ⇒ more dependence on computers • Storage Costs Decreasing ⇒ cheaper to store information about individuals • Big Data Techniques ⇒ can develop (mostly accurate) profiles of individuals • Growth of Internet ⇒ easy to access and copy personal data • Growth of Mobile Phone Usage ⇒ location may be tracked without user knowledge or consent - e.g. turn off location and you can still be tracked CS 330 Spring 2019 363 Moral Dimensions of Information Age Implications The rise of of computers and the Internet has raised five areas of ethical, social and political concern 1. Personal information rights and obligations e.g. what rights do we have to protect ourselves from others tracking our personal information 2. Digital property rights and obligations e.g. music/video/software piracy 3. Data and system quality e.g. is the data about me correct, secure CS 330 Spring 2019 364 Moral Dimensions of Information Age Implications 4. Accountability, liability and control e.g. who is held accountable for any harm done when customer’s data is stolen 5. Quality of life e.g. maintaining boundaries between work and home life Example: Privacy and Social Networks produced by the Office of the Privacy Commissioner of Canada https://www.youtube.com/watch?v=X7gWEgHeXcA CS 330 Spring 2019 365 Cautionary Tales Loss of Control • For information posted on the web or send through email, you generally have no control of ... - how it is used: prank, ridicule, spam, identify theft - how it is interpreted: humourous vs. insulting, intentional vs. accidental • Often websites are able to use the text, pictures, or videos you post for whatever purposed they want to use it for. CS 330 Spring 2019 366 Moral Dimensions of Information Age Ethical, Social, Legal Aspects • Ethical: principles of right and wrong that individuals use to make choices to guide their behaviors • Social: affecting people and communication, i.e. etiquette, expectations, social responsibility (acting for the benefit of society), changing social institutions (family, education, organizations) • Legal/Political: knowing the law and working within its limits, i.e. changing old laws, creating new laws, and understanding existing laws CS 330 Spring 2019 367 Ethics Key Concepts • Responsibility: accepting the potential costs, duties, and obligations for decisions • Accountability: provide mechanisms to identify who is responsible • Liability: laws exist that permits individuals to recover damages done to them • Due process: laws are well known and understood, can appeal to a higher authority to ensure that the laws are applied correctly CS 330 Spring 2019 368 Cautionary Tales Use of Company Computers • Generally companies own the information on their computers, tablets and cell phones • In the past, UW has allowed police access to the email of UW students caught running a meth lab • Police have obtained IP addresses of company computers used to “anonymously” create malicious posts about someone else. They then approached the company and asked who used that computer. • Exception, if the company allows you to use a company laptop for personal use. CS 330 Spring 2019 369 Ethics Ethical Principles • Golden Rule: Do unto others as you would have them do unto you. • Kant’s Categorical Imperative: If an action is not right for everyone to take, then it is not right for anyone. • Descartes’ Rule of Change: If an action cannot be taken repeatedly, then it is not right to be taken at any time (e.g. using $1 worth of office supplies for personal use). • Utilitarian Principle: Take the action that achieves the higher or greatest value for all concerned. • Risk Aversion Principle: Take the action that produces the least harm or incurs the least cost to all concerned. CS 330 Spring 2019 370 Ethics An Ethical Decision To what extent should companies monitor their employees at work? Monitor everything? Monitor nothing? Is there middle ground? CS 330 Spring 2019 371 Concern 1: Personal Information Privacy • Privacy is the claim of individuals to be left alone, free from surveillance or interference from other individuals, organizations, or the state. • In Canada we have the Personal Information Protection and Electronic Documents Act (PIPEDA) • It establishes principles for the collection, use, and disclosure of personal information. • Organizations need informed consent to collect and use customer data. • Our law is more strict than the US, less strict than Europe. CS 330 Spring 2019 372 Concern 1: Personal Information What is it? According to PIPEDA personal information (PI) includes • demographics: age, income, ethnic origin, religion, marital status • internet: e-mail, e-mail address, IP address • physical: age, height, weight, medical records, blood type, fingerprints • financial: purchases, spending habits, banking information, credit/debit card data, loan or credit reports, tax returns, Social Insurance Number source: https://www.priv.gc.ca/information/pub/guide_ind_e.asp CS 330 Spring 2019 373 Concern 1: Personal Information How your PI is Protected? PIPEDA’s Principles for the Treatment of PI • Accountability: appoint someone to be responsible • Consent: inform you of the purpose of collecting that info • Limiting use: only use it for purposes you consent to • Safeguards: your PI must be protected • Individual access: you have the right to access your PI source: https://www.priv.gc.ca/information/pub/guide_ind_e.asp CS 330 Spring 2019 374 Concern 1: Personal Information How your PI is Protected? PIPEDA’s Principles for the Treatment of PI • Identifying purposes: the reason for collecting your PI must be identified • Limiting collection: only gather information that is necessary • Accuracy: should keep your info accurate • Openness: privacy policy should be easy to find and understand • Recourse: you should be provided with a complaint procedure source: https://www.priv.gc.ca/information/pub/guide_ind_e.asp CS 330 Spring 2019 375 Concern 1: Personal Information Concerns • Terms of service are often all-or-nothing, if you use the website or app you must agree to give up your privacy. • Often companies will provide your PI to “affiliates” or “trusted partners” ⇒ Who are they? • Often companies say they keep information needed for business purposes ⇒ What PI and what purposes? • Often companies can keep your PI for as long as they want i.e. your PI has dual ownership e.g. Facebook. https://www.youtube.com/watch?v=Gb29Rcycjv0 CS 330 Spring 2019 376 Concern 1: Personal Information Internet Challenges to Privacy • cookies - a website stores a unique bit of data (like an account number) on your device - think of the cookie as a primary key identifying you in their database - use this data to track your activity on the site • third party cookies - companies like Facebook, Google, Amazon, track your activity across many websites, not just their own - even if you do not have a Facebook account, Facebook tracks you - use this technology to get a more complete picture of you CS 330 Spring 2019 377 Concern 1: Personal Information Internet Challenges to Privacy – Cookies source: course text, Figure 4-3 CS 330 Spring 2019 378 Concern 1: Personal Information Internet Challenges to Privacy • web beacons – websites can tell that you’ve viewed a certain item, say an ad in your email - typically a small picture the same colour as the background (so you don’t see it) - could be on a website or in an email • spyware – software that tracks where you have surfed, typically spotted by virus protection programs • each smartphone has a unique International Mobile (Station) Equipment Identity (IMEI) associated with them (try dialing *#06#) that tracks that device and can be used to blacklist a phone in case of theft CS 330 Spring 2019 379 Concern 1: Personal Information Internet Challenges to Privacy • browser fingerprinting - Each computer/cell phone has many ▪ settings (e.g. has “do not track” activated) ▪ hardware specs (what are my screen dimension) - The combination of these properties that browsers can report makes each cell phone/laptop rare (or unique), e.g. https://panopticlick.eff.org/ https://amiunique.org/ - This rareness provides a way for companies to track you even if many other tracking methods have been blocked. - Currently Firefox has the best support to limit this approach. CS 330 Spring 2019 380 Concern 1: Personal Information What Information is Collected • Your posts, post you started but then deleted • Sites visited, posts read, videos viewed • Searches, location, relationships • Items you have purchased, items you have looked at • Images from computer camera • Sounds overheard by personal assistant (e.g. Alexa) • Health, medical data and financial data • E.g. Facebook and Google https://gizmodo.com/all-the-ways-facebook-tracks-you-that-you-might-notkno-1795604150 https://www.nytimes.com/interactive/2019/07/10/opinion/google-privacypolicy.html CS 330 Spring 2019 381 Concern 1: Personal Information How is that Information Used? • Advertising product and services you may be interested in • Tailoring the content that you see - suggesting articles / videos you may be interested in - limitation: the echo-chamber effect • Hiring decisions • Insurance coverage/premiums • Preferential offers/pricing/etc. • Identifying security risks • Solving crimes • Combined with other information … CS 330 Spring 2019 382 Concern 1: Personal Information How is that Information Used? NORA: nonobvious relationship awareness NORA combines info from various sources (telephone listings, lists of customers) to create a more detailed profile of each person. E.g. https://www.youtube.com/watch?v=V7M_FOhXXKM CS 330 Spring 2019 383 Concern 1: Personal Information Strategies Chris is concerned about privacy so deletes the browser’s cookies once a week. What are the limitations of this strategy? • If at some point Chris had logged into website XYZ with the previous cookies and then logs in again to the same account with the new cookies... then website XYZ can match up (i.e. associate) both sets of cookies to the same person and continue to accumulate information about Chris. • Can also use fingerprinting to tentatively associate the old and the new version of the cookie. CS 330 Spring 2019 384 Concern 1: Personal Information Strategies Is there a better strategy that is not too much of a hassle? • Use two browsers, one for day-to-day access (of accounts you have to sign in to) and one for more private access. • Delete any cookies from the one you use for private access quite frequency and don’t log into any accounts with it. • Consider using a computer in a lab or library for certain types of searches. - Do not log into any of your regular websites here. - Generally, identity can be tracked to the organization but not to you personally but the police could still discover your identity by contacting the university / library. CS 330 Spring 2019 385 Concern 1: Personal Information Viewing your Cookies in Chrome • While browsing, click on the lock (on the LHS of the address bar) to view the cookies that site is using. • To see all your cookies go to chrome://settings/siteData - Click on the triangle to see the cookies for each site. - Click on the chevron (i.e. ‘v’) to view a particular cookie. Viewing your Cookies in Firefox • While browsing, click on the ‘i’ (on the LHS of the address bar) to view the cookies that site is using. • To see all your cookies, go to about:preferences#privacy • In the Cookies and Site Data section select Manage Data… CS 330 Spring 2019 386 Concern 2: Digital Property Rights What is IP • Intellectual property (IP) is intangible property (a recipe, a song, an invention, software) created by individuals or corporations • Depending on what it is, it can be protected by one of the following legal traditions: a) Trade secret b) Copyright c) Patent CS 330 Spring 2019 387 Concern 2: Digital Property Rights Trade Secret • A trade secret is intellectual work or product belonging to a business, provided it is not in the public domain, that confers economic advantage, and reasonable attempts have been made to keep it secret. - e.g. recipe for Coke or the layout of a chemical plant • the risk is that there is a breach of confidentiality - e.g. publishing the recipe for Coke • most End User License Agreements (EULAs) prohibit the reverse engineering of a computer program CS 330 Spring 2019 388 Concern 2: Digital Property Rights Copyright • A copyright protects original literary, musical, artistic, dramatic works and computer software. • Prohibits copying of entire work or parts for at least 50 years • Copyrighting the look and feel of a device is still a murky issue - Apple v. Microsoft (1994): look and feel of Mac OS vs. MS Windows 2.0 - Apple v. Samsung: (2011): look and feel of smart phones and tablets CS 330 Spring 2019 389 Concern 2: Digital Property Rights Patent • A patent grants the owner an exclusive monopoly on the ideas behind an invention for between 17 and 20 years • intended to promote innovation by protecting investments made to commercialize inventions • originality, novelty, and invention are key concepts • can offer protection in all 160 counties that are members of the World Trade Organization (WTO) • cannot patent software in Canada, can in the US CS 330 Spring 2019 390 Concern 2: Digital Property Rights Challenges to IP Rights • the internet has made it easy to copy and distribute intellectual property • perfect digital copies cost almost nothing • sharing of digital content over the Internet costs almost nothing • a web page may present data from many sources • sites and software for file sharing are hard to regulate CS 330 Spring 2019 391 Concern 2: Digital Property Rights Canada’s Response • the Copyright Modernization Act (2011) • cannot circumvent digital locks • time shifting, format shifting, and backup copies are OK as long as there are no digital locks • fair use provisions for education, satire, parodies • damages for non-commercial infringement (i.e. illegally downloading music and videos) limited to between $100 $5000 CS 330 Spring 2019 392 Concern 2: Digital Property Rights Canada’s Response • includes a notice-and-notice provision (in effect as of Jan 1, 2015) • copyright holders notify ISP about infringement • ISP notifies customer (without revealing customer’s identity to copyright holder) • copyright holder still has to get a court order for an ISP to reveal a customer’s identity http://www.theglobeandmail.com/technology/digital-culture/canadiandownloaders-should-expect-a-copyright-notice-in-the-mail/article22336673/ CS 330 Spring 2019 393 Concern 3: Data Quality and System Errors The Issue • No large program is error-free: errors exist with a low probability • it is impossible to test every combination of inputs • software producers knowingly ship products with bugs • the number of bugs can reach a steady state: in the process of fixing existing bugs, new bugs are created • the largest source of error is poor data quality rather than faulty hardware or software CS 330 Spring 2019 394 Concern 3: Data Quality and System Errors Example: Design Flaw to Cost Intel $1 Billion • In 2011, Intel temporarily halted shipments of a new chip platform due to a design flaw that may cause 5% of chips to fail over the next three to five years. • It's estimated the move will cost Intel $1 billion. • Costs includes having to fix nearly half a million desktop and laptops already out there. source: New York Times http://www.nytimes.com/2011/02/01/technology/01chip.html CS 330 Spring 2019 395 Concern 4: Accountability and Liability Software Company’s Liability • software is typically licensed not sold • most End User License Agreements (EULAs) limit liability • in law, publishers of books and magazines are not legally liable for their content, to allow for freedom of expression • when software acts more like a book (an information provider) the producer is not liable • when software acts more like a machine controller (a service provider) the producer can be held liable CS 330 Spring 2019 396 Concern 5. Quality of Life IS Have Negative Social Costs • Blurring work-home boundaries employees are expected to do more work at home with company laptops and cell phones • Centralized control structure companies such as Google, Facebook, Amazon and Microsoft dominate the collection of personal information • Rapidity of change because of globalization, companies must respond very quickly to any changes in the environment CS 330 Spring 2019 397 Concern 5. Quality of Life IS Have Negative Social Costs • Dependency on IS many companies are vulnerable to any failure in their IS, yet these systems are not regulated • Cybercrime whole new areas of crime have opened up and institutions have been slow to respond: e.g. malware infection, phishing fraud, hardware theft, attacks by botnets • Job Loss • Repetitive Stress Injury / Carpal Tunnel Syndrome CS 330 Spring 2019 398 Topic 8 – Security Key Concepts • Secure Communication • The Problem • Common Malware • Computer Security • Tools for Protecting IS • Wireless Security • Securing Your System • Security and Control Framework References • course text, Chapter 8, Securing Information Systems CS 330 Spring 2019 399 Secure Communication Basic Idea • encryption: render a message unreadable so anyone seeing it will not be able to determine the original message • decryption: retrieve the original message • The strength of an encryption depends on the number of possible keys ⇒ it takes longer to try all possible keys • e.g. pick a key of length one, add 3 to each letter plain text: meet me after the toga party key: 3333 33 33333 333 3333 33333 cypher text: phhw ph diwhu wkh wrjd sduwb • ‘e’ and ‘t’ are common in the plain text, ‘h’ and ‘w’ are common in the cypher text. CS 330 Spring 2019 400 Secure Communication Basic Idea • The number of possible keys is a function of the length of the key, e.g. a longer key means more possible key values. • E.g. pick a key of length five, say 3, 6, 5, 2, 4. • Add 3, 6, 5, 2, 4 respectively to each sequence of five letters plain text: meet me after the toga party key: 3652 43 65243 652 4365 24365 cypher text: pkjv qh gkviu zmg xrmf revzd • The longer key makes it harder to use statistics to find out which letters correspond to ‘e’ or ‘t’. • Called symmetric key encryption: the same key is used to encrypt and decrypt the message. CS 330 Spring 2019 401 Secure Communication Brute Force Search • To use brute force search means to try every possible key to find the actual key • the difficulty grows exponentially with key size Key Size (bits) CS 330 Number of Possible Keys Time required at 1012 attempts/sec 32 232 = 4.3 x 109 2.15 milliseconds 56 256 = 7.2 x 1016 10 hours 128 2128 = 3.4 x 1038 5.4 x 1018 years 168 2168 = 3.7 x 1050 5.9 x 1030 years Spring 2019 402 Secure Communication Computationally Secure • computationally secure: an encryption method is computationally secure if it will take the attacker a very long time to crack the message using the best existing technology • what is secure today may not be secure years from now - implication of Moore’s Law - novel methods, e.g. quantum computation • for an example of a method that is computationally secure consider secure hashing… CS 330 Spring 2019 403 Secure Communication Secure Hashing Example: SHA256 • A hash function is a computer function that maps input of any size onto an output of a fixed size. • Secure Hash Algorithms (SHA) are a family of hashing functions. • SHA256 maps any message to a 32 byte (256 bit) number - i.e. there are 2256 ≈ 1.16 x 1077 different output values • Change the input even slightly and the hash value (i.e. the output) changes considerably. • Given a value x, it is computationally hard to come up with a message m such SHA256(m) = x (typically you’d use brute force) • For an online SHA256 calculator see https://www.tools4noobs.com/online_php_functions/sha256/ CS 330 Spring 2019 404 Secure Communication Key Distribution Problem • Recall that in symmetric key encryption both parties must know the key. • How do both parties get the symmetric key when you want to buy something from a web site for the first time? • How do you and the web site agree on a key? • This challenge is called the key distribution problem • The solution that is currently used is called public key encryption... CS 330 Spring 2019 405 Secure Communication Public Key Encryption • Idea: use a pair of keys: a public key and a private key • The two keys are mathematically related so that when you encrypt with either one, the only way to decrypt (other than brute force) is using the other one. • It is generally used to exchange a shared key or a digital signature, rather than a whole message. • For an example of how encryption and decryption is done see https://www.cemc.uwaterloo.ca/resources/real-world/RSA.pdf CS 330 Spring 2019 406 Secure Communication Digital Signature • Goal: to show that the message came from the sender rather than an imposter (it is authentic) and has not been tampered with (it has data integrity). • A digital signature uses a hash function to convert the message m into a number. Call the hash function, h( ). • E.g. for the message “buy apple stock”, associate a number with each letter and add them up (mod 1000). m =b u y a p p l e s t o c k h(m) = 2+21+24 + 1+16+16+12+5 + 19+20+15 +3+11 = 165 CS 330 Spring 2019 407 Secure Communication Digital Signature • The sender and receiver agree on a hash function, e.g. SHA256 • If the sender wants to send message ms - calculate the hash function of the message, h(ms) - encrypt h(ms) with the sender’s private key encrypt(h(ms)) - send ms and encrypt(h(ms)) • Receiver - receives mr and calculates the hash function of mr, i.e. h(mr) - decrypts encrypt(h(ms)) using the sender’s public key and checks to see if it equals h(mr) - when encrypt(h(ms)) is decrypted using the senders public key, if it equals mr then it means that mr came from the sender CS 330 Spring 2019 408 Secure Communication Digital Signature • the message is not secret • these steps only guarantee that the message - came from the sender and - has not been tampered with • only the sender could encrypt h(ms) with the sender’s private key • anyone can decrypt it with the sender’s public key and verify that it did come from sender. • But how do you find out the sender’s public key in a reliable way ? ⇒ need a certificate authority CS 330 Spring 2019 409 Secure Communication Digital Signature image source: http://en.wikipedia.org/wiki/Digital_signature CS 330 Spring 2019 410 Secure Communication Certificates • The certificate has the digital signature of a known Certificate Authority (CA). • These are a small number of trusted organizations. • A list of them and their public keys are included with a browser. • The browser can verify - the legitimacy of the digital signature, - hence the legitimacy of the certificate, - hence the public key of the certificate holder. • https is based on using CA’s and certificates ... CS 330 Spring 2019 411 Secure Communication Secure Browsing – Part 1 • Say Pat wants to buy a book from Amazon for the first time. • When Amazon first started, it created a pair of keys, one public and one private. • It submitted the public key to a Certificate Authority (CA), say DigiCert, to get a certificate. • The CA verifies that this is the public key of Amazon offline (e.g. through the mail). • Once verified, the CA then creates a certificate for Amazon (digitally signed by the CA). • The certificate contains information about Amazon and its public key. CS 330 Spring 2019 412 Secure Communication Secure Browsing – Part 2 • When Pat signs up for an account, Amazon presents its certificate to Pat’s browser. • The process to verify the certificate is done by the browser. • The browser verifies that the certificate has been signed by a recognized CA (checked using that CA’s public key). • If the certificate is valid then its contents (which includes the public key of Amazon) are also valid. • The browser then extracts Amazon’s public key from the certificate and can now send Amazon an encrypted message that only Amazon can decrypt. CS 330 Spring 2019 413 Secure Communication Secure Browsing – Part 3 • The browser then randomly generates a symmetric key and encrypts it using Amazon’s public key and sends it back to Amazon. • Since it is encrypted with Amazon’s public key, it can only be decrypted by Amazon’s private key. • Amazon decrypts the key. • Now Pat and Amazon share a symmetric key and all subsequent conversation can be encrypted using this key. CS 330 Spring 2019 414 Secure Communication Secure Browsing – Summary DigiCert Inc Amazon Pat Image source: course text, Figure 8-7 CS 330 Spring 2019 415 The Problem One Source of Problems - People • People are careless and make mistakes • People can be tricked (recall social engineering, slide 36-37) into divulging confidential information • E.g. IT professionals are discouraged from having LinkedIn accounts. Why? - If Chris’s LinkedIn profile says he works in the IT Dept of XYZ Inc., then hackers will send e-mails to employees of XYZ pretending to be Chris asking them to click a link, download a file, or reveal some confidential information CS 330 Spring 2019 416 The Problem Source of Data Breaches (in 2011) – Part 1 Stolen laptop Fraud or scam Document found in trash or unattended Stolen computer Snail mail exposed or intercepted Stolen document Lost media found Lost document found Lost computer drive found Stolen computer drive 7% 10% 7% 6% 5% 3% 3% 3% 2% 2% source: http://www.scientificamerican.com/article/data-breach-howthieves-steal-your-identity-and-information/ CS 330 Spring 2019 417 The Problem Another Source of Problems: Bugs • Any complex piece of hardware or software contains bugs • A computer processor (billions of transistors) or an operating system (100 million lines of code) are very complex • For even a moderately complex enterprise system there are many points of vulnerability ... CS 330 Spring 2019 418 The Problem Another Source of Problems: Bugs Some possible points include ... source: course text, Figure 8-1 CS 330 Spring 2019 419 The Problem Source of Data Breaches (in 2011) – Part 2 Email exposed or intercepted Virus Hacked computer or server Scraped from the Web 4% 2% 16% 12% source: http://www.scientificamerican.com/article/data-breach-howthieves-steal-your-identity-and-information/ • note: web scraping is when a computer program rather than a person surfs the web • Sometimes companies are pressured to create backdoors (secret ways) for governments to access private data CS 330 Spring 2019 420 The Problem Another Source of Problems • In 2013 Edward Snowden revealed that the NSA could breach many security protocols • These included - Encrypted chat - Encrypted VoIP (Voice over IP) - VPN (Virtual Private Network) - SSH (Secure Shell) - HTTPS (Hypertext Transfer Protocol using SSL, where SSL means Secure Sockets Layer): developed by the predecessor of Firefox to implement secure browsing CS 330 Spring 2019 421 The Problem How big is the problem? • In June 2014 McAfee estimated that the global cost of cybercrime was between $375 billion and $500 billion per year. Activity Car Crashes Narcotics Cost as a % of GDP 1.0% 0.9% Cybercrime 0.8% source: http://www.mcafee.com/ca/resources/reports/rp-economicimpact-cybercrime2.pdf CS 330 Spring 2019 422 Classes of Threats Common Malware Malware: malicious software, i.e. software designed to cause damage to or loss of control of a computer or a computer network. We will look at nine common types of malware. • Computer virus: software that attaches to other programs or data in order to be executed, - copy itself from file to file - can harm data, programs, machines, the network or open a backdoor to hackers CS 330 Spring 2019 423 Classes of Threats Common Malware • Worm: similar to viruses but run on their own (i.e. do not need to attach to other programs) - can cause the same damage as a virus - uses a computer network to spread - e.g. many computers come with a default password, a worm might try to remotely log on to other computers using the default names and passwords for a variety of operating systems • Trojan horse: a software program that appears to be benign, but then does something unexpected behind the scenes - the user has to launch them - they cannot replicate on their own CS 330 Spring 2019 424 Classes of Threats Common Malware • Trojan horse: (continued) - can cause the same damage as a virus - e.g. Android app that supplies weather reports could also allow a hacker to download any files on that phone • Phishing: an email or text message that 1. pretends to come from a trusted authority 2. asks for confidential information e.g. please log into your account to verify some information CS 330 Spring 2019 425 Classes of Threats Common Malware source: http://en.wikipedia.org/wiki/Phishing CS 330 Spring 2019 426 Classes of Threats Common Malware • Denial of Service Attack - many computers overwhelm a website requesting service in an attempt to block others from using the website - no data is lost, only potential business is lost • Sniffing: eavesdropping on network communication in order to obtain propriety information, i.e. email, confidential reports, company files, etc. • Spam: junk email (usually sent in bulk), less of it now - there are laws against spam - Gmail, Hotmail, Outlook have excellent spam filters CS 330 Spring 2019 427 Classes of Threats Common Malware • Botnet: a collection of computers (usually ones that have been compromised) that are used together for a common purpose (i.e. a robot network) such as a denial of service attack. - The largest botnet that has been found and removed so far controlled over 12 million computers - it has been estimated as much as 10% of computers around the world may be part of one or another botnet • Ransomware: software that threatens to publish the victim’s files or prevents the victim from accessing their files unless a ransom is paid (usually in Bitcoin so the victim cannot trace the person they paid). CS 330 Spring 2019 428 Computer Security Definition • Computer security is the policies, procedures and technical measures used to prevent unauthorized access, alteration, theft, interruption or physical damage to information systems • There is more to computer security than just password protection and encryption ... CS 330 Spring 2019 429 Computer Security What Services are Needed? A customer wants to order at item online. What might be some concerns? The customer ... • is who he says he is (i.e. authentication) • can only access certain parts of the system (i.e. access control) • cannot view another customer’s order (i.e. data confidentiality) • cannot modify another customer’s data (i.e. access integrity) CS 330 Spring 2019 430 Computer Security What Services are Needed? A customer wants to order at item online. What might be some concerns? The customer ... • can place an order if so desired (i.e. availability) • keeps his word after placing the order (i.e. non-repudiation). There are two types of repudiation 1) the sender denies sending the data 2) the receiver denies receiving the data CS 330 Spring 2019 431 Computer Security Six Security Service Definitions • Authentication: assurance that the communicating entity is the one claimed • Access Control: prevention of the unauthorized use of a resource • Data Confidentiality: protection of data from unauthorized disclosure • Data Integrity: assurance that data received is as sent by an authorized entity • Non-Repudiation: protection against denial by one of the parties in a communication • Availability: assurance that services are available when needed CS 330 Spring 2019 432 Computer Security Which Service? Captain Jack Sparrow redecorates the Black Pearl and wants to open it up to the public with these prices - for $10, a tourist can visit the 1st deck, - for $20, a tourist can visit the whole ship. Question: Which of the following security services can be implemented to enforce these rules? Authentication, Access Control, Data Confidentiality, Data Integrity, Availability, Non-repudiation Answer: access control CS 330 Spring 2019 433 Computer Security Which Service? Captain Jack Sparrow decides to auction off the Black Pearl on eBay but is not sure if the website that he logs into is in fact eBay. Question: Which of the following security services could be implemented to ease his anxiety? Authentication, Access Control, Data Confidentiality, Data Integrity, Availability, Non-repudiation Answer: Authentication CS 330 Spring 2019 434 Computer Security Which Service? Captain Jack Sparrow receives an email from the Smurfs offering $10 million to buy the Black Pearl. Jack thinks this is a sweet deal but he is afraid that the Smurfs might back down later on. Question: What security service can be used to prevent the Smurfs from denying they send the e-mail? Authentication, Access Control, Data Confidentiality, Data Integrity, Availability, Non-repudiation Answer: Non-repudiation CS 330 Spring 2019 435 Computer Security Which Service? Captain Jack Sparrow wants to make an announcement that he sold his ship and officially retires from piracy. Question: What security service (or services) can be used to ensure the public the message is genuine? Authentication, Access Control, Data Confidentiality, Data Integrity, Availability, Non-repudiation Answer: Authentication and Data Integrity CS 330 Spring 2019 436 Tools for Protecting IS Access Control • Passwords - security professionals prefer long, mixed case, alphanumeric combinations that are not words in any language - people prefer short, lowercase, meaningful words • Two factor authentication: token / smart card / phone app - a second physical device which is often used in conjunction with a password • Biometrics - fingerprint, retinal image, face CS 330 Spring 2019 437 Tools for Protecting IS 25 Most Commonly Hacked Passwords password 123456 qwerty abc123 1234567 letmein dragon baseball iloveyou master ashley bailey shadow 123123 superman qazwsx football 12345678 monkey trustno1 111111 sunshine passw0rd 654321 michael source: http://www.theglobeandmail.com/news/technology/tech-news/top25-most-hacked-passwords-revealed/article2244739/ CS 330 Spring 2019 438 Tools for Protecting IS How do Password Crackers Work • Try common passwords, i.e. previous slide. • Try common passwords with a suffix of 2 or 3 characters. • Try dictionary words, with variations in capitalization or spelling (like ‘$’ for ‘s’, ‘1’ for ‘l’, @ for a). • Try combinations of 2 or 3 dictionary words. • To target a specific person, gather info about them (from the web) e.g. names (of partner, children, pets) favourites (sports, food, musicians, actors) and use these instead of common passwords in the strategy above. CS 330 Spring 2019 439 Tools for Protecting IS How to Foil Them • Best Method: use a password manager. These programs pick a different random sequence of characters for each web site. • Alternative Method: convert a phrase meaningful only to you - don’t pick a pet’s name, e.g. Bailey or b@i1ey - do pick a phrase that describes the pet ▪ I was 14 when Bailey arrived. ▪ Convert it to a password, typically by picking the first letter for words, keeping capitalization, punctuation and numbers. ▪ Iw14wBa. CS 330 Spring 2019 440 Tools for Protecting IS Firewalls • Mentioned back on slide 239. • Both Mac OS X and Windows have had software firewalls for the last 10+ years. • Many cable and DSL modems have hardware firewalls built in. Intrusion Detection Systems • Looks for unusual patterns, e.g. - Chris normally works weekdays 8:30 am–4:30 pm and typically only logs into his desktop computer, email and Learn. - Why is he trying to remotely log into every other computer on the network at 3 am on a Saturday? CS 330 Spring 2019 441 Tools for Protecting IS Antivirus software • Avast and AVG (among others) are free • Windows Defender (part of Windows 10) and File Quarantine (part of Mac OS) are supplied for free with their respective OS. • These programs look for bit patterns in programs, called a signature to recognize known viruses, worms, Trojan horses • They can also look for “unusual behaviour” to detect new ones - e.g. a program accessing the internet a lot. • Downsides of antivirus software - can slow the launch or running of programs a bit - can slow the opening of a file or the mounting of a USB thumb drive CS 330 Spring 2019 442 Wireless Security Setting Up Wi-Fi • Setting up the most secure Wi-Fi connection possible can involving knowing about a lot of acronyms. • All you have to understand these abbreviations: 802.11a, 802.11b, 802.11g, 802.11n, 802.11ac, WEP, WEP-40, WEP-104, WPA, WPA-personal, WPA/PSK, WPA2, WPA2-personal, WPA2/PSK, WPA2-Entreprise, WPA3, TKIP, AES, WPS, EAP, LEAP, PEAP + approximately 100 other authentication protocols! • As time goes on ... there will be more! CS 330 Spring 2019 443 Wireless Security Three Parameters: #1 Speed • Wi-Fi comes in a variety of bandwidths. • The newer versions have the fastest bandwidths. • • 802.11b 802.11a/802.11g 802.11n 802.11ac 11Mb/s 54Mb/s 300Mb/s 1200Mb/s slowest fastest the newest is 802.11ac generally there is some backward compatibility - n generally works with g and b - ac generally works with all the rest CS 330 Spring 2019 444 Wireless Security Three Parameters: #2 Security • Wi-Fi comes with a variety of security protocols. • The newer versions are the most secure. • • • • WEP WPA WPA2/AES WPA3 least secure most secure WEP (Wired Equivalent Privacy) can be easily cracked WPA (Wireless Protected Access) - was a temporary replacement to WEP WPA2/AES is newer WPA3 is the newest (2018) CS 330 Spring 2019 445 Wireless Security Three Parameters: #3 Authentication Authentication – Personal • There are methods specifically for home or small companies. • • WPS least secure WPS is Wi-Fi Protected Setup PSK is Pre-Shared Key Personal or PSK most secure Authentication – Enterprise • you typically have no choice, just do what the company or university tells you to do • e.g. Eduroam at UW CS 330 Spring 2019 446 Wireless Security The Fundamental Problem • ISPs want to make the default set-up something that practically everyone has (i.e. the oldest) WEP or WPA • but you want the most secure option that is available. The Solution • Check each device to see if it supports the most recent (currently WPA3). - If some don’t, then make it as secure as you can. - Typically Wi-Fi routers can be set to try WPA3 first (then WPA2/AES and then fall back to WPA). CS 330 Spring 2019 447 System Security Software Vulnerability • Recall from slide 418-419, that software usually contains bugs. • Bugs can create security vulnerabilities, opening up the system to intruders. • Eliminating all bugs is not technically or economically possible with large programs. • Vendors release small pieces of software (called patches or updates) to repair significant flaws. • Many programs now, by default, automatically download and install updates. If not, set it up so these programs get updated. • Caution: the discovery of bugs outpaces the ability of even big companies to fix them all. CS 330 Spring 2019 448 Securing Your System Barest Minimum • Use strong passwords (slides 437 – 440). • Use antivirus / malware protection (slide 442). • Activate automatic updates for OS, browser, and anything else that uses the internet (slide 448). Best Practices • Isolate and encrypt sensitive data. • Minimize your attack surface: the different places in your system where a hacker can try to add or extract data. CS 330 Spring 2019 449 Securing Your System Isolate and Encrypt Sensitive Data • The NSA cannot (currently) crack AES-256 encrypted documents. • macOS, Linux and Windows 10 Professional all have the ability to encrypt hard drives. - but Windows Home edition does not have this feature. • Use AES-256 based encryption software, e.g. Veracrypt for Windows / OS X / Linux to encrypt your drives. • Use AES-256 based flash drives, e.g. Kingston Data Traveler Vault Privacy. • Have a separate user account on your computer for your banking and financial activities and files. CS 330 Spring 2019 450 Securing Your System Minimize Your Attack Surface • Use WPA3 (or WPA2 + AES) for Wi-Fi (slide 447) • Configure the firewall in your OS and your modem/router (slide 441) - Google the terms configure or setup + firewall + your OS or your modem/router manufacturer and model ▪ e.g. 1: configure firewall macOS ▪ e.g. 2: setup firewall Windows 10 ▪ e.g. 3: configure firewall 2wire 2701 CS 330 Spring 2019 451 Securing Your System Minimize Your Attack Surface • When not in use, disconnect from the internet, i.e. turn off Wi-Fi (on computer) or turn off modem - e.g. when you are sleeping, at school, at work • More advanced: remove unnecessary browser plug-ins, remove unnecessary software, don’t run unnecessary services, modify unnecessary default features. source: https://www.us-cert.gov/sites/default/files/publications/TenWaystoImproveNewComputerSecurity.pdf CS 330 Spring 2019 452 Security and Control Framework Business Value of Security and Control • Inadequate security and control can result in lost of business and may create serious legal liabilities. • Businesses must protect the information assets of - their own company, their own employees - their customers, and their business partners. • Failure to do so can lead to costly litigation for data exposure or theft. • A sound security and control framework that protects business information assets can thus produce a high return on investment. CS 330 Spring 2019 453 Security and Control Framework Legal and Regulatory Requirements • Canada: recall slides 372-375, Personal Information Protection and Electronic Documents Act (PIPEDA) - It establishes principles for the collection, use, disclosure and safeguarding of personal information. • Canada: companies must be able to respond to legal requests for electronic documents relevant to a civil case (a discovery request). • Ontario: Canadian version of the Sarbanes-Oxley Act (C-SOX) - Internal controls must be put in place to govern the accuracy of information in financial statements (similar to how they do in the US with the Sarbanes-Oxley Act). - Other provinces have done the same. CS 330 Spring 2019 454 Security and Control Framework Tool #1: Risk Assessment • To do a risk assessment is to determine the level of risk to the firm for various classes of risks e.g. - Type of risk: power failure - Probability of occurrence in a year: 30% - Loss Range (low, average, high) = ($5k, $100k, $200k) - Expected Annual Loss = 0.3 × $100,000 = $30,000. - Conclusion: spending $20,000 on backup system is a reasonable expense. CS 330 Spring 2019 455 Security and Control Framework Tool #2: Security Policy • A security policy identifies - main risks (say power failures), - goals (maximum a downtime of 3 minutes per year), - mechanisms to achieve these goals (uninterruptible power supplies + diesel generator backup). Tool #3: Acceptable Use Policy • Acceptable Use Policy (AUP) states the acceptable uses and users of information and computers, - e.g. privacy, user responsibility, personal use of devices, access rules for different employees - technical measures used to enforce the policies CS 330 Spring 2019 456 Security and Control Framework Tool #3 continued: Sample Access Rule for an HR Clerk • This document identifies the information employees have access to and the type of access (read-only vs. update) based on their role in the organization. source: course text, Figure 8-3 CS 330 Spring 2019 457 Security and Control Framework Tool #4: Disaster Recovery Planning • Getting IT systems up and running after a disruption - e.g. back-up files and maintain back-up systems. Tool #5: Business Continuity Planning • Getting the business up and running after a disaster - safeguarding people as well as machines. • Identify and document critical business processes - not relying on people who may be unavailable. • Create action plans for these processes. • Line up offsite resources, e.g. the cloud. CS 330 Spring 2019 458 Security and Control Framework Tool #6: Security Auditing • A security audit investigates if the current security and control framework is adequate. • Create a comprehensive assessment of a company’s computer security policies, procedures, technical measures, personnel, training, documentation - may even simulate an attack. • The risk assessment is done before security implementation while auditing is done after its implementation and repeated from time to time. CS 330 Spring 2019 459 Security and Control Framework Tool #6 continued: A Sample Audit CS 330 Spring 2019 course text, Figure 8-4 460 Security and Control Framework Bottom Line • Many companies assume that a disaster too improbable and so security and control is not worth the investment in time and money. • Lack of knowledge or lack of motivation are the greatest causes of computer security breaches. CS 330 Spring 2019 461 Topic 9 – Managing Knowledge Key Concepts • why is knowledge management needed • knowledge and wisdom • explicit and tacit knowledge • implementing a KM system References • course text, Chapter 11.1 Managing Knowledge CS 330 Spring 2019 462 Why Knowledge Management The Knowledge Economy For several decades the world's best-known forecasters of societal change have predicted the emergence of a new economy in which brainpower, not machine power, is the critical resource. But the future has already turned into the present, and the era of knowledge has arrived. The Learning Organization by Economist Intelligence Unit and IBM (1996) • • Note the year of the quote: 1996 Most of you have lived your entire lives in the era of the knowledge economy. CS 330 Spring 2019 463 Why Knowledge Management The Increasing Demand for Knowledge Workers 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 1900 1910 1920 1930 farmworkers service managerial & admin. 1940 1950 1960 labourers & operators clerical prof. & tech. 1970 1980 1990 2000 crafts sales Source: T.A. Stewart, Intellectual capital. (1997) CS 330 Spring 2019 464 Why Knowledge Management Cost of Mismanagement Each year, poor documentation and communications cost the Canadian economy more than $50 billion. Peter Richardson, Coping with the Crisis in the office: Canada’s $50 Billion Challenge Loss of System Knowledge • What is the impact on an organization when people leave? • Do they leave with years of knowledge? CS 330 Spring 2019 465 Why Knowledge Management The Exponential Growth of Digital Media • Recall (from slide 86) that one of our drivers of technology was the fact that the amount of data stored is roughly doubling every year. • Most of it is stored digitally. • Printed documents only account for 0.003% of information growth. CS 330 Spring 2019 466 Why Knowledge Management The Challenge We need better way to ... • manage the data and information randomly floating inside an organization • extract, store, and share the knowledge stored inside the minds of the employees • harness the external data and information freely floating around an organization CS 330 Spring 2019 467 Knowledge Management (KM) Key Concepts Recall (from slide 280) we distinguished • Data: raw facts (course text, pg. 13) e.g. a list of items scanned at a supermarket checkout • Information: data shaped into a form that is meaningful ... to human beings (course text, pg. 13) e.g. which items are selling well and which aren’t • now we will add ... CS 330 Spring 2019 468 Knowledge Management (KM) Key Concepts • Knowledge: to discover patterns, rules and contexts where the information is useful (course text, pg. 342) - e.g. customers are more likely to buy an item that is at eye level on a grocery store shelf • Wisdom: when, where and how to apply knowledge to get a solution to a problem (course text, pg. 343) - e.g. how to maximize the amount of money you make per square foot in a grocery store CS 330 Spring 2019 469 Knowledge Management (KM) Two Types of Knowledge 20% is Explicit Knowledge • Knowledge that has been documented somewhere • reports, policies, manuals, emails • formal or codified • databases • books, magazines, journals CS 330 80% is Tacit Knowledge • What employees know that has not been documented • knowledge held in the minds of the employees • informal and uncodified • values, perspectives and culture • memories of staff, suppliers and vendors Spring 2019 470 Knowledge Management (KM) What is KM? • Knowledge management is the task of acquiring, storing, disseminating, and applying an organization's explicit and tacit knowledge to meet mission objectives. • The objective of KM is to - connect those who know to those who need to know - leverage knowledge transfer from one to many - know‐how, know‐why and know‐who CS 330 Spring 2019 471 Knowledge Management (KM) What is KM’s Role? • KM is one of the fastest growing areas of software investment in companies. • Knowledge is a source of wealth for an organization, just like labor, land, or financial capital. • The key challenge of the knowledge‐based economy is to foster innovation. • A substantial part of a companies stock value is related to its intangible assets. • These intangible, intellectual assets that must be properly managed. CS 330 Spring 2019 472 Knowledge Management (KM) Which Areas of KM? (US Data) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Capture & share best practices Corporate learning strategies Customer Relationship Mgmt Competitive intelligence Source: J. Milan, KM: A revolution waiting for IR (2001) Paper presented at the 41st Annual AIR Forum. CS 330 Spring 2019 473 Implementing a KM System Stage #1: Create a Knowledge Network • Develop a sharing environment - Mentor program, virtual team, expert panel, seminars and conferences, communities of practice • Use collaboration tools to encourage information sharing - Shared drives , Wikis and Blogs, Groupware like SharePoint, Creation of FAQs • Ideally, everything you do, say and know is properly documented and stored in digital form • Challenges ... CS 330 Spring 2019 474 Implementing a KM System Typical Concern: Resistance to Sharing • The more I share, the less valuable I am to the company and others • If you are a ... - student, would you share your studying strategies? - professor, would you share your lectures? - a machine operator, would you share your knowledge of operations? - a stock broker, would you share trade information? • But if you a reputation in the company for being knowledgeable and helpful, your job is safer. CS 330 Spring 2019 475 Implementing a KM System Stage #2: Implement a Search Engine • Provide relevant information to decision making using a text based search engine. - Internal sources: everything stored in digital form: e-mails, internal online forum, meeting minutes, reports, memos, database systems - External sources: everything publicly available on the Internet - Search engine: a program that decides what information is relevant, e.g. https://cloud.google.com/products/search/ CS 330 Spring 2019 476 Implementing a KM System Typical Concern: Relevancy • Relevancy, from a human standpoint, is: - user-dependent ▪ depends upon a specific user’s judgment; ▪ situational, relates to user’s current needs - time dependent ▪ changes over time - geographically dependent ▪ an approach that works in one part of the country will not work in another part ▪ municipal and provincial laws may be different CS 330 Spring 2019 477 Implementing a KM System Stage #3: Build an Intelligent System • The ultimate goal of knowledge management • Build on the search engine with the addition of an inference engine or machine learning - system is capable of making suggestions or computing solutions, e.g. automated medical diagnosis - might use a neural net to detect suspicious (possibly fraudulent) credit card transactions or suspicious tax returns That is the end of the official course material! CS 330 Spring 2019 478 Final Exam and Final Thoughts Preparing for the Final • Stay tuned to Piazza for an official post with details about - The final exam details: format, excluded material, weighting of material from 1st and 2nd half etc. - The post will be up by Aug 4th - I will create a single file that contains all the slides. - I will have extra office hours for the 3 business days before the exam (9th, 12th, 13th). • I hope this course has helped you to become more informed users of computer technology and better able to use it in a business environment. • Good luck on the final! CS 330 Spring 2019 479