From Physics to .com

advertisement
Next Generation Information Systems
Avi Silberschatz
Department of Computer Science
Yale University
URL: www.cs.yale.edu/~avi
1
The Digital Age
 Digital information forms the glue for blending the fields of computing,
communication and entertainment.
 At the center of this revolution is data that is stored, accessed and
delivered in digital format. Some of the major issues surrounding this
type of data are:

Data is to be available to the users anytime and anywhere and
with the desired QoS.

Data access must adhere to privacy and security policies.

Data Interoperability.

Fast access to data, which implies support for queries with
approximate answers.

Data analysis and mining capabilities over very large datasets.
 Many of the advances in information systems are due to
development of new technologies. These advances, in turn, are
pushing the developments of even newer technologies.
Next Generation Information Systems
2
Silberschatz
Research Challenges
 Storage retrieval and delivery of multimedia data

Storage System Issues

QoS issues of continuous media data (e.g., video and audio)
 Approximate answers

useful for very large data sets

useful for Web searching
 Data mining

Discovering “interesting” patterns in very large data sets

Discovering “interesting” patterns from incomplete information
 Data Interoperability
 Privacy and security
 Next generation Networks

Converged networks

Network Management
Next Generation Information Systems
3
Silberschatz
Multimedia Data
 Regular Data

text, binary, image
 Database Data

tuples, objects
 Continuous Media Data


Video Data

The display (playback) of the data must be continuous with a
fixed rate, which is typically 30 frames/second.

A viewer may wish to control the way the data is to be displayed
by applying various VCR-type operations to the video data.
Audio Data

The playback must be continuous with a fixed rate, which is
dependent on the sample rate.

A listener may wish to control the way the data is played back.
Next Generation Information Systems
4
Silberschatz
Storage System Issues

Rapid growth in storage capacity demand
 world-wide installed storage:
 738 PetaByte in 2000
 over 75% per year storage capacity increase over the next 5 years
 reaches ZettaByte in 2009
 data stored at Global 2500 companies double every 18 months
 data stored at e-commerce companies grow at 400% a year
 Management
 40-50% of company IT budget is spent on storage
 fraction of IT budget spent on storage is expected to grow



cost for storage management exceeds cost of storage equipment
 management: $300 per GB per year
 low-end storage: $14 - $50 per GB (packaged, powered,
networked)
management cost is expected to grow
Storage Requirement
 24 x 7
 Disaster recover
Next Generation Information Systems
5
Silberschatz
Storage is Moving Into the Network
 Motivation

Use commodity IP based networks
 IT staff know-how
 Distance and universal access
 Applications

Disaster recovery
 Archiving
 Backups

Content Distribution
 Managed storage
 Value added storage services
 Consolidation of storage
Next Generation Information Systems
6
Silberschatz
IP-Based Network Storage
 Storage is managed
Client site #1
possible by different
domains
LAN
 Storage devices are
Client site #2
connected over networking
infrastructure
LAN
Metro/WAN
file
server
LAN
SAN
Next Generation Information Systems
LAN
file
servers
7
SAN
Silberschatz
file
servers
IP-based Network Storage (Cont.)
 IETF standards are being drafted

Most popular: iSCSI and FCIP
 Almost all networking and storage companies are participating in these
standards
 Issues

Performance
 Reliability
 Future
 end-to-end iSCSI;
end-to-end IP storage networking?
 demise of FC?
 Hybrid?
 FC (InfiniBand) SAN islands connected over IP networks


FC SANs in data centers accessed by IP networks
Next Generation Information Systems
8
Silberschatz
Network Storage Security
 Customers may not trust the storage service provider (SSP)
 Storage consolidation over different customers is essential to make
storage outsourcing viable. However, customers may not trust each
other
 Threat model

Disclosure of data to an eavesdropper intercepting communication

Disclosure of data to storage service provider (SSP) and to other
customers of the SSP

Manipulation of communication by an attacker

Manipulation of data by the SSP or other customers of the SSP
 Challenges

high throughput encryption (e.g., 1Gbps, 10 Gbps)

security without hindering performance
Next Generation Information Systems
9
Silberschatz
Multimedia Storage and Delivery Issues
 The size of some databases is enormous, especially those that
are used for data mining (e.g., cash register transactions).
30 terabytes largest commercial database
 Some information sources generate data at an astonishing rate
(e.g., satellite images).
EOS – 1-2 terabytes per day
 The BBC is planning to digitize the last 50 years of programming.
 Continuous media data is voluminous:

100 minute MPEG-1 video requires 1.125GB

100 minute HDTV video requires 15GB
 Continuous media data require support for QoS.
Next Generation Information Systems
10
Silberschatz
System Resources to be Managed for QoS
Storage Server Resources
Tertiary
Storage
I/O Bus
Secondary
Storage
I/O Bus
Buffer Space
Processor(s)
Network
Next Generation Information Systems
11
Silberschatz
Research Issues
 Admission control
 Disk Scheduling
 Buffer Management
 Storage Management

data layout
 varying disk transfer rates
 disk striping
 meta data

fault-tolerance
 Tertiary storage
Next Generation Information Systems
12
Silberschatz
Cycle-based Scheduling
 Let T be the length of a service cycle
 Maintain a queue of requests
R1 , R2 . . . Rn . Each Ri
corresponding to a request to view a CM clip. Each
an associated rate ri.
request has
 For each request, a buffer is allocated of size 2 T  ri .
 Requests in the queue are served in a cyclic order using double
buffering. In each cycle I:

get data from disk to buffer (I mod 2)

transfer data from the (I + 1 mod 2) buffer to the client
Next Generation Information Systems
13
Silberschatz
Disk Scheduling

Request are serviced in service cycles (rounds).

In the beginning of a service cycle requests are ordered in
C-SCAN order.

In the beginning of every service cycle, it is ensured that

 2  T  ri
B
 T  ri

 
 t rot  t settle   2  t seek  T
 rdisk

hold. (where trot , t settle , t seek are the rotational delay, settle time, and seek time,
respectively, and B is the buffer pool size).

The value of T is adjusted depending on the workload.

In every service cycle,
min T  ri , 2  T  ri  offset of last retrieved - offset of last consumed 
bits of data retrieved for each request.
Next Generation Information Systems
14
Silberschatz

Admissions Control
Queue is bounded by an admission control scheme
 For each request, the service time for a request is estimated.
 A request is admitted only if the sum of the estimated service times for
all admitted requests does not exceed the duration of service cycle T.
Next Generation Information Systems
15
Silberschatz
Admission Control (cont.)
 Reserve a fraction of service cycle T, say
  T (0    1) for continuous
media requests.
 A request (real-time, non-real-time), is admitted if
 T  ri

 ni


 t rot  t settle    
 t rot  t settle   2  t seek  T
 rdisk

 rdisk

 A real-time request is admitted if
 T  ri

 t rot  t settle   2  t seek    T
 rdisk


 Above scheme ensures

both continuous and non-continuous media requests are allocated time
during a service cycle.
 any time during a service cycle unused by continuous media requests is
allocated to non-continuous media requests.
Next Generation Information Systems
16
Silberschatz
Length of T
What about the length of T?
Next Generation Information Systems
17
Silberschatz
Buffer Space Constraints
 Let B be the available buffer size
 Let N be the number of admitted clients
 Assume infinite disk bandwidth
 Requirements:
N
 2  T  ri  B
i 1
N
T
For a given buffer size B, the larger T, the fewer clients can be admitted.
Next Generation Information Systems
18
Silberschatz
Disk Bandwidth Constraints
 Assume infinite buffer space
 Use C-SCAN disk scheduling
 Requirements:
T  ri
T
i 1 r
disk
N
2  t settle  N  (trot  t settle )  
N
T
The larger T the larger N is
Next Generation Information Systems
19
Silberschatz
Combining Disk & Buffer Constraints
N
disk constraint
buffer constraint
T
The optimal T is obtained by solving a quadratic equation of the disk
and buffer space constraints.
Next Generation Information Systems
20
Silberschatz
Minimizing Response Time
 Under some workloads (e.g., request with small
ri 's such as 64
Kbps), the value of T that maximizes throughput can be high
(e.g., 20 secs.).
 This might yield high response times.
 Solution:

maintain small T values

in order not to degrade throughput, for each request Ri data
is prefetched from disk in every ki service cycles (instead of
in every service cycle)

The maximum amount of data prefetched is ki  T  ri

buffer space allocated to Ri is  ki  1  T  ri
Next Generation Information Systems
21
Silberschatz
Minimizing Response Time (contd.)
 Issues:

Calculation of ki’s

Admission control:
 k1 , k2 ,..., kn 
 lcm
service cycles to manage
For a request Ri, finding the least loaded service cycles
lcm k1 , k2 ,... kn 
ui  ki  l , 0  l 
1
ki
 In order to reduce response time, start a new request Ri in
the first possible service cycle and then move it
incrementally to the selected least loaded service cycle.

 This solution also provides higher throughput for workloads with
small ri’s
Next Generation Information Systems
22
Silberschatz
Querying Huge Data Sets
 Give me all objects (e.g., images) that look like this.
 If we are dealing with PetaBytes of data, this may take days or
weeks.
 One solution is to capture “meta data” information about the
stored objects as the objects are stored in the database.

Querying is done against the “meta data”.

Major issue – nature of the meta data.
 Another solution is to provide support for “approximate
answers”.
Next Generation Information Systems
23
Silberschatz
Providing Approximate Answers
 Traditional databases provide exact answers to queries, but...
 In massive data environments, can take minutes to hours due to
disk I/Os
 In distributed environments, data may be remote or currently
unavailable
 In real-time environments, even single I/O may be too slow
Next Generation Information Systems
24
Silberschatz
Providing Approximate Answers (Cont.)
 Trade-off accuracy for performance: e.g., 30 minutes for exact
answer vs. 3 seconds for an approximate answer with 5% error
 Examples where fast approximate answers are preferred:

drill-down query sequence in data mining: searching for the
“interesting” queries

tentative answer when base data unavailable

leading digits suffice (e.g., 3.5 million vs. 3.512 million)
 Can proceed to the exact answer, if desired
Next Generation Information Systems
25
Silberschatz
The AQUA System
Approximate Query Engine for data warehousing
(Fast) Query on
the Aqua synopses
(Slow) Query on
the warehouse data
SQL
Query Q
SQL
Query Q’
Network
Browser
Excel
DBMS
for
Large Data
Warehouse
Result
(w/ error bounds)
HTML
XML
Aqua
synopses

Aqua precomputes and maintains small synopses of the data

Aqua provides approximate answers with accuracy guarantees, by
rewriting user queries as depicted above
Next Generation Information Systems
26
Silberschatz
Aqua Synopses: The Key Ingredient
 (Small) Surrogate for the actual data.
 Must accurately estimate the exact answers from the synopses.
 As data is updated, must keep synopses up-to-date.
We developed new techniques for summarizing data,
and for adapting these summaries to changes in
both the data and the query mix.
First system to provide fast, highly-accurate approximate answers
for a broad class of queries arising in data warehousing scenarios
Next Generation Information Systems
27
Silberschatz
Private, Public, and Sensitive Information in a Wired World
 Private information

Only the data subject has a right to it.
 Public information

Everyone has a right to it.
 Sensitive information

“Legitimate users” have a right to it.

It can harm data subjects, data owners, or data users if it is
misused.
Next Generation Information Systems
28
Silberschatz
Erosion of Privacy

“You have zero privacy. Get over it.” – Scott McNealy, 1999
 Changes in technology are making privacy harder.


increased use of computers and networks

reduced cost for data storage

increased ability to process large amounts of data
Becoming more critical as public awareness, potential misuse, and
conflicting goals increase.
Next Generation Information Systems
29
Silberschatz
“Public Records” in the Internet Age
 Depending on State and Federal law, “public records” can include:

Birth, death, marriage, and divorce records

Court documents and arrest warrants (including those who were acquitted)

Property ownership and tax-compliance records

Driver’s license information

Occupational certification
They are, by definition, “open to inspection by any person.”
 Traditionally: Many public records were “practically obscure.”

Stored at the local level on hard-to-search media, e.g., paper, microfiche, or
offline computer disks.

Not often accurately and usefully indexed.
Now: More and more public records, especially Federal records, are being put on
public web pages in standard, searchable formats.
 Issues

Should some Internet-accessible public records be only conditionally
accessible?

Should data subjects have more control?

Should data collectors be legally obligated to correct mistakes?
Next Generation Information Systems
30
Silberschatz
Examples of Sensitive Information
 Copyright works
 Certain financial information
 Health Information
 Question: Should some information now in “public records” be
reclassified as “sensitive”?
Next Generation Information Systems
31
Silberschatz
State of Technology
 We have the ability (if not always the will) to prevent improper
access to private information. Encryption is very helpful here.
 We have little or no ability to prevent improper use of sensitive
information. Encryption is less helpful here.
Next Generation Information Systems
32
Silberschatz
The PORTIA Project
 PORTIA: Privacy, Obligations, and Rights in Technology of Information
Assessment
 Large ITR grant from NSF. It is five-year multi-institutional, multi-
disciplinary, multi-modal research project on end-to-end handling of
sensitive information in a wired world
 Researchers from:

Stanford: Dan Boneh, Hector Garcia-Molina, John Mitchell, Rajeev Motwani

Yale: Joan Feigenbaum, Ravi Kennan, Avi Silberschatz

University of NM: Stephanie Forrest

Stevens Institute: Rebecca Wright

NYU: Helen Nissenbaum

Plus participation by software industry, key user communities,
advocacy organizations, and non-CS academics.
 http://crypto.stanford.edu/portia
Next Generation Information Systems
33
Silberschatz
PORTIA Goals
 Produce a next generation of technology for handling sensitive
information that is qualitatively better than the current generation’s.
 Enable end-to-end handling of sensitive information over the course of
its lifetime.
 Formulate an effective conceptual framework for policy making and
philosophical inquiry into the rights and responsibilities of data
subjects, data owners, and data users.
Next Generation Information Systems
34
Silberschatz
Five Major Research Themes
 Privacy-preserving data mining and privacy-preserving surveillance
 Database policy enforcement tools
 Sensitive data in P2P systems
 Policy-enforcement tools for database systems
 Identity theft and identity privacy
Next Generation Information Systems
35
Silberschatz
Privacy and Security on the Web
An increasing number of web sites require user registration, which enables
personalized services. This however, raises some concerns.
 Privacy concerns: providing the same user name
(or e-mail) allows creation of comprehensive dossiers; providing your email address reveals your true identity
 Security concerns: using the same user name and password at
multiple web sites enables password from insecure sites to be used to
help determine password at secure sites
 Junk e-mail: giving your e-mail address makes you susceptible to junk
e-mail
 Inconvenience: people have to invent and remember multiple user
names and passwords
Next Generation Information Systems
36
Silberschatz
The LPWA system
A tool for combining privacy, security and convenience . Enables personalized
services by generating consistent, untraceable aliases for use on the web.
quote.com
axyz, x45t
LPWA
Czar, 4rt5
my.yahoo.com
Boss, 56yh
Arun Netravali
expedia
Next Generation Information Systems
37
Silberschatz
The LPWA Proxy
Properties
 Privacy: web sites cannot collude to create dossiers
 Security: different passwords for different web sites
 Convenience: no need to remember multiple user names and
passwords
 Alias e-mail addresses support communication from web sites back
to users and allow control of junk e-mail
Next Generation Information Systems
38
Silberschatz
Generation of Aliases
 At the first invocation of the LPWA proxy

User provides:

user’s e-mail address id

a secret S (random string)
 Registering

User types \u, \p, \@ for username, password and e-mail
address, resp.

LPWA uses id , S , and the domain-name of the web-site
being visited to compute the users’ alias
 Repeat Visits

User again types \u and \p for username and password

LPWA computes the same alias-username/password.
Next Generation Information Systems
39
Silberschatz
Network System Challenges
 Next-generation network -- will be simpler, lower cost, and will provide
customized services for consumers and businesses
 Converged networks -- will incorporate the best features of today’s
voice and data networks
 Network management – automate many of the functions that are
currently done by people.
Next Generation Information Systems
40
Silberschatz
Next-generation networks
Yesterday’s Networks
NM
Next-Generation Networks
NM
NM
NM
5E
5E
5E
Service
Layer
Local
ISP
CLEC
5E
Video
Data
Voice
PSTN
ADM
ADM
Electronic
Layer
ADM
ADM
ADM
ADM
DCS
DCS
ADM
ADM
ADM
ADM
ADM
ADM
ADM
ADM
DCS
ADM
DCS
ADM
Optical
Layer

Point-to-point optical links


Circuit switched, centrally managed


Separate networks for voice, data, video


Fixed, closed
Next Generation Information Systems

41
All-Optical mesh backbone
Packet switched, distributed
Unified network for customized multimedia
services
Open APIs for ISV services
Silberschatz
Next generation converged networks
NEXT
GENERATION
NETWORK
Data
Network
Next Generation Information Systems
Converged
Applications
42
Voice
Network
Silberschatz
Network Management Challenges
 Managing today’s networks is extremely challenging due to their increased
complexity

Networks contain hundreds of network elements and thousands of
physical links

Network elements follow a multitude of protocols (e.g., BGP, OSPF,
ISIS, RIP)

Networks are heterogeneous and contain equipment from multiple
different vendors
 Manually managing networks

is tedious, labor-intensive, time-consuming and error-prone

is not cost-effective due to severe shortages of and high costs of
skilled labor
 Critical need for software tools that automate network management tasks
Next Generation Information Systems
43
Silberschatz
Next-Generation Network Management
 Next-Generation network management software functionality includes

Keeping track of network inventory and topology

Monitoring network link bandwidth and latency

Storing, analyzing and reporting network performance data

Load balancing by appropriately configuring network parameters

Automating and simplifying network configuration tasks (e.g., VPNs)
 Value Proposition:

Ease management and configuration of ISP networks

Optimize utilization of network resources
 Goal: Make networks self-administering and self-tuning
Next Generation Information Systems
44
Silberschatz
There are many approaches to predicting the future
 I think there is a world market
for maybe five computers.
(Thomas Watson, 1943)
 Video won’t be able to hold onto
any market it captures after the
first six months. People will soon
get tired of staring at a plywood
box every night.
(Darryl F. Zanuck, head of 20th
Century Fox, 1946)
 640K ought to be enough for
anybody. (Bill Gates, 1981)
“How do you want it – the crystal
mumbo-jumbo or statistical probability?”
Next Generation Information Systems
45
Silberschatz
Five predictions for the new millennium
1
A mega-network of networks will
enfold the earth in a communications
“skin” with ubiquitous connectivity and
enormous bandwidth.
Next Generation Information Systems
46
Silberschatz
Five predictions for the new millennium
2
By 2010, there will be so many
interconnected devices that the
volume of “infrachatter” among
communicating machines will
surpass communications among
humans.
Next Generation Information Systems
47
Silberschatz
Five predictions for the new millennium
3
Bandwidth will be too
cheap to meter.
$
Next Generation Information Systems
48
Silberschatz
Five predictions for the new millennium
4
Consumers and businesses will
have a vast variety of
individualized, custom services -written by countless programmers
on an open mega-network.
Next Generation Information Systems
49
Silberschatz
Five predictions for the new millennium
5
Virtual reality will become a reality
and will transform the way people
live and conduct their business.
This lecture will be given from the
comfort of my office without me
having to travel.
Next Generation Information Systems
50
Silberschatz
Download