Reliable Distributed Systems Web Services

advertisement
Reliable Distributed Systems
Web Services
Today


Web Services – Introduction
“Remote Procedure Call” in WS


Binding, Marshalling…
Using TCP as the transport for RPCs

Connectivity Issues: NAT, Firewall
What are Web Services?

Today, we normally use Web browsers
to talk to Web sites



Browser names document via URL (lots of
fun and games can happen here)
Request and reply encoded in HTML, using
HTTP to issue request to the site
Web Services generalize this model so
that computers can talk to computers
What are Web Services?
Client
System
SOAP
Router
Backend
Processes
Web
Service
What are Web Services?

“Web Services are software
components described via
WSDL which are capable of
being accessed via standard
network protocols such as SOAP
over HTTP.”
SOAP
Router
Backend
Processes
Web
Service
What are Web Services?

“Web Services are software
components described via
WSDL which are capable of
being accessed via standard
network protocols such as SOAP
over HTTP.”
Today, SOAP is the primary standard.
SOAP provides rules for encoding the
request and its arguments.
SOAP
Router
Backend
Processes
Web
Service
What are Web Services?

“Web Services are software
components described via
WSDL which are capable of
being accessed via standard
network protocols such as SOAP
over HTTP.”
Similarly, the architecture doesn’t assume
that all access will employ HTTP over TCP.
In fact, .NET uses Web Services “internally”
even on a single machine. But in that case,
communication is over COM
SOAP
Router
Backend
Processes
Web
Service
What are Web Services?
“Web Services are software
components described via
WSDL which are capable of
being accessed via standard
network protocols such as SOAP
WSDL
over HTTP.”
documents
are used to
drive object
assembly,
code
generation,
and other
development
tools.

SOAP
Router
Backend
Processes
+
WSDL
document
Web
Service
Web Services are often Front Ends
Web Service
invoker
COM
App
C#
App
CORBA
App
Client Platform
WSDLdescribed
Web Service
SAP
Web
App
Server
Web
Server
(e.g., IBM
WebSphere,
SOAP
BEA
messaging
WebLogic)
DB2
server
Server Platform
The Web Services “stack”
BPEL4WS (IBM only, for now)
Transactions
Reliable
Messaging
Security
Coordination
WSDL, UDDI, Inspection
SOAP
XML, Encoding
Business
Processes
Quality
of
Service
Description
Other
Protocols
TCP/IP or other network transport protocols
Messaging
Transport
What are Web Services?


Amazon would hand out
“serverlets” for 3rd party
developers to use
This connects their applications
directly to Amazon’s system
serverlet
SOAP
Router
Backend
Processes
Web
Service
Advantages of web services?*

Web services provide interoperability between various
software applications running on various platforms.


Web services leverage open standards and protocols.
Protocols and data formats are text based where possible


“vendor, platform, and language agnostic”
Easy for developers to understand what is going on.
By piggybacking on HTTP, web services can work through
many common firewall security measures without requiring
changes to their filtering rules.
*: From Wikipedia
How Web Services work

First the client discovers the service.


More in next lecture!
Typically, client then binds to the server.


By setting up TCP connection to the
discovered address .
But binding not always needed.
How it works…

Next build the SOAP request: (Marshaling)



Fill in what service is needed, and the arguments.
Send it to server side.
XML is the standard for encoding the data (but is
very verbose and results in HUGE overheads)
SOAP router routes the request to the
appropriate server(assuming more than one
available server)

Can do load balancing here.
How it works…


Server unpacks the request,
(Demarshaling) handles it, computes
result.
Result sent back in the reverse
direction: from the server to the SOAP
router back to the client.
Marshalling Issues

Data exchanged between client and
server needs to be in a platform
independent format.





“Endian”ness differ between machines.
Data alignment issue (16/32/64 bits)
Multiple floating point representations.
Pointers
(Have to support legacy systems too)
Discovery

This is the problem of finding the
“right” service



In our example, we saw one way to do it –
with a URL
Web Services community favors what they
call a URN: Uniform Resource Name
But the more general approach is to use
an intermediary: a discovery service
Example of a repository
Name
Type
Publisher
Web Services Performance and
Load Tester
Application
LisaWu
Temperature Service Client
Application
vinuk
Weather Buddy
Application
DreamFactory Client
Toolkit
Language
OS
N/A
Cross-Platform
Glue
Java
Cross-Platform
rdmgh724890
MS .NET
C#
Windows
Application
billappleton
DreamFactory
Javascript
Cross-Platform
Temperature Perl Client
Example Source
gfinke13
Perl
Cross-Platform
Apache SOAP sample source
Example Source
xmethods.net
Apache SOAP
Java
Cross-Platform
ASS 4
Example Source
TVG
SOAPLite
N/A
Cross-Platform
PocketSOAP demo
Example Source
simonfell
PocketSOAP
C++
Windows
easysoap temperature
Example Source
a00
EasySoap++
C++
Windows
Weather Service Client with
MS- Visual Basic
Example Source
oglimmer
MS SOAP
Visual Basic
Windows
TemperatureClient
Example Source
jgalyan
MS .NET
C#
Windows
Repository summary


A database listing servers
Each is described using the UDDI language,
which is defined over XML


Hence can be searched with XML queries
An extensible standard


Defines some required information about
interfaces available and argument types, etc
But services can provide extra information too.
Roles?



UDDI is used to write down the
information that became a “row” in the
repository (“I have a temperature
service…”)
WSDL documents the interfaces and
data types used by the service
But this isn’t the whole story…
Discovery and naming

The topic raises some tough questions




Many settings, like the big data centers run
by large corporations, have rather standard
structure. Can we automate discovery?
How to debug if applications might
sometimes bind to the wrong service?
Delegation and migration are very tricky
Should a system automatically launch
services on demand?
Example: Why discovery is
tricky

Client has opinions


Service has opinions


Amazon.com would like requests from Ithaca to
go to the NJ-3 datacenter, and if possible, to the
same server instance within each clustered service
DNS has opinions


“I want current map data for Disneyland showing
line-lengths for the rides right now”
Many systems play with name -> IP bindings
Internet has opinions (routing)
So, what’s tricky?



Web Services doesn’t standardize these
four steps, it just assumes that people
will hack solutions
Hence some are hard to implement, we
lack standards, and in some cases,
solutions are poor ones
UDDI and WSDL are just a corner of the
overall picture!
Network address translation…

Another issue: Often, the internal address is
not addressable from outside!


A tiny bit of security.
But if RPC server is behind a NAT, trouble!




NAT needs the host behind it to start the connection
process.
Need to configure NAT to let specified traffic through.
Generally: (WS traffic)HTTP is let through.
Tough to have a connection in between two hosts
behind NATs.

There are some tricks to bypass this though.
Firewalls

These allow/disallow traffic, depending on source,
destination, protocol used, etc.


Stateful: remember active flows, and disallow unexpected
packets (NAT)





Often only allow connection from the inside to the
outside!
Again, need to configure to ensure server traffic gets
through. (General RPC)
Again, (WS)HTTP does not face as much of a restriction.
Get traffic statistics.
Spam/virus checking, etc.
NAT and firewall typically in the same box.
Demilitarized Zone (DMZ)





DMZ: used to host publicly
accessible services like
company webpages, ftp, dns.
Good place to host the Web
Service!
DMZ situated outside the
private network.
No outgoing connections from
DMZ.
If DMZ attacked, damage
limited to DMZ.
Client talks to eStuff.com


Moving on… let’s oversimplify and just
assume the client manages to find the
data center
We think of remote method invocation
and Web Services as a simple chain:
Client
system
Soap RPC
SOAP
router
Web
Web
Service
Web
Service
Services
So… suppose we get in



Assuming we can connect to the data
center (to its Web Services router), then
what?
If you just use Visual Studio out of the
box, you end up with a single-machine
Web Server
But massive datacenters are common!
A glimpse inside eStuff.com
“front-end applications”
Pub-sub combined with point-to-point
communication technologies like TCP
LB
service
LB
service
LB
service
LB
service
LB
service
LB
service
Clusters and load balancing



Idea here is that some form of load
balancer spreads work over a cluster
And cluster replicates data for
availability and load management
How it does this is a topic we need to
discuss in more detail (not today)
What about “legacy”
applications?

Some of these Web services are really just
front-ends to older legacy applications

So to talk to an old IBM database, we might





Run the database on some sort of machine, or virtual
machine
Build one of these translator front-ends
And then register it with the Web Services router
This may sound expensive (it is) but it works!
Obviously, our fancy clustering and loadbalancing won’t apply to a legacy application,
so those fancy tricks are only for “new” code
Discovery in eStuff.com


Data centers are increasingly common
And they raise hard questions!



How can a data center in California control
decisions a client is making in Ithaca?
Services are clustered. How should client
request be “routed” to the right member
Once you start talking to a server it may
cache data for you. How can you be sure
to get the right one next time?
These are modern challenges



Web Services can be seen as evolving
from prior work
Most often cited: CORBA, which also
was used in many big data centers
But CORBA didn’t assume that clients
came in over the public Internet

More often, CORBA was used between a
hand-built client and the service it talks to
CORBA approach

CORBA had what are called

Ways to export specialized client stubs



The client stub could include server provided
decision logic, like “which data center to
connect with”
Gives data center a form of remote control
Factory services: manufacture certain kinds
of objects as needed

Effect was that “discovery” can also be a
“service creation” activity
CORBA is object oriented

Seems obvious… and it is. CORBA is centered
around the notion of an object





Objects can be passive (data)
… active (programs)
… persistent (data that gets saved)
… volatile (state only while running)
In CORBA the application that manages the object is
inseparable from the object



And the stub on the client side is part of the application
The request per-se is an action by the object on itself and
could even exploit various special protocols
We can’t do this in Web Services
Web Services are documentcentric


That is, communication is by sending documents (like
pages) from client to server and back
And most guarantees or properties are associated
with the document itself, not the service


For example, WS_RELIABILITY isn’t about making services
reliable, it defines rules for writing reliability requests down
and attaching them to documents
In contrast, CORBA fault-tolerance standard tells how to
make a CORBA service into a highly available clustered
service
Will Web Services “help” with
naming and discovery?

Web Services tells us how






One client can…
… find one server and
… bind to that server and
… send a request that will make sense
… and make sense of the response
So sure, WS will help
But Web Services won’t…


Allow the data center to control decisions the
client makes
Assist us in implementing naming and
discovery in scalable cluster-style services


How to load balance? How to replicate data?
What precisely happens if a node crashes or one is
launched while the service is up?
Help with dynamics. For example, best server for
a given client can be a function of load but also
affinity, recent tasks, etc
How we do it now


Client queries directory to find the service
Server has several options:

Web pages with dynamically created URLs



Server can point to different places, by changing host names
Content hosting companies remap URLs on the fly. E.g.
http://www.akamai.com/www.cs.cornell.edu (reroutes
requests for www.cs.cornell.edu to Akamai)
Server can control mapping from host to IP addr.


Must use short-lived DNS records; overheads are very high!
Can also intercept incoming requests and redirect on the fly
Why this isn’t good enough

The mechanisms aren’t standard and are
hard to implement


And they are costly


Akamai, for example, does content hosting using
all sorts of proprietary tricks
The DNS control mechanisms force DNS cache
misses and hence many requests do RPC to the
data center
We lack a standard, well supported, solution!
Coming up?


How content is managed in even larger
systems, that have multiple data
centers
The main example is Akamai…
Download