Stanford`s Windows Infrastructure-

advertisement
Arkills: Appendix C
C
Stanford University Directory
Architecture
Environment
The Stanford community consists of approximately 1500 faculty, 8000 staff, and 14000 students.
The extended community includes over 250000 alumni. The university is organized in seven
schools, several of which regularly receive top honors in national reviews. Many notable research
projects are undertaken in over 100 locations, including the Stanford Linear Accelerator Center
(SLAC) and the Stanford Hospital.
This environment demands sophisticated IT resources which can be easily accessed in a
distributed computing model. Some IT support is provided centrally, but each school and research
project has autonomy and may deploy computing resources. Central IT helps support resources
that must have centralized management. The Stanford directory architecture is an example of
such a resource.
Stanford employs a network-wide user identity system for authentication that is based on
Kerberos. This system is known as the SUNet ID system, and can be used to access many
network services including email, the directory, websites, a Windows infrastructure, and other
services.
The Stanford directory architecture has evolved over time to meet Stanford’s diverse
needs. Some of the elements of the architecture are vendor provided, while others are custom
written. This composite nature of the directory architecture, along with the large, diverse
environment provide an interesting example of a data architecture that is worth a closer look.
304
Arkills: Appendix C
Source systems
Stanford has several systems of record. These systems hold authoritative data about Stanford’s
business. Each of these source systems is owned by offices that are responsible for the data, not
the central IT organization. For example, one source system comes from the Registrar, is based
on Peoplesoft, and contains authoritative information about students. Another source system
comes from Human Resources and contains authoritative information about staff and faculty.
Other sources include data from SLAC, and the Stanford Hospital. An ID card system for all
Stanford affiliated people also is a source system. This system maps a person’s name to their
unique ID card number. Figure 5-14 shows the relationship between the many source systems and
the central repository that integrates each of these source systems. This central repository is called
the Stanford Registry.
The Stanford Registry
The Stanford Registry is neither a LDAP directory, nor any kind of directory, but rather a
database. The rationale behind the Registry being a database centers on the purpose and
functionality it serves. A database holds several key benefits that meet the requirements desired.
Most notable among these requirements is the ability to make a large number of modifications
and also to roll back data to a previously known state. The Stanford Registry provides a custom
metadirectory functionality by amalgamating all the relevant data in one single repository. The
Registry eliminates any potential duplication of information from the multiple sources, and uses
business logic specific to Stanford. For example, imagine a student who also works for the
University as a staff member. At least two of the source systems hold authoritative information
about this person. The Registry takes all the information and applies a rule that decides which
information has more priority. In this example, some of the information from the student source is
taken, and some from the staff source. The level of modification activity, reporting, rollback and
305
Arkills: Appendix C
commit functionality required lead to the decision to use a database for this metadirectory
purpose.
Figure 5-14
Stanford Source Systems
The Registry gets information from the source systems via a periodic process involving
306
Arkills: Appendix C
XML formatted data. Because each of the source systems runs on different platforms, and each
has a schema with slight variations from the others, the level of abstraction that XML provides is
very useful. The Registry can then use the business rules it has defined to judge which source is
ultimately most important or more current and whether values from multiple sources can coexist.
Note that the Stanford Registry is a copy of the authoritative data, and not a referral that
points back to the source database or directory. This means that any subsequent use of this data
must only be for read-only purposes, or the authority of the source systems is put in jeopardy. So
modification of data should be redirected to the authoritative source. However, for a subset of the
data from the various source systems, specifically the person-related data set, the Registry is a coowner of the authoritative data. This means that changes made to this subset of the data in the
Registry propagate back to the source systems, just as all changes propagate to the Registry from
the source systems. In other words, person data is replicated both ways.
In addition to information replicated from source systems, the Registry hosts a few other
central information repositories. The Organization Registry holds an authoritative table of all the
officially recognized departments, schools and organizations associated with Stanford University.
This organization data helps to provide unambiguous name resolution for applications that must
differentiate between possibly ambiguous department names. For example, one application might
call a department the business
school,
while another calls it the Graduate
School of Business,
while still another calls it the GSB. In addition to providing clear names, this data set also
authoritatively establishes the hierarchical relationship between each department.
The Workgroup Registry provides a central place to define groups of people, such that
the group definition can be re-used for multiple services. This is similar to how groups are used in
network operating systems like Windows, but is platform independent, so that a group definition
can be made once, and be used by many services uniformly. Both departments and individual
users can define groups for their own use.
The Authority Registry is something still in development, but its intent is to provide a
307
Arkills: Appendix C
central definition of who holds authority for specific responsibilities and administrative tasks.
This will tie into the Organization registry, and can be used by network services to provide
definition of roles, and delegate administration. The Organization, Workgroup and Authority
Registries are incredibly important, because in general, the university employs a non-centralized
computing administration model, and these repositories help to unify the distributed services that
have been deployed by centrally defining groups and roles to make administration and interaction
easier.
The Registry must provide privacy controls for information. As mandated by the federal
law known as FERPA, Stanford is liable for the privacy of student personal data. This means that
the university must honor a student's request to protect their personal information. The Stanford
Registry therefore has privacy settings for applicable personal data. Access controls are set on
personal data attributes to protect the privacy of this data. All subsequent re-use of the data must
also employ the same or a stricter level of privacy control.
Privacy controls
The Registry provides the privacy control in a very interesting fashion, that is different
from traditional access control list (ACL) methods. All users (student or otherwise) can specify 3
different privacy settings for each piece of information about their person. These settings are:
World, Stanford,
Stanford
or Self. A World setting means that the information can be accessed by anyone. A
setting means that the information can only be accessed by people who are members of
the Stanford community. A Self setting means that the information is completely private, and
only the person can access it. Of course, Stanford business processes and Stanford administrators
must access data regardless of these settings to provide basic Stanford services. But these privacy
settings ensure that general directory searches respect the rights of the person.
Each of the three privacy settings noted above are placed in a special visibility attribute
308
Arkills: Appendix C
that is informally associated with the attribute it is intended to protect. For example, the
suVisibEmail
attribute holds the privacy settings that correspond to the mail attribute for each
person entry. Almost every attribute that holds personal information has a corresponding
visibility attribute. Even the person’s name can be protected. Some attributes are grouped
together in logical sets. For example, the suVisibAffiliation attribute protects the affiliation, o,
and ou attributes. Another set covers all the personal attributes to simplify those that want to treat
all their information in the same manner.
These visibility attributes are then used as an authorization factor to determine whether
any particular person has authority to access the informally linked attribute(s). Netscape
Directory Server supports Access Control Information (ACI) statements that provide this
interesting authorization factor functionality. These statements can be associated with any
container in the directory, but in Stanford’s case they are set at the root of the directory. The ACI
statement allows a content based access control to be implemented. In other words, the ACI
statement specifies that the value of a special attribute of the requestor’s binding entry must
match a special attribute value of the targeted entry. For example, imagine that I specify that my
email address has a privacy setting of Stanford (i.e. suVisibEmail=Stanford). A user that wants to
access the mail attribute of my entry must have a suPrivilegeGroup attribute on their entry with a
value of Stanford, to indicate that they are authorized to view my email address. Otherwise, they
will not get access. This functionality can be duplicated via traditional ACLs, but ACI statements
allow for a much more dynamic application of access control than traditional ACLs do.
Stanford’s experience with the Netscape Directory Server product has been that the overhead
involved with managing and processing attribute level ACLs is greater than using ACI
statements. For contrast, we will see how a comparable visibility is implemented in a traditional
ACL model shortly when we turn to the Stanford Windows Infrastructure and Microsoft’s Active
Directory product.
Once all the data has been unified into the Registry, it is published in a LDAP directory,
309
Arkills: Appendix C
called the Stanford Directory, for subsequent use by services and applications. The method of
moving the data from the Registry to the LDAP directory is a custom-designed process that is
very interesting.
Directory Harvester
The directory harvester moves information from the Registry to the master directory server for
the Stanford Directory. The directory harvester moves information in close to real-time so that as
an update is made in the Registry it is also reflected in the directory. This functionality is enabled
with the help of a special event database, which provides notification to the harvester of each
change to the registry. The directory harvester is not interested in all information in the Registry,
but only a subset. For example, the directory harvester is not interested in the organization
information, but is interested in the people information. Stanford has more than one harvester but
the directory harvester is the most critical. The directory harvester is unique among all the other
harvesters, because it is the only one that retrieves information from the Registry for publication.
All the other harvesters retrieve information from the Stanford Directory.
Event database
The event database provides a way to track each change to an entry in a fairly simply manner.
Each change results in an event posted to the Events database. The harvester keeps track of the
last event ID it knows about, and periodically checks the Events database for new events. So
when a new event is posted, the harvester knows about it. The harvester queries the entry noted in
the event, and creates/deletes/modifies the corresponding directory entry. Events are triggered by
each source system, but how each system accomplishes this event posting process differs between
systems. For example, one source system parses an audit log of entry modifications every five
minutes and creates events based on this information.
310
Arkills: Appendix C
The Stanford Directory
The Stanford Directory is currently run on the Netscape Directory Server product. A singlemaster replication model is employed, and this single master replicates the entire directory to 2
sets of directory servers. The first set of directory servers primarily provides mailbox resolution
for the campus email services. The second set of directory servers primarily provides a general
white page service via a custom-designed web interface. Each set provides a failover backup for
the other set, but helps to isolate service-intensive load to specific servers so users from one
service aren’t arbitrarily impacted by other services. Incidentally in the short term, Stanford is
actively migrating off of Netscape Directory Server onto OpenLDAP. In the longer term,
Stanford will closely evaluate each of the products to see which best meets its business
requirements.
Email service integration
Stanford primarily runs a sendmail based email service in addition to other mail offerings. The
sendmail service is integrated to perform its lookup and routing of user SMTP information
against the LDAP directory. Usually this information is stored on each individual sendmail server
in the form of database mapping or flat file, but when there are multiple sendmail servers
involved, the process of keeping these local mapping files synchronized while also up-to-date can
be difficult. Information about how you might integrate your sendmail service with a LDAP
directory can be found at http://www.iconimaging.net/~jradford/sendmail/sendmail-ldap.html.
Jason Christopher Radford has provided these helpful online tips.
Web UI integration
Currently at Stanford, directory searches are provided exclusively through a web interface. In the
311
Arkills: Appendix C
future, LDAP protocol based clients may be allowed access. The web interface, called
Stanford.Who, is quite friendly. A web-based form is provided, and the user can search based on
name. You can also designate a person’s affiliation (student, staff, faculty) to help refine the name
search. Alternatively, you can search based on email address, campus phone number or
Stanford’s network ID called the SUNet ID. Results include only the personal information that is
publicly accessible. A special web authentication system tied to the SUNet ID enforces the
privacy access controls.
Updating your personal information
In general, users can update their personal information via a web interface called Stanford.You.
This interface provides a portal for users to interface with the Registry (which co-owns their
authoritative person data), without needing to know any specifics about the source system or
Registry and the software it runs on. The user can view their personal information, and modify it
as needed. Additionally, the user can choose privacy settings in this interface. This is a good
example of the Loose Directory Interconnection approach noted in Chapter Five.
Active Directory Harvester
The Active Directory of the Stanford Windows Infrastructure is a subscriber to the Stanford
Directory via its own Event Harvester as shown in Figure 5-15. Stanford chose to harvest a
minimum of person-related information to AD, so only name, the primary department affiliation,
authorization group information (suPrivilegeGroup), and privacy settings were harvested. The
primary departmental affiliation is used to determine where in the root domain of AD the user’s
account should reside. A hierarchy of organizational units, that mimic the department hierarchy
relationship at the University exists in the root domain for the accounts to be created within. A
person’s primary department affiliation determines the location of their account in this OU
312
Arkills: Appendix C
hierarchy. This allows account administration to be easily delegated to the decentralized
departmental Windows administrators across campus. The harvester is capable to moving
accounts between departmental OUs when the primary departmental affiliation changes.
Figure 5-15
Active Directory Harvester
As shown in Figure 5-15, the password information for a person’s account is also written
to AD. This is done via a separate process than the harvester, and tight security restrictions are
placed on this data. The AD employs a kerberos realm trust, which along with using the
altSecurityIdentities
attribute, allows the existing MIT-style kerberos 5 realm to authenticate
all kerberos TGT requests from Windows clients. The corresponding Windows account just
functions as a shadow proxy account containing the proprietary Microsoft information. The
passwords are written to AD to ensure that downlevel clients that don’t support kerberos
authentication can participate. At a later time when these downlevel clients are no longer
supported, this password synchronization will be discontinued.
313
Arkills: Appendix C
Privacy Control in AD
Active Directory doesn’t provide many authorization factors. For example, the ACI statement
functionality discussed earlier isn’t supported. Active Directory, however, does support inherited
ACLs. When a person’s entry is created by the AD harvester, it is placed somewhere beneath an
Accounts OU. This OU has an inherited ACL that only allows the owner of that entry access to
the entry. Inherited ACLs are staticly applied in AD, so at the time of creation the setting is
copied to the entry. This establishes the minimum level of access that all entries shares.
A special Windows-based service using LDAP code, helps establish the more open
access settings that people may have chosen. Active Directory supports the persistent search
LDAP control, which enables this service to know whenever an entry has been modified. The
service then checks the entry for two things, and takes action as needed. First, it creates
membership in groups that match the values of the suPrivilegeGroup attribute of the entry. So a
World
and Stanford group are dynamically maintained by this service with memberships of all the
appropriate entries. In actuality, there are far more groups dynamically created and maintained,
and these groups correspond to the Workgroup Registry functionality described earlier. But for
the purposes of privacy control, focus on just the two groups. Second, the service reads the
privacy attributes set on the entry. The service compares the value of each of these attributes to
the ACL it finds on the entry. If one of the informally linked attributes needs to have more access
given (or access taken away), then it has the authority to add an ACE to that entry’s ACL. And of
course, it uses the groups it is dynamically maintaining. This approach works quite well. If the
special Windows service fails, no data is put at risk, because the default setting is more restrictive
than the actual privacy desired.
Summary
As has been demonstrated already, a great number of applications and services participate in the
314
Arkills: Appendix C
overall directory architecture. I’ve purposely simplified the number of interactions that actually
happen, so that the general architectural concepts can be shown in a specific real-world
environment. A great detail exists in terms of schema definitions, data architecture, directory
functionality into the Stanford architecture that simply can not be described fully. Hopefully this
small snapshot will be useful in illustrating how integration can be accomplished in a real-world
setting. I appreciate the opportunity Stanford has allowed me to take in describing their
environment.
315
Download