Arkills: Appendix C C Stanford University Directory Architecture Environment The Stanford community consists of approximately 1500 faculty, 8000 staff, and 14000 students. The extended community includes over 250000 alumni. The university is organized in seven schools, several of which regularly receive top honors in national reviews. Many notable research projects are undertaken in over 100 locations, including the Stanford Linear Accelerator Center (SLAC) and the Stanford Hospital. This environment demands sophisticated IT resources which can be easily accessed in a distributed computing model. Some IT support is provided centrally, but each school and research project has autonomy and may deploy computing resources. Central IT helps support resources that must have centralized management. The Stanford directory architecture is an example of such a resource. Stanford employs a network-wide user identity system for authentication that is based on Kerberos. This system is known as the SUNet ID system, and can be used to access many network services including email, the directory, websites, a Windows infrastructure, and other services. The Stanford directory architecture has evolved over time to meet Stanford’s diverse needs. Some of the elements of the architecture are vendor provided, while others are custom written. This composite nature of the directory architecture, along with the large, diverse environment provide an interesting example of a data architecture that is worth a closer look. 304 Arkills: Appendix C Source systems Stanford has several systems of record. These systems hold authoritative data about Stanford’s business. Each of these source systems is owned by offices that are responsible for the data, not the central IT organization. For example, one source system comes from the Registrar, is based on Peoplesoft, and contains authoritative information about students. Another source system comes from Human Resources and contains authoritative information about staff and faculty. Other sources include data from SLAC, and the Stanford Hospital. An ID card system for all Stanford affiliated people also is a source system. This system maps a person’s name to their unique ID card number. Figure 5-14 shows the relationship between the many source systems and the central repository that integrates each of these source systems. This central repository is called the Stanford Registry. The Stanford Registry The Stanford Registry is neither a LDAP directory, nor any kind of directory, but rather a database. The rationale behind the Registry being a database centers on the purpose and functionality it serves. A database holds several key benefits that meet the requirements desired. Most notable among these requirements is the ability to make a large number of modifications and also to roll back data to a previously known state. The Stanford Registry provides a custom metadirectory functionality by amalgamating all the relevant data in one single repository. The Registry eliminates any potential duplication of information from the multiple sources, and uses business logic specific to Stanford. For example, imagine a student who also works for the University as a staff member. At least two of the source systems hold authoritative information about this person. The Registry takes all the information and applies a rule that decides which information has more priority. In this example, some of the information from the student source is taken, and some from the staff source. The level of modification activity, reporting, rollback and 305 Arkills: Appendix C commit functionality required lead to the decision to use a database for this metadirectory purpose. Figure 5-14 Stanford Source Systems The Registry gets information from the source systems via a periodic process involving 306 Arkills: Appendix C XML formatted data. Because each of the source systems runs on different platforms, and each has a schema with slight variations from the others, the level of abstraction that XML provides is very useful. The Registry can then use the business rules it has defined to judge which source is ultimately most important or more current and whether values from multiple sources can coexist. Note that the Stanford Registry is a copy of the authoritative data, and not a referral that points back to the source database or directory. This means that any subsequent use of this data must only be for read-only purposes, or the authority of the source systems is put in jeopardy. So modification of data should be redirected to the authoritative source. However, for a subset of the data from the various source systems, specifically the person-related data set, the Registry is a coowner of the authoritative data. This means that changes made to this subset of the data in the Registry propagate back to the source systems, just as all changes propagate to the Registry from the source systems. In other words, person data is replicated both ways. In addition to information replicated from source systems, the Registry hosts a few other central information repositories. The Organization Registry holds an authoritative table of all the officially recognized departments, schools and organizations associated with Stanford University. This organization data helps to provide unambiguous name resolution for applications that must differentiate between possibly ambiguous department names. For example, one application might call a department the business school, while another calls it the Graduate School of Business, while still another calls it the GSB. In addition to providing clear names, this data set also authoritatively establishes the hierarchical relationship between each department. The Workgroup Registry provides a central place to define groups of people, such that the group definition can be re-used for multiple services. This is similar to how groups are used in network operating systems like Windows, but is platform independent, so that a group definition can be made once, and be used by many services uniformly. Both departments and individual users can define groups for their own use. The Authority Registry is something still in development, but its intent is to provide a 307 Arkills: Appendix C central definition of who holds authority for specific responsibilities and administrative tasks. This will tie into the Organization registry, and can be used by network services to provide definition of roles, and delegate administration. The Organization, Workgroup and Authority Registries are incredibly important, because in general, the university employs a non-centralized computing administration model, and these repositories help to unify the distributed services that have been deployed by centrally defining groups and roles to make administration and interaction easier. The Registry must provide privacy controls for information. As mandated by the federal law known as FERPA, Stanford is liable for the privacy of student personal data. This means that the university must honor a student's request to protect their personal information. The Stanford Registry therefore has privacy settings for applicable personal data. Access controls are set on personal data attributes to protect the privacy of this data. All subsequent re-use of the data must also employ the same or a stricter level of privacy control. Privacy controls The Registry provides the privacy control in a very interesting fashion, that is different from traditional access control list (ACL) methods. All users (student or otherwise) can specify 3 different privacy settings for each piece of information about their person. These settings are: World, Stanford, Stanford or Self. A World setting means that the information can be accessed by anyone. A setting means that the information can only be accessed by people who are members of the Stanford community. A Self setting means that the information is completely private, and only the person can access it. Of course, Stanford business processes and Stanford administrators must access data regardless of these settings to provide basic Stanford services. But these privacy settings ensure that general directory searches respect the rights of the person. Each of the three privacy settings noted above are placed in a special visibility attribute 308 Arkills: Appendix C that is informally associated with the attribute it is intended to protect. For example, the suVisibEmail attribute holds the privacy settings that correspond to the mail attribute for each person entry. Almost every attribute that holds personal information has a corresponding visibility attribute. Even the person’s name can be protected. Some attributes are grouped together in logical sets. For example, the suVisibAffiliation attribute protects the affiliation, o, and ou attributes. Another set covers all the personal attributes to simplify those that want to treat all their information in the same manner. These visibility attributes are then used as an authorization factor to determine whether any particular person has authority to access the informally linked attribute(s). Netscape Directory Server supports Access Control Information (ACI) statements that provide this interesting authorization factor functionality. These statements can be associated with any container in the directory, but in Stanford’s case they are set at the root of the directory. The ACI statement allows a content based access control to be implemented. In other words, the ACI statement specifies that the value of a special attribute of the requestor’s binding entry must match a special attribute value of the targeted entry. For example, imagine that I specify that my email address has a privacy setting of Stanford (i.e. suVisibEmail=Stanford). A user that wants to access the mail attribute of my entry must have a suPrivilegeGroup attribute on their entry with a value of Stanford, to indicate that they are authorized to view my email address. Otherwise, they will not get access. This functionality can be duplicated via traditional ACLs, but ACI statements allow for a much more dynamic application of access control than traditional ACLs do. Stanford’s experience with the Netscape Directory Server product has been that the overhead involved with managing and processing attribute level ACLs is greater than using ACI statements. For contrast, we will see how a comparable visibility is implemented in a traditional ACL model shortly when we turn to the Stanford Windows Infrastructure and Microsoft’s Active Directory product. Once all the data has been unified into the Registry, it is published in a LDAP directory, 309 Arkills: Appendix C called the Stanford Directory, for subsequent use by services and applications. The method of moving the data from the Registry to the LDAP directory is a custom-designed process that is very interesting. Directory Harvester The directory harvester moves information from the Registry to the master directory server for the Stanford Directory. The directory harvester moves information in close to real-time so that as an update is made in the Registry it is also reflected in the directory. This functionality is enabled with the help of a special event database, which provides notification to the harvester of each change to the registry. The directory harvester is not interested in all information in the Registry, but only a subset. For example, the directory harvester is not interested in the organization information, but is interested in the people information. Stanford has more than one harvester but the directory harvester is the most critical. The directory harvester is unique among all the other harvesters, because it is the only one that retrieves information from the Registry for publication. All the other harvesters retrieve information from the Stanford Directory. Event database The event database provides a way to track each change to an entry in a fairly simply manner. Each change results in an event posted to the Events database. The harvester keeps track of the last event ID it knows about, and periodically checks the Events database for new events. So when a new event is posted, the harvester knows about it. The harvester queries the entry noted in the event, and creates/deletes/modifies the corresponding directory entry. Events are triggered by each source system, but how each system accomplishes this event posting process differs between systems. For example, one source system parses an audit log of entry modifications every five minutes and creates events based on this information. 310 Arkills: Appendix C The Stanford Directory The Stanford Directory is currently run on the Netscape Directory Server product. A singlemaster replication model is employed, and this single master replicates the entire directory to 2 sets of directory servers. The first set of directory servers primarily provides mailbox resolution for the campus email services. The second set of directory servers primarily provides a general white page service via a custom-designed web interface. Each set provides a failover backup for the other set, but helps to isolate service-intensive load to specific servers so users from one service aren’t arbitrarily impacted by other services. Incidentally in the short term, Stanford is actively migrating off of Netscape Directory Server onto OpenLDAP. In the longer term, Stanford will closely evaluate each of the products to see which best meets its business requirements. Email service integration Stanford primarily runs a sendmail based email service in addition to other mail offerings. The sendmail service is integrated to perform its lookup and routing of user SMTP information against the LDAP directory. Usually this information is stored on each individual sendmail server in the form of database mapping or flat file, but when there are multiple sendmail servers involved, the process of keeping these local mapping files synchronized while also up-to-date can be difficult. Information about how you might integrate your sendmail service with a LDAP directory can be found at http://www.iconimaging.net/~jradford/sendmail/sendmail-ldap.html. Jason Christopher Radford has provided these helpful online tips. Web UI integration Currently at Stanford, directory searches are provided exclusively through a web interface. In the 311 Arkills: Appendix C future, LDAP protocol based clients may be allowed access. The web interface, called Stanford.Who, is quite friendly. A web-based form is provided, and the user can search based on name. You can also designate a person’s affiliation (student, staff, faculty) to help refine the name search. Alternatively, you can search based on email address, campus phone number or Stanford’s network ID called the SUNet ID. Results include only the personal information that is publicly accessible. A special web authentication system tied to the SUNet ID enforces the privacy access controls. Updating your personal information In general, users can update their personal information via a web interface called Stanford.You. This interface provides a portal for users to interface with the Registry (which co-owns their authoritative person data), without needing to know any specifics about the source system or Registry and the software it runs on. The user can view their personal information, and modify it as needed. Additionally, the user can choose privacy settings in this interface. This is a good example of the Loose Directory Interconnection approach noted in Chapter Five. Active Directory Harvester The Active Directory of the Stanford Windows Infrastructure is a subscriber to the Stanford Directory via its own Event Harvester as shown in Figure 5-15. Stanford chose to harvest a minimum of person-related information to AD, so only name, the primary department affiliation, authorization group information (suPrivilegeGroup), and privacy settings were harvested. The primary departmental affiliation is used to determine where in the root domain of AD the user’s account should reside. A hierarchy of organizational units, that mimic the department hierarchy relationship at the University exists in the root domain for the accounts to be created within. A person’s primary department affiliation determines the location of their account in this OU 312 Arkills: Appendix C hierarchy. This allows account administration to be easily delegated to the decentralized departmental Windows administrators across campus. The harvester is capable to moving accounts between departmental OUs when the primary departmental affiliation changes. Figure 5-15 Active Directory Harvester As shown in Figure 5-15, the password information for a person’s account is also written to AD. This is done via a separate process than the harvester, and tight security restrictions are placed on this data. The AD employs a kerberos realm trust, which along with using the altSecurityIdentities attribute, allows the existing MIT-style kerberos 5 realm to authenticate all kerberos TGT requests from Windows clients. The corresponding Windows account just functions as a shadow proxy account containing the proprietary Microsoft information. The passwords are written to AD to ensure that downlevel clients that don’t support kerberos authentication can participate. At a later time when these downlevel clients are no longer supported, this password synchronization will be discontinued. 313 Arkills: Appendix C Privacy Control in AD Active Directory doesn’t provide many authorization factors. For example, the ACI statement functionality discussed earlier isn’t supported. Active Directory, however, does support inherited ACLs. When a person’s entry is created by the AD harvester, it is placed somewhere beneath an Accounts OU. This OU has an inherited ACL that only allows the owner of that entry access to the entry. Inherited ACLs are staticly applied in AD, so at the time of creation the setting is copied to the entry. This establishes the minimum level of access that all entries shares. A special Windows-based service using LDAP code, helps establish the more open access settings that people may have chosen. Active Directory supports the persistent search LDAP control, which enables this service to know whenever an entry has been modified. The service then checks the entry for two things, and takes action as needed. First, it creates membership in groups that match the values of the suPrivilegeGroup attribute of the entry. So a World and Stanford group are dynamically maintained by this service with memberships of all the appropriate entries. In actuality, there are far more groups dynamically created and maintained, and these groups correspond to the Workgroup Registry functionality described earlier. But for the purposes of privacy control, focus on just the two groups. Second, the service reads the privacy attributes set on the entry. The service compares the value of each of these attributes to the ACL it finds on the entry. If one of the informally linked attributes needs to have more access given (or access taken away), then it has the authority to add an ACE to that entry’s ACL. And of course, it uses the groups it is dynamically maintaining. This approach works quite well. If the special Windows service fails, no data is put at risk, because the default setting is more restrictive than the actual privacy desired. Summary As has been demonstrated already, a great number of applications and services participate in the 314 Arkills: Appendix C overall directory architecture. I’ve purposely simplified the number of interactions that actually happen, so that the general architectural concepts can be shown in a specific real-world environment. A great detail exists in terms of schema definitions, data architecture, directory functionality into the Stanford architecture that simply can not be described fully. Hopefully this small snapshot will be useful in illustrating how integration can be accomplished in a real-world setting. I appreciate the opportunity Stanford has allowed me to take in describing their environment. 315