University of Leeds, VLE Service 2006 User Management Development Phase 1 Software Design Specification Author: Jon Maber Client: The University of Leeds, VLE Service Date: July 2006 Purpose of this Document The VLE service has need of new a new software module for use with the Bodington software used to create the VLE Bodington Common. This software module will more closely integrate user management with theUniversity’s other user management systems, namely Active Directory, LURCIS, SAP and Banner. Additional aims are to move towards the use of a single password store for each user across the VLE and other web based services and to make possible a single sign on mechanism for users authenticating to the University Portal and then navigating to the VLE. The work is to be carried out in two phases. Phase one is the implementation of a Bodington authentication plugin that will authenticate the user against the University Active Directory. This will give users two major benefits. Firstly they will no longer need to maintain a separate password for the VLE because the Active Directory is used for authentication to all major university systems in particular the PC network. Secondly, since the Portal uses Active Directory to authenticate users when the Portal passes on a user’s user name and password to the VLE, as part of a single sign on scheme, the background authentication will succeed for the majority of users. Phase two which will be treated in detail in another document will resolve the problem of multiple VLE accounts for individual users in three scenarios: an off-line batch resolution of legacy multiple accounts; resolution of multiple accounts embedded into updated data import processes; one off resolution of multiple accounts for a single user initiated by an administrator. Phase two will not be further discussed in this document. Defining the Problem This phase considers the interaction of two University systems, the Bodington VLE and Active Directory. Both contain user data and both have user authentication functionality. Both already have records for University staff on the payroll and registered University students. However, it is possible that there may be inconsistencies between the records of the two systems since they have been designed and developed along separate paths although ultimately data has originated from the same sources. Active Directory Review Active Directory is access via the LDAP internet protocol and structures data in a tree. Branches of the tree are built using organizational Unit nodes which represent the division between staff and students, faculties and departments. Within each department node are nodes of type “user”. Each user record holds a list of attributes. Attributes of the user record which are of importance to this design specification are listed here along with comments on the use of the attribute at Leeds and comments on the quality of the data. Attribute Usage Data Quality cn Common name at Leeds is used to hold the user’s log in user name. It is guaranteed that no two user records have the same common name. ●At present no two user records have the same cn attribute. ●A person may have cn changed, for example when moving from undergraduate to postgraduate. ●There are a number of cn values which are suspiciously short or long. ●There are a number of cn values which have non alphanumeric characters. Describes the class of user. May be used to identify disabled accounts. ●Not yet clear what the system is. ●Not clear whether genuinely useful. employeeID Either holds an 8 character employee ID (SAP company ID or current payroll no.?)or a 9 character student ID. ●Person may have both a payroll number and a student ID but AD can only store one. ●Many staff accounts lack a meaningful ID. ●Some students have payroll Ids. ●Some staff have student Ids. sn Surname ● Not tested. displayName The name in a suitable form for display, probably first name and surname or initials and surname. ● Not tested. mail The user’s email address. ● Which is definitive? extensionAttribute2 Also contains email address. ● Which is definitive? description intials dn Initials not delimited and not including initial of surname. The distinguished name is formed from the cn and the names of the organizational Units which contain the user. Accessing the user record via the cn attribute involves a search operation whereas fetching the record via the dn does not. It is therefore useful to store the dn to speed up access to a user record. However, if the user record is moved to another part of the directory the dn will change and a search will be required. ● Not tested. ● May change from time to time. Bodington Review db field Usage users.user_id A Bodington generated number that identifies a user account. Not generally known to users. Comments ● Guaranteed unique. ● For a particular record never changes. ● A given person may have multiple records, one for each payroll no. and student number they have been assigned. pass_phrases.user_name The log in user name for a user. ● Guaranteed unique. ● One only per user record ● May change. ● User names not derived from University data can be distinguished. alias.name The name of a class of aliases. New types of alias can easily be added, e.g. library card number etc. ●Includes Banner student ID. ●Includes SAP payroll no. alias_entries.value The value of an alias for a given user and a given alias name. A user can have multiple alias entries – one for each available alias type. Alias entries can be used to find a user account. ●For given alias type entry is guaranteed unique. ●For given alias type, guaranteed no more than one entry per user account. Data Consistency Analysis It has been possible to analyse data from the two sources in order to highlight potential inconsistencies and to estimate the number of users who might experience problems with the new authentication scheme. 1) Null or Unknown employeeID attributes select * from ldap_users where "employeeID" is null or "employeeID" ='null' or "employeeID" = 'Unknown' • • • 1666 in staff organisational unit 7 in students organisational unit There is potential for finding a student in the Bodington database by matching both user name and student ID. 2) Duplicate employeeID values select * from ldap_users where "employeeID" in (select "employeeID" from ldap_users group by "employeeID" having count(*) > 1) order by "employeeID" • • There are no duplicate student IDs in this field. There are about twenty duplicate payroll numbers. i.e. two or more user names refer to the same staff ID number. Confirms that it is feasible to check student IDs but not staff payroll numbers of staff. 3) User names in AD not found in Bodington select * from ldap_users l left join test_users t on l.cn = t.user_name where t.user_name is null and ou='Students' • • • 4515 unrecognised user names 3355 unrecognised from staff ou (of which only 1740 have payroll numbers defined). 1160 unrecognised from students ou. (of which 7 are dummy accounts for testing – all others have student IDs defined. Would these numbers be much reduced if regular staff and student data imports were performed? 4) Students with student ID numbers that don’t match select cn as ldap_name, user_name as bod_name, "employeeID" as ldap_sid, sid as bod_sid, l.surname as ldap_surname, t.surname as bod_surname from ldap_users l left join test_users t on l.cn = t.user_name where t.user_name is not null and ou='Students' and "employeeID" <> sid There are likely to be two main explanations for the anomaly. 1) Where the surnames differ but students have the same initials, the user name has been erroneously assigned to two different students and by some means A.D. and Bodington have each received the records in reverse order. (And each ignored the second presentation of the same user name.) 2) Where the names suggest the same person in both databases the student ID in Bodington is incorrect but the student ID in A.D. is probably corrected. There are for example more ficticious looking student IDs in the Bodington column of the query results. Conclusion. There may be good reason to only authenticate when both the user names and the student IDs match. This will succeed for the majority of students and may prevent accidental access to the wrong account when occasionally the user name is given to the wrong student or to two students. 5) Staff with payroll numbers that don’t match select cn as ldap_name, user_name as bod_name, "employeeID" as ldap_eid, eid as bod_eid, l.surname as ldap_surname, t.surname as bod_surname, (l.surname = t.surname) as same from ldap_users l left join test_users t on l.cn = t.user_name where t.user_name is not null and ou='Staff' and ('0' || "employeeID") <> eid • • Total 520. Some (about 5-10%) also have mismatched names. Similar conclusions to students. 1. Some user names have been misassigned to people with the same department and initials. 2. A number of staff users are defined in Bodington with out of date payroll numbers. Conclusion. There may be too many staff with out of date employee ID numbers to make it useful to match accounts using both user name and payroll number. It may be better to use only user name to make the match but to log the difference as users authenticate so that it is possible to count the logins that would have failed had this extra check been implemented. 6) Staff in A.D. which Bodington thinks are students select * from ldap_users l left join test_users t on l.cn = t.user_name99where user_id is not null and eid is null and ou='Staff' • 546 user names correspond to staff in A.D. and to students in Bodington. This is the result of failing to recognise correspondence of new staff with exstudents in current Bodington data input procedures. Conclusion. Moderately high level of confidence that user name does belong to the same individual. Therefore match staff by user name alone but log mismatch information for statistical analysis and possibly to allow help desk to contact user to confirm identity. 7) Students in A.D. which Bodington thinks are staff select * from ldap_users l left join test_users t on l.cn = t.user_name where user_id is not null and sid is null and ou='Students' • • • Only one record – texdjb. Although placed in AD Students organisational unit AD has a payroll number. A.D. payroll number matches Bodington payroll number. Conclusion – may be better to ignore organisational unit as a means to differentiate between staff and students and instead look at the length of the employeeID attribute. (9 chars=student, 8 chars=staff). Recommended Design Basics • • • • • • Implemented as a Bodington authentication plugin. Implemented as one Java class possibly with one or more supporting Java classes. Source code for release under Bodington open source license. Interaction with Active Directory via LDAP protocol. Interaction with Bodington database via Bodington persistent object system. Configuration of plugin via plain text file edited by system administrator. Functional Requirement The following is a pseudo code description of the software which indicates the decision making flow but126omits the detail of the implementation. Text in green italic describes in outline functionality that would be implemented in Java source code. Discussion of the detail follows. Variables used in this psuedo code user_name = the user name that the user entered password = the password the user entered in the login page. dn = user's distinguished name, initially null user_id = bodington user id, intially null user_id = Look up user_name in Bodington and find associated user record if ( user_id not found ) { authentication fail exit } dn = Look up distinguished name in Bod alias entry against user id or set null. if ( dn not found ) { do LDAP search for dn using user name if ( dn still not found ) goto FALLBACK AUTHENTICATION METHOD else store dn in Bodington using an alias entry } Connect to LDAP and authenticate using dn and password if ( dn not found ) i.e. dn is out of date { do LDAP search for dn using user name if ( dn still not found ) goto FALLBACK AUTHENTICATION METHOD else store dn in Bodington using an alias entry } load all the user’s interesting attributes from LDAP check that AD user’s attributes in LDAP indexed by dn match the Bodington user’s properties indexed by user_id if ( not the same person ) { goto FALLBACK AUTHENTICATION METHOD } store password in user’s Bodington password record authentication success exit } FALLBACK AUTHENTICATION METHOD ldap authentication wasnt possible because the user name wasn’t found there or the connection to the ldap failed. check password against Bodington password store. if ( password matches ) authentication success else authentication fail exit LDAP connection configuration It’s necessary to ensure that authentication continues even if one or more of the active directory servers is not functioning. It is necessary therefore to maintain a list of Internet addresses for all the active directory servers and ensure that if connection to one fails the connection is retried against another. Consideration of load balancing may also be given. Design specification: the configuration file allows the specification of a list of IP names. Connection to the ldap system will be cycled through the active IP names. If a connection timeout occurs against an IP name it will be moved out of the active list into an inactive list. A background worker thread will periodically attempt connections to the servers in the inactive list and if successful will promote them to the active list. Searching LDAP for distinguished name using log-in user name Objects of interest in the directory are defined as type ‘user’ which can be selected with an LDAP filter (objectClass=user). However there are numerous records of this type spread through the directory which do not refer to human beings. It is necessary to further limit the search to two branches of the directory defined by ou=Staff and ou=Students. Since it isn’t possible to tell from the user name alone whether the user will be found in the Staff or the Students branches of the directory, two searches may be required; first search base: ou=Students second search base: ou=Staff search filter: (&(objectClass=user)(cn=user name here)) If the first search doesn’t find a match the second search is invoked. Implementing the “not the same person” check This will be configurable so that policy changes can be made. First the user is categorised as staff, student or other based on the content of the employeeID attribute. • if employeeID is nine digits or letters the person is a student, if the employeeID is eight digits the person is staff otherwise the person is other. Then (optionally for each category) the employeeID and/or the surname can be checked against the surname in Bodington and the alias entries. Initial recommended configuration; • • • • For all categories of user enforce matching user name because record may have been found using stored and now out of date distinguished name. Student: employeeID attribute must match student ID alias entry. Surnames don’t need to match. Staff: no check other than user name. Other: ldap authentication rejected – must use Bodington password authentication. Data Logging It is important that the authentication module log data on each authentication attempt for two reasons; 1. To provide records of failed authentications for the sake of supporting individual users. 2. To provide statistical information about successful and failed authentications that will inform future policy. The following data will be logged. • • • • • • • • Date and time of authentication attempt. User name entered by user. Length of password supplied LDAP distinguished name or null if not found. Flag indicating source of dn – Bodington database or LDAP search. All attributes found via LDAP. or nulls if user not found. Authentication method applied – LDAP bind or Bodington password Authentication success or failure.