Advanced Active Directory Design and Troubleshooting Ed Whittington Principal Software Engineer HP Business Critical Call Center Oct. 06, 2002 Topics Troubleshooting Basics Troubleshooting Tools DNS Troubleshooting Troubleshooting Replication Troubleshooting DCPromo Troubleshooting FRS Replication and DFS Troubleshooting Group Policy Troubleshooting in .NET Troubleshooting Basics Basic Troubleshooting Steps Define the problem (make sure there is one) • What’s failing? • Client authentication and security • Group policy application. • Replication. • Name resolution. • Errors and warnings in event logs. • FRS/DFS • Application • How is the problem replicated? • One or multiple machines? • Narrow the variables Basic Troubleshooting Steps MPSReports_DS (from HP or Microsoft) Get the Log files • Event logs – http://www.eventid.net • %windir%\debug\usermode\Userenv.log • %windir%\debug\DCPromo*.log Turn on Verbose Logging Run NetDiag, DCDiag (verbose) Get status report from Replication Monitor. Basic Troubleshooting Steps • Check DNS. • Resolver on ALL computers. • Name Server Properties (forwarding, etc.). • Monitoring tab – test name resolution. • Nslookup, ping to test name resolution. • Ping SRV records. • Check Replication. • Force replication. • Identify who isn’t replicating to whom. • Outbound vs. inbound. Basic Troubleshooting Steps If all else fails, try demoting. • Really cleans up a lot of problems… If problem is isolated to one DC. • If replication isn’t working, demotion won’t work. • Reinstall to remove the AD, then clean up AD • • Ntdsutil to remove server object. • Delete server object from Sites & Services. • Delete FRS server object from System container. Can manually demote a DC. Manual Demotion of a DC HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet \Control\ProductOptions Product Type= • – ServerNT (when the computer is a Member Server) – LanManNT (when the computer is a Domain Controller) Change from LanManNT to ServerNT It’s now a “dirty” member server Clean server objects from the AD (Ntdsutil) Clean up the disk and Registry 1. Create new Forward Lookup Zone – Bogus.com 2. Run DCpromo – create new forest for Bogus.com 3. Demote and eliminate Bogus.com 4. Wait for Replication 5. Promote back into domain – use same name if desired Tool in Windows .NET Troubleshooting Tools Gathering Information Netdiag.exe NETDIAG.EXE /v - verbose – always turn this on. /l - log – writes netdiag.log to default directory. /d:domain controller – finds DC in domain. /test: - runs only specified tests. /skip: - skips specified tests. Can’t execute remotely. C:>netdiag /v /l Netdiag.exe Domain Controller Discovery Bindings, IP address, Default Gateway tests DNS tests NBTstat and WINS ping Netstat Route Trust Kerberos Dcdiag.exe DCdiag /v Domain controller functions of netdiag More domain-specific FSMO roles Connectivity Replications Domain controller locator Intersite “health” Topology integrity Nltest.exe /server:servername Sets default server /dsgetdc:domainname Dsgetdcname API [ /gc /timeserv /ldap ] /dclist:domainname Lists DCs in domain /parentdomain Lists parent domain /dsgetsite /dsgetsitecov Lists site of server Lists DC “covering” site /dcname:domainname Lists PDC for domain /dcpromo Tests potential success of DCPromo /whowill:domain user Returns name of DC that will authenticate user Netdom.exe /join /add /reset /resetpwd /query FSMO /trust NTDSUtil • Built-in utility. • Directly accesses Active Directory. • Authoritative Restore. – Can restore an older version of the AD and force it on all DCs to correct variety of problems. – Entire AD or single tree. – Can’t restore the schema. • FSMO Roles. – List, Transfer, Seize roles. – Better than UI – can manipulate all roles in forest and all domains from one utility.. NTDSUtil Metadata Cleanup – Delete orphaned objects. – Servers – Domains – The UI can and will lie to you! Don’t trust it. Useful tool for listing contents of the AD – Sites, domains, servers, FSMO role holders. – Domains in site. – Servers in domain, servers in site. Q216364, Q216498, Q230306 Gpresult.exe Run on client Returns: • Security group membership • User and Computer policy info • GPOs applied to each • Registry settings set in the GPO • Client-side extensions set – Scripts applied Remember • Policy is cached – reboot / login to clear • Note who authenticating server is – Environmental Variable “logon server” Much Improved in .NET! GPOtool.exe Run on domain controller. Returns: • Analysis of all GPOs in domain. • GUID and friendly name of all GPOs. • DS and Sysvol versions. • Errors encountered. Good group policy troubleshooting tool. May take a long time to process (#GPOs) ADSIedit.exe GUI much like Users & Computers snap-in /Advanced features. Graphical view of AD. Like LDP.exe but: • Easier to browse. • Can modify attribute values Don’t confuse with Users & Computers! LDP.exe Takes time to set up: • Connect • Bind • View – Tree • Enter DN to start (blank for default) Exposes attributes quickly, easy to see. Faster than ADSIedit – no GUI to traverse. LDAP searches. Can delete and modify, but not as easy as ADSIedit. Can execute remotely. DCPromo.log, DCPromoui.log Located in %systemroot%\debug. Logged every time dcpromo runs. DCPromo.log • Shorter. • Appended (read bottom up). DCPromoUI.log and DCPromoUI.xxxx.log • Results of what is seen in the UI – longer. • Find: Results of getdsdcname, DNS query, Time service sync, authentication, replication, Site info. • Error (0x0) = success – no error . Error reporting different – read both logs. Userenv.log Located: %systemroot%\debug\usermode User environment info: • Group policy (registry) • Client side extensions – Scripts – Security Increase verbose logging (Q221833) Take time – read and study and you may be surprised at what you can find! Additional User Mode Logs Client-side extensions • Registry see Q216357 HKLM\software\Microsoft\WindowsNT\currentversion\winlogon\ GPExtension • Errors created in %windir%\debug\user mode – Named after the .dll – Scripts = Gptext.dll = gptext.log – Folder Redirection = fdeploy.dll = fdeploy.log – Security = scecli.dll = winlogon.log – Q245422 – Produced automatically on error (except winlogon.log) – Check User Mode directory for these files • Invaluable in debugging. Use them! Client Side Extensions (registry) Windows .NET Troubleshooting Tools Remote Desktop Resource Redirection Client Resources Available when using Terminal Services Remote Desktop • File System – Local drives and Network drives on Local Machine available on Remote machine • Audio – Audio streams such as .wav and .mp3 files can be played through the client sound system. • Port – Applications have access to the serial and parallel ports • Printer – The default local or network printer on the client becomes the defaultprinting device for the Remote Desktop. • Clipboard – The Remote Desktop and client computer share a clipboard • Terminal Services Virtual Channel Application Programming Interfaces (APIs) are provided to extend client resource redirection for custom applications. WMI Computer management Active Directory • Provider: MicrosoftActiveDirectory • Classes: – Replication - See replprov.mof %windir%\system32 Trust health • Provider: MicrosoftHealthMonitor • Classes: see system32\wbem\trusthm.mof DNS • Provider: MicrosoftDNS • Classes: system32\wbem\dnsprov.mof Cluster • MSCluster Also look in CIM Studio in MSDN WMIC Sample Commands Look in %windir%\system32\wbem *.mof files for names of providers, classes, etc. Active Directory • Provider: MicrosoftActiveDirectory • wmic:/namespace: \\root\microsoftactivedirectory PATH msad_replneighbor (shows replication partners) • wmic:/namespace:\\root\rsop\user path RSOP_GPO (lists GPOs with User settings) Admin Tool Improvements Users and Computers snap-in • Drag and drop. • Multi-select and edit user objects. • Heavily revised object picker. Users and Computers, Sites and Services, DNS Snap-ins • Saved queries. • Viewing Saved DS, DNS, FRS eventlogs on non-DCs! .NET Adminpak (only on XP) Command Line Tools GPresult • Enhanced reporting DCDiag • dcdiag /test:DCPromo Repadmin – enhanced reporting Netdom – computername for DCrename Others Shipped on • Service Pack 2 CD (install manually) • .NET Server, AdvSvr CD Windows .NET Improvement to NTDSUtil Change Offline, DS Repair Mode Password While Online! NTDSUtil • Set DSRM Password (main menu) Increases server up-time limited by password change interval in Win2K. • (Had to reboot to DS Repair mode to change.) • Q223301 (Win2K limit) Cool error message! Setting password failed. WIN32 Error Code: 0x6ba Error Message: The RPC server is unavailable. See Microsoft Knowledge Base article Q271641 at http://support.microsoft.com for more information. Errors in Windows .NET Kinder, Gentler and Report to Microsoft Active Directory Load Balancing Tool Does the job of branch office deployment. • KCC chooses BHS for connection objects – choose the same one. • Tool allows you to spread the load to other DCs in the site (that have that NC). • ADLB tool modifies the Hub DC’s replication schedules to spread it out over time. • Generates a log – like replmon’s status log. • For Deployments with hundreds of branch offices all replicating to a single hub.. • Tool=no benefit to sites with only one DC per domain. Future: Graphical Replication Monitoring Tool Very much like ‘Age of Directories’ Ability to make configuration changes Not in .NET - maybe Longhorn or Blackcomb? Troubleshooting DNS DNS Resolver Configuration Win2K clients, servers point to Win2K DNS Name Server that is SOA for their zone. • Don’t point to ISP, other Internal NS. (even as “additional”.) • Keep it simple. Win2K Name Servers forward to ISP or internal name server hosting registered domain. DNS Name Server Configuration Basics • • Dynamic updates = Yes. Active Directory Integrated Zone • Select one “Primary” • All other ADI Primary NS point to it for DNS • Win2k Name Servers can: • Forward to ISP or Internal NS. • Use root hints (or modify root hints). • Reverse Lookup Zones NOT required • Needed only for tools - NSLookup ADI Primary and Standard Secondary mixed zone • • Only a DC can host an ADI primary zone Member Servers can host Secondary zone • Synch off of an ADI Primary ADI Primary Secondary Secondary ADI Primary ADI Primary DNS Case Study Forwarding na.corp.net sa.corp.net eu.corp.net na.corp.net Secondary zones corp.net sa.corp.net eu.corp.net DNS Case Study corp.net eu.corp.net sa.corp.net na.corp.net na.corp.net sa.corp.net find na.corp.net eu.corp.net With Conditional Forwarding Feature In Windows .NET Server… corp.net na.corp.net sa.corp.net eu.corp.net find na.corp.net Problem: SRV records only in Root domainLocation of SRV: w2k.net corp.com corp.com PDC GC Cname = Zone Xfer = Forwarder NA.w2k.net EU.w2k.net Solution: Delegate _msdcs zone corp.com w2k.net _msdcs Location of SRV: PDC _msdcs GC _tcp Cname _sites _udp = Delegation = Forwarder NA.w2k.net EU.w2k.net DNS Hotfix Symptom: Replication breaks Configuration: Using Secondary Zones for root _msdcs at child domains. Problem: Serial Number of Secondary zone is higher than the primary – zone transfers stop. Hotfix Q304653 • The Serial Number Is Decremented in DNS When You Reboot • Solved in .Net DNS Troubleshooting Basics • • Check DNS event log (and others). Check Location of DNS servers. • Usually want Name Server in remote sites. • Check population of SRV records. • _msdcs; _tcp; _udp; _sites • Need Kerberos, LDAP records for each DC. • Correct address, etc. • Can delete, repopulate by restarting netlogon. • Check Delegations – correct names, IP. DNS Troubleshooting Basics • • • • • • Use of Active Directory Integrated (ADI) zones. • Put standard secondary zones on mbr svrs. • Can clear problems by switching to Std Pri. Ping DC by SRV record: ping <guid>.site._msdcs.compaq.com. Clear the server cache. • Negative Caching problems. Test – Server Properties – Monitoring tab. Test – Ping names, NSLookup. Troubleshooting AD Replication Replication Troubleshooting Tools Event logs – Directory Services, System Sites and Services snap-in Age of Directories (AOD) – HP Replication Monitor Aelita Event Admin NetPro Directory Analyzer Command Line (Support Tools & Res Kit) DCdiag, Netdiag Repadmin.exe Event Logs for Replication Troubleshooting Directory Services Log • 5778 - Subnets not mapped. – Will break client’s “site awareness.” • 1311 - serious - Not enough connectivity. – Connectivity, traffic issue. – Sites with DCs and no site links. – Site topology incorrectly defined. • DNS Lookup failure. • 1772 – RPC Server is unavailable. – Physical connectivity. – DNS. Event Logs for Replication Troubleshooting System Log • Netlogon errors – Authentication – Trusts – Secure channel • w32Time errors – Kerberos authentication required for replication – DCs must be no more than five minutes out of sync. – Watch time zones! Sites and Services Snap-in Check for duplicate connection objects. • KCC generating >1 connection between 2 DCs. • Delete all connections and select “check replication topology” option to regenerate them. • If they come back, find out why. – Usually a DNS problem. • Breaks FRS and AD replication. Sites and Services Snap-in Check for sites with no DC’s… • OK to have a site with no servers if you plan it that way. • If there should be a server in that site, find it and move it there. Make sure all subnets are mapped to correct sites. • Keep up on IP addressing changes. Sites and Services Snap-in Make sure site links are correct. • Link correct sites per design (need a drawing). • Cost, schedule, replication frequency. Force replication between DCs. • All connections are inbound. • Use “check replication topology.” • Create new site, user named for the DC. – Checks Configuration NC and Domain NC. – Force Replication Between Replication Partners. – On DC1 from DC2 and on DC2 from DC1. Sites and Services Snap-in • Validate inbound, outbound replication on all DCs. – Create new site, user named for the DC. – Checks Configuration NC and Domain NC. – Wait for replication (don’t force it). – Check each DC for copy of these users, sites. DC1 DC3 DC2 User Site User Site User Site DC1 DC1 DC2 DC2 DC1 DC1 DC2 DC2 DC3 DC3 DC3 DC3 DC3 Check Cname DNS Records • In root _msdcs zone (only), alias record mapping DC’s FQDN to its server GUID. Only one record. – Delete duplicates. Match GUID in alias record to GUID reported by Repadmin /showreps. If in doubt, delete DC’s Alias record(s) and re-start netlogon on broken DC to re-register . Age Of Directories Tool - Demo If interested, contact me ed.whittingtonn@HP.com Replication Monitor Status report (replication health report) List of all GCs, BHS, Trusts List of all replication errors on all DCs in domain Changes not replicated Replication partners Force push/pull replication Meta-data Group Policy Object status FSMO validation Inbound connections (including reason) Replication Monitor Command-Line Utilities RepAdmin • In Support Tools. • Perhaps the most useful tool for troubleshooting replication. • /showreps - lists inbound, outbound connections. – Only one to list outbound connections. – Lists Server GUID (used for replication). – Lists successful replication messages. – Lists replication errors. – Lists Replication partner used to replicate every naming context – inbound and outbound. NTDS Diagnostic Logging HKLM\system\CCS\Services\NTDS\diagnostics • Set value = 0-5 – 0 = off 5=very verbose – Start with 3 to begin with – Reported in Event log • Important Values 1 Knowledge Consistency Checker 13 Name Resolution 5 Replication Events 8 Directory Access 9 Internal Processing 18 Global Catalog Things that break Replication (or indicate that it’s broken) Duplicate connection objects Orphaned objects • Esp. DC objects, caused by a DC being removed from the domain without successful DCPromo. • Garbage Collection initiated manually before all DCs and GCs are fully replicated. • Reported in event logs. Things that break Replication (or indicate that it’s broken) DC unavailable • Down • Name Resolution • Network problem DNS misconfigured • TCP/IP addresses change – Delegation – Client resolver configuration (including name servers) – DHCP scope configuration for DNS registration • Failure to Contact a DNS server (for SRV records) Things that break Replication (or indicate that it’s broken) KCC doesn’t do it’s job • Routes around inaccessible DCs by creating duplicate connection objects. • When DCs come back on line, KCC should clean up the duplicate connection objects. – Usually doesn’t… – Causes replication errors. – Events in the DS Log. – Need to clean them up manually. Lingering Object Behavior Basics Scenerios Object Deletions Deleted objects turn into tombstones • Tombstones replicated to other DCs • This is how replication partners learn that an object was deleted Tombstones purged from local database after tombstone lifetime has expired • AD: 60 days, adjustable (2 days minimum) • Sysvol: 60 days If tombstone does not replicate to a DC, object deletion is not replicated • Object not deleted on this DC • Object is now a Lingering Object • Can be on DC or GC Rule: tombstone lifetime = • Max time DC can be disconnected • Max lifetime of Backup tape Lingering Objects – Scenarios Deleted object re-appears on all domain controllers in a domain and on all GCs Deleted account does not disappear from Exchange GAL Object was moved between domains and disconnected GC is brought online Replication error on GC when new object is created • Lingering object still holds attribute where uniqueness is enforced (samAccountName) • Exchange cannot create mailbox because object already exists Why does this Happen???? DCs disconnected for more than tombstone lifetime • Left in storage room for long time • Replication failures – I.e., bridgehead servers overloaded, no monitoring in place • WAN connections down for a long time – Tombstone lifetime abuse – “Somebody” changed time on a DC to garbage collect an object – Tombstone lifetime was changed to garbage collect objects on single servers Can this be avoided? • YES, monitor KCC topology and replication • Do not set tombstone lifetime to less than 60 days • DCs offline > tombstone lifetime must be re-promoted Lingering Objects Strict vs. Loose Replication Behavior Replication Behavior • Defines how DC reacts if an update for an object is replicated in, and the object does not exist on DC Loose Behavior • DC requests full copy from replication source • Logs event ID: 1388 Strict Behavior • DC stops replication from offending replication source • Logs error code 8240 (ERROR_DS_NO_SUCH_OBJECT) embedded in event ID 1084 • Requires logging level 1 Behavior can be set via registry key • HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NTDS\Parameters\Strict Replication Consistency • Introduced in Q314282 Deleting Lingering Objects If found on a DC • In loose behavior: Delete the object via users and computers • In strict behavior: Follow procedures outlined in Q314282 On GC (in read-only NC) • Object cannot be changed or deleted on GC • Solution 1: Delete object on writeable replica (if possible) • Solution 2: Use ldp to delete the object on the GC – Support to remove lingering objects from GC added in Q314282 – Follow procedures outlined in Q314282 You might have to set loose behavior temporarily Best Practice Recommendations DC has not replicated for more than 60 days • Tombstone lifetime default (60 days) – Do not replicate, re-install OS • Tombstone lifetime adjusted to > 60 days – 60 days < time DC disconnected < tombstone lifetime – Re-connect DC, restore sysvol – Time DC disconnected > tombstone lifetime – Do not replicate, re-install OS If you have to disconnect a DC • Make sure that it replicates successfully before you take it off-line New deployments • Add registry key to enforce strict replication behavior at DC OS installation time More Best Practice Recommendations Existing deployments • Default setting: Loose replication (even on SP3) • Goal: Get to strict mode asap • Set registry key to strict mode on all DCs • Watch event logs on DCs – If you get many replication errors on single DCs, re-promote DC – For small number of replication errors, clean-up the DC – Delete lingering objects if necessary – Follow procedures outlined in Q314282 • If you were monitoring… – Then don’t worry, you won’t see any replication errors Don’t lower tombstone lifetime to less than 60 days Monitor! Lingering Object Fix Q317097 (good instructions) HKLM\System\CurrentControlSet\Services\NTDS\Parameters… • Add Value Name = Correct Missing Object • Data Type =REG_DWORD • Value = 1 (tight) 0 (loose) Allows or Restricts AD replication when lingering objects are discovered. • Tight when you want to know. • Loose to inventory and remove the objects. Value Level Replication WNT: Object Replication • change to attribute or value W2K: Attribute level replication • Better than NT (more efficient) • Change to attribute replicates attribute • Change to value replicates attribute • Problem: Multi-Valued Attributes – – – – – Group = Attribute Member = Value Change Member = replicate attribute with all members Impacts network traffic Limit (per Microsoft) of 5,000 users/group .NET: Value Level Replication • Replicates values – not attributes • Eliminates 5,000 user/group limit Domain Limit There is a limit of about 800 child domains to a single parent Child domains are unlinked, multi-valued attribute – stored in the crossref attribute of the domain object Jet database limits the data that can be stored. No way to patch – must change Jet “Might” be improved in Longhorn (not Whistler) Domain Limit One customer got to 900 domains • Replication failed • Authentication failed • Mission critical application failed Temporary Repair • Demote all domains in reverse order of creation to return to 800 • Fixed Replication Solution • Redesign and redeployed to a single domain DCPromo Troubleshooting DCPromo Basics First Test of: • DNS registration and resolution . • LDAP query and response. • Kerberos authentication. • Active Directory replication. • FRS replication. • Application of group policy. Validation and Flow … • Chapter 2, Active Directory Data Storage in the Windows 2000 Resource Kit DCPromo Logs %windir%\debug • Dcpromo.log • Dcpromoui.log • Dcpromoui.xxx.log Set verbosity on dcpromoui.log • HKLM\Software\Microsoft\Windows\CurrentVersion\AdminDebug • Values: DCpromo and DCPromoui • Data – 380001 = Default – 0xFF003 – full file and debugger logging output – 0xFF001 – maximum detail to DCPromoui.log DCPromo Phases Initialization • UI Input - DNS Name resolution • LDAP Query/resp - Kerberos Authentication AD Replication FRS Replication Wrap Up • Apply policy - Upgrade Trusts • Publish new DC in the DS Initialization Phase Authorization error • Enterprise Admin required to create new domain (or to remove the last one). • Domain Admin required to add replica DC (or demote a replica). Can’t find DNS with Dynamic Updates. • Prompt to let DCPromo configure DNS. – Creating domain. – Answer NO! Replicas, Child – must find DNS server to locate a “sourcing DC.” Errors Creating the Computer Account Need privileges to create the account. First creates the account, puts it in domain/computers container. Then puts it in domain controller’s OU. Source DC identified in DCPromo logs. DCPromo Initialization Checklist Privileges required • Enterprise Admin if creating new domain. • Domain Admin if creating a replica. System time configured properly • Kerberos requires sync within five minutes. • All parent, child domain DCs. Sufficient free disk space. • ~850 MB Domain Naming Master FSMO required if creating new domain. DCPromo Initialization Checklist Everyone or Enterprise DC group has “Access this computer from network” Enterprise DC group rights: • Manage Replication Topology. • Replicating Directory Changes. • Replication Synchronization. Sourcing DC • Security policy applied. • Enable Computer and user account to be trusted for delegation. DCPromo Initialization Checklist Target DC has valid Kerberos tickets. • Kerbtray.exe utility from Resource Kit. GC must be contacted. • Nltest /dsgetdc:compaq.com/GC Able to contact a functional existing DC. • Uses UDP (watch for firewall issues). – Can use TCP but it’s a Microsoft Secret! • Use Ping, NLTest, Nslookup to find a DC. If Source DC not Reachable... See if one responds. • Ping FQDN of domain (Ping compaq.com). • NLTest /dsgetdc:compaq.com /ds – Other: /gc /pdc /timeserv • Check Site mapping for this computer. – Nltest /server:<name> /dsgetsite Check Dcpromoui.log to see source. Force DCPromo to use a specific source • Q224390 • Turn off Netlogon on other DCs. Join the Server to the domain then DCPromo. Info to Collect for Debug Netdiag /v • Problem DC • Source DC (see dcpromo.log) DCDiag /v • Source DC Replication working? (other DC in site) AD & FRS Replication Phases Initially inbound connection created to replicate from source DC. • Machine acct (DC1$) moved to DC OU. – UserAccountControl Attribute set – 4096 (1000 hex) = Workstation/Server – 532480 (82000 hex) = DC – Account is moved. • Error: DC1$ not found, access denied, etc. – Credentials of account running Dcpromo – Source must have computer object. – Source must have security policy applied to itself. – Q250874 AD & FRS Replication Phases After first reboot… • Outbound connection created. • AD changes for new DC replicated to source. – Including UserAccountControl attribute. – Server (Replication) object. – Replicated to other DCs. • Sysvol is populated (policies copied to new DC). • Sysvol and Netlogon Shares created. Troubleshooting Missing Sysvol, Netlogon Shares Outbound connection failed • Look in Sites and Services or Repadmin • UserAccountControl still 4096 on source [Q257338] – Good but … • Build manual “outbound” connection • Force KCC to “Check Replication Topology” • Check UDP traffic if in a remote site. Missing Sysvol and Netlogon Shares Create replication “links” manually then force replication: • Repadmin /add (adds outbound link) • Repadmin /sync (forces replication) Can’t create them manually. When Replication is fixed, they’ll get created. Tracking Down a GUID Problem: GUID referenced in event log. What is it? Solution: (Q216359) • LDP – search for the GUID • Search.vbs in Support tools Orphaned Object (will kill replication) • Turn up NTDS diagnostic logging – Internal processing – Replication • Find object (GUID) in event logs • Delete it via LDP DCPromo Improvements in Windows .NET Install From Media (IFM) Source Replica AD from Media in DCPromo • GCs or DCs (Replica only). • No initial replication from a DC. – Faster (no searching for a DC). – Less network impact (No full sync on the WAN). – Easy branch office installation. • After initial load, replicates changes. • Network connectivity still required. • Unattended Answer File Support: – ReplicateFromMedia – ReplicationSourcePath Install From Media (IFM) Unattended Answer File Support • ReplicateFromMedia • ReplicationSourcePath Media must be local drive. Media useful life < 60 days. How?Use Backup Files/Media • Create first DC in domain. • Back up DC. • Restore to Media (local disk, CD, …). • C:>dcpromo /adv. • Wizard produces an additional screen… DCPromo Answer File See Q223757 [Unattended] Unattendmode=fullunattended [DCINSTALL] UserName=administrator Password=Password3 UserDomain=corp.net DatabasePath=c:\windows\ntds LogPath=c:\windows\ntds SYSVOLPath=c:\windows\sysvol SafeModeAdminPassword=Password2 CriticalReplicationOnly SiteName=Seattle ReplicaOrNewDomain=Replica ReplicaDomainDNSName=corp.net ReplicationSourceDC= ReplicateFromMedia=yes ReplicationSourcePath=e:\DSrestore RebootOnSuccess=yes ! Leave this blank for IFM File Replication Service (FRS) Basics FRS Background File Replication Service • Replicates file system portion of policy • Optional replication engine for DFS Concepts Challenges • Journal wraps • Staging File backlog • Reconciliation / Morphed Directories Concepts Objects in DS • Members, Subscribers, Conn. objects, filters • Depends on AD replication • Determines partners and schedule NTFS USN Journal • Used by FRS to track changes to NTFS volumes Staging File and Directory • Rename safe • Compression support Database • Record of incoming, outgoing & existing files File Replica Service (FRS) Replaces NT 3.X\4.0 LMREPL service Replicates SYSTEM Policy, Group Policy, DFS • Group policy templates • Ntconfig.pol & logon scripts for down-level clients – NETLOGON Share • DFS share contents Multi-threaded replication engine • Replicate different files to different computers simultaneously. Terminology • Computer A and B replicate DFS+SYSVOL • B is computer A’s outbound partner • A is B’s inbound partner. • A is B’s “upstream” partner • Changes flow “downstream to B Downstream Upstream Computer A Replication B’s Inbound partner Computer B A’s Outbound partner Basic Operation 1 DC1 GPO Change created on DC1 2 GPO 3 Temp File moved to staging directory 4 Notify Replication partners (replicas) of changes Partners pull changes from DC1 DC2 File and Folder Filters Excluded from FRS Replication: • Computer specific EFS files/folders • File names beginning with ~ • Files with .bak or .tmp extensions • NTFS Mount Points • Reparse points Configurable for DFS shares The Replication Process AD Object version updated GPO \winnt\sysvol\sysvol\ compaq.com\policies DC1 \winnt\sysvol\ staging\ domain \winnt\sysvol\s taging areas \compaq.com Notify Partners The Replication Process Pull DC1 /\winnt\sysvol\sysvol\ DO_NOT_REMOVE_ ntfrs_PreInstall_Dom ain DC2 Sysvol version of GPT.ini GPO updated /\winnt\sysvol\ sysvol\compaq.com\ policies FRS Replication Observe File Replication Process • Edit a group policy – modify and save it. • Copy of changed file goes to staging and staging areas directories. • Copied to staging/staging areas directories on other DCs.. • Moved to sysvol\sysvol directory on the DC. • Group policy file is updated. Distributed File System (DFS) DFS Basics Domain-based (Win2K) vs Standalone (NT) Root • Must be on a DC. • Contains PKT. • DFS service. Replica • PKT from DC, stored locally. • DC or Member Server. FRS Replicates Data between DCs • Member servers DFS replicate data to share via DFS service. Site Aware (clients locate “closest” DFS Replica) The DFS Replication Process Data DC1 Root DFS service FRS SVR1 SVR2 DC2 Replica Replica Data Replica Data DFS Troubleshooting Symptom: Shared folders not in sync. Make Sure DFS service is started on all servers and DCs. Make sure AD Replication is working. Make sure FRS is working. DFSUtil.exe. Watch for applications that keep files open. • Anti-virus. • Defragmenters. FRS Troubleshooting Techniques Basics Remember… • You MUST install latest service pack and hot fix. – Post SP2 (SP3) Hot fix Q307319 – Don’t go any further until this is installed. • “Multi Master” characteristics replicates changes (and problems) quickly. Turn off the FRS Service to get control. • FRS depends on AD Replication, which depends on DNS. Diagnostic Tools Event Viewer: FRS log, DS Log NTFRSutl.exe • /outlog – outbound logs • /inlog – inbound logs • /ds – directory service NTFRSxxx.log in \winnt\debug NTFRS Health Check utility • HP, Microsoft Netdiag, DCDiag AD replication tools FRS Replication What happens if it breaks? • Changes not replicated to all DCs, resulting in inconsistent AD • Group policy gets out of sync and may not get applied. – GPOTool: Version mismatch • Logon scripts don’t get applied. • DFS shares out of sync. FRS Replication How to tell if it’s broken • Events in FRS log – Event 1000, 1001 in app log every five minutes. • Files backed up in staging areas – Get size of staging directories (MB). – Get date of oldest file (how long it has been broken). • Group Policy not applied (new changes) Replication Problems Ensure DNS is working. • DNS Lookup Failures in events (description). • Ping, Nslookup to resolve names. – Domain name – DC, Server names Ensure AD Replication is working. • Create New Objects and see if they replicate. • Repadmin/showreps and /showconn • DS Event Log • DCDiag Replication Problems Staging Areas should have no files • Common FRS problem. • Check size of dir, date of files. Ensure FRS is working. • Create text file on each DC, named for the DC. • Put it in \winnt\sysvol\sysvol\<domain name>. • All DCs should have copy of all DCs’ text files . Replication Problems FRS Event Log • 13508 – Normal…but watch them • 13509 – success after having 13508s • 13514 – When Sysvol share not created “FRS preventing computer from becoming a DC” • 13553,13554 – FRS successfully added computer to replica set (DCPromo successful) • 13557 – Duplicate Connection Objects • 13522 – Staging area full Q264822 • Lots of KB Articles: Search for “FRS and Event” Interpreting the Logs NTFRS_000x.log \WINNT\DEBUG Identify errors, warning messages and milestone events in the log files Very difficult to interpret NTFRSutl.exe Ntfrsutl inlog = Lists inbound log Ntfrsutl outlog = Lists outbound log Ntfrsutl sets = Lists replica sets Ntfrsutl DS = FRS’s view of the DS Can execute remotely: Ntfrsutl sets DC1 Group Policy Troubleshooting Group Policy Troubleshooting Basics Policy isn’t getting applied • Set something easy – Admin Templates – User Settings: Log off/on – Computer Settings: Reboot • Client-side extensions act as separate policies – debug separately from Admin Templates – Folder Redirection – Scripts – Disk Quotas – Security – IE Branding – EFS Recovery – IPSec – Application Management Group Policy Troubleshooting Basics Policy applied, but settings not effective. • Userenv.log (verbose) Q221833 • Set Diagnostic logging Q186454 HKLM\software\Microsoft\WindowsNT\CurrentVersion\Diagnostics Value: RunDiagnosticLoggingGroupPolicy Value Type: REG_DWORD Value Data: 3 (value 0-5 0=off) – Change One setting in GPO – Logoff/on or reboot – Verbose info in Application log – Lists all registry settings applied to user – Turn it off afterward – fills the event log fast! Gpresult.exe Resource Kit command-line utility. Reports applied policy for user, computer. • DN • Security groups Verbose mode – gpresult /v • Registry settings • Computer: Client-side extensions. WATCH: • Logon server. • Cached policy on client may mask solution. • Refresh Policy – make sure it’s applied . GPOtool Resource Kit command-line utility. Run on DC only. • Version Comparison: AD vs. Sysvol. – AD version set immediately on change. – Sysvol version set after FRS Replication. • Friendly name /GUID association Policy {08FAB736-9628-41D5-B5A8-37A0F98D7E43} Policy OK Details: ------------------------------------------------------------ DC: Qtest-DC2.qtest.cpqcorp.net Friendly name: Folder Redirection Policy Solving Version Mismatch Small mismatch is normal. • After change until FRS Replication completes. • Be patient – see if it resolves. Big mismatch is bad. • Prevents application of policy. • Unreplicated changes. • Manually set FRS version = AD version. – %windir%\sysvol\sysvol\<domain>\policies\{guid}\gpt.ini – Will lose changes. Resetting Default Domain Policy or Default DC Policy These policies are always same (GUID). • Default Domain: {31B2F340-016D-11D2-945F-00C04FB984F9} • Default DC: {6AC1786C-016F-11D2-945F-00C04FB984F9} Changes are a mess – need to restore default. To restore security defaults only, import the BasicDC.inf template (Q258595). If settings are hosed, copy an original copy of the policy to winnt\sysvol\sysvol\ <domain>\policies. • Copying policies only supported for these two cases. • Other will have different GUIDs. • Can’t copy other policies from one forest to another for debug. How to copy the Default Domain and Default DC policy 1. Get a copy of a clean, default policy folder. – Restore the policy folder (GUID) from backup. – Create new domain and copy the GUID folder from that machine . – Don’t zip it . 2. Delete existing policy. 3. Wait for replication. 4. Copy new policy folder to winnt\sysvol\sysvol\<domain>\policies. 5. Wait for replication. 6. Run GPOtool to make sure it shows up on all DCs. Unable to Edit Group Policy Group policy changed on PDC by default. If PDC is not available. • Dialog: Change on any DC, current DC or not. • Error: Unable to contact Domain (no DC). Solution: Transfer or seize the PDC role to another DC. Can set policy to NOT use PDC …. Don’t! Using Userenv.log to solve Group Policy problems Turn on Verbose Logging Q221833 interpreting group policy information in userenv.log Debugging Logon Scripts (script doesn’t apply) Configure it via group policy snap-in. Make sure policy is applied. • Set a desktop setting. • Use Gpresult /v. • Enable verbose logging for Userenv.log. Turn on “Run logon scripts visible.” Create simple logon script as a .bat file to make sure it’s not the script failing. Example: Using Userenv.log to find script errors. Can’t find FSMO Role Holder Problem: Operation trying to contact a FSMO role holder – PDC Emulator or…? • Can ping by name – seems to be ok • Operation can’t find it Solution: • Find out who has that role: netdom query fsmo (returns a quick list) • Transfer the role to a local DC Group Policy Refresh Anomaly Users complain of a 5-25 second “hang” intermittently in any application – Outlook, Word, 3rd party apps. Keystrokes are buffered and they can continue to work Noticed direct correlation between the 1704 events (GP Refresh) and the “hang”. Change refresh interval via group policy and the frequency of the “hang” changed. Group Policy Refresh Anomaly Cause: SceCli applies group policy every 16 hrs (default) if no gpo changes have occurred. (DCs are every 5 minutes) • Broadcasts WM_settingschanged to all top level windows • Wakes up sleeping processes causing massive paging in/out of memory – causing hangs • More pronounced on “slower” computers Solution: Configure Policy Refresh Interval in Group Policy so refresh occurs every 12 hrs at midnight/noon so users don’t notice it. Account Lockout Background Finding locked out user accounts Client Bugs and Fixes Server Bugs and Fixes Resolution and Futures Lockout Reasons & Options Prevent spoofing or hijacking account Optional event logging in Audit Policy Account Lockout Options • Timed lockout – Account enabled after admin defined time • Hard lockout – Account disabled until reset by admin • Lockout policy defined in group policy – Single lockout and password policy per domain – Location: default domain policy Account Lockout on DC’s Each DC records # of bad password attempts BDC check PDC for latest password All Bad password attempts seen by PDC • PDC always 1st to lock out account • PDC urgently replicates lockout when threshold reached • Bad password attempts not replicated by DC BadPasswordCount reset to 0 on 1st good password PDC chaining operations If BDC fails authentication with: • • • • • STATUS_WRONG_PASSWORD STATUS_PASSWORD_EXPIRED STATUS_PASSWORD_MUST_CHANGE STATUS_ACCOUNT_LOCKED_OUT Referred to as “BadPasswordStatus” BDC chains authentication to PDC • Return status from PDC if status = success or listed above • Otherwise, ignore PDC status and use local status Exception to PDC chaining • AvoidPDCOnWan enabled and PDC in remote site (Q225511) • 10 “BadPasswordStatus”events logged in 10 minutes – NegativeCache enhancement Q263821 – Cache reset after good password entered Troubleshooting account lockouts Your goal: Answer the 4 W’s • Who, Where, When and Why Environment setup • Enable Auditing in domain policy – Account Logon Events – Failure – Account Management – Success – Logon Events – Failure – Security Event log on DC’s: 10K events + over-write • Enable netlogon logging (ntlm clients) – NLTEST /DBFLAG:2080FFFF (no reboot) • Enable Kerberos Logging – Q262177: Kerberos logging (kerb clients) Account Lockout – Where DC Resources • NTLM Clients – Search DC & CLIENT NETLOGON.LOG for lockouts – 0xC000006A = bad passwords – 0xC0000234 = account lockout • NTLM + Kerberos Clients – – – – – – – Search DS Event Logs Q230254, Q299475, Q273499 and Q301677 for description 644: NTLM + Kerberos Lockout Event 675: Kerberos badd password 681: NTLM bad password 529: Failed logon 531: Account disabled Tools • EVENTCOMB • AL.EXE • NETMON.EXE EVENTCOMB AL.EXE Account Lockout: Why Attack, “Pilot Error” or Bug • Wrong Password entered, mis-configured Service Account Scenario • Account type: user, computer or service account • Lockout trigger? • logon, drive access, following p/w change) Drill Down: Look at TOD, pattern & frequency • Process related lockouts – Structured pattern – Logged when users not present – Look for: – common services, applications, client configuration • User related lockouts – Random pattern, – Fewer events logged – Look at: – shortcuts, mapped drives, logon scripts, applications Account Lockout – Client Win9X • • • • Q278558: Access denied to a mapped drive after disconnect Q272594: Client can't log on after log off w/o reboot Q293793: VREDIR looses file tracking structures Q271496: One unsuccessful logon attempt triggers lockout (1:3) – Net use + dsgetdc + logon attempt. • Q266772: Logon fails if Unicode string password to NTLM SSPI DS Client on Win95, Windows 98, 98 Second Ed • DSCLIENT *MUST be installed before any hotfixes! – Q301344, Q283261 – DS Client lets WIN98 account lockout fixes work on Win95 Win2K • Q275508: User locked when accessing home dir after changing p/w • Hotfix or SP2 Windows XP • None Account Lockout: Server Fixes Read server side KB articles • Q287639: Win9x Clients Locked Out after unlock – MSV1 package does password check against BDC with old password during 2nd phase of logon • Q278299: Bad p/w count not reset to 0 (ntlm) – Original hotfix had regression. Confirm latest version deployed. • Q263821: Bad p/w count not reset to 0 (kerb) • Q292573: DSA.MSC and ADSI may not use same DC to WinSERaid:16662 (post SP2 hotfix) Resolution • Windows 2000 DC’s: Install SP2 + Q314282 – Same QFE as lingering object and other good DC fixes • Service Pack 3 PDC FSMO Load Reduction Windows 2000 domains are much larger than their NT 4 predecessors • i.e. > 50,000 clients NT 4 and WIN9X clients still deployed and target PDC only for updates Windows 2000 / XP clients use Windows 2000 DCs in mixed mode domains (Q284937) Older applications select PDC only rather than any DC Applications may enumerate whole domain ( NT 4 usrmgr, srvmgr ) Result: PDC gets more load Symptoms of Overload High CPU utilization for long period • Greater than 70% • High average disk queue – Disk queue > number spindles • Timeout of requests – Password changes Steps to Optimize PDC Optimize hardware and software Hide PDC from DNS clients Implement WINS optimizations Block down-level enumeration PDC in dummy site Optimize Hardware & Software Run Windows 2000 Advance Server with /3gb switch • Enables ESE cache of 1.5 gb 4 Processor Server is optimal 2 Gb RAM Disk • RAID 1 set for OS and Page File • RAID 1 set for Log Files • RAID 0+1 for NTDS.DIT and sysvol Run only core DC services Disk • RAID 1 set for OS and Page File • RAID 1 set for Log Files • RAID 0+1 for NTDS.DIT and sysvol Run only core DC services Hiding Techniques (DNS) Lower PDC SRV Priority • Reduce chance of DS aware clients selecting PDC before other DCs • HKLM\System\CurrentControlSet\Services\Netlogon\Parameters\LdapSrvPriority=1000 • Data type: Reg_DWORD PDC only Site • Clients will use it only as last resort • Create a site-link to real site Disable AutoSite Coverge on PDC • HKLM\System\CurrentControlSet\Services\Netlogon\Parameters\AutoSiteCoverage=0 Hiding Techniques (WINS) Down-level clients locate DCs through 1C queries WINS always adds PDC first in 1C list Remove PDC from top of list (SP2) Q269424 – HKLM\System\CCS\Services\WINS\Parameters – Value name: Add1Bto1CQueries – Data type: Reg_DWORD – Value data: 0 = disabled, 1 = Enabled (default) Randomize 1C list for general load balancing – HKLM\System\CCS\Services\WINS\Parameters – Value name: Randomize1cList – Data type: Reg_DWORD – Value data: 0 = disabled, 1 = Enabled – Q231305 (NT4 SP4 and later) Block Enumeration Old (non DS enabled) applications often call SAM APIs to enumerate entire domain Hard to control Block unauthorized users from seeing more than 100 objects per call • New access control right determines access • HKLM\System\CCS\Control\Lsa\SamDoExtendedEnumerationAccessCheck=1 • Q268339 Misc. – Server Applications Server based applications can create frequent changes in the directory • Agent based systems – Create and delete accounts – Grant accounts rights in the domain Changes create replication • AD replication for frequent group changes • FRS changes for policy changes Apply SMS hot fixes • Q311127, Q278345 • Read articles, configuration necessary Distributed Link Tracking Purpose • Used to track moves of linked files across volumes and servers (shell shortcuts) • Uses AD objects to track files and volumes Objects stored in DS • linkTrackVolentry object for each NTFS volume in the domain • linkTrackOMTEntry created for each linked item that is moved • Clients query service when a shell shortcut or OLE link can’t be resolved Clients refresh links every 30 days DCs scavenge objects older than 90 days Distributed Link Tracking DLT is an optional service • Enabled by default Typically not included in DS capacity planning Best Practices • Disable on all DCs – Reduces AD replication traffic – Reduces AD database size • Use Group Policy to disable DLT server service on DCs • Remove objects from DS – Use staggered approach • Q312403 DC/GC Promotion Consideration DC Promotion / Demotion Process to cleanup after failed promotion GC Promotion GC Demotion DC Promotion / Demotion Create proper sites before hand Failed promotion or removing server • Manually clean out metadata from any failed attempt – When replacing a failed DC – When a DCPROMO has failed – To clean meta data – Use NTDSUTIL – FRS member / subscriber objects – Machine account in domain • Allow replication to all DCs before promoting again GC Promotion First GC in site may go online before all partitions are replicated • Default: GC will advertise after all partitions in site replicate • Exchange may use GC before ready • Mail may bounce Best Practice • Stop Netlogon • Mark DC as GC • Use repadmin to monitor success • Start Netlogon all NCs replicated SP3 will wait for all partitions to replicate before advertising GC Demotion GC removal requires time for object removal The KCC removes 500 objects per default 15 min cycle Best Practice • Monitor for event 1069 to record progress • Forced GC removal when needed (Q297935) – Remove each partition with repadmin – repadmin /delete DC=globalit,DC=unity,DC=com %destgc% /nosource Container Inheritable ACE’s ACE that applies to either all objects or objects of a specific class in a container • Example: Delegate right to reset user passwords in one OU Security Descriptor propagation copies ACE to all objects • Makes access check very fast – All information is on directory object • Also class specific ACEs are copied to all objects – Example: ACE used to delegate right to reset user passwords also copied to computer and container objects Increases object size – database size • Increase proportional to size of subtree – If set on domain root: Highest impact – If set on OU: Lower impact (depends on number of objects in OU) • Low impact if set on schema or configuration container SD propagation is asynchronous • Takes time to propagate (i.e., 3 hours in 50,000 user domain) Container Inheritable ACEs Best Practices Don’t add container inheritable ACEs to domain root Add on OUs as appropriate • Best Practice Documentation recommends OUs for – Users – Groups – Computers • Container inheritable ACEs on these OUs have small impact only Watch SD propagator events • SD propagation running: 1257 (Level 2) • SD propagation report (objects touched): 1258 (Level 2) • SD propagation terminated abnormally: 1262 (Level 0) Always leave sufficient disk space on database partition • 20% of database size, at least 500 MB • Monitor! Test ACL changes in lab or pilot domain to bracket size increase Container Inheritable ACEs The Future Windows .NET will have single-instance store for Security Descriptors • Objects have links to security descriptors • If container inheritable ACE changes, only one SD changes – No impact on disk size Does not require .NET only forest • SD propagation happens on local DC • Transparent to other DCs • Feature available immediately Monitor SD prop events after upgrading a DC • SD propagator will build single instance store after the domain controller boots .NET for the first time Database will shrink after OS upgrade • Need to off-line defrag database to see changes Forest Recovery Imagine the unthinkable • All domain controllers crash and won’t reboot • Data corruption replicates through the forest • Schema becomes unavailable • Somebody made changes to the schema that prevent standard applications from installing • Malicious administrator performs irreparable damage to the schema that replicates through the forest • You lose your root domain • You win the lottery So far, this has never happened • But you want to be prepared Forest Recovery Rolling back in time Restore – Changes lost Identified Root Cause Catastrophic Event Changes Time Backup Backup Backup Backup Backup Backup Backup Backup Backup Backup Forest Business Recovery High Level Steps Shutdown all domain controllers in forest In each domain • Restore one DC from good backup tape • Re-install OS on all other domain controllers • Re-promote all other domain controllers Start with root domain first Forest Recovery Shutdown all DCs Restore one DC per domain (off-network) Disable GC service Break replication Seize FSMO roles Increase RID by 100,000 Bring restored DCs back on the network Enable GC on at least one root DC Forest Recovery Re-install OS on all other DCs Promote all other DCs Enable GC service as needed Move FSMOs as needed Forest Recovery Detailed steps available very soon in white paper on microsoft.com • Best Practice for Recovering your Active Directory Forest FRS Concepts revisited Objects in DS • Members, Subscribers, Conn. objects, filters • Depends on AD replication • Determines partners and schedule NTFS USN Journal • Used by FRS to track changes to NTFS volumes Staging File and Directory • Rename safe • Compression support Database • Record of incoming, outgoing & existing files FRS Replication Operation Create / Modify file NTFS Drive NTFS Drive FRS learns of file changes from the NTFS “USN Change journal” Filter out unwanted files Age Cache waits 3s Rename + move file to final location Write OB Log Write entry in FRS ID Table Copy file into Pre-install area Build staging file Replica copies file to staging dir Write to OB log for other replicas Send change order to partner Request change Write to Inbound and ID log Journal Wraps / Staging backlog NTFS USN Journal is a fixed-size log of file changes • FRS Service must run to keep up with these changes • Last ∆ in FRS DB must exist in NTFS journal – If not, FRS cannot know all changes. Called ‘journal wrap’ • Resolution – Keep Service running (especially during bulk modifications) – Increase size of USN journal (automatic in SP3 rollup) Staging File backlog • Before SP3, staging files stored until all direct partners receive the staged files – Associated with connections • Common causes of backlogs: – Offline downstream partners – Full SYNCS by Administrators or applications – Antivirus , Disk Optimizers, File system policy • Sharing violations / Move-In problems Reconcilation & Morphed Directories Files: Last-writer wins • All change orders have event times (UTC) • Event time of CO compared to ID Table – Event time > 30 minutes, last writer wins – Event time < 30 minutes, highest version wins Folders: Last-writer wins • Conflicting change gets morphed name – Preserves files associated with directory – First-writer wins for name conflicts of folders • Causes – BURFLAGS abuse – Conflicting creates on replication failure FRS Enhancements (Q319473) QFE roll-up of coming Service Pack 3 changes Increases NTFS USN journal: 128 MB Dynamic staging file relocation LRU staging files deleted: 60 / 90 rule Staging files for offline partners deleted SYSTEM = Full Control / NTFS bug Duplicate changes not sent on wire + event Office XP (Excel) data deletion fix Topology Enhancements DFSGUI from .NET Server • Runs on XP clients in Windows 2000 domains • Available on microsoft.com now: Q304718 New topology options • Full Mesh, Ring, Simple Hub & Spoke • Custom Topologies • Connection Tuning – Enable / disable individual connections – Change orders are associated with connections – Disabling connections deletes associated backlog Connection Priority (may pull this) • Bit on options attribute of connection object • Defines partners used during initial / recovery sync – High: “Must” source all connections in class – Medium: Source from at least 1 connection in class – Low: “best effort” sync FRS best practices Run Q307319 + new NTFS.SYS Keep service running • Avoids journal wraps Join empty replica sets Don’t place DFS targets on OS partition DFS: enable replication on child links • Targets can be taken offline • Incremental sourcing & advertisement of data • Replica set specific burflags Properly size staging dir • 128 largest files + 50% or 650 MB minimum Don’t delete files from staging directory • Change orders, # of VV joins, file size FRS best practices Topology management • No full mesh • SYSVOL: requires 1 in / outbound CO Forceful deletion of FRS members • Delete member and subscriber objects Tools NTFRSUTL • NTFRSUTL DS – Repadmin /showconn for FRS – DS Object inventory + topology review • NTFRSUTL SETS – Repadmin showreps for FRS – Status of downstream partner sync status • NTFRSUTL INLOG | OUTLOG: IDTABLE – Inbound + outbound changes + tree inventory Debug Logs: systemroot%\debug\ntfrs_*.log • Two way conversation between partners Summary All deployments should run SP2 Deploy SP3 when available Q314282 provides roll-up fix for many issues • Lingering objects • Account lockouts • PDC overload situations Monitor Active Directory New Documentation Available on microsoft.com • Best Practices for Active Directory Delegation – http://www.microsoft.com/windows2000/techinfo/planning/activedirectory/addeladmin.asp Coming soon • Active Directory Monitoring Guidelines and Key Indicators • Active Directory Forest Recovery Eventcomb – http://download.microsoft.com/download/win2000adserv/secops/RTM/NT 5/EN-US/SecOps.exe