Documentation of Data Manipulation Initial data was taken from generated canned reports - reports with pre-defined fields structure - which explained the flatness and redundancy of columns. We did first level of normalization and removed PII’s and sensitive financial information. From 64 columns it was narrowed down to our MAIN Dataset, with 12692 rows and 30 columns. ADDS Added Region Name in Main Spreadsheet to corresponding HID using VLOOKUP using data from Hospital by Region spreadsheet. Added Months until Regularization and Tenure in Years. Months until Regularization = RegularizationDate – HireDate (If never regularized, value is NULL) Tenure in Years = SeparationDate - HireDate (If still employed in hospital, value is RecordDate - HireDate Extracted Reason for Leaving and SeparationType from Attrition Data spreadsheet. CHANGES/UPDATES Changed the Rank of all rows with Employment Classes: Executive, Head, Manager, Managerial to Managerial. Did not do this with Employment Classes: Supervisor and Supervisory. Regular employees with negative Months to Regularization (58) changed to Not Disclosed. Changed Statuses of rows (155) with mis tagged probationary statuses. Changed Ages and Birthdates of (135) rows with NULL/invalid ages. Fixed naming conventions of Employment Class/Rank. Some (21) entries with EmploymentStatuses as Active have SeparationDates. Will covert EmploymentStatus to Separated. REMOVALS Removed Province column as all values are NULL. Only 5434 were disclosed to be regularized. Removed entries with negative tenures (18). Removed (75) rows that are classified as Separated but have no SeparationDates. Removed Course column as 11k are NULL values. Removed (3) rows with employee birthdates in the future. Removed “Project” column. All NULL values. Removed LicenseReleased/LicenseExpdate, JobGrade and RateClass as we can’t use the variables/unimportant for our purpsose. Removed all (33) rows that have birthdates of < 6 years. FINAL COUNT: 10, 521 Rows Specific Attrition data only covers 765 employees. Link to Dataset: Combined Hospital_Data.xlsx