Uploaded by lito_bautista

Cleaning Documentation

advertisement
Documentation of Data Manipulation
Initial data was taken from generated canned reports - reports with pre-defined fields structure - which explained the
flatness and redundancy of columns. We did first level of normalization and removed PII’s and sensitive financial
information.
From 64 columns it was narrowed down to our MAIN Dataset, with 12692 rows and 30 columns.
ADDS
Added Region Name in Main Spreadsheet to corresponding HID using VLOOKUP using data from Hospital by Region
spreadsheet.
Added Months until Regularization and Tenure in Years.
Months until Regularization = RegularizationDate – HireDate
(If never regularized, value is NULL)
Tenure in Years = SeparationDate - HireDate
(If still employed in hospital, value is RecordDate - HireDate
Extracted Reason for Leaving and SeparationType from Attrition Data spreadsheet.
CHANGES/UPDATES
Changed the Rank of all rows with Employment Classes: Executive, Head, Manager, Managerial to Managerial.
Did not do this with Employment Classes: Supervisor and Supervisory.
Regular employees with negative Months to Regularization (58) changed to Not Disclosed.
Changed Statuses of rows (155) with mis tagged probationary statuses.
Changed Ages and Birthdates of (135) rows with NULL/invalid ages.
Fixed naming conventions of Employment Class/Rank.
Some (21) entries with EmploymentStatuses as Active have SeparationDates. Will covert EmploymentStatus to
Separated.
REMOVALS
Removed Province column as all values are NULL.
Only 5434 were disclosed to be regularized.
Removed entries with negative tenures (18).
Removed (75) rows that are classified as Separated but have no SeparationDates.
Removed Course column as 11k are NULL values.
Removed (3) rows with employee birthdates in the future.
Removed “Project” column. All NULL values.
Removed LicenseReleased/LicenseExpdate, JobGrade and RateClass as we can’t use the variables/unimportant for our
purpsose.
Removed all (33) rows that have birthdates of < 6 years.
FINAL COUNT: 10, 521 Rows
Specific Attrition data only covers 765 employees.
Link to Dataset: Combined Hospital_Data.xlsx
Download