A Design-Driven Approach to A.I. Explainability CUNY SPS MS Data Science DATA 698 Prof Sabrina Khan Rob Hodde May 17, 2022 Introduction * Leading A.I. Models are black boxes * People are afraid of black boxes * This is not good And we’re worried plenty… What is being done? Introducing: XAI Explainable Artificial Intelligence XAI XAI XAI So What’s wrong with XAI? XAI It’s another black box! We propose a transparent A.I. engine, Easy to explain, Yet powerful… Rational Forest (RAF) GAM (Generalized Additive Model) + RF (Random Forest) Maps entire training space as unique decision trees RAF Instead of randomly building many decision trees that disagree with each other… And inviting them to vote democratically… RAF Why not use all the data, Build every species of tree, And use the best tree for the job? We propose a Rational Forest is Better Than a Random Forest OK, PROVE IT! Build a competitive prediction engine that is explainable to the non-technical end user that answers the following question: “If I buy this stock today, will the price go up in the next week?” Method ology No free lunch The “No Free Lunch” theorem states that, no engine can work best on all problems. The Rational Forest is designed to answer the research question. It may not perform well on other questions. FASTEN SEAT BELT MAJOR JARGON FEST APPROACHING Method ology Python: VS Code dev environment .NET/SQL: VS Method ology data store All data stored in MS-SQL server. Fast, reliable, powerful, integrated, scalable! Method ology data collection Commercial Provider: First Rate Data Method ology data model A tabular data model allows dynamic SQL generation When the table is updated, the code updates itself Method ology response variable Tesla predictors experi mentation Common measures of recent volatility and price movement response curves experi mentation classifiers experi mentation One-hot encode predictors to vote Yay or Nay collinearity experi mentation After removing weaker predictors, We are ready to vote! experi mentation vote Four “Yay” Votes For TSLA on March 8 = 71% Likely to Profit WIPE YOUR EYES CHUG MOUNTAIN DEW power of the vote experi mentation Wait… How does it calculate the 71% ? RAF build experi mentation 1: Start With Predictor Pairs: Use the training data to calculate how strong they are together: RAF build 2: Add Another Predictor: experi mentation Precision If the new predictor makes the team stronger, keep it. Otherwise, discard. Keep adding predictors; up to twelve can play on a team. RAF build experi mentation At the end you get something like this Example 1 experi mentation Stock RAF Example 2 Stock experi mentation RAF Example 2 experi mentation accuracy experi mentation RAF classification Scoring is based on Test (holdout) data only. accuracy comparisons experi mentation RAF TPOT Rec 1 hour TPOT Rec 12 hours TPOT Rec 3 days explain ability experi mentation explain ability Specific Lift table: experi mentation experi mentation explain ability explain ability More About Lift: Like triage, the first intervention is the most important Additional countermeasures are necessary, but add less experi mentation explain ability General Lift experi mentation IS THIS THING EVER GOING TO END TAKE DEEP BREATHS conclusion 1. RAF = Hybrid Ensemble Classifier 2. Competitive 3. Explainable Next steps 1. Graded Voting 2. Scrambled Lift 3. Mo’ Models ALL DONE !!! THANK YOU !!