An Artificial Intelligence Driven System to Predict ASD Outcomes in ABA David J. Cox1,2, Dana D’Ambrosio1, Jamie Pagliaro1, Rethink Data Team1 1 Rethink First; 2 Endicott College Past researchers have sought to describe and predict how individuals with autism spectrum disorders (ASD) are likely to benefit from applied behavior analysis (ABA) therapy. These studies, however, have had limited generalizability due to sample sizes, simple modeling approaches, and failing to include more holistic patient profiles. Further, few studies have embedded their results into technology platforms practitioners can incorporate into treatment settings. In this article, we provide an overview of how we used 48 variables spanning hours and characteristics of ABA, treatment goals characteristics, and patient characteristics to predict goals mastered for 31,294 individuals with ASD receiving from 615 companies. Unsupervised machine learning identified between 8-56 distinct patient clusters (depending on the algorithm) differing along characteristics known to influence outcomes from past published research and patient characteristics unpublished in past research predicting patient progress from ABA. Linear regression models (as used in past research) led to an overall r2 value (r2=.90; MAE=1.30) that was ~.23 higher than previously published studies. Machine learning improved predictions further (r2=.99; MAE=1.04). When predictions were made within patient clusters, r2 ranged between .95-.99 (~.20-.24 points higher than past research) and MAE ranged between 1.12-1.45. To close, we describe how this Artificial Intelligence (AI) system is embedded within a technology platform continuously collecting data. This allows the system to improve over time and, in turn, users of the AI system can use the results to improve ASD outcomes through use cases such as: (a) real-time recommendations of ABA dosage based on unique patient characteristics; (b) feedback on actual versus expected patient outcomes; and (c) how patient progress varies along social determinants of health. In the future, these data and the underlying models could be leveraged by payors and providers alike to support and enable their unique value-based care initiatives. Keywords: outcomes; artificial intelligence; machine learning; autism; applied behavior analysis Predictability is a common concern in the delivery of healthcare services. Patients typically want to know how likely it is that each of several treatment options will be effective (e.g., Meyer et al., 2021); and they want to know the potential treatment length and cost relative to their personal budget and calendar so they can plan and budget accordingly (e.g., Bhargava & Loewenstein, 2015; Han et al., 2012). Likewise, insurance companies often want to know how potential treatment length and cost so they can forecast expenses and plan strategically for the future (e.g., Papanicolas et al., 2018). Finally, healthcare practitioners want to know potential treatment length, the possible range Conflict of Interest: The AI system (patent pending) and products leveraging this AI system are the property of RethinkFirst. Funding: No external funding was used to support the work published herein. Disclaimer: The AI system described herein is continuously ingesting more data and the modeling methods are being continuously refined to improve performance. Thus, all results discussed herein should be considered as the static state of the system at the time of writing in February, 2023. in which outcomes can improve, and the resources it will require so they can manage things such as caseloads, waitlist communication, hiring, and business operations (e.g., WHO, 2015). It is through this lens that the current work sits relative to one area of healthcare in the delivery of Applied Behavior Analysis (ABA) services for individuals with autism spectrum disorders (ASD). The prevalence of ASD diagnosis has increased in the previous 20 years from 1 in 150 to 1 in 44 children (Centers for Disease Control and Prevention, 2021 ). As a result, stakeholders (patients, parents, payors, and providers) are increasingly aware of the variability in cost, treatment duration, and patient outcomes. For example, the average annual cost of treatment is estimated to range anywhere between $17,227 to $130,182 (Autism Speaks, 2022; Eldevik et al., 2009); and individuals diagnosed with autism and their families might receive ABA for between 5 to 40 hours per week for a duration ranging between 18 months to five years (Larsson, 2012). Such variability in cost and duration of treatment following AI SYSTEM TO PREDICT ABA OUTCOMES an ASD diagnosis adds significant uncertainty to all involved (De Groot & Thurik, 2015; Han et al., 2012; Meyer et al., 2021). Compounding this uncertainty around increasing amounts of ABA service delivery is a lack of standardization for ABA dosage as a function of patient presentation at intake and throughout the duration of services. At the time of writing1, the process of determining optimal dosage of ABA for each patient is seemingly subjective. Past research suggests a patient’s age, symptom severity, historical treatment duration, personal/family needs, current abilities, and the overall goals of therapy should play a role in how many hours per week of ABA an individual might need (e.g., APBA, 2020; Granpeesheh et al., 2009). However, no known quantitative methodology with robust empirical support allows clinicians to translate those variables into a precise recommendation of the optimal number hours of ABA. Thus, a tool that provides reliable and precise predictions based on patient data and published best-practice evidence would help decrease the subjectivity and uncertainty around ABA treatment. In addition to decreased subjectivity and reduced uncertainty around ABA treatment, board certified behavior analysts (BCBAs) arguably have an ethical obligation to improve in this area. The Behavior Analyst Certification Board (BACB) Ethics Code for Behavior Analysts includes several guidelines that speak directly to this topic (BACB, 2020). For example, Guideline 2.08 requires discussing the scope of treatment with a client before starting services; Guideline 3.01 requires BCBAs to identify and act upon opportunities that lead to avoidable harm or wasteful allocation of resources; and, Guideline 3.12 requires BCBAs to advocate for appropriate services. Here, leveraging a reliable, precise, and evidence-based method to recommend ABA dosage allows the BCBA to accurately communicate the scope of likely treatment; why they believe benefits will be maximized while avoiding wasteful allocation of resources; and to know that the amount and level of behavioral services they are recommending is data-based. Past researchers have sought to improve predictability and aid BCBAs in their ethical obligations by quantitatively modeling the relationship between the hours that patients contact 1 February, 2023. We want to reiterate that this is a relative claim as they are among some of the largest studies to date examining these kinds of 2 2 ABA and the resulting outcomes. For example, Linstead and colleagues (2017a) modeled the relationship between treatment intensity and mastered learning objectives for 726 children aged 1.5-12 years who received community-based behavioral intervention services. Linear regression and neural network models led to r2 values of .35 and .60, respectively. Similarly, in a separate study by Linstead et al. (2017b), they used linear regression to model the relationship between treatment intensity/duration and goals mastered within specific skill domains for 1,468 individuals with ASD. Across goal domains, r2 values ranged between .50 (social goals) and .67 (motor goals). As a final example, Ostrovsky and colleagues (2022) used a series of t-tests, effect size measures, and Pearson correlations to quantify the relationship between changes in standardized assessment scores as a function of hours of ABA and modality of supervision. For the 178 individuals included in the study, they observed that clinically significant improvements in function were independent of the hours of ABA received. Though broader limitations to this (and other) past studies are described in more detail below, the variance accounted for by these models was likely impacted by several factors. First, the overall sample sizes in these studies were relatively small2. Second, as noted by the authors of several studies, models have often been built without accounting for the heterogeneity that comprises the broad spectrum that is ASD and that was unlikely to be fully captured in the studied samples. A final limitation to past work in this realm (noted by Ostrovsky et al., 2022) is the assumption of a linear relationship between hours per week of ABA and progress made. Dose-response curves in other areas of behavioral healthcare are often decidedly nonlinear (e.g., Dews, 1955; Levy et al., 2020; Zoladz & Diamond, 2009). Thus, modeling approaches that more flexibly account for nonlinear relationships may perform better (e.g., the neural network outperforming linear regression in Linstead et al., 2017a). PREDICTING PATIENT PROGRESS BASED ON PATIENT & ABA CHARACTERISTICS One technique that researchers have used to create data-driven subgroups from larger datasets relationships. “Small” here is in reference to the total number and amount of heterogeneity present with the ASD population as a whole and compared to the sample size in the current study. AI SYSTEM TO PREDICT ABA OUTCOMES is called cluster analysis (e.g., Ansari et al., 2018; Charron et al., 2023; Hewlett et al., 2022). Cluster analysis at its simplest can be defined as finding groups in data (Hennig et al., 2016). For our purposes here, cluster analyses would seek to separate “all individuals with ASD in our sample” into smaller subgroups based on the totality of their demographics and clinical-developmental states while they contact ABA. The idea is that modeling relationships between contact with ABA and patient progress might either be more accurate when conducted within patient clusters or lead researchers to differentially identify important variables that predict patient progress for specific clusters. In contrast to traditional modeling techniques where the relations between independent variables and a dependent variable is known, cluster analyses are often more exploratory as the researcher may not know how many clusters should exist nor which patients should be grouped in which cluster. Uncovering these relationships is often the goal of the analysis. Researchers have used cluster analysis techniques to identify subgroups of individuals within samples of data gathered from individuals with ASD. For example, Parlett-Pelleriti and colleagues (2022) published a review of how unsupervised machine learning has been used to identify patient clusters for individuals with ASD. Table 1 shows the descriptive statistics of the sample sizes and number of clusters identified across the 36 studies included. Two items are of note, here. First, considering the heterogeneity of ASD, the sample sizes were relatively small with the median sample size being 220, 78% of the studies used less than 1000 participants, and all but one used less than 5000 participants. It is unknown what the clustering results might look like when a larger sample of individuals with ASD are included. The second item to note were the variables used for clustering analyses and the resulting number of clusters. The median number of clusters identified across studies was three with results largely clustering around the ASD diagnostic criteria or assessment domains. This makes sense as those were the data used for clustering in many of the studies. But individuals with ASD are much more than the developmental and behavioral patterns contained in diagnostic or skills-based assessments. Minimally, they may have comorbid diagnoses or medical concerns that influence their unique clinical course (e.g., Lingren et al., 2016). Additionally, individuals with ASD (like the rest of us) are embedded within larger socioeconomic, 3 Table 1. Descriptive statistics of research applying unsupervised machine learning to cluster individuals with ASD (Parlett-Pelleriti, 2022). Maximum Arithmetic Mean Standard Deviation Median Minimum Current Work* Sample Size 20,658 1,218 Number of Clusters 7 3.32 3,504 1.21 220 10 31,294 3.00 2.00 8-53 * NB: Because this is a “live” system, the overall sample size is continuously increasing and the specific clustering results are continuously being refined toward optimal. familial, and educational contexts. Past researchers have found that many variables outside of diagnostic symptomatology will predict responseto-intervention such as health-related features of neighborhoods and socioeconomic factors (e.g., Braverman & Gottlieb, 2014). Clustering analyses that attempt to identify more holistic subgroups of ASD beyond diagnostic and assessment classifications will need to include a greater number of variables. A second limitation to past research unmentioned thus far has been how a stakeholder might use the information from the published analyses. For example, practitioners might be interested in answers to questions such as: How might the models provide a recommended dosage of ABA for an individual? How might practitioners design treatment programs based on this information? How could practitioners gain feedback around the patients’ outcomes they observe and what is expected based on these models? And, when considering the entirety of a patients’ unique human situation, what variables are most predictive of intervention success, and which are least predictive? Lastly, how might answers to these questions be packaged into a system that makes access to clustering and predictive models as efficient as possible? The purpose of this manuscript is to describe at a high-level how researchers have been addressing the abovementioned limitations while attempting to provide answers to the practical practitioner questions. To do this, we first provide an overview of a patent-pending Artificial Intelligence (AI) based system that: continuously collects data on patient and ABA-related variables with a known relation to patient progress from ABA therapy; aggregates the data for analysis; uses unsupervised machine AI SYSTEM TO PREDICT ABA OUTCOMES learning to cluster patients and supervised machine learning to predict goals mastered; and returns the model outputs back into a technology platform to be used by an end-user. Following this high-level overview, we describe the general results of patient clustering analyses and how predicting patient outcomes is impacted when using these patient characteristics. In total, this manuscript provides a description of how the AI system functions, continuously improves, and preliminary results which will likely continue to improve. HIGH-LEVEL SCHEMATIC OF THE AI SYSTEM Figure 1 shows a high-level overview of the patent-pending AI system. The system involves five major steps. First, users interact with a web-based platform to enter the raw data for the patient and ABA-related variables used in the AI Engine (see below for more details). The specific application that captures these variables can differ depending on the variable, end-user, and product they are using. The output of this first step is raw data having 4 been collected and stored in a product-specific database. The second step of the AI system is to use extract, transform, and load (ETL) workflows to move the data from the product-specific databases associated with its collection to a single, central location (i.e., a data warehouse). Specifically, the ETL workflows act on the output of the first step and regularly move data from the product specific databases to a centralized data warehouse within the scope of a data model allowing for the data from many of the tables to be related to one another. The primary output of the second step is a set of relational tables where each table contains some portion of the total set of variables used for modeling in the AI engine. The third step involves a series of Python scripts that conduct the data pre-processing and analytics referred to as the AI Engine. The AI Engine is described in more detail below. Here, the primary points are that the Python scripts involved in this workflow: (a) pre-process the data stored in the relational tables in the data warehouse; (b) Figure 1. High-level schematic of the AI system to predict patient clusters and patient outcomes based on contact with ABA therapy and their unique patient cluster profile. AI SYSTEM TO PREDICT ABA OUTCOMES combine the variables from many related tables into a single analytic data frame; (c) conduct unsupervised machine learning relative to patient clustering; (d) conduct a suite of statistical and supervised machine learning analyses relative to predicting patient outcomes using the patientrelated and ABA-related variables; and (e) deploy two top-performing models as API endpoints—one each for patient clustering and predicting outcomes. The primary output of this third step are the mathematical-computational models for patient clustering and predicting patient outcomes. The fourth step uses the primary outputs of step three to build dose-response curves unique to individual patients. Specifically, a user interacts with a web-based tool to provide data specific to the patient-related and ABA-related variables needed for the finalized models from step 3. Once collected, that data can be run through the clustering and outcome prediction models to generate a unique dose-response curve predicting estimated progress as a function of the patient characteristics and a range of hours of ABA the patient could contact. The fifth step uses the dose-response created in step four toward some practical aim. Few people are likely interested in identifying patient clusters or predicting outcomes simply for the sake of doing so. Rather, they want to use the information to help them do something better than they currently do. To 5 highlight what’s possible, we walk through three possible use cases based on the output of step four in the final section below. These include: (a) using the dose-response curve to recommend hours per week of ABA to optimize patient progress; (b) to gain performance feedback between observed and expected patient outcomes for a clinician’s caseload or an organization as a whole; and (c) how patient progress varies along social determinants of health when accounting for the total milieu of patient characteristics and their interactions. Before we get to these use cases, however, we will first briefly review how the AI Engine works and current results around patient clustering and precision predicting patient progress. HIGH-LEVEL OVERVIEW OF THE AI ENGINE As noted above, the AI Engine is comprised of five distinct steps (Figure 2). These are: (a) preprocess the data stored in the relational tables in the data warehouse, (b) integrate the various tables into a single, analytic data frame, (c) conduct unsupervised machine learning relative to patient clustering; (d) conduct a suite of statistical and supervised machine learning analyses relative predicting patient outcomes using the patientrelated and ABA-related variables; and (e) deploy two top-performing models as API endpoints—one Figure 2. High-level overview of the steps that comprise the AI Engine portion of the AI system. AI SYSTEM TO PREDICT ABA OUTCOMES each for patient clustering and predicting outcomes. In this section we provide a high-level overview of what occurs in each section. Of note, manuscripts that provide significant detail for steps (c) and (d) are in preparation and will be published in the near future. Variable Selection. We sought to create a comprehensive set of variables that capture patient characteristics spanning diagnostic characteristics (DSM-5), patient age and history of treatment, clinical presentation of social and behavioral excesses and deficits, medical and behavioral health comorbidity, goal counts related to developmental domain subsets, social drivers of health, and barriers to progress/treatment. The 48 variables chosen for inclusion in the current analyses were derived from published standards (e.g., Autism Commission on Quality, 2022; Behavioral Health Center of Excellence, 2022; National Autism Center, 2009) and applicable research literature that outline best practices. Once selected, a team of experienced BCBAs and BCBAs at the doctoral level (BCBA-Ds) narrowed and rounded out the list of variables for inclusion which were then solidified by a provider advisory committee to ensure agreement on the likely relation between included variables and patient outcomes. Pre-processing. Pre-processing is a series of steps that convert the data from its raw form stored in the data warehouse to a dataset on which statistical and machine learning analyses can be conducted (for reviews on methods, see Baskar et 6 al., 2013; Müller & Guido, 2017; Saleem et al., 2014). Generally, pre-processing involves data cleaning, data integration, data transformation, and dimensionality reduction. These steps and the strategies used within the current AI system are shown in Table 2. Data cleaning often, at minimum, involves employing explicit strategies around handling missing data, outliers, and inconsistent data. Data integration involves joining or merging datasets together after aggregating each dataset at the level required for analyses. Here, data were aggregated and integrated at the individual patient level spanning one week of ABA service delivery. Data transformation refers to a set of mathematical techniques whereby data are converted from their original values to values likely to improve subsequent modeling (Müller & Guido, 2017). For example, data transformations might include: generalization (e.g., converting from zip code to state), normalization (e.g., log-transforms, min-max scaling), feature reduction (e.g., removing variables with little variability; combining features that are similar in some way); or feature discretization (i.e., converting continuous data into bins or categories). Unsupervised ML: Patient Clustering. Unsupervised machine learning refers to a suite of mathematical and algorithmic techniques aimed at learning the underlying structure in data where the answer is unknown (Patel, 2019). Clustering refers to a suite of mathematical and algorithmic techniques to find groups in data (Hennig & Meila, 2016). Combined, the goal of unsupervised machine learning for patient clustering is to identify patient subgroups within the larger dataset that are Table 2. Overview of the sequence and strategies used in pre-processing data to prepare it for patient clustering analyses and to predict patient outcomes. Step 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Strategy Drop patients and features where the amount of missing data would likely impact their reliability and are likely to also impact model precision (e.g., patients with fewer than 100 sessions, features with 60% or more missing data). Identify and Winsorize outliers. Impute missing data by chained equation (Buck, 1960; van Buuren & Groothius-Oudshoorn, 2011). Use zip code for feature engineering (e.g., add neighborhood and economic SDOH proxies; convert to US region and one-hot encode) Aggregate and quantify patient developmental and familial characteristics (e.g., age, number of siblings). Aggregate intervention characteristics across skill acquisition domains. Aggregate intervention characteristics across reduction targets, intensities, and functional assessments. Aggregate and quantify medical and behavioral health comorbidities. Aggregate and quantify patient progress, generally, as well as across specific skill domains. Aggregate and quantify hours per week of ABA contact per patient for the duration they are in the dataset. Merge all datasets together aggregated at the level of patient-week (e.g., continuous variables convert to arithmetic mean; categorical convert to one-hot encoded; ordinal convert to median). Examine how aggregation influenced feature distributions and modify where significantly impacted. Normalize feature scales using min-max to 0-1. AI SYSTEM TO PREDICT ABA OUTCOMES 7 similar when considering the 48 variables that comprise the final analytic dataset. Table 3 shows the patient clustering (N=31,294) fit metrics for four algorithms wherein each takes a unique definition for how to define a cluster3. Overall, maximum Silhouette Coefficients ranged between 0.25-0.47, maximum CalinskiHarabsz indices ranged between 6,086.6613,718.52, and optimal patient clusters ranged between 8-to-56 depending on the algorithm and level of granularity a researcher or practitioner is interested in obtaining or is practically useful for them. Figure 3 shows a three-dimensional plot of the clustering results for agglomerative hierarchical clustering wherein the underlying assumptions are likely best justified based on the function of clustering patients with ASD in the current context (for a comprehensive treatment of these assumptions see Hennig et al., 2016). Each color in the plot represents a unique patient cluster and each marker represents a unique patient. A natural question is exactly how each of these patient clusters differ from each other. Figure 4 shows 10 of the individual patient characteristics included in the current analysis. Five of those patient characteristics are variables that have been associated with differential patient outcomes following contact with ABA in past research (top five panels). Five of the patient characteristics in Figure 4 have no known previous publications associating the variable with differential patient outcomes following contact with ABA at the time of this writing (bottom five panels). Figure 3. Example patient cluster visual using agglomerative hierarchical clustering. Each color represents a unique cluster as identified by the algorithm. Each individual marker represents a single patient. There are two important observations to be made based on the results above and considering the scope of this current manuscript. First, the clustering algorithms consider all 48 variables simultaneously with each cluster being a unique combination of all patient characteristics. So, the ten variables shown in Figure 4 should be considered a demonstration of what this system currently allows us to do. The current results should be considered preliminary and will continue to improve as the AI system collects more data and our methodologies are further refined. Second, a follow-up manuscript is in preparation that provides significantly greater detail around the patient Table 3. Overview of the best clustering fit metrics for the algorithms tested (N=31,294). For both metrics, a higher score indicates better clustering. Clustering Algorithm Silhouette Calinski-Harabasz Optimal Clusters k-Means 0.25 8,169.60 30 Agglomerative Hierarchical 0.47 13,718.52 8 HDBSCAN 0.37 2,377.78 56 BIRCH 0.39 6,086.66 18 Silhouette equals the ratio of average dissimilarity of each observation with its own cluster (a i) relative to the average dissimilarity to 𝑏 −𝑎 observations in the closest other cluster (bi)(Halkidi et al., 2016). This can be written as an equation: 𝑠𝑖 = 𝑖 𝑖 }. Calinski-Harabasz max{𝑎𝑖 ,𝑏𝑖 equals the ratio of the sum of between-cluster dispersion and of within-cluster dispersion (Halkidi et al., 2016). As an equation: 𝑡 CH(CK)= 𝑡𝑟𝑎𝑐𝑒(∑𝐾 ̅ 𝑗 −𝐱̅)(𝐱̅𝑗 −𝐱̅) ) 𝑗=1 𝑛𝑗 (𝐱 𝑡𝑟𝑎𝑐𝑒(∑𝐾 ̅ 𝑖 −𝐱̅𝑗 )(𝐱̅𝑖 −𝐱̅𝑗 )𝑡 𝑗=1 ∑𝑐(𝑖)=𝑗(𝐱 × 𝑛−𝐾 𝐾−1 . In more lay terms, Silhouette coefficients provide a sense of how well an observation belongs to its assigned cluster compared to any other cluster; and Calinski-Harabasz indices indicate how well partitioned the clusters are. 3 The reader is referred to Hennig et al. (2016) for a review of the assumptions and underlying mathematics used to define a cluster via each algorithm. AI SYSTEM TO PREDICT ABA OUTCOMES 8 Figure 4. Demonstration of patient characteristics that differ among agglomerative hierarchical clusters. The top panel shows differences in variables previously published in the literature. The bottom panels show differences in variables related to health that could logically impact progress. clustering methods, their evaluation, and how these results were derived. Again, the scope of this manuscript is to highlight the totality of the system and what it can do. The first author can be contacted for specific methodological questions in the interim. Supervised Machine Learning: Predicting Patient Outcomes. Supervised machine learning refers to a suite of mathematical and algorithmic techniques aimed at learning the underlying structure in data where the answer is known (Müller & Guido, 2017). For our purposes, the answer we know and want to predict is how many goals each patient mastered during a given period of time. The goal of supervised machine learning is to create a mathematical equation (or set of equations) that relates the patients-related and ABA-related characteristics to how many goals they mastered. Once built, the model can be used to predict how many goals a new patient is likely to master based on their unique set of characteristics. Different statistical and machine learning algorithms range from less (linear regression) to more (ensemble models) complex. Less complex models are often more parsimonious though can sometimes be less accurate than more complex models. In contrast, more complex models are often more accurate though, due to their complexity, can sometimes be more difficult for users to intuitively understand how the input variables relate to the predicted output. As a result of the tradeoff between model simplicity and complexity, we analyzed a series of increasingly complex statistical and machine learning models for how well they could predict patient progress as a function of contact with ABA and each patient’s unique characteristics. Table 4 shows an overview of the fit metrics when assessing each algorithm using 10-fold cross-validation with 80%-20% test-train splits. That is, we started by first randomly splitting our data into two datasets. One dataset was used to train the model (80% of all observations) and a holdout dataset was used to test the predictive capabilities of the model (20% of all observations). We then repeated this process nine more times using a different selection of testtrain splits of the dataset with each iteration. Table 4 shows the arithmetic mean of the fit (r2) and loss (mean absolute error; MAE) metrics for each of the 10-folds for each of the algorithms using their default parameters. Overall, the random forest regressor algorithm led to the lowest MAE values across Table 4. Average fit and loss metrics from 10-fold crossvalidation using the test data sets not included in model training. Algorithm r2 MAE Linear Regression k-Nearest Neighbors Support Vector Regressor Random Forest AdaBoost 0.90 0.91 0.99 0.99 0.88 1.30 1.27 1.06 1.04 1.32 AI SYSTEM TO PREDICT ABA OUTCOMES both training and test datasets. To build a final predictive model, we conducted hyperparameter tuning via grid search across model parameters of the number of trees and the maximum depth allowed. The left panel in Figure 5 shows the results of using the final model built using the random forest algorithm. The final fit and loss metrics of the top performing model with the holdout test data were r2 = .99 and MAE = 1.12. The right panel in Figure 5 shows the results of using linear regression similar to researchers in past published studies. Fit and loss metrics using linear regression were r2 = .90 and MAE = 1.82, respectively. Of note, the obtained r2 using linear regression is .23 points higher than the highest known previous model predicting goals mastered by individuals with ASD as a function of contacting ABA (Linstead et al., 2017b reported an r2 of .67 for motor goals, MAE was unreported). The predictive model using machine learning was .32 points higher. We conducted the same set of analyses for each patient cluster in the dataset. That is, we first isolated the data to only the patients in a given cluster, assessed each algorithm using 10fold cross-validation with 80%-20% test-train splits, conducted hyperparameter tuning via grid search to identify a top model, and then made predictions of patient progress on the holdout test set. When predicting goals mastered within unique clusters, r2 values ranged from .95-.99 and MAE values ranged 1.12-1.45. Of note, the r2 9 values obtained here are ~.20-.24 higher than the cluster specific r2 values observed by past researchers which ranged from .64 to .75 (Stevens et al., 2019). Thus, the AI system approach taken in this work appears to be a significantly more precise method for predicting patient progress compared to previously published research. This appears true, overall, as well as within unique patient clusters. As with the section on patient clustering, there are two important observations to be made based on the results above and considering the scope of this manuscript. First, this system continuously receives more patient data and we are actively identifying better ways to collect and incorporate data spanning a greater number of variables. Thus, the results in this section should be considered a demonstration of what this system currently allows us to do. The current results should be considered preliminary and will continue to improve as the AI system collects more data and our methodologies are further refined. Second, a follow-up manuscript is in preparation that provides significantly greater detail around using machine learning to predict patient outcomes across goals mastered and changes in VBMAPP scores. Again, the scope of this manuscript is to highlight the totality of the system and what it can do. The first author can be contacted for specific methodological questions in the interim. Figure 5. Predicted vs. observed goals mastered for the models derived using the random forest algorithm (left panel) and using linear regression (right panel). In each panel, the red line shows where exact matches between observed and predicted occur. AI SYSTEM TO PREDICT ABA OUTCOMES Deploy as API endpoint. The last phase in the AI Engine is to make the final, top-performing models available to users of a technological platform. Deploying machine learning models can be accomplished in a number of ways (see Treveil et al., 2020 for an introductory text). For the technology ecosystem at RethinkFirst, Microsoft Azure makes it easy to deploy machine learning models as a REST API endpoint. Users with access to this technology system can then call the API endpoint with the requisite data fields and have the model return a value. What value gets returned, however, would vary with the use case and how the API endpoint is embedded within the product. USE CASES Recommending ABA Dosage. A straightforward application of the finalized model would be to have the model recommend the optimal hours per week relative to goals mastered. Figure 6 shows hypothetical data wherein patient characteristics can be passed to the predictive model along with a range of hours of ABA the patient might contact. Such doseresponse curves are commonly used to identify optimal therapeutic effect of an intervention in behavioral pharmacology (e.g., Dews, 1955) and have been used previously for analyses specific 10 to ABA (e.g., Ostrovsky et al., 2022; Stevens et al., 2019). For each patient, there is likely to be a point wherein little-to-no further progress would be predicted if more hours of ABA were contacted (the large filled-in circles in Figure 6). This point of little-to-no further return could then be the recommended hours per week of ABA. Observed vs. Expected Patient Outcomes. Another straightforward application of the finalized model would be to retroactively compare a patient’s actual progress to expected progress. Specifically, a patient’s unique characteristics could be passed to the model along with the hours of ABA they contacted. The model would return the number of goals expected to be mastered based on other patients who present with similar characteristics and received similar amounts of ABA. By comparing the observed patient progress to the expected patient progress, the provider or their supervisor can get feedback on how well they are providing services with current clients. In turn, providers can follow-up with specific patients and the supervising BCBAs to identify why they are achieving less than expected outcomes. And, for patients for whom the provider is observing greater than expected outcomes, providers can follow-up to identify if there are variables predicting better success that they can use with other patients to further improve outcomes. Further, these data and the underlying models could be leveraged by payors Figure 6. Example use case wherein the predictive model produces an expected dose-response curve based on hours per week of ABA along with the patient characteristics that determine cluster assignment. Each color would be a unique dose-response curve predicted for a patient based on their unique profile. AI SYSTEM TO PREDICT ABA OUTCOMES and providers alike to support and enable their unique value-based care initiatives. Associated Social Determinants of Health. Many aspects of our social environment have been repeatedly shown to play a role in patient outcomes and are referred to as social determinants of health (e.g., Braverman & Gottlieb, 2014). Because several of these variables are included in the model, the user could analyze these data to help identify and potentially mitigate the effects of these variables on patient outcomes. For example, similar to the previous use case, a provider could systematically identify how well each patient in their organization or on their caseload is performing relative to expected outcomes. The provider could then determine whether differences between observed and expected outcomes were associated with unique variables known to be a social determinant of health (SDOH; e.g., race, neighborhood walkability, income, education). By identifying specific SDOH that impact the patients on their caseload or in their organization, providers would better know how to identify resources for those clients that might mitigate this effect and improve their overall outcomes. Similar to feedback on provider performance, analyses related to SDOH could be leveraged by payors and providers alike to support and enable their unique value-based care initiatives. LIMITATIONS AND FUTURE DIRECTIONS There are several limitations to the current patent-pending AI system and underlying AI Engine, each of which points to future directions for building upon and improving the AI system as described herein. The first limitation is that the patients included in this work do not comprehensively account for all possible combinations of unique individual characteristics that an individual with ASD might present to ABA with (i.e., sampling bias). Nevertheless, the current study did contain data from 31,294 individuals with ASD making it the largest known published study on individuals with ASD at the time of writing4. Thus, though all research and technological systems will necessarily be incomplete in terms of the included sample (Jennings & Cox, 2023), the sample size in the present work is 1.51 times larger than the next 4 February, 2023. 11 closest study (Lingren et al., 2016), 6.35 times larger than the third largest study (Doshi-Velez et al., 2014), and 142.25 times larger than the median sample size used in previous cluster analyses for individuals with ASD (ParlettPelleriti, 2022). For past work predicting patient outcomes as a function of ASD, the current study is 21.32 times larger than the sample size of the second largest study (Linstead et al., 2017b). A second limitation and area for future direction relates to the variables chosen for inclusion in patient clustering and for predicting patient progress. We used domain expertise and past published research to identify and include variables with a logical or known relationship to patient progress following contact with ABA. However, behavior analysts have historically underreported demographic variables within the clinical literature (e.g., Jones et al., 2020) and have only recently begun to consider how SDOH might influence outcomes (e.g., Wright, 2022). Further, it is possible that a different approach to aggregating ABA session characteristics and treatment plans would lead to better models— work we are feverishly conducting. As noted in the introduction, the purpose of this manuscript was to showcase how this system currently operates and its current effectiveness. Future work using this AI system will include further exploration of variable inclusion and feature engineering. A final area for future research involves more direct measures around how the outputs of this AI system impact clinical decision-making and stakeholders more broadly. As shown in Figure 1, the output of this system can inform a variety of products for a variety of end users. By examining how end users behave before and after use of the AI system, future research will be able to identify if and how clinician decision-making changes as a function of the system and how those decisions might influence patient progress. Outside of the AI system and end products themselves, however, there are likely to be additional, unmeasured variables that interact with this information to multiply influence the decisions behavior analysts make. Without knowledge of what those variables might be, collaboration with end users will be crucial for identifying if and where those additional variables might need to be added to the AI system as a whole. AI SYSTEM TO PREDICT ABA OUTCOMES SUMMARY All stakeholders alike would benefit from the ability to predict likely patient progress as a function of contact with ABA therapy and the patient’s unique characteristics. Past researchers have sought to describe and predict how individuals with ASD are likely to benefit from ABA therapy. However, the results of these studies have had limited generalizability due to sample sizes, simple modeling approaches, and a lack of including robust patient cluster analyses to individualize patient predictions. Further, few studies have directly converted the insights gained into a technology that others could incorporate into their treatment settings to improve patient outcomes. Above, we described how we have built and are leveraging a patentpending AI system that: integrates the continuous and ongoing collection of patient data in treatment outcomes and patient characteristics; allows us to identify robust patient clusters unique from previous literature given its significantly larger sample size; combines patient cluster characteristics with ABA hours per week to predict within 1.12 goals mastered within a 12month ABA treatment period; how the models can be deployed back into a technological system for user interaction; and three subsequent use cases in: (a) real-time recommendations of ABA dosage based on unique patient characteristics; (b) feedback on actual patient outcomes relative to expected outcomes; and (c) how patient progress varies along social determinants of health. REFERENCES Ansari, W. E., Ssewanyana, D., & Stock, C. (2018). Behavioral health risk profiles of undergraduate university students in England, Wales, and Northern Ireland: A cluster analysis. Frontiers in Public Health, 6, 120. https://doi.org/10.3389/fpubh.2018.00120 Association of Professional Behavior Analysts (2020). ABA Autism Guidelines: Applied Behavior Analysis Practice Guidelines for Autism Spectrum Disorder. Retrieved from: https://casproviders.org/wp-content/uploads/2020/03/ABAASD-Practice-Guidelines.pdf Autism Commission on Quality (2022). ACQ Applied Behavior Analysis Accreditation program standards and guide (version 1.0). Retrieved from: https://autismcommission.org/standards/ Autism Speaks (2022). Autism statistics and facts. Retrieved from: https://www.autismspeaks.org/autism-statistics-asd Baskar, S. S., Arockiam, L., & Charles, S. (2013). A systematic approach on data pre-processing in data mining. Compusoft: An International Journal of Advanced Computer Technology, 2(11), 335-339. 12 Behavior Analyst Certification Board (2020). Ethics code for behavior analysts. https://bacb.com/wp-content/ethicscode-for-behavior-analysts/ Behavioral Health Center of Excellence (2022). 2022 standards and new accreditation model. Retrieved from: https://www.bhcoe.org/2022-standards-and-newaccreditation-model/ Bhargava, S., & Loewenstein, G. (2015). Choosing a health insurance plan: Complexity and consequences. Journal of the American Medical Association, 314(23), 2505–2506. https://doi.org/10.1001/jama.2015.15176 Braverman, P., & Gottlieb, L. (2014). The social determinants of health: It’s time to consider the causes of the causes. Public Health Reports, 129(Suppl 2), 19-31. https://doi.org/10.1177%2F00333549141291S206 Centers for Disease Control and Prevention (2021). Data & Statistics on Autism Spectrum Disorder. Retrieved from: https://www.cdc.gov/ncbddd/autism/data.html Charron, E., Yu, Z., Lundahl, B., Silipigni, J., Okifuji, A., Gordon, A. J., ... & Cochran, G. (2023). Cluster analysis to identify patient profiles and substance use patterns among pregnant persons with opioid use disorder. Addictive Behaviors Reports, 17, 100484. https://doi.org/10.1016/j.abrep.2023.100484 De Groot, K., & Thurik, R. (2018). Disentangling risk and uncertainty: When risk-taking measures are not about risk. Frontiers in Psychology, 9, 2194. https://doi.org/10.3389/fpsyg.2018.02194 Dews P. B. (1955). Studies on behavior. I. Differential sensitivity to pentobarbital of pecking performance in pigeons depending on the schedule of reward. The Journal of Pharmacology and Experimental Therapeutics, 113(4), 393–401. Doshi-Velez, F., Ge, Y., & Kohane, I. (2014). Co-morbidity clusters in autism spectrum disorders: an electronic health record timeseries analysis. Pediatrics, 133(1), e54–e63. Eldevik, S., Hastings, R. P., Hughes, J. C., Jahr, E., Eikeseth, S., & Cross, S. (2009). Using participant data to extend the evidence base for intensive behavioral intervention for children with autism. Journal of Autism and Developmental Disorders, 39(11), 1302–1310. https://doi.org/10.1007/s10803-009-0732-7 Granpeesheh, D., Dixon, D. R., Tarbox, J., Kaplan, A. M., & Wilke, A. E. (2009). The effects of age and treatment intensity on behavioral intervention outcomes for children with autism spectrum disorders. Research in Autism Spectrum Disorders, 3(4), 1014-1022. https://doi.org/10.1016/j.rasd.2009.06.007 Halkidi, M., Vazirgiannis, M., & Hennig, C. (2016). Methodindependent indices for cluster validation and estimating the number of clusters. In: C. Henning, M. Meila, F. Murtagh, & R. Rocci (Eds.), Handbook of Cluster Analysis (pp. 595-618). Taylor & Francis Group. ISBN: 978-1-46655189-3 Han, P. K. J., Klein, W. M. P., & Arora, N. K. (2012). Varieties of uncertainty in health care: A conceptual taxnomy. Medical Decision Making, 31(6), 828-838. https://doi.org/10.1177%2F0272989X11393976 Hennig, C., Meila, M., Murtagh, F., & Rocci, R. (2016). Handbook of Cluster Analysis. Taylor & Francis Group. ISBN: 978-1-4665-5189-3 AI SYSTEM TO PREDICT ABA OUTCOMES Hennig, C., & Meila, M. (2016). Cluster analysis: An overview. In: C. Henning, M. Meila, F. Murtagh, & R. Rocci (Eds.), Handbook of Cluster Analysis (pp. 1-20). Taylor & Francis Group. ISBN: 978-1-4665-5189-3 Hewlett, M. M., Raven, M. C., Graham-Squire, D., Evans, J. L., Cawley, C., Kushel, M., & Kanzaria, H. K. (2022). Cluster analysis of the highest users of medical, behavioral health, and social services in San Francisco.Journal of General Internal Medicine. https://doi.org/10.1007/s11606022-07873-y 13 Ostrovsky, A., Willa, M., Cho, T., Strandberg, M., Howard, S., & Davvitian, C. (2022). Data-driven, client-centric applied behavior analysis treatment-dose optimization improves functional outcomes. World Journal of Pediatrics. https://doi.org/10.1007/s12519-022-00643-0 Papanicolas, I., Woskie, L. R., & Jha, A. K. (2018). Health care spending in the United States and other high-income countries. Journal of the American Medical Association, 319 (10), 1024–1039. https://doi.org10.1001/jama.2018.1150 Jennings, A., & Cox, D. J. (2023). Starting the Conversation Around the Ethical Use of Artificial Intelligence in Applied Behavior Analysis. PsyArXiv Preprints. https://doi.org/10.31234/osf.io/zmn28 Patel, A. A. (2019). Hands-on unsupervised learning using Python: How to build applied machine learning solutions from unlabeled data. O’Reilly Media, Inc. ISBN: 978-1-49203564-0 Jones, S. H, St. Peter, C. C., & Ruckle, M. M. (2020). Reporting of demographic variables in the Journal of Applied Behavior Analysis. Journal of Applied Behavior Analysis, 53(3), 1304-1315. https://doi.org/10.1002/jaba.722 Parlett-Pelleriti, C. M., Stevens, E., Dixon, D.. & Linstead, E. J. (2022). Applications of unsupervised machine learning in autism spectrum disorder research: A Review. Review Journal of Autism and Developmental Disorders. https://doi.org/10.1007/s40489-021-00299-y Larsson, E. V. (2012). Applied behavior analysis (ABA) for autism: What is the effective age range for treatment? The Lovaas Institute for Early Intervention Midwest Headquarters White Paper. Retrieved from: https://childsplayautism.com/wp-content/uploads/EffetiveTreatment-Range-1.pdf Saleem, A., Asif, K. H., Ali, A., Awan, S. M., & Alghamdi, M. A. (2014). Pre-processing methods of data mining. IEEE/ACM 7th International Conference on Utility and Cloud Computing, 451-456. https://doi.org/10.1109/UCC.2014.57. Levy, H. C., Worden, B. L., Davies, C. D., Stevens, K., Katz, B. W., Mammo, L., Diefenbach, G. J., & Tolin, D. F. (2020). The dose-response curve in cognitive-behavioral therapy for anxiety disorders. Cognitive Behaviour Therapy, 49(6), 439-454. https://doi.org/10.1080/16506073.2020.1771413 Lingren, T., Chen, P., Bochenek, J., Doshi-Velez, ManningCourtney, P., Bickel, J., …, & Savova, G. (2016). Electronic health record based algorithm to identify patients with autism spectrum disorder. PLoS ONE, 11(7), e0159621. https://doi.org/10.1371/journal.pone.0159621 Linstead, E., Dixon, D. R., French, R., Granpeesheh, D., Adams, H., German, R., …, & Kornack, J. (2017a). Intensity and learning outcomes in the treatment of children with autism spectrum disorder. Behavior Modification, 41(2), 229-252. https://doi.org/10.1177/0145445516667059 Linstead, E., Dixon, D. R., Hong, E., Burns, C. O., French, R., Nocak, M. N., & Granpeesheh, D. (2017b). An evaluation of the effects of intensity and duration on outcomes across treatment domains for children with autism spectrum disorder. Translational Psychiatry, 7(9), e1234. https://doi.org/10.1038%2Ftp.2017.207 Meyer, A. N. D., Giardina, T. D., Khawaja, L., & Singh, H. (2021). Patient and clinician experiences of uncertainty in the diagnostic process: Current understanding and future directions. Patient Education and Counseling , 104(11), 2606-2615. https://doi.org/10.1016/j.pec.2021.07.028 Müller, A. C., & Guido, S. (2017). Introduction to Machine Learning with Python. O’Reilly Media, Inc. ISBN: 978-1449-36941-5 National Autism Center. (2009). National Standards Project: Phase 1. Retrieved from: https://nationalautismcenter.org/national-standards/phase1-2009/ National Autism Center (2015). National Standards Project, Phase 2. Randolph, MA: National Autism Center. Retrieved from: https://www.nationalautismcenter.org/nationalstandards/ Stevens, E., Dixon, D. R., Novack, M. N., Granpeesheh, D., Smith, T., & Linstead, E. (2019). Identification and analysis of behavioral phenotypes in autism spectrum disorder via unsupervised machine learning. International Journal of Medical Informatics, 129, 29–36. https://doi.org/10.1016/j.ijmedinf.2019.05.006 Towards AI (2023). Centroid neural network : An efficient and stable clustering algorithm. Retrieved from: https://towardsai.net/p/l/centroid-neural-network-anefficient-and-stable-clustering-algorithm Treveil, M., & Dataiku Team (2020). Introducing MLOps: How to Scale Machine Learning in Enterprise. O’Reilly Media, Inc. ISBN: 978-1-492-08329-0 World Health Organization (2015). Health systems financing: The path to universal coverage. Retrieved from: https://apps.who.int/iris/bitstream/handle/10665/155002/W HO_HIS_SDS_2015.6_eng.pdf Wright, P. (2022). Social determinants of health and applied behavior analysis: Leveraging the connection to make the world a better place. CEU Event available from: https://behavioruniversity.com/social-determinants-healthaba Zoladz, P. R., & Diamond, D. M. (2008). Linear and non-linear dose-response functions reveal a hormetic relationship between stress and learning. Dose-Response: a publication of International Hormesis Society, 7(2), 132– 148. https://doi.org/10.2203/dose-response.08-015.Zol