Figures
Abstract
Signs and symptoms of Attention-Deficit/Hyperactivity Disorder (ADHD) are present at preschool ages and often not identified for early intervention. We aimed to use machine learning to detect ADHD early among kindergarten-aged children using population-level administrative health data and a childhood developmental vulnerability surveillance tool: Early Development Instrument (EDI). The study cohort consists of 23,494 children born in Alberta, Canada, who attended kindergarten in 2016 without a diagnosis of ADHD. In a four-year follow-up period, 1,680 children were later identified with ADHD using case definition. We trained and tested machine learning models to predict ADHD prospectively. The best-performing model using administrative and EDI data could reliably predict ADHD and achieved an Area Under the Curve (AUC) of 0.811 during cross-validation. Key predictive factors included EDI subdomain scores, sex, and socioeconomic status. Our findings suggest that machine learning algorithms that use population-level surveillance data could be a valuable tool for early identification of ADHD.
Author summary
Many children exhibit symptoms of Attention-Deficit/Hyperactivity Disorder (ADHD) at a young age, but it is often diagnosed at a later stage. This delay in diagnosis can deprive children of the necessary support that they require. To address this issue, we conducted a study to develop a model that could predict ADHD in kindergarteners. We analyzed various information readily available for this age group in 2016, including health records, demographics, and teacher-rated developmental assessments. We then followed these children for four years to evaluate the accuracy of our model in predicting their later ADHD diagnosis. Our findings were promising, particularly when we used all the available data. The scores from developmental assessments were a significant factor in predicting the diagnosis accurately, along with other health and demographic factors. Our results suggest that machine learning could be an effective tool in helping parents, teachers, and doctors identify children with ADHD earlier, leading to better and more timely support.
Citation: Liu YS, Talarico F, Metes D, Song Y, Wang M, Kiyang L, et al. (2024) Early identification of children with Attention-Deficit/Hyperactivity Disorder (ADHD). PLOS Digit Health 3(11): e0000620. https://doi.org/10.1371/journal.pdig.0000620
Editor: Ryan S. McGinnis, Wake Forest University School of Medicine, UNITED STATES OF AMERICA
Received: October 5, 2023; Accepted: August 20, 2024; Published: November 7, 2024
Copyright: © 2024 Liu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Due to privacy policy restrictions individualized data cannot be shared. Data can be accessed with permission from both the Ministry of Health in Alberta, Canada through www.alberta.ca/health-research or health.inforequest@gov.ab.ca.
Funding: This research was undertaken, in part, thanks to funding from the Canada Research Chairs program (BC), Alberta Innovates (BC), Mental Health Foundation (BC), MITACS Accelerate program (BC, YSL), BBRF Young Investigator Grant from the Brain & Behavior Research Foundation (YSL), Simon & Martina Sochatsky Fund for Mental Health (BC), Howard Berger Memorial Schizophrenia Research Fund (BC), the Abraham & Freda Berger Memorial Endowment Fund (BC), the Alberta Synergies in Alzheimer’s and Related Disorders (SynAD) program (BC, YSL, FT), University Hospital Foundation (BC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
ADHD (Attention-Deficit / Hyperactivity Disorder) is characterized by developmentally inappropriate, persistent, and pervasive inattention and/or hyperactivity-impulsivity that interferes with daily functioning at home, school, or work [1,2]. Based on a review of 175 studies up to the year 2013, the prevalence of ADHD in children aged 18 and under is estimated to be 7.2% [3] and is increasing [4]. ADHD is associated with emotional dysregulation [5], neuropsychological dysfunction [6], poor social relationships and cognitive skills [7], academic underachievement [8], risky sexual behavior, early pregnancy [9], and criminal activities [10,11]. In turn, the economic impact of ADHD is substantial, with disease-associated costs estimated to be $74 billion and $6 to $11 billion annually in the United States and Canada, respectively, due to losses in productivity [1].
Early interventions in preschool [12] and school-aged [13,14] children, such as behavioral training and stimulant medications, are effective in curbing downstream negative consequences of untreated ADHD. However, diagnosis of ADHD in the preschool years is challenging, delaying early intervention in most cases. For example, in 2016, of 6.1 million ADHD cases diagnosed before 18 years of age in the United States, only 2–6% were diagnosed before four years of age [15], and over half were diagnosed between 12 and 17 years [14–17]. Delayed diagnosis is more prevalent in girls [18]. Factors contributing to delayed diagnosis may include a lack of awareness of ADHD signs/symptoms in parents and teachers. Early identification of children with a heightened risk of ADHD at a young age may raise parents’ awareness and encourage parents to seek clinical diagnostic clarification, and promote early intervention.
However, clinical diagnoses are often not directly captured in population-level data, and the risk of ADHD can only be estimated. One source of identifying probable ADHD cases is administrative health data. In Canada, the health care system is publicly funded, universally available, and administered at the province/territory level. Administrative health data is being collected routinely and used widely for population health surveillance and holds the potential to estimate the risk of clinical diagnoses using case definitions, including ADHD. The validity of using case definitions of ADHD to approximate the clinical diagnosis and population-level prevalence of ADHD was previously explored, yielding high confidence using International Classification of Disease (ICD) codes from physician claims, ambulatory records as well as drug dispensation history [19–22].
In addition to administrative health data, cross-sector data such as population surveillance within the educational system are also routinely collected and may facilitate identifying and estimating ADHD risk. One population-level surveillance tool widely used in the education sector internationally is the Early Development Instrument (EDI) [23,24]. EDI assesses developmental health by identifying children’s vulnerability to poor developmental outcomes based on teacher-completed questionnaires. It provides information about kindergarten-aged children’s (4 to 6 years old) ability to meet age-appropriate developmental expectations shaped by their experiences in the first five years of life [25]. The questionnaire comprises 103 questions involving five domains, including physical health and well-being, social competence, emotional maturity, language, and cognitive development, as well as communication skills and general knowledge [23]. The five domains consist of 16 subdomains, including physical readiness for the school day, physical independence, gross and fine motor skills, overall social competence, responsibility and respect, approaches to learning, readiness to explore new things, prosocial and helping behavior, anxious and fearful behavior, aggressive behavior, hyperactivity and inattentive behavior, basic literacy, interest literacy/numeracy and memory, advanced literacy, basic numeracy, and communication and general knowledge [26,27]. The EDI also contains parent-reported and teacher-recorded medical and developmental diagnoses, including parent-reported formal diagnosis of ADHD [28]. Cross-linkage of data sets across the health and education sectors provides an enriched context for interdisciplinary research focussed on identifying risk factors of developmental disorders and developing data-driven, high-performance health risk predictive models [29].
The literature reported a wide range of risk factors for ADHD, including demographic factors such as family size, low socio-economic status [30] and health history factors such as asthma [31], early exposure to antibiotics [32], increased health utilization [33], and prenatal maternal health [34]. The risk factors for ADHD published in the literature were typically investigated using conventional statistical analysis methods (e.g., [30,32,35]) that concentrated on the data description; the models’ performance and generalizability were not the main focus. Machine learning (ML), which goes beyond conventional statistical analysis, uses sophisticated algorithms to construct predictive models to make precise individual-level forecasts that may be extrapolated to real-world contexts. For instance, ML models are created using training data, and a hold-out test set is used to assess how well they perform in the real world when forecasting the future [36]. The underlying causes of a successful predictive model may provide data on the likelihood of ADHD risk at the individual level. Studies of ML-based mental health prediction using large-scale data have been growing rapidly over the past few years, including successful prediction of a wide range of disorders such as depression [37–39], opioid use disorder [29,40,41], and post-traumatic stress [42]. ADHD prediction using ML has been pioneered using neuroimaging data (for a review, see [43]). Only a few population-scale studies used ML-based ADHD predictions [44,45].
Our research goal was to develop a high-performing predictive model for identifying individuals with childhood ADHD in a four-year follow-up window by applying ML algorithms to population-level administrative health data cross-linked with EDI. We also evaluated the contributing predictive risk factors. This novel approach may facilitate forecasting the elevated risk of future ADHD in the real world.
Methods
Data sources
The 2016 EDI data were collected between February and March 2016 and provided by the Ministry of Education, Alberta, Canada. The EDI implementation was offered to all publicly funded schools; however, unlike in other Canadian jurisdictions, opting out was possible for the district, school, and individual families.
In addition to the EDI data, administrative health datasets used in this study included the Alberta Health Care Insurance Plan (AHCIP) Physician Claims, National Ambulatory Care Reporting System (NACRS), Discharge Abstract Database (DAD), the Alberta Health Care Insurance Plan (AHCIP) Population Registry Database, Alberta Pharmaceutical Information Network (PIN) database and Alberta Human Services Drug Supplement Plan database (AHSDSP), Alberta Notice of Birth database, Statistics Canada Census Data (2016). These datasets contain children’s health utilization history, prenatal records, and demographics.
The EDI dataset was matched with the health administrative datasets from the Ministry of Health, Government of Alberta (Alberta Health) based on identifiable information (i.e., name, biological sex at birth, date of birth) and unique provincial health number. All predictive variables except demographics were developed based on a 3-year historical window before enrollment. Personally identifiable information was used for data linkage only. The cross-linked dataset was prepared and anonymized at the Ministry of Health before the researcher’s access for data analysis.
This study was approved by the ethics committee at the University of Alberta (Pro00104650). Informed consents were waived due to minimal risks of secondary analysis, where the Ministry of Health already anonymized the cross-linked data before analysis.
Sample derivation
In Alberta, there were 69,486 children between the ages of five and six in 2016. Of those, 38,358 (55.2%) completed the EDI questionnaire. The reasons that EDI data was not collected for all children aged five and six included: homeschooling, living in remote areas, or school authority and family opt-outs. We first applied exclusion criteria to the EDI dataset, including removing children with missing data, those who attended less than 30 days in the classroom, and a lack of parental or guardian consent records, resulting in the exclusion of 7,677 children. After linking the EDI data with the health administrative data, more children (7,187) were excluded due to mismatch, resulting from either a child not having an Alberta biological birth record (5,458) or the birth record did not match the information in the health administrative data (1,729) (Fig 1).
A flow chart illustrating the derivation of the final EDI cohort.
To ensure the model predicts ADHD prospectively, children with prior ADHD diagnoses before March 31st, 2016 (N = 247) were further eliminated from the remaining cohort of 23,494 based on administrative health case definition and EDI. The final cohort for analysis included 23,247 children, with a mean age of 5.68 (SD = 0.33), 48.0% female, 52.0% male, where 8.3% of the cohort belongs to Socioeconomic subsidy group [i.e., Aboriginal, subsidy, welfare]. All predictive variables developed for this dataset were collected before Marth 31st, 2016. Based on our case definition (See S1 Table in supporting information), the cohort’s age 4–6 ADHD prevalence rate is 1.1%, which is in line with Canadian reports of 0.8%, 2.0%, and 2.1% prevalence rates for the age group of 5 to 9 years old in Ontario, Nova Scotia, and Quebec, respectively [4]. During the follow-up period, 1,680 kids (7.2%) were found to have case-defined ADHD. See Table 1 for descriptive statistics for ADHD and non-ADHD groups.
Outcome definition
The target outcome is whether an individual has ADHD in a 4-year follow-up window, operationally defined as a binary outcome (1 –ADHD, 0 –No ADHD) based on administrative health data-derived case definition. This included ICD 9 and ICD 10 codes connected to inpatient and outpatient visits, psychiatric and mental health facility outpatient visits, physician claims, or a history of stimulant drug use based on Anatomical Therapeutic Chemical (ATC) Classification drug codes (see S1 Table for ADHD case definition). Incidence of ADHD was noted for the cohort in the 4-year period between March 2016 and March 2020.
Data analysis
Python 3.6 with the scikit-learn 1.0.1 package was used for data pre-processing and ML analysis. A total of 57 predicting variables or features were used for analysis (Table 1). The raw EDI data include 103 questions to evaluate vulnerability in five developmental domains. EDI features were based on categorical subdomain scores. The subdomain scores rated how well the children met developmental expectations at three levels: 1) Met almost all or all expectations (coded as 3). 2) Met some of the developmental expectations (coded as 2), or 3) Met few/none of the developmental expectations (coded as 1) [26]. We reverse coded the scores of physical readiness for the school day, anxious and fearful behavior, aggressive behavior, and hyperactivity and inattentive behavior, switching scores 1 and 3, so a higher score indicates more problem behavior rather than comporting with developmental expectations. In addition, children not fluent in the language of instruction in class (English/French as a second language, n = 2,986) and children with repeated grades (n = 699) were logged as categorical features. Twenty-six features with categorical responses (e.g., ‘Yes’, ‘No’) were dummy coded with the first redundant level dropped (e.g., “Breastfeeding” is first coded as two separate binary coded columns “Breastfeeding = yes”, “Breastfeeding = no”, both columns contain the same information, thus “Breastfeeding = no” is dropped).
We tested a set of linear and non-linear ML models to explore the combined and individual predictive utility of administrative data and EDI data. We also sought to identify important individual predictive factors driving predictions. For linear models, we included the standard Logistic Regression model and Logistic regression model with regularizations, i.e., Logistic Lasso Regression [46] and Logistic Ridge Regression [47]. We included Gradient Boosting [48] and Random Forest [49] for non-linear models. All models were optimized for the area under the receiver operating characteristic curve (AUC) and evaluated using 10-fold cross-validation (CV) with hyperparameter tuning (see S3 Table for details). During 10-Fold CV, the data were split into 10 equal subsets, where each subset was used to validate a model trained with the remaining subsets. As a common practice to enable performance comparison of multiple ML models based on the same processed data [50], all features were standardized using the StandardScaler function to a mean of 0 and a unit standard deviation after training and testing data splitting.
The optimal model was selected based on a performance using AUC. We ran three additional logistic regression models: 1) Administrative health data only (36 features), 2) EDI data only (23 features), and 3) ADHD symptoms (3 features) as baselines for performance comparison. Age and biological sex at birth were included in all baseline models. For all linear models, frequency-based weight adjustments (class_weight = balanced) were applied to control the class-imbalance effect (1,680 ADHD cases versus 21,567 No ADHD controls). The AUC confidence interval was derived based on 30 times of repeats of the 10-fold CV. Non-overlapping confidence intervals were interpreted as statistically different at p < 0.05.
Due to the computational complexity of the study pipeline, no feature selection algorithm was used during modeling. Feature importance was estimated based on ranked average coefficient values from 100 models using bootstrapping. For each of the 10 trained models during 10-Fold CV, we applied 10 times bootstrapping using randomly selected data, each representing 90% of the sample. Following the identification of important features, we aim to explain further how those features impact ADHD identification in its original unit. Some features were presented in a percentage unit but were standardized to zero mean and unit variance in the ML pipeline; thus, they were not directly interpretable. Additional analysis was conducted by fitting a logistic regression model with balanced weight adjustment to the raw data, with ADHD as the dependent variable and all other variables as independent variables, to extract the odds ratios of independent variables.
Results
Model performance
The standard Logistic Regression model without regularisation had a cross-validation AUC of 0.811, representing the best model performance. In contrast, other more complex ML models offered no enhancement of predictive performance (S2 Table). As a result, we concentrated on the outputs of the Logistic Regression to assess the model’s performance in terms of metrics, such as balanced accuracy, and to determine the top 10 predictive features based on feature importance ranking. The Receiver Operating Characteristic (ROC) of the best fitting and baseline models are plotted in Fig 2. The logistic model achieved a cross-validated balanced accuracy of 0.745, with a sensitivity of 0.717 and a specificity of 0.773, presenting a 9.5 percentage points increase in balanced accuracy compared to the model with no EDI features (balanced accuracy = 0.650). Compared to a model using features exclusively from EDI, we found a 0.4 percentage points balanced accuracy difference (balanced accuracy = 0.741). When compared to the ADHD symptoms model using EDI Hyperactive and Inattentive Behaviour score, Sex, and Age as features, we found a 4.3 percentage points balanced accuracy difference (balanced accuracy = 0.702). To facilitate the interpretability of the ML analysis, a logistic regression model with frequency weight adjustment for equal class weight was fitted to the raw data of the entire dataset to generate odds ratios corresponding to the raw data units and FDR adjusted p-value (based on α = 0.05).
The ROC curves of the best fitting and baseline models. The solid line illustrates the best-fitting model using all available features.
Predictive variables
Table 2 presents the top 10 predictive variables for case-defined ADHD, including four features from EDI. Odds ratios and confidence intervals of odds ratios were calculated based on regular logistic regression fits on raw data with class-balance weight adjustments. This part of the results is a separate analysis from ML prediction of ADHD but presented to facilitate interpretation of the top predictive features identified through ML in its original scale. Approaches to learning help assess how well children work neatly and independently, solve problems, adhere to rules and routines in class, and readily adapt to changes. English/French as a 2nd language indicates whether a child is not a native speaker of the classroom instruction language. Hyperactive and inattentive behavior evaluates the degree to which children show hyperactive behaviors: the ability to concentrate, settle in chosen activities, wait their turn, and think before doing something. Note the scores have been reverse coded, so a higher number indicates more problem behavior in the current study. Overall social competence evaluates the degree to which children have good or excellent overall social development, an ability to get along with other children and to play with various children, cooperative play, and self-confidence.
A high score on learning strategies, learning English or French as a second language (i.e., not being fluent in the language of instruction), having a female biological sex at birth, and having a high overall social competence is protective against an increased risk of ADHD, according to the multivariate model. A longer history of past mental health records, more hyperactive and inattentive behavior, and a mother’s history of mental health concerns at childbirth were all linked to higher probabilities of ADHD. In addition, demographic data, including a higher percentage of individuals with postsecondary education in the neighborhood, greater than or equal to 30% of income spent on housing, and larger average household size in the neighborhood, were associated with an increased risk of ADHD. The odds ratio of variables in percentage units is associated with per-unit odds change and, thus, needs to be interpreted within this context.
Discussion
Using cross-linked data from administration health and a population surveillance tool, the EDI, we investigated a ML approach to identify and validate the increased risk of ADHD in kindergarten-aged children in this study. We report an AUC of 0.811 and balanced accuracy of 0.745, demonstrating an increase of 9.5 percentage points when compared against the use of administrative health data alone to predict increased risk of ADHD. The EDI-only model also performed close to the comprehensive model (AUC = 0.796, balanced accuracy = 0.741). The ADHD symptoms model with EDI Hyperactivity and Inattentive Behaviour score, Sex, and Age as features performed in-between the EDI-only model and administrative health data-only model (AUC = 0.750, balanced accuracy = 0.702). The administrative health data alone is underperforming compared to other data but is still better than random prediction (AUC = 0.711, balanced accuracy = 0.650). The result suggests EDI, although designed to be a population surveillance tool for children’s vulnerability, may offer insight to facilitate identifying heightened risk of ADHD. Our results also further contribute to the literature on confirming key risk factors of ADHD that may be used to facilitate early identification and intervention to reduce the harm associated with ADHD.
Early identification of children with a heightened risk of ADHD often starts with parents’ and teachers’ suspicion and is confirmed by physicians later. However, earlier signs of ADHD are often overlooked, even though reliable patterns to identify ADHD may have already emerged. Our study supports that ML application on population-level data may offer a practical tool to identify the overlooked early warning signs and, therefore, raise the red flags for parents, teachers, and physicians, which in turn may translate to early diagnoses and intervention. Although there is currently a lack of comparable studies utilizing similar methods on populational data for children’s ADHD risk screening and a lack of general clinical tools to facilitate early childhood ADHD screening, our model’s performance is comparable to those studies aiming to identify other developmental disorders and childhood ADHD retrospectively. A clinical scale used for screening for autism, the Childhood Autism Rating Scale [51], composed of 24 questions, achieved a sensitivity of 0.71 and specificity of 0.75 in a large sample validation study. In a meta-analysis, pooled sensitivity and specificity of ADHD screening tools ranged from 0.72 to 0.84 [52], with the Conners Abbreviated Symptom Questionnaire reaching a balanced accuracy of 0.83. However, the scales are not designed to identify ADHD in a future time window.
The top four contributing features of our best-performing model are consistently EDI-based features. Higher scores on approaches to learning are protective against ADHD risk (OR = 0.58) and may indicate children with ADHD echo early signs of learning disabilities at kindergarten age [7]. Children learning English or French as a second language have a significantly reduced risk of ADHD (OR = 0.35). To the authors’ knowledge, there’s a lack of empirical findings on the impact of children not fluent in the language of instruction in class on ADHD. However, a lack of English structural skills has been shown to be positively associated with ADHD behavior [53]. Thus, it is likely children not fluent in the language of instruction have a higher risk of ADHD. The reduced risk of ADHD for English and French as a second language children in our model may indicate this is a group of children vulnerable to underdiagnoses of ADHD, a hypothesis that warrants future research. Further, it is not surprising that early observations of hyperactive and inattentive behavior are associated with a 1.62 OR increase for future diagnoses of ADHD, recognized as a primary symptom of ADHD. ADHD is delayed and underdiagnosed in the female population [18]. Thus, females with ADHD were less likely to be diagnosed in our sample of young children. Correspondingly, the female sex reduces the odds of ADHD by half in our model (OR = 0.52). In addition, ADHD children suffer from social incompetency [54], coinciding with our finding that a higher social competency score reduces the ADHD odds by 37% (OR = 0.63).
For a list of notable risk factors from health-data-based predictors, children with more years of past mental health visits, and mother’s poor mental health at birth are associated with largely increased odds of a future ADHD (OR 1.52 per year and 1.73, respectively), consistent with the literature where both children’s health [31–33] and maternal health [34] are risk factors of ADHD. The results cannot inform the underlying cause of increased odds; some plausible explanations may include poor mental health of the child and mother leading to attachment problems, as insecure attachment was high among ADHD children and their mothers [55,56]. However, this finding may inform mental health service providers and policymakers to allocate more mental health resources to parents with mental disorders, such as mothers suffering from post-partum depression or psychosis.
For census-based predictors, the risk of higher ADHD odds increases at OR 1.37 per person for a higher average household size in the neighborhood. This finding is in line with prior reports that larger household sizes coincide with increased childhood adversity, according to Rutter’s indicators [30]. The odds ratio of the percentage of individuals with postsecondary education and greater than or equal to 30% of renter income spent on housing was very low, at 1.02. The average percentage difference between the ADHD and No ADHD groups for those variables was also small in magnitude (e.g., < 1%). Thus, we could not draw a meaningful interpretation based on such small-magnitude effects.
Our results support the hypothesis that cross-linkage of administrative health data and population surveillance data collected by EDI may facilitate accurate individual-level prediction of ADHD, opening opportunities for harm reduction strategies such as promoting awareness of ADHD among teachers, parents, and clinicians and encouraging early access to health care for at-risk children. In recent literature, EDI data has been linked with administrative data records to study medical and social risk factors of non-specific developmental vulnerabilities. One study reported a reasonable concordance between ADHD case definition and EDI records, with a positive predictive value of 61.9% and a negative predictive value of 96.7% [57]. In another study, EDI data were cross-linked with census data to develop behavioral self-regulation profiles of children, showing children with a high-risk profile were more likely to be associated with a subsequent clinical diagnosis of ADHD up to 5 years later [28].
Another insight from the current study results is that administrative health and EDI data both have the potential to facilitate the identification of ADHD even without data crosslinking. Administrative data alone, even though performing subpar to models including EDI data, can be used to perform crude prediction of heightened ADHD risks (AUC = 0.711). It’s also not surprising that EDI data alone perform well in ADHD screening, as parent-reported and school-reported symptom data were often critical to making a diagnosis of ADHD. Importantly, the score on Hyperactive and Inattentive Behaviour also performs well when combined with basic demographic information. The current findings corroborated the literature that symptoms and school performance reports in kindergarten years are predictive for ADHD diagnosis into the school-age years. The current findings set a stage for future follow-up studies to refine predictive modeling algorithms and explore potential real-world applications of big data and ML to inform heightened ADHD risks.
In the current study, we deliberately removed children already diagnosed with ADHD or having a case-defined ADHD label at the time of data collection from our cohort to ensure cross-validation is applied to the prediction of a future label of ADHD and not contaminated with a present label. Thus, the model is trained explicitly for prospective prediction. If this group of children with dual labels (current confirmed ADHD, case-defined ADHD in a 4-year window) are included in training samples, the algorithm may perform at a higher classification accuracy as the ML model has access to more examples of ADHD children to differentiate from non-ADHD children.
Due to a relatively short follow-up window of four years, our data extraction may mislabel a proportion of children with ADHD identified after the follow-up window and falsely label them as having no ADHD in the study. Mislabeling ADHD cases with a delayed diagnosis as No ADHD cases in our model training process may weaken the model’s classification performance in identifying ADHD. Similarly, the model may identify someone who has a higher chance of having ADHD, but in our time frame, the person might not have received a diagnosis. This person is considered a false positive in our model yet could be a true positive case given a more extended time window. Thus, in this sense, false positive predictions from our model could be viewed as a risk indicator for the heightened risk of ADHD. Knowing this, we anticipate future studies with a longer longitudinal follow-up period may yield better classification results and confirm if a model prediction based on a shorter time window could be used for early identification of ADHD.
In addition, the current research focuses on identifying the best prediction model and may offer a limited interpretation of the relative contribution of features that are not prominent predictors of ADHD. The baseline model, including Hyperactive and inattentive behaviour, in addition to Age and Sex, for ADHD prediction performed quite well (AUC = 0.75), which was the strongest predictor.
Another limitation of the study is that the identification of ADHD is based on a case definition derived from administrative health data. This may be a good proxy for true ADHD diagnoses or a heightened risk, but not equivalent to a confirmed clinical diagnosis. The definition of surveillance cases usually has limited specificity but is sensitive and has a high degree of confidence in the identified true cases. For example, some parents may not want children to take psychoactive medication. They may also avoid seeking medical help for various reasons, including considerations such as cost, side effects, social stigma, or believing medication won’t help. Those considerations may result in not visiting a physician or visiting a physician only once without a follow-up, thus not meeting our at least two physician claims criteria and likely missing our criteria for taking ADHD medications. However, when considering the total number of children with case-defined ADHD from 5 to 10 years of age, the identified ADHD rate is higher than expected, at 7.2%. As a comparison, the prevalence rate of ADHD in Ontario, Canada has been estimated to be at 5.6% [58]. The higher prevalence of ADHD identified in our data suggests the case definition used in our study may have introduced more false positive cases, where children with no ADHD risk could have been identified as ADHD cases. Also, the modeling did not extract diagnoses of specific mental disorders and use them as predictive factors and cannot inform if predicted ADHD had comorbid diagnoses. Finally, the modeling pipeline conducted hyperparameter tuning on the full dataset to simplify the computation process, which may cause overfitting and inflate the ML performance. This limitation does not apply to models with no hyperparameter, such as logistic regression.
Future studies should explore further validation of the current study by following up the cohort longer than the 4-year follow-up windows, using clinical diagnoses of ADHD if such data becomes accessible. We also encourage researchers from other geographical regions with cross-linked EDI data to conduct similar analyses. Other areas of future directions include further exploring risk factors of ADHD, building specific models for subtypes of ADHD, ADHD with comorbidity, as well as focused investigation on at-risk populations for delayed diagnoses (e.g., girls).
Conclusion
In summary, the result of this study suggests that children at risk of ADHD could be identified prospectively at kindergarten age through ML algorithms that use administrative health and population-level surveillance data. The novel application of ML on cross-linked population-level data may have the potential to systematically improve parents’ and clinicians’ awareness of elevated ADHD risks, leading to early diagnosis and, in turn, promoting early intervention to minimize the negative impact of ADHD. We encourage future studies to further validate this approach by using diverse data samples from different regions, and further refining model performance on groups of children vulnerable to delayed diagnoses of ADHD.
Code availability
The study uses a Python package, dummyML, developed by the authors [59], available from: https://preprints.jmir.org/preprint/65966); code used for analysis is available to access at https://pypi.org/project/end2endML/
Supporting information
S2 Table. Model cross-validation performance.
https://doi.org/10.1371/journal.pdig.0000620.s002
(DOCX)
Acknowledgments
This study is based in part on data provided by Alberta Health. The interpretation and conclusions contained herein are those of the researchers and do not necessarily represent the views of the Government of Alberta. Neither the Government nor Alberta Health express any opinion in relation to this study.
References
- 1.
Centre for ADHD Awareness Canada. Paying Attention to the Cost of ADHD… The Price Paid by Canadian Families, Governments and Society. 2017. Available: https://caddac.ca/adhd/wp-content/uploads/2017/01/Socioeconomic-Policy-Paper-1.pdf
- 2.
American Psychiatric Association. American Psychiatric Association: Diagnostic and Statistical Manual of Mental Disorders Fifth Edition. Arlington. 2013.
- 3. Thomas R, Sanders S, Doust J, Beller E, Glasziou P. Prevalence of attention-deficit/hyperactivity disorder: A systematic review and meta-analysis. Pediatrics. American Academy of Pediatrics; 2015. pp. e994–e1001. pmid:25733754
- 4. Vasiliadis HM, Diallo FB, Rochette L, Smith M, Langille D, Lin E, et al. Temporal Trends in the Prevalence and Incidence of Diagnosed ADHD in Children and Young Adults between 1999 and 2012 in Canada: A Data Linkage Study. Can J Psychiatry. 2017;62: 818–826. pmid:28616934
- 5. Shaw P, Stringaris A, Nigg J, Leibenluft E. Emotion dysregulation in attention deficit hyperactivity disorder. American Journal of Psychiatry. American Psychiatric Association; 2014. pp. 276–293. pmid:24480998
- 6. Pauli-Pott U, Becker K. Neuropsychological basic deficits in preschoolers at risk for ADHD: A meta-analysis. Clinical Psychology Review. Pergamon; 2011. pp. 626–637. pmid:21482321
- 7. Thomaidis L, Choleva A, Janikian M, Bertou G, Tsitsika A, Giannakopoulos G, et al. Attention Deficit/Hyperactivity Disorder (ADHD) symptoms and cognitive skills of preschool children. Psychiatriki. 2017;28: 28–36. pmid:28541236
- 8.
DuPaul G. J., Stoner G. ADHD in the schools: Assessment and intervention strategies (3rd ed.). New York, NY: Guilford Press; 2014.
- 9. Meinzer MC, LeMoine KA, Howard AL, Stehli A, Arnold LE, Hechtman L, et al. Childhood ADHD and Involvement in Early Pregnancy: Mechanisms of Risk. J Atten Disord. 2020;24: 1955–1965. pmid:28938857
- 10. Baggio S, Fructuoso A, Guimaraes M, Fois E, Golay D, Heller P, et al. Prevalence of attention deficit hyperactivity disorder in detention settings: A systematic review and meta-analysis. Front Psychiatry. 2018. pmid:30116206
- 11. Sebastian A, Retz W, Tüscher O, Turner D. Violent offending in borderline personality disorder and attention deficit/hyperactivity disorder. Neuropharmacology. 2019. pmid:30844407
- 12. Diamond A, Lee K. Interventions shown to aid executive function development in children 4 to 12 years old. Science. 2011. pp. 959–964. pmid:21852486
- 13. Rimestad ML, Lambek R, Zacher Christiansen H, Hougaard E. Short- and Long-Term Effects of Parent Training for Preschool Children With or at Risk of ADHD: A Systematic Review and Meta-Analysis. Journal of Attention Disorders. SAGE Publications Inc.; 2019. pp. 423–434. pmid:27179355
- 14. DuPaul GJ, Kern L, Belk G, Custer B, Hatfield A, Daffner M, et al. Promoting Parent Engagement in Behavioral Intervention for Young Children With ADHD: Iterative Treatment Development. Topics Early Child Spec Educ. 2018;38: 42–53.
- 15. Lavigne J V., LeBailly SA, Hopkins J, Gouze KR, Binns HJ. The prevalence of ADHD, ODD, depression, and anxiety in a community sample of 4-year-olds. J Clin Child Adolesc Psychol. 2009;38: 315–328. pmid:19437293
- 16. Danielson ML, Bitsko RH, Ghandour RM, Holbrook JR, Kogan MD, Blumberg SJ. Prevalence of Parent-Reported ADHD Diagnosis and Associated Treatment Among U.S. Children and Adolescents, 2016. J Clin Child Adolesc Psychol. 2018;47: 199–212. pmid:29363986
- 17. Oliva F, Malandrone F, Mirabella S, Ferreri P, Girolamo G, Maina G. Diagnostic delay in ADHD: Duration of untreated illness and its socio-demographic and clinical predictors in a sample of adult outpatients. Early Interv Psychiatry. 2020; eip.13041. pmid:32945134
- 18. Sayal K, Prasad V, Daley D, Ford T, Coghill D. ADHD in children and young people: prevalence, care pathways, and service provision. The Lancet Psychiatry. 2018. pmid:29033005
- 19. Gruschow SM, Yerys BE, Power TJ, Durbin DR, Curry AE. Validation of the Use of Electronic Health Records for Classification of ADHD Status. J Atten Disord. 2019;23. pmid:28112025
- 20. Daley MF, Newton DA, DeBar L, Newcomer SR, Pieper L, Boscarino JA, et al. Accuracy of Electronic Health Record–Derived Data for the Identification of Incident ADHD. J Atten Disord. 2017;21. pmid:24510475
- 21. Morkem R, Handelman K, Queenan JA, Birtwhistle R, Barber D. Validation of an EMR algorithm to measure the prevalence of ADHD in the Canadian Primary Care Sentinel Surveillance Network (CPCSSN). BMC Med Inform Decis Mak. 2020;20. pmid:32690025
- 22. Mohr-Jensen C, Vinkel Koch S, Briciet Lauritsen M, Steinhausen HC. The validity and reliability of the diagnosis of hyperkinetic disorders in the Danish Psychiatric Central Research Registry. Eur Psychiatry. 2016;35: 16–24. pmid:27061373
- 23. Janus M, Offord DR. Development and psychometric properties of the Early Development Instrument (EDI): A measure of children’s school readiness. Can J Behav Sci. 2007;39: 1–22.
- 24. Janus M, Brinkman SA, Duku EK, Janus M, Duku EK, Brinkman SA. Validity and Psychometric Properties of the Early Development Instrument in Canada, Australia, United States, and Jamaica. Soc Indic Res. 2011;103: 283–297.
- 25. Janus M, Reid-Westoby C. Monitoring the development of all children: the Early Development Instrument. Early Childhood Matters. 2016.
- 26.
M., Janus, Walsh C. DE. Early Development Instrument: Factor structure, Sub-domains and Multiple Challenge Index. Hamilton, ON; 2005.
- 27.
Janus M. Estimating prevalence of behaviour problems in kindergarten children based on population-level data. XIV European Conference on Developmental Psychology. Bologna, Italy: Medimond International Proceedings; 2010. pp. 193–198.
- 28. Granziera H, Collie RJ, Martin AJ, Nassar N. Behavioral self-regulation among children with hyperactivity and inattention in the first year of school: A population-based latent profile analysis and links with later ADHD diagnosis. J Educ Psychol. 2021 [cited 20 Jul 2022].
- 29. Liu YS, Kiyang L, Hayward J, Zhang Y, Metes D, Wang M, et al. Individualized Prospective Prediction of Opioid Use Disorder. Can J Psychiatry. 2022; 07067437221114094. pmid:35892186
- 30. Østergaard SD, Larsen JT, Dalsgaard S, Wilens TE, Mortensen PB, Agerbo E, et al. Predicting ADHD by Assessment of Rutter’s Indicators of Adversity in Infancy. Hay PJ, editor. PLoS One. 2016;11: e0157352. pmid:27355346
- 31. Liu X, Dalsgaard S, Munk-Olsen T, Li J, Wright RJ, Momen NC. Parental asthma occurrence, exacerbations and risk of attention-deficit/hyperactivity disorder. Brain Behav Immun. 2019;82: 302–308. pmid:31476415
- 32. Lavebratt C, Yang LL, Giacobini M, Forsell Y, Schalling M, Partonen T, et al. Early exposure to antibiotic drugs and risk for psychiatric disorders: a population-based study. Transl Psychiatry. 2019;9: 317. pmid:31772217
- 33. Engelhard MM, Berchuck SI, Garg J, Henao R, Olson A, Rusincovitch S, et al. Health system utilization before age 1 among children later diagnosed with autism or ADHD. Sci Rep. 2020;10: 17677. pmid:33077796
- 34. Hall HA, Speyer LG, Murray AL, Auyeung B. Prenatal maternal infections and children’s socioemotional development: findings from the UK Millennium Cohort Study. Eur Child Adolesc Psychiatry. 2020;1: 3. pmid:32949288
- 35. Wüstner A, Otto C, Schlack R, Hölling H, Klasen F, Ravens-Sieberer U. Risk and protective factors for the development of ADHD symptoms in children and adolescents: Results of the longitudinal BELLA study. PLoS One. 2019;14. pmid:30908550
- 36. Bzdok D, Altman N, Krzywinski M. Points of Significance: Statistics versus machine learning. Nature Methods. 2018. pmid:30100822
- 37. Song Y, Qian L, Sui J, Greiner R, Li X min, Greenshaw AJ, et al. Prediction of depression onset risk among middle-aged and elderly adults using machine learning and Canadian Longitudinal Study on Aging cohort. J Affect Disord. 2023;339. pmid:37380110
- 38. Su D, Zhang X, He K, Chen Y. Use of machine learning approach to predict depression in the elderly in China: A longitudinal study. J Affect Disord. 2021;282. pmid:33418381
- 39. Librenza-Garcia Di, Passos IC, Feiten JG, Lotufo PA, Goulart AC, De Souza Santos I, et al. Prediction of depression cases, incidence, and chronicity in a large occupational cohort using machine learning techniques: An analysis of the ELSA-Brasil study. Psychol Med. 2021;51. pmid:32493535
- 40. Ellis RJ, Wang Z, Genes N, Ma’Ayan A. Predicting opioid dependence from electronic health records with machine learning. BioData Min. 2019. pmid:30728857
- 41. Lo-Ciganic W-H, Donohue JM, Yang Q, Huang JL, Chang C-Y, Weiss JC, et al. Developing and validating a machine-learning algorithm to predict opioid overdose in Medicaid beneficiaries in two US states: a prognostic modelling study. Lancet Digit Heal. 2022;4: e455–e465. pmid:35623798
- 42. Schultebraucks K, Shalev AY, Michopoulos V, Grudzen CR, Shin SM, Stevens JS, et al. A validated predictive algorithm of post-traumatic stress course following emergency department admission after a traumatic stressor. Nat Med. 2020;26. pmid:32632194
- 43. Cao M, Martin E, Li X. Machine learning in attention-deficit/hyperactivity disorder: new approaches toward understanding the neural mechanisms. Transl Psychiatry. 2023;13: 236. pmid:37391419
- 44. Goh PK, Elkins AR, Bansal PS, Eng AG, Martel MM. Data-Driven Methods for Predicting ADHD Diagnosis and Related Impairment: The Potential of a Machine Learning Approach. Res Child Adolesc Psychopathol. 2023;51: 679–691. pmid:36656406
- 45. Ter-Minassian L, Viani N, Wickersham A, Cross L, Stewart R, Velupillai S, et al. Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare data. BMJ Open. 2022;12: e058058. pmid:36576182
- 46. Tibshirani R. Regression Shrinkage and Selection Via the Lasso. J R Stat Soc Ser B. 1996;58: 267–288.
- 47. Cessie S Le, Houwelingen JC Van. Ridge Estimators in Logistic Regression. Appl Stat. 1992.
- 48. Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Stat. 2001;29: 1189–1232.
- 49. Ho TK. Random decision forests. Proc Int Conf Doc Anal Recognition, ICDAR. 1995;1: 278–282.
- 50.
Hastie T, Tibshirani R, Friedman J. Springer series in statistics: The elements of statistical learning: Data mining, inference and prediction. EAS Publications Series. 2009.
- 51. Tachimori H, Osada H, Kurita H. Childhood Autism Rating Scale—Tokyo Version for screening pervasive developmental disorders. Psychiatry Clin Neurosci. 2003;57: 113–118. pmid:12519463
- 52. Chang LY, Wang MY, Tsai PS. Diagnostic accuracy of Rating Scales for attention-deficit/hyperactivity disorder: A meta-analysis. Pediatrics. 2016;137. pmid:26928969
- 53. Sharma C. The effect of bilingualism on Attention Deficit Hyperactivity Disorder (ADHD)-related behaviour, ADHD symptoms, and executive functions in a general primary school. 2020 [cited 16 Dec 2021]. Available: https://aspace.repository.cam.ac.uk/handle/1810/305929
- 54. Storebø OJ, Andersen ME, Skoog M, Hansen SJ, Simonsen E, Pedersen N, et al. Social skills training for attention deficit hyperactivity disorder (ADHD) in children aged 5 to 18 years. Cochrane Database Syst Rev. 2019;2019. pmid:31222721
- 55. Özcan NK, Boyacıoğlu NE, Dikeç G, Dinç H, Enginkaya S, Tomruk N. Prenatal and Postnatal Attachment Among Turkish Mothers Diagnosed with a Mental Health Disorder. 2018;39: 795–801. pmid:30111211
- 56. Darling Rasmussen P, Bilenberg N, Shmueli-Goetz Y, Simonsen E, Bojesen AB, Storebø OJ. Attachment Representations in Mothers and Their Children Diagnosed with ADHD: Distribution, Transmission and Impact on Treatment Outcome. J Child Fam Stud. 2019;28.
- 57. Saunders NR, Janus M, Porter J, Lu H, Gaskin A, Kalappa G, et al. Use of administrative record linkage to measure medical and social risk factors for early developmental vulnerability in Ontario, Canada. Int J Popul Data Sci. 2021;6. pmid:34007902
- 58. Hauck TS, Lau C, Wing LLF, Kurdyak P, Tu K. ADHD Treatment in Primary Care: Demographic Factors, Medication Trends, and Treatment Predictors. Can J Psychiatry. 2017;62. pmid:28103079
- 59. Song Y, Liu YS, Metes D, Wang M, Cao B. dummyML: Automated tabular data analysis pipelines for non-experts (Preprint). JMIR Prepr. 2024.