Figures
Abstract
Timely interventions have a proven benefit for people experiencing psychotic illness. One bottleneck to accessing timely interventions is the referral process to the specialist team for early psychosis (STEP). Many general practitioners lack awareness or confidence in recognising psychotic symptoms or state. Additionally, referrals for people without apparent psychotic symptoms, although beneficial at a population level, lead to excessive workload for STEPs. There is a clear unmet need for accurate stratification of STEPs users and healthy cohorts. Here we propose a new approach to addressing this need via the application of digital behavioural tests. To demonstrate that digital behavioural tests can be used to discriminate between the STEPs users (SU; n = 32) and controls (n = 32, age and sex matched), we compared performance of five different classifiers applied to objective, quantitative and interpretable features derived from the ‘mirror game’ (MG) and trail making task (TMT). The MG is a movement coordination task shown to be a potential socio-motor biomarker of schizophrenia, while TMT is a neuropsychiatric test of cognitive function. All classifiers had AUC in the range of 0.84–0.92. The best of the five classifiers (linear discriminant classifier) achieved an outstanding performance, AUC = 0.92 (95%CI 0.75–1), Sensitivity = 0.75 (95%CI 0.5–1), Specificity = 1 (95%CI 0.75–1), evaluated on 25% hold-out and 1000 folds. Performance of all analysed classifiers is underpinned by the large effect sizes of the differences between the cohorts in terms of the features used for classification what ensures generalisability of the results. We also found that MG and TMT are unsuitable in isolation to successfully differentiate between SU with and without at-risk-mental-state or first episode psychosis with sufficient level of performance. Our findings show that standardised batteries of digital behavioural tests could benefit both clinical and research practice. Including digital behavioural tests into healthcare practice could allow precise phenotyping and stratification of the highly heterogenous population of people referred to STEPs resulting in quicker and more personalised diagnosis. Moreover, the high specificity of digital behavioural tests could facilitate the identification of more homogeneous clinical high-risk populations, benefiting research on prognostic instruments for psychosis. In summary, our study demonstrates that cheap off-the-shelf equipment (laptop computer and a leap motion sensor) can be used to record clinically relevant behavioural data that could be utilised in digital mental health applications.
Author summary
Neuropsychiatric assessment and accurate diagnosis are notoriously challenging. Psychosis represents a classical example of this challenge where many at-risk of psychotic illness individuals (often very young) are misdiagnosed and/or inappropriately treated clinically. Our study demonstrates that combining digital tests with data analytics has potential for simplifying neuropsychiatric assessment. It shows that using measurements from trail making task and mirror game allows to differentiate between people accepted for assessment in specialist team for early psychosis (STEP) and controls with outstanding performance (AUROC > 0.9), while achieving 100% specificity (no false positive detections). The study shows feasibility of using cheap, portable equipment, assembled from off-the-shelf components, for collection of clinically relevant data that could be used to inform clinical decision making. Moreover, our study, with its state-of-the-art performance and interpretable results, demonstrate high potential of implementing digital batteries of behavioural tests in clinical practice. Such developments would not only help to stratify STEPs users but would facilitate rapid assessment for all people seeking care in early intervention services. This in turn would contribute to improving the quality of life and wellbeing of all help seeking individuals.
Citation: Słowiński P, White A, Lison S, Sullivan S, Emmens T, Self P, et al. (2023) The potential of digital behavioural tests as a diagnostic aid for psychosis. PLOS Digit Health 2(9): e0000339. https://doi.org/10.1371/journal.pdig.0000339
Editor: Laura M. König, Universität Wien: Universitat Wien, AUSTRIA
Received: March 6, 2023; Accepted: July 29, 2023; Published: September 15, 2023
Copyright: © 2023 Słowiński et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The fully anonymised research data supporting this publication are openly available from: https://doi.org/10.17605/OSF.IO/RNZYS.
Funding: The research was supported by, EPSRC Impact Acceleration Account, Impact & Knowledge Exchange Award, Jean Golding Institute seed corn, Avon & Wiltshire Mental Health Partnership NHS Trust Research Capability Funding. PS was generously supported by the Wellcome Trust Institutional Strategic Support Award 204909/Z/16/Z. KTA gratefully acknowledges the financial support of the EPSRC via grant EP/T017856/1. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Psychosis is a severe mental illness characterised by loss of contact with reality and symptoms such as hallucinations, delusions and thought disorders. It can be one of the first symptoms of a range of serious and long-term mental disorders such as schizophrenia, affective and other psychoses. Developing psychosis in young adulthood is devastating and often disrupts the trajectory into healthy and independent adulthood; the mean age of onset is 22 years for men and 26 years for women [1]. People with serious mental illness die 15 to 20 years earlier than the general population [2].
Serious mental disorders are extremely expensive to treat, with presence of psychotic or affective symptoms being one of patient characteristics driving increase in hospital costs [3]. Their direct healthcare toll to NHS England have been estimated at £2.82 billion annually in 2019 [3]. While the most recent estimate puts the overall annual economic impact of schizophrenia and psychosis in England at £11.8 billion [4]. The higher overall economic impact includes reduced labour supply, premature mortality, reduced health-related quality of life, lost output, lost tax revenue, transfer payments, and unpaid care by family or friends.
Most risk factors for a poor outcome, such as gender or low socio-economic status, are difficult or impossible to alter. But people with psychosis have better outcomes if they are treated as soon as possible after their first symptom [5]. Early interventions can reduce the rate of relapse, risk of suicide, and number of hospital admissions [1,6]. They also significantly improve quality of life by enabling people to finish education and develop supportive networks outside the family of origin [1].
Early interventions are typically delivered by a specialist team for early psychosis (STEP) [7–12]. However, the referral process to STEPs is far from being optimal. Although most STEP referrals are from primary care, many general practitioners lack awareness of high-risk symptoms or are not confident with recognising the psychotic state, both of which could lead to people not receiving the care they need [13,14]. On the other hand, although increasing the number of referrals has been shown to be beneficial at the population level, it also leads to increases in STEPs workload thus contributing to the pressure on the care system. The extra work is caused by higher number of assessments requested as well as a need for an increased engagement (including dedicated liaison practitioners) with primary care providers to identify and refer people experiencing, or at risk of, psychotic illness [9,15].
Here, we investigate if digital behavioural tests can be used as an effective tool that allows differentiation between people referred to STEPs and the general population, and if they show potential to facilitate and standardise the referral process. Specifically, we use data from a digital version of the trail making task (TMT), a standard method for assessment of cognitive function [16,17], and the mirror game (MG), a novel way of assessing socio-motor functioning (motor coordination and interpersonal synchronisation) [18,19]. Our choice of the tasks is based on a significant body of research showing that assessment of movement, behaviour and cognitive function allows to accurately differentiate between people with schizophrenia and general population [18,20–31]. In particular, motor and executive functions [18,30] as well as eye movements [24,27] were shown to hold promising diagnostic potential. In addition, deficits in motor coordination were recently shown to be markers of long-term clinical outcomes [31], while performance in TMT was shown to differ in people at clinical high-risk for psychosis who transitioned from those who did not transition to psychosis [32]. Moreover, this existing body of research ensures generalisability of the presented results from sample to population. In addition, our study demonstrates feasibility of using cheap off-the-shelf components (laptop computer with a plug-in sensor) for simplifying neuropsychiatric assessment and introducing standardised digital tests to clinical practice [20].
Methods
Study design and participants
The study was designed as a prospective, cross-sectional feasibility study in a group of service users accepted for an assessment for psychosis including people with first episode psychosis or assessed as being at risk of developing psychosis. Control cohort was recruited independently at the University of Exeter (UoE). Demographic and clinical characteristics of participants can be found in Table 1.
Service users (SU) were identified and recruited by Devon Partnership NHS Trust (DPT) and Avon and Wiltshire Mental Health Partnership NHS Trust (AWP). In total we recruited 32 participants, all of which were included in the analysis (we do not have data about number of screened participants, none of the participants dropped-out). The inclusion criteria were being accepted for an assessment for psychosis or risk of developing psychosis by a consultant psychiatrist or a trained specialist with experience in at-risk mental states. The exclusion criteria were:
- Lacking capacity to provide informed consent for inclusion. The clinical team had an opportunity to assess the mental capacity at the CAARMS (comprehensive assessment of at-risk mental state (ARMS) [33] appointment before potential participants were approached to request consent to contact.
- Insufficient understanding of English to follow the test instructions.
- Any suspected organic cause of psychosis (i.e., head injury, epilepsy or dementia).
- Taking antipsychotic medication for longer than 4 months before the start of the study.
Each participant was offered £10 for participating and if necessary, reimbursement for reasonable travel expenses (after producing a receipt). SU were recruited between 19/07/2018–23/05/2019. The study was reviewed by Research Ethics Committee (REC) and received approval from Health Research Authority (HRA) and Health and Care Research Wales (HCRW); IRAS (Integrated Research Application System) ID: 236262, REC reference: 18/SW/0065, protocol number 1718/26.
Control cohort (CC) was identified at UoE. In total we recruited 86 eligible participants, of which 43 played the same version of MG as SU and were used to identify the n = 32 CC matching by age and gender the SU as close as possible. Participants were volunteers recruited by personally approaching potential participants, putting posters around UoE campus and at Exeter’s community centres, social media adverts, and snowball sampling. The exclusion criteria were:
- Moderate, or more severe, symptoms of depression, assessed by means of Patient Health Questionnaire-9 (PHQ-9) [34]. For ethical reasons we excluded the question concerning thoughts of suicide and self-harm. All participants scoring above 9, indicating at least moderate levels of depression, were signposted to several sources of support.
- A diagnosis of depression, an anxiety disorder or schizophrenia.
- Taking any psychopharmacological medication. Participants who indicated that they were having difficulties with mental health were directed to the UoE wellbeing centre.
- Suffering from seizures.
- English not being one of their first languages. This criterion was introduced to try to minimise the chances of misinterpretations due to the extensive use of questionnaires in the study.
CC was additionally screened using Community Assessment of Psychic Experiences-42 (CAPE-42) [35]; question 14, which asks about suicidal ideation and loads onto the depressive subscale, was excluded for ethical reasons. Each participant was offered £5 or one course credit for participating. CC was recruited by AW between 25/05/2018–26/11/2018 as part of his Master’s degree project. The recruitment of the CC was approved by University of Exeter, College of Life and Environmental Sciences (CLES), Psychology Ethics Committee, eCLESPsy000568 v2.1.
Participant flow chart is presented in Fig 1. All participants gave written informed consent prior to the study.
Further information about the study can be found at https://www.hra.nhs.uk/planning-and-improving-research/application-summaries/research-summaries/movement-and-perspective-taking-as-a-diagnostic-aid-for-psychosis/. Full study protocol can be accessed at http://hdl.handle.net/10871/132205.
Mirror game
The mirror game (MG) used in this study was based on the algorithm described by Zhai and colleagues [36] and followed closely our earlier work on establishing socio-motor markers of schizophrenia [18]. MG is a movement task that can be used to assess socio-motor functioning (motor coordination and interpersonal synchronisation) [18,19]. We used two MG tasks. The first task was a Solo game, where participants were asked to move their hand freely in a horizontal direction. Participants were given the following instruction: “Please move your hand left and right, create an interesting motion and enjoy playing.” The second task was a Leader-Follower game. In the second task an animated image of a robot appeared on the screen. The animation showed the robot controlling its own dot. The dot moved horizontally according to a pre-generated movement pattern. Participants were tasked with following the dot’s movement as closely as possible whilst it was on screen. Participants were given the following instruction: “Please try to follow the movement of an animated robot as accurately as you can.” During the Leader-Follower game the robot was also presenting parametric positive social feedback (smiling) as described by Cohen and colleagues [37].
The two tasks were grouped into one session. The session consisted of the Solo game, three repetitions of the Leader-Follower game and another Solo game. Each game lasted for one minute. The session was repeated three times. Participants were free to take breaks between the games and sessions. Each Leader-Follower game used a different pre-generated movement pattern. Patterns were the same for each participant. We excluded the 1st Solo game and the 1st Leader-Follower game to allow participants to get familiar with the task. The SU participants were sitting in front of a 17” diagonal laptop computer (1100x680 pixels screen resolution), the CC was using a 23” diagonal computer monitor; the image displayed on the computer monitor was scaled down to use the central 17” diagonal part of the monitor and have the same resolution as the laptop display. Movement of the hand was recorded using a leap motion sensor [38] and displayed as a dot on the screen. Participants used their dominant hand to control the horizontal position of a dot on the screen. The computer set-ups were different in the two groups due to the need of collecting the data simultaneously in multiple locations and additional research goals for the experiments with the CC that are not a part of the presented analysis.
In our analysis we used the recorded position of the participant hand (Solo and Leader-Follower) and the trajectory of the movement generated by the computer (Leader-Follower). Recorded position data is in arbitrary units in the range [-0.5, 0.5]; variable sampling rate, 90–140 Hz (S) and 40-70Hz (Leader-Follower). Pre-processing included:
- resampling to 100Hz with linear interpolation,
- low pass filtering with 5 Hz cut-off done using phase preserving Butterworth filter of degree 2,
- omitting the first and last 5s of the recording,
- estimation of movement velocity, using a fourth-order finite difference scheme.
Trail-making task
The trail-making task (TMT) [16,39] is a valid, public domain test of visual attention, working memory and executive control [40]. It has two parts, which were alternated. In each part, participants must click on 25 dots in a specified order as quickly and accurately as possible. The visual attention part (TMT A) had participants click numbers in ascending order, 1–25. The executive control part (TMT B) had participants alternate between clicking on numbers and letters, both in ascending order (1-A-2-B-3-C etc.). Participants completed each part three times, alternating between TMT A and B, starting with TMT A. Only the last two repetitions were included in the analysis. We excluded the 1st repetition to allow participants to get familiar with the task. We used a digital version of the task implemented in PEBL: The Psychology Experiment Building Language [41]. In the original study protocol the TMT was used as a non-diagnostic attention measuring task. It was retrospectively included in the analysis after literature review [37,42–44] and data analysis indicated that including participants’ performance in this task could be beneficial for differentiating between the SU and CC.
For analysis we used the times between each individual mouse click made by the participant, we also used times between mouse clicks made on the correct targets.
Testing procedure
The stages of the research session are presented in Table 2. Both tasks and examples of collected data are shown in Fig 2.
In the MG participant sat in front of a computer with a connected leap motion sensor. In the Solo game (first row) the participant was instructed: “Please move your hand left and right, create an interesting motion and enjoy playing.” We recorded the horizontal hand movement (blue). In the Leader-Follower game (second row) the participant was instructed: “Please try to follow the movement of an animated robot as accurately as you can.” We recorded movement generated by the computer (leader (L), green) and movement of the participant (follower (F), blue). In the TMT participant sat in front of the same laptop computer but was using a computer mouse to complete the task. In the TMT (third row) the participant was asked to connect a set of 25 dots as quickly and accurately as possible (in order given by numbers (Part A) and alternating numbers and letters (Part B)). We recorded the time between each click a person made on the screen and analysed both parts together (Part A, dots; Part B, crosses). For the sake of clarity, we show simplified illustration with 9 dots instead of 25. Part of the MG illustration is based on the same source file as Fig 1 in [18], Fig 1 in [18] is distributed under the terms of the CC BY 4.0 license (CC BY 4.0).
Sample size
For the feasibility study we recruited as many eligible SU as possible for the duration of the study. We approached all eligible participants that had been identified as appropriate for assessment for risk of psychosis by the Specialist Teams for Early Psychosis. Convenience sampling allowed us to proceed with the study as quickly as possible and assess what are feasible sample sizes for future research. Sample size of the CC was driven by the research objectives of AW’s Master’s degree project.
Features extracted from data
The selection of features for classification was informed by our earlier work [18], and modified to better fit the machine learning methodology employed in the current study. Instead of using distributions (histograms) as in the previous work, here we use a set of their descriptive statistics (e.g., mean, standard deviation or median) or point measures (e.g., power at 5Hz frequency). As previously, the data was concatenated or averaged across the repetitions. Before averaging or concatenating the movement data, we remove parts where participants’ moves reached the edges of the sensor range (-0.5 or 0.5 value). The complete list of the features, and their description, is presented in Table 3.
Results
Classes
We used the group (CC or SU), as the primary classification outcome (predicted variable), binary classification of the full dataset. Additionally, we used CAARMS score (CAARMS > 0 –at-risk mental state, psychotic or CAARMS = 0 –neither) as classification outcome of an independent binary classification within the SU group. CAARMS is one of the gold standards in assessing service users at risk of or with first episode of psychosis. CAARMS was completed by a consultant psychiatrist or a trained specialist with experience in at-risk mental states of STEPs before completion of the MG and TMT tasks.
Classification methods
We compared following classifiers: k-nearest neighbours (kNN), naïve Bayes (NB), support vector machines (SVM), bagged trees (BT), linear discriminant (LD) [48]. To find the k-nearest neighbours for kNN classifier we used a cosine distance to measure distances between the points in the n-dimensional feature space (each coordinate of the feature space corresponds to a single z-scored feature). Cosine distance is defined as 1–cos(θ), where θ is the angle between vectors defined by coordinates given by the set of features and the origin of the coordinate system. We used a NB classifier as implemented in Matlab 2022b function fitcnb [49], a SVM classifier as implemented in Matlab 2022b function fitcsvm [50], a BT classifier as implemented in Matlab 2022b function TreeBagger [51] and a LD classifier as implemented in Matlab 2022b function fitcdiscr [52]. All classifiers were trained using default Matlab 2022b settings.
To avoid overfitting, we used only three out of the 18 features (Table 3), namely one from the 6 features estimated from the TMT data, one from the 4 features estimated from the MG Solo task and one from the 8 features estimated from MG Leader-Follower task. To select the features, we used the value of Cliff’s delta [53], a non-parametric measure of effect size. The set of classification features was based on the results from the training phase, meaning that it was selected separately for each training-testing split (fold).
Additionally, we analysed how performance of the kNN and LD classifiers depends on the number of features. We compared kNN with 3 (selected as described above) and all 18 features and LD with 2 (selected as described above but without MG Solo task), 3 (selected as described above) and all 18 features.
Training and testing
To evaluate the performance of the classifiers, we used two training-testing splits. A 25% hold-out (HO) training-testing split and a leave-one-out (L1O) training-testing split (corresponding to 2% hold-out in our case). Parameters of the classifier and the three features used for classification were identified using only the training set. Hold-out data is used only for testing and is unseen by the classifier during the training.
In the 25% HO split we selected at random 25% of the data (8 out of 32 participants in each cohort). We train the classifier using the remaining 75% of the data (24 CC and 24 SU datasets). We used the 16 participants (8 CC and 8 SU datasets) unseen by the classifier during training to construct confusion matrix, and compute performance metrics. To estimate 95% confidence intervals of the classifier performance we repeated the 25% HO split 1000 times (1000 folds).
The leave-one-out (L1O) training-testing split used n-1 participants to train the classifier and 1 participant to test the model. L1O training-testing split allowed to test the methodology n times. The L1O split simulated situation where a new participant would be diagnosed using classifier based on all the data available prior to the arrival of the new participant. To construct confusion matrix and compute performance metrics we compared the original classes with the set of individual predictions of each of the L1O splits, i.e., we compared original class of the participant unseen by the classifier with the class predicted by the model trained using the other n-1 participants.
Classification results
All tested methods allowed classification of the CC and SU participants with an excellent (0.8–0.9) [54] or outstanding (0.9–1) [54] accuracy, sensitivity, specificity and precision. The only exception being acceptable (0.7–0.8) [54] or poor (0.6–0.7) [54] sensitivity in few cases (see Table 4).
Comparing how classification results depend on the number of features we found that the performance of the kNN classifier was comparable when using 3 or 18 features. The kNN classifier using three features had higher AUC but lower specificity and precision than the kNN using all 18 features. The performance of the LD classifier showed stronger dependence on the number of features. It performed best using three features and its performance decreased when using two and 18 features.
Furthermore, all the analysed methods failed to differentiate between SU with and without at-risk-mental-state (CAARMS score of 0 and CAARMS score > 0); binary classification within the SU cohort using CAARMS score as group label, CAARMS = 0 vs CAARMS > 0. Since there were only 16 participants with CAARMS = 0 and 16 participants with CAARMS > 0 we only used the L1O training-testing split. See Table 4 for details.
To better understand the difference in performance of the proposed methodology in the two cases (CC vs. SU and SU CAARMS = 0 vs SU CAARMS > 0), we compared distributions of the features in the 3 groups. We did not find any statistically significant differences between the SU CAARMS = 0 vs SU CAARMS > 0 groups. Interestingly, we observed that the values of most features in the SU CAARMS > 0 cohort differ more from the CC compared to the difference between SU CAARMS = 0 cohort and CC; see Fig 3 and Table 5. Moreover, we observed that the effect size (Cliff’s delta) is overall higher for the features from the MG task while remaining relatively unchanged for TMT. The statistical significance of the difference observed in the pattern of change between the SU CAARMS = 0 vs SU CAARMS > 0 groups was investigated using a bootstrap test with 10000 random splits of the SU cohort of observing simultaneous low change in effect sizes for TMT (smaller than the median change of the 6 TMT summary statistics measures), mixed change in effect sizes for Solo MG task (larger than the median change of the 4 Solo MG summary statistics measures) and large change in effect sizes for Leader-Follower MG task (larger than the median change of the 8 Leader-Follower MG summary statistics measures). We found this to be statistically significant with p < 0.0031.
Violin plots illustrate distributions of the values, white dot shows median, gray vertical bar shows IQR (middle 50% of values), scatterplots in each violin plot show all individual values. In all plots y-axis shows z-scored values in arbitrary units.
Discussion
We presented results of a feasibility study in which we investigated the potential for employing digital behavioural tests in healthcare practice for stratification of specialist teams for early psychosis (STEP) users and healthy cohorts. Our analysis demonstrated that the two investigated behavioural tests (MG and TMT) can be used to differentiate between STEPs users and healthy cohorts with excellent accuracy AUC>0.84 using any of the five analysed classifiers and two different training-testing splits, 25% hold-out and leave-one-out. Excellent performance of the classifiers is driven by statistically significant and large differences (large effect sizes) in features between the cohorts. Finally, we showed that cheap off-the-shelf equipment (laptop computer, 722.76GBP, and a leap motion sensor, 84.24GBP, prices at mid 2018) can be used to record clinically relevant behavioural data and that digital behavioural tests hold the prospect to aid clinical practice.
We also identified areas that require further research and development. We observed that the behavioural data from the MG and TMT collected in the current study cannot be used to differentiate between service users (SU) without (CAARMS = 0) and SU with at-risk-mental-state (CAARMS = 2) or first episode psychosis (CAARMS = 4). This result might partially reflect limited specificity of the CAARMS assessment, meaning that only 15–22% of individuals with at-risk-mental-state develop a full psychotic disorder within 12 months [54–56]. Another possible limitation is the small number of participants available in our SU cohort. Nonetheless, the fact the SU can be so accurately differentiated (large effect sizes for difference between features in the two cohorts) from the CC confirms that the so-called ‘non-cases’ among STEPs referral have a range of characteristic behavioural markers and constitute an important clinical cohort that differs from control cohort [57–60]. Moreover, the Specificity = 1 achieved by 3 out of 8 tested methods (kNN, SVM, LD3) means that it most accurately identifies control participants. This is important as misclassification in terms of mental health state in young individuals could have equally serious consequences due to stigma associated with mental health diagnosis [61].
Furthermore, we showed (analysis of the effect sizes) that SU with CAARMS>0 differ more from the CC than SU with CAARMS = 0. This indicates presence of differences between these two cohorts which could be uncovered by means of including additional tasks and additional data modalities. For example, recordings of hand movements during the TMT or recordings of eye-movements during both tasks. Inclusion of eye-movement data could be particularly beneficial since it is demonstrated to have diagnostic potential [24,27]. Additionally, using mechanistic (differential equations) models [62] to combine eye-movements, reaction time and movement data could help to identify people’s cognitive strategies e.g., employed to complete neuropsychological tasks [63,64]. Identification of the cognitive strategies and understanding their causal mechanisms would elucidate the role of pathophysiology in perturbed information processing and allow the development of new methodologies for risk and treatment stratification.
Finally, longitudinal studies using digital behavioural tests would be instrumental for understanding how motor coordination and other neurological signs change (decline or improvement) in the course of psychosis and why, as shown by Ferruccio and colleagues [31], they allow to predict its long-term severity.
Features’ importance and classifiers’ interpretability
Overall excellent performance of the classifiers can be explained by the large effect sizes of the differences between the cohorts in terms of the features used for classification. The effect sizes (presented in Table 5, column ‘Effect size’) are directly related to the differences (distance) between feature values and affect performance of all the considered classifiers. This conclusion is directly confirmed by the almost identical performance of the kNN classifier using 3 and 18 features, meaning that the 3 features with the highest effect size capture most of the information contained in the 18 features.
The best performance of the LD3 classifier demonstrates that the SU and CC cohorts are linearly separable. While decrease of performance of the LD2 classifier shows that all 3 tasks are important to discriminate between the cohorts. Table 5, column ‘Effect size’, shows that features derived from the TMT data are the most important and features from the MG Solo are the least important for classification. Worse performance of LD with 18 features could be a result of theoretical (‘curse of dimensionality’) [65] or numerical (e.g., separating points in higher dimension might require estimation of more complex decision boundaries) limitations.
Study limitations
There are two main limitations of the study. First, we did not control the level of education in the two groups and it is know that the performance in TMT is affected by years of education [17]. However, even using only the features from the MG allow to classify the CC and SU with AUC = 90 (0.69–1) and sensitivity = 0.75 (95%CI 0.5–1) and specificity = 0.88 (95%CI 0.62–1); kNN, 25% hold-out and 1000 folds. The second, potential source of bias is the short exposure to antipsychotic medication; less than 4 months. We allowed 4 months of antipsychotic medication in order to facilitate recruitment of participants while minimising potential for manifestation of motor side-effects associated with anti-psychotic drugs [66]. Therefore we anticipate the effect of medication status to be minimal and additionally confound with CAARMS score. In an earlier study we have shown that the obtained classification results are independent from anti-psychotic medication status [18]. Furthermore, a recent study showed that neurological signs (e.g., tests of coordination and balance) and their change over 10 years is likely unrelated to exposure to anti-psychotic drugs [31].
Feasibility of the real-world implementation
As a part of the study, we collected feedback from the participants and healthcare practitioners regarding acceptability and easiness of use of the tests. 65% of participants found the test acceptable as a part of a routine clinical assessment for risk of psychosis (answer >5 on a 10-point scale, 1 –not at all and 10 –very much, to a question: ‘How acceptable would you find the activity as a part of a routine clinical assessment for risk of psychosis?’) with only 5 out of 32 participants giving a score <5. 87.5% of participants replied that they would be comfortable taking such a test at home (answer >5 on a 10-point scale to a question: ‘Would you be comfortable taking such a test at home?’) with 17 out of 32 participants giving a score 9 or 10 and only 2 participants giving a score <5 (all questions and SU answers are available at https://doi.org/10.17605/OSF.IO/RNZYS). The healthcare practitioners (doctors and research nurses) who collected the data from the service users provided oral feedback about the tasks and the data collection set-up, they praised its portability (ability to run the tests at service users’ homes) and easy set-up (simply plugging in the USB cables into a laptop computer).
We envisage that most immediate potential future implementations of digital batteries of behavioural tests will be taking place within a healthcare setting, in a clinic or practitioners office). We foresee the tests as part of a decision support tool and to be employed as any other medical test. The data collected during the test will be treated as any other patients’ data collected during medical examination in terms of ethics and privacy. We believe that augmenting neuropsychological evaluation by means of digital batteries of behavioural test could be saving clinicians time for meaningful conversation with a help-seeking service user. We hope that the time saved by the tests, together with their demonstrated accuracy will help to humanise and destigmatise mental health diagnosis. We are aware that future deployment of such technology would require careful consideration of privacy and related ethical issues, which is beyond the scope of this paper.
Implications for clinical and research practice
Our findings reinforce the benefits of digital behavioural test and quantitative analysis of their results and their potential for being used as a mobile assessment platform; assessable in home settings as well [67]. Cheap, portable off-the-shelf equipment allows the assessment to take place in a range of indoor locations, while automatic data collection greatly simplifies the necessity for training of clinical personnel.
Digital behavioural tests would benefit research on prognostic instruments for psychosis. Recent review [54] identified heterogeneity in recruitment strategies for high-risk services as one of the factors limiting development of prognostic instruments for psychosis. Digital behavioural test could alleviate this limitation by stratifying and enabling the identification of more homogeneous clinical high-risk populations.
Finally, with further development standardised digital test batteries could supplement and augment neuropsychiatric/ neurological tests making them quicker and easier to apply in routine clinical practice. This would have wide ranging implications for home health, care-coordination and care referral. Therefore, future work should focus on identification of optimal set of tests for establishing standardised digital batteries of behavioural tests and their optimal technological implementations. Such innovative and cost-effective testing methods have the potential to be extended beyond STEPs users’ stratification [68] and would facilitate rapid assessments (in clinic or at home) for all people referred to mental health early intervention services [69,70], improving their quality of life and wellbeing.
Acknowledgments
The research team would like to thank all the STEPs users and control participants who generously shared their time and took part in the project. This study would not be possible without them.
References
- 1.
Yung AR. Chapter 3—At-risk mental states. In: Thompson AD, Broome MR, editors. Risk Factors for Psychosis. Academic Press; 2020. pp. 47–57.
- 2. de Mooij LD, Kikkert M, Theunissen J, Beekman AT, de Haan L, Duurkoop PW, et al. Dying Too Soon: Excess Mortality in Severe Mental Illness. Frontiers in Psychiatry 2019; 10: 855. pmid:31920734
- 3. Ride J, Kasteridis P, Gutacker N, Aragon Aragon MJ, Jacobs R. Healthcare Costs for People with Serious Mental Illness in England: An Analysis of Costs Across Primary Care, Hospital Care, and Specialist Mental Healthcare. Appl Health Econ Health Policy 2020; 18: 177–88. pmid:31701484
- 4. Andrew A, Knapp M, McCrone P, Parsonage M, Trachtenberg M. Effective interventions in schizophrenia: the economic case. 2012. Available from: http://www.schizophreniacommission.org.uk/the-report/.
- 5. Howes OD, Whitehurst T, Shatalina E, Townsend L, Onwordi EC, Mak TL, et al. The clinical significance of duration of untreated psychosis: an umbrella review and random-effects meta-analysis. World Psychiatry 2021; 20: 75–95. pmid:33432766
- 6. Polari A, Lavoie S, Yuen HP, Amminger P, Berger G, Chen E, et al. Clinical trajectories in the ultra-high risk for psychosis population. Schizophrenia Research 2018; 197: 550–6. pmid:29463457
- 7. Mortimer A, Brown T. Early intervention in psychosis: another triumph of hope over experience? Progress in Neurology and Psychiatry 2015; 19: 10–4.
- 8. O’Connell N O’Connor K, McGrath D, Vagge L, Mockler D, Jennings R, et al. Early Intervention in Psychosis services: A systematic review and narrative synthesis of the barriers and facilitators to implementation. European Psychiatry 2022; 65: e2.
- 9. England NHS. Implementing the Early Intervention in Psychosis Access and Waiting Time Standard: Guidance. NHS England, London 2016. Available from: https://www.nice.org.uk/guidance/qs80/resources/implementing-the-early-intervention-in-psychosis-access-and-waiting-time-standard-guidance-2487749725.
- 10. Singh K, Ghazi F, White R, Sarfo-Adu B, Carter P. Improving access to Early Intervention in Psychosis (EIP): the 2-week wait for cancer comes to psychosis. BMJ Open Qual 2018; 7: e000190. pmid:30167471
- 11. O’Donoghue B O’Connor K, Thompson A, McGorry P. The need for early intervention for psychosis to persist throughout the COVID-19 pandemic and beyond. Irish Journal of Psychological Medicine 2021; 38: 214–9. pmid:32434611
- 12. Yung AR, Wood SJ, Malla A, Nelson B, McGorry P, Shah J. The reality of at risk mental state services: a response to recent criticisms. Psychological Medicine 2021; 51: 212–8. pmid:31657288
- 13. Strelchuk D, Wiles N, Derrick C, Zammit S, Turner K. Identifying patients at risk of psychosis: a qualitative study of GP views in South West England. Br J Gen Pract 2021; 71: e113–20. pmid:33257466
- 14. Fusar-Poli P, Spencer T, De Micheli A, Curzi V, Nandha S, McGuire P. Outreach and support in South-London (OASIS) 2001–2020: Twenty years of early detection, prognosis and preventive care for young people at risk of psychosis. European Neuropsychopharmacology 2020; 39: 111–22. pmid:32921544
- 15. Perez J, Jin H, Russo DA, Stochl J, Painter M, Shelley G, et al. Clinical effectiveness and cost-effectiveness of tailored intensive liaison between primary and secondary care to identify individuals at risk of a first psychotic illness (the LEGs study): a cluster-randomised controlled trial. The Lancet Psychiatry 2015; 2: 984–93. pmid:26296562
- 16. Fellows RP, Dahmen J, Cook D, Schmitter-Edgecombe M. Multicomponent analysis of a digital Trail Making Test. The Clinical Neuropsychologist 2017; 31: 154–67. pmid:27690752
- 17. Park S-Y, Schott N. The trail-making-test: Comparison between paper-and-pencil and computerized versions in young and healthy older adults. Applied Neuropsychology: Adult 2022; 29: 1208–20. pmid:33397159
- 18. Słowiński P, Alderisio F, Zhai C, Shen Y, Tino P, Bortolon C, et al. Unravelling socio-motor biomarkers in schizophrenia. npj Schizophr 2017; 3: 8. pmid:28560254
- 19. Słowiński P, Zhai C, Alderisio F, Salesse R, Gueugnon M, Marin L, et al. Dynamic similarity promotes interpersonal coordination in joint action. Journal of The Royal Society Interface 2016; 13: 20151093. pmid:27009178
- 20. van Harten PN, Walther S, Kent JS, Sponheim SR, Mittal VA. The clinical and prognostic value of motor abnormalities in psychosis, and the importance of instrumental assessment. Neuroscience & Biobehavioral Reviews 2017; 80: 476–87.
- 21. Fradkin SI, Erickson MA, Demmin DL, Silverstein SM. Absence of Excess Intra-Individual Variability in Retinal Function in People With Schizophrenia. Frontiers in Psychiatry 2020; 11: 543963. pmid:33329084
- 22. Garvey MA, Cuthbert BN. Developing a Motor Systems Domain for the NIMH RDoC Program. Schizophrenia Bulletin 2017; 43: 935–6. pmid:28911051
- 23. Stephenson DD, Shaikh AAE, Shaff NA, et al. Differing functional mechanisms underlie cognitive control deficits in psychotic spectrum disorders. Journal of Psychiatry and Neuroscience 2020; 45: 430–40. pmid:32869961
- 24. Morita K, Miura K, Kasai K, Hashimoto R. Eye movement characteristics in schizophrenia: A recent update with clinical implications. Neuropsychopharmacology Reports 2020; 40: 2–9. pmid:31774633
- 25. Vinogradov S, Poole JH, Willis-Shore J, Ober BA, Shenaut GK. Slower and more variable reaction times in schizophrenia: what do they signify? Schizophrenia Research 1998; 32: 183–90. pmid:9720123
- 26. Athanasopoulos F, Saprikis O-V, Margeli M, Klein C, Smyrnis N. Towards Clinically Relevant Oculomotor Biomarkers in Early Schizophrenia. Frontiers in Behavioral Neuroscience 2021; 15: 688683. pmid:34177483
- 27. Wolf A, Ueda K, Hirano Y. Recent updates of eye movement abnormalities in patients with schizophrenia: A scoping review. Psychiatry and Clinical Neurosciences 2021; 75: 82–100. pmid:33314465
- 28. Lemvigh CK, Brouwer RM, Pantelis C, Jensen MH, Hilker RW, Legind CS, et al. Heritability of specific cognitive functions and associations with schizophrenia spectrum disorders using CANTAB: a nation-wide twin study. Psychological Medicine 2022; 52: 1101–14. pmid:32779562
- 29. Dean DJ, Scott J, Park S. Interpersonal Coordination in Schizophrenia: A Scoping Review of the Literature. Schizophrenia Bulletin 2021; 47: 1544–56. pmid:34132344
- 30. Bolt LK, Amminger GP, Farhall J, McGorry PD, Nelson B, Markulev C, et al. Neurocognition as a predictor of transition to psychotic disorder and functional outcomes in ultra-high risk participants: Findings from the NEURAPRO randomized clinical trial. Schizophrenia Research 2019; 206: 67–74. pmid:30558978
- 31. Ferruccio NP, Tosato S, Lappin JM, Heslin M, Donoghue K, Giordano A, et al. Neurological Signs at the First Psychotic Episode as Correlates of Long-Term Outcome: Results From the AESOP-10 Study. Schizophrenia Bulletin 2021; 47: 118–27. pmid:32656567
- 32. Hedges EP, See C, Si S, McGuire P, Dickson H, Kempton MJ. Meta-analysis of longitudinal neurocognitive performance in people at clinical high-risk for psychosis. Psychological Medicine 2022; 52: 2009–16. pmid:35821623
- 33. Yung AR, Yung AR, Pan Yuen H, Mcgorry PD, Phillips LJ, Kelly D, et al. Mapping the Onset of Psychosis: The Comprehensive Assessment of At-Risk Mental States. Aust N Z J Psychiatry 2005; 39: 964–71. pmid:16343296
- 34. Kroenke K, Spitzer RL, Williams JBW. The PHQ-9. J GEN INTERN MED 2001; 16: 606–13.
- 35. Stefanis NC, Hanssen M, Smirnis NK, Avramopoulos DA, Evdokimidis IK, Stefanis CN, et al. Evidence that three dimensions of psychosis have a distribution in the general population. Psychological Medicine 2002; 32: 347–58. pmid:11866327
- 36. Zhai C, Alderisio F, Słowiński P, Tsaneva-Atanasova K, Bernardo M di. Design of a Virtual Player for Joint Improvisation with Humans in the Mirror Game. PLOS ONE 2016; 11: e0154361. pmid:27123927
- 37. Cohen L, Khoramshahi M, Salesse RN, Bortolon C, Słowiński P, Zhai C, et al. Influence of facial feedback during a cooperative human-robot task in schizophrenia. Sci Rep 2017; 7: 15023. pmid:29101325
- 38. Ultraleap. Tracking | Leap Motion Controller | Ultraleap. 2022. Available from: https://www.ultraleap.com/product/leap-motion-controller/.
- 39. Brown EC, Casey A, Fisch RI, Neuringer C. Trail Making Test as a screening device for the detection of brain damage. Journal of Consulting Psychology 1958; 22: 469–74.
- 40. Sánchez-Cubillo I, Periáñez JA, Adrover-Roig D, Rodríguez-Sánchez JM, Ríos-Lago M, Tirapu JE, et al. Construct validity of the Trail Making Test: Role of task-switching, working memory, inhibition/interference control, and visuomotor abilities. Journal of the International Neuropsychological Society 2009; 15: 438–50. pmid:19402930
- 41.
Mueller, Shane T. The PEBL Trail-making task. Available from: https://pebl.sourceforge.net/.
- 42. Salesse RN, Casties J-F, Capdevielle D, Raffard S. Socio-Motor Improvisation in Schizophrenia: A Case-Control Study in a Sample of Stable Patients. Front Hum Neurosci 2021; 15: 676242. pmid:34744659
- 43. Goldstein G, Neuringer C. Schizophrenic and Organic Signs on the Trail Making Test. Percept Mot Skills 1966; 22: 347–50.
- 44. Mahurin RK, Velligan DI, Hazleton B, Mark Davis J, Eckert S, Miller AL. Trail Making Test Errors and Executive Function in Schizophrenia and Depression. The Clinical Neuropsychologist 2006; 20: 271–88. pmid:16690547
- 45. Watson D, Clark LA, Tellegen A. Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality and Social Psychology 1988; 54: 1063–70. pmid:3397865
- 46. Shimoyama I, Ninchoji T, Uemura K. The Finger-Tapping Test: A Quantitative Analysis. Archives of Neurology 1990; 47: 681–4.
- 47. Grinsted A, Moore JC, Jevrejeva S. Application of the cross wavelet transform and wavelet coherence to geophysical time series. Nonlinear Processes in Geophysics 2004; 11: 561–6.
- 48.
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. 1st ed. New York, NY: Springer, 2009
- 49.
Train multiclass naive Bayes model—MATLAB fitcnb—MathWorks United Kingdom. Available from: https://uk.mathworks.com/help/stats/fitcnb.html.
- 50.
Train support vector machine (SVM) classifier for one-class and binary classification—MATLAB fitcsvm—MathWorks United Kingdom. Available from: https://uk.mathworks.com/help/stats/fitcsvm.html
- 51.
Fit bagged trees classifier—MATLAB TreeBagger—MathWorks United Kingdom. Available from: https://uk.mathworks.com/help/stats/treebagger.html.
- 52.
Fit discriminant analysis classifier—MATLAB fitcdiscr—MathWorks United Kingdom. Available from: https://uk.mathworks.com/help/stats/fitcdiscr.html.
- 53. Cliff N. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin 1993; 114: 494–509.
- 54. Oliver D, Arribas M, Radua J, de Pablo GS, de Micheli A, Spada G, et al. Prognostic accuracy and clinical utility of psychometric instruments for individuals at clinical high-risk of psychosis: a systematic review and meta-analysis. Mol Psychiatry 2022; 27: 3670–78. pmid:35665763
- 55. de Pablo GS, Radua J, Pereira J, Bonoldi I, Arienti V, Besana F, et al. Probability of Transition to Psychosis in Individuals at Clinical High Risk: An Updated Meta-analysis. JAMA Psychiatry 2021; 78: 970–8. pmid:34259821
- 56. Fusar-Poli P, Bonoldi I, Yung AR, Borgwardt S, Kempton MJ, Valmaggia L, et al. Predicting Psychosis: Meta-analysis of Transition Outcomes in Individuals at High Clinical Risk. Archives of General Psychiatry 2012; 69: 220–9. pmid:22393215
- 57. O’Donoghue B, Lyne J, Renwick L, Madigan K, Kinsella A, Clarke M, et al. A descriptive study of ‘non-cases’ and referral rates to an early intervention for psychosis service. Early Intervention in Psychiatry 2012; 6: 276–82. pmid:22240056
- 58. Jordan G, Kinkaid M, Iyer SN, Joober R, Goldberg K, Malla A, et al. Baby or bathwater? Referrals of “non-cases” in a targeted early identification intervention for psychosis. Soc Psychiatry Psychiatr Epidemiol 2018; 53: 757–61. pmid:29541798
- 59. Lindhardt L, Lindhard M, Haahr UH, Hastrup LH, Simonsen E, Nordgaard J. Help-Seekers in an Early Detection of Psychosis Service: The Non-cases. Frontiers in Psychiatry 2021; 12: 778785. pmid:34955925
- 60. Edwards J, Norman R, Kurdyak P, MacDougall AG, Palaniyappan L, Lau C, et al. Unmet need for mental health services among people screened but not admitted to an early psychosis intervention program. Schizophrenia Research 2019; 204: 55–7. pmid:30121188
- 61. Rainteau N, Salesse RN, Macgregor A, Macioce V, Raffard S, Capdevielle D. Why you can’t be in sync with schizophrenia patients. Schizophrenia Research 2020; 216: 504–6. pmid:31839550
- 62. Słowiński P, Al-Ramadhani S, Tsaneva-Atanasova K. Neurologically Motivated Coupling Functions in Models of Motor Coordination. SIAM J Appl Dyn Syst 2020; 19: 208–32. pmid:31992962
- 63. Danion FR, Flanagan JR. Different gaze strategies during eye versus hand tracking of a moving target. Sci Rep 2018; 8: 10059. pmid:29968806
- 64. Brenner E, de la Malla C, Smeets JBJ. Tapping on a target: dealing with uncertainty about its position and motion. Exp Brain Res 2022; 241: 81–104. pmid:36371477
- 65. Trunk GV. A Problem of Dimensionality: A Simple Example. IEEE Transactions on Pattern Analysis and Machine Intelligence 1979; PAMI-1: 306–7. pmid:21868861
- 66. Kahn RS, Sommer IE, Murray RM, et al. Schizophrenia. Nat Rev Dis Primers 2015; 1: 1–23.
- 67. Koo BM, Vizer LM. Mobile Technology for Cognitive Assessment of Older Adults: A Scoping Review. Innovation in Aging 2019; 3: igy038. pmid:30619948
- 68. Mallawaarachchi SR, Amminger GP, Farhall J, Bolt LK, Nelson B, Yuen HP, et al. Cognitive functioning in ultra-high risk for psychosis individuals with and without depression: Secondary analysis of findings from the NEURAPRO randomized clinical trial. Schizophrenia Research 2020; 218: 48–54. pmid:32171637
- 69. Shah JL. Sub-threshold mental illness in adolescents: within and beyond DSM’s boundaries. Soc Psychiatry Psychiatr Epidemiol 2015; 50: 675–7. pmid:25660761
- 70. Iyer SN, Boksa P, Lal S, Shah J, Marandola G, Jordan G, et al. Transforming youth mental health: a Canadian perspective. Irish Journal of Psychological Medicine 2015; 32: 51–60. pmid:31715701