Address for correspondence: Darren L. Dunning, Department of Psychology, University of York, York YO10 5DD, UK; e-mail: firstname.lastname@example.org
Children with low working memory typically make poor educational progress, and it has been speculated that difficulties in meeting the heavy working memory demands of the classroom may be a contributory factor. Intensive working memory training has been shown to boost performance on untrained memory tasks in a variety of populations. This first randomized controlled trial with low working memory children investigated whether the benefits of training extend beyond standard working memory tasks to other more complex activities typical of the classroom in which working memory plays a role, as well as to other cognitive skills and developing academic abilities. Children aged 7–9 years received either adaptive working memory training, non-adaptive working memory training with low memory loads, or no training. Adaptive training was associated with selective improvements in multiple untrained tests of working memory, with no evidence of changes in classroom analogues of activities that tax working memory, or any other cognitive assessments. Gains in verbal working memory were sustained one year after training. Thus the benefits of working memory training delivered in this way may not extend beyond structured working memory tasks.
Working memory (WM) is the cognitive system responsible for the temporary storage of information required to support ongoing everyday activities such as following instructions, mental arithmetic, and problem-solving (Adams & Hitch, 1997; Gathercole, Durling, Evans, Jeffcock & Stone, 2008b; Oberauer, Suß, Wilhelm & Wittman, 2008). Children with poor WM skills at school entry are at high risk of educational underachievement in reading and mathematics (Gathercole, Brown & Pickering, 2003), and typically make poor progress in all assessed areas of the academic curriculum (Geary, 2004; Swanson & Sachse-Lee, 2001). These difficulties in learning may be unsurprising due to the heavy WM demands of the classroom. Children with poor WM frequently fail to complete common classroom activities that require large amounts of information to be held in mind, experience difficulties following multi-step instructions, and have problems keeping their place in demanding and complex activities (Engle, Carullo & Collins, 1991; Gathercole & Alloway, 2008; Gathercole, Alloway, Kirkwood, Elliott, Holmes & Hilton, 2008a).
Early interventions that could overcome these relatively common cognitive and learning difficulties would clearly be of great potential benefit to these children, and there is accumulating evidence that intensive WM training can substantially boost performance on non-trained WM tasks (e.g. Klingberg, Fernell, Olesen, Johnson, Gustafsson, Dahlstrom, Gillberg, Forssberg & Westerberg, 2005). Training programmes typically require individuals to train intensively for a continued period on tasks adapted to their current performance. Cogmed Working Memory Training (CWMT; Klingberg, Forssberg & Westerberg, 2002, Klingberg et al., 2005) is the most widely used of these, and has been linked with enhancements in WM in children with ADHD (Klingberg et al., 2005; Holmes, Gathercole, Place, Dunning, Hilton & Elliot, 2010b), special educational needs in reading (Dahlin, 2010), and attentional problems (Mezzacappa & Buckner, 2010). There is also preliminary evidence that it is beneficial for children with low WM who have no other recognized learning difficulties (Holmes, Gathercole & Dunning, 2009).
The methodologies adopted in research studies of WM training have recently been the focus of debate (Gathercole, Dunning & Holmes, 2012; Klingberg, 2012; Melby-Lervåg & Hulme, 2013; Shipstead, Redick & Engle, 2010, 2012). Shortcomings levelled at the field include failing to employ outcome measures with high construct validity and not using multiple measures of each construct, not randomly allocating participants to condition, inadequate sample sizes and either the lack of a control group or failing to include both a no treatment baseline condition and an active comparison intervention to control for non-specific factors. The present study is the first randomized controlled trial (RCT) of CWMT with children with low WM that overcomes these methodological shortcomings. Both active and passive control interventions were included, schools were randomly allocated to condition, and multiple and well-validated measures of WM, reading, maths, and other cognitive skills were administered by researchers who were blind to intervention condition.
We investigated whether two fundamental aspects of transfer of training would withstand the challenges of this rigorous methodology: transfer to other conventional WM tasks, and transfer to analogues of WM-loaded classroom activities in which low WM children are known to be impaired (Gathercole, Lamont & Alloway, 2006, Gathercole et al., 2008b). The latter of these was measured by three tasks. In one, the child counted the number of words in a spoken sentence before attempting verbatim recall. This activity involved the temporary storage of the sentence while the words were being counted. The second task required the child to identify a pair of rhyming words in short spoken poems. Here the child must monitor and compare the phonological forms of a potentially large number of word pairs, hence requiring off-line support from WM. In the third task, the child attempted to carry out sequences of spoken instructions using familiar classroom items. Children with low WM have been shown to improve on this task following CWMT (Holmes et al., 2009).
We also examined whether selective cognitive enhancements with adaptive training extend to other cognitive functions including speed of visual search, sustained attention, IQ, as well as mathematics and reading. Whilst improvements in WM tasks that closely resemble the trained activities are reported consistently (e.g. Klingberg et al.,2005; Holmes et al., 2009), the evidence for transfer to tasks that share little overlap with the structure and content of trained activities while drawing on hypothesized common processes is mixed (e.g. Jaeggi, Buschkuehl, Jonides & Shah, 2011). Evaluation of the impact of training on children's learning is vital as it is central to the practical value of training. A small number of studies have reported changes in reading and mathematics performance after training, but in each case an RCT methodology was not employed. For example, Holmes et al. (2009) reported improvements in maths scores 6 months after adaptive training but did not conduct a similar follow-up assessment for the control group, raising the possibility that gains were a case of repeat testing. The improvements reported in reading comprehension by Dahlin (2010) were significant in comparison to a no-contact rather than an active control group, which guards against test–retest but not the Hawthorne effect (see McCarney, Warner, Illife, van Haselen, Griffin & Fisher, 2007). The present study provided the first substantial test of whether training in low WM children leads to improvements in learning using measures widely claimed to depend on WM such as mathematical reasoning (Swanson & Sacshe-Lee, 2001) and reading comprehension (e.g. Cain, Oakhill & Bryant, 2004).
The sustainability of any training gains and the potential impact of training on learning progress in literacy and in mathematics were assessed 12 months after training. Previous non-RCT studies have shown that the positive effects of training remain intact after 6 months (e.g. Holmes et al., 2009). The question here is whether such gains are sustained over a longer period using a more robust methodology.
Participants and procedure
To identify children with low WM, 810 children (425 boys, age M =8y 5 m, SD = 7.91 m) attending nine schools in the North-East of England were included in the screening phase. In these schools, 29% received free school meals (nationally, 19%), and 11% had special educational needs (nationally, 21%). All children were screened on two tests of the Automated Working Memory Assessment (AWMA; Alloway, 2007): a verbal test, backward digit recall, in which the children attempted to recall spoken strings of digits in reverse order, and a visuo-spatial test, Mr X, which involved the recall of a series of locations, interspersed with mental rotation decisions.
Ninety-four children with English as their first language (47 boys, age M =8y 5 m, SD = 7.97 m) with standard scores at or below the 15th centile on both screening measures were identified as having low WM. These children had normal or corrected-to-normal vision and hearing. No other exclusion criteria were applied. Written consent was obtained both from the participating schools and the parents/guardians of the children.
These children completed pre-training assessments (T1) and were assigned to one of three groups: adaptive training, non-adaptive training or no intervention. Participants in the adaptive and non-adaptive groups then completed 6 weeks' training. Post-training assessments were completed after training (T2) for the adaptive and non-adaptive groups, and 6 weeks after pre-assessment for the no intervention group.
On the basis of schools' willingness to participate, 15 children in the adaptive group (eight boys, mean age 9y 3 m, SD 4.97) and 19 in the non-adaptive group (eight boys, mean age 9y 6 m, SD 7.15) were re-tested 12 months after training (T3). A subset of measures was used to reduce testing time.
All assessments were conducted by research assistants blind to the intervention status of the children. Statistical power was .94 with a p level of .05 (Erdfelder, Faul & Buchner, 1996) for the main study, and .62 for the 12-month follow-up.
Participating schools were randomly assigned to adaptive training, non-adaptive training or no intervention conditions. This cluster randomization procedure minimizes contamination between the two training groups and avoids the dilution of effects that would have occurred with random assignment of pupils to conditions within schools (Torgerson & Torgerson, 2008). Although necessary given co-training of children from both training groups in school, cluster randomization is associated with decreased variability of individuals within clusters due to shared factors other than condition, such as having the same teacher. Intra-cluster correlations (ICC) were calculated for T1 measures (see Table 1) to measure the variance between clusters, and consequently determine how similar the outcome would be had individual randomization been used (Koch, 1982). ICCs of .5 –.6 indicate moderate agreement between clusters, .7–.8 show a strong agreement, and ICCs of over .8 almost perfect agreement (Landis & Koch, 1977). The ICCs in this study reflect an acceptably low level of risk that clustering would generate spurious differences between conditions (Kerry & Bland, 1998).
The adaptive training group received CWMT for 20–25 sessions. Each session lasted for between 30 and 45 minutes and consisted of training on eight exercises, with 15 trials on each, giving a total of 120 trials to be completed in each training session. The eight training exercises were selected daily from a bank of 12. CWMT is a commercially available product and the exercises presented in each training session were delivered in a preset manner. Task difficulty was adjusted on a trial-by-trial basis to match the participant's current WM span. A high-score chart allowed participants to gauge their performance level. For full details see www.cogmed.com/rm.
Children in the non-adaptive group were trained on a version of CWMT developed for trial evaluations (Klingberg et al., 2005). This version was identical to the adaptive version of the programme apart from training tasks were set at a low span level of two throughout the training period, and did not increase in difficulty with task success and, as the difficulty level was fixed, there was no high-score chart.
Training was conducted in school with groups of 6 –12 individuals, under researcher supervision. Children in both training groups received small rewards such as stationery items for every five training sessions completed. Inevitably, all motivational features could not be equated across the two training programmes. The percentage of total time spent on task recorded on the Cogmed log (relative to total training time) was comparable across groups (adaptive, 80%; non-adaptive, 82%).
At T1 eight standardized subtests from the AWMA were administered to each child over two testing sessions – two tests each of verbal STM (digit and word recall), visuo-spatial STM (dot matrix and block recall), verbal WM (counting recall and backward digit recall) and visuo-spatial WM (Mr X and spatial span). At T2, four tasks were repeated from T1 and four were introduced for the first time: verbal STM (digit and nonword recall), visuo-spatial STM (dot matrix and mazes memory), verbal WM (listening recall and backward digit recall) and visuo-spatial WM (Mr X and odd-one-out). Correlations between the four pairs of non-repeated subtests tapping common constructs were: word recall–nonword recall = .60, block recall–mazes memory = .49, counting recall–listening recall = .53, spatial span–odd-one-out = .59 (Alloway, Gathercole & Pickering, 2006). Composite scores were derived for each of the four aspects of WM by calculating the mean standard scores from the AWMA for each pair to increase the robustness of the assessments.
Three tests based on classroom activities known to vary with WM skill employed by Gathercole et al. (2006, 2008b) were administered at each testing point.
An array of 15 coloured objects (folders, boxes, pencils, rulers and erasers that were coloured red, blue and yellow) were placed in front of the participant. On each trial, the experimenter read aloud a set of instructions relating to a subset of the objects, which participants attempted to follow. A span method was used, with each span consisting of a block of six trials. Testing started at one action (e.g. touch the red pencil) and increased by one action per block until the participant was unable to complete four trials in a block correctly. The total number of trials correct was calculated.
On each trial, the experimenter read aloud two-line poems containing between 4 and 10 words (mean = 6.57) such as I like cakes, my mum bakes. Participants were asked to recall the two rhyming words in the poem. There were seven trials and the total number of pairs of rhyming words correctly recalled was calculated. The difficulty level of this task did not vary.
Sentence counting and recall
On each trial, a sentence containing between two and nine words (mean = 5.71) was spoken aloud (e.g. pigs like to roll around in the mud) and participants were asked to count the number of words in the sentence, say the total aloud, and then repeat the sentence verbatim. Fourteen sentences were presented.
At T1 and T2, participants completed the Wechsler Abbreviated Scales of Intelligence (WASI; Wechsler, 1999), which consists of four subtests: Similarities and Vocabulary (verbal IQ), and Matrix Reasoning and Block Design (performance IQ). They also completed the Mathematical Reasoning and Number Operations subtests of the Wechsler Objective Number Dimensions test (WOND; Wechsler, 1996) and the Basic Reading subtest of the Wechsler Objective Reading Dimensions (WORD; Wechsler, 1993). The Neale Analysis of Reading Ability test (NARA; Neale, 1997) yielded measures of reading accuracy, reading comprehension and reading rate. The Written Expression subtest from the Kaufman Test of Educational Attainment (KTEA; Kaufman & Kaufman, 2004) was also administered. At T3, participants completed the Similarities and Matrix Reasoning subtests of the WASI, the Basic Reading subtest of the WORD, the Number Operation subtest of the WOND, and the NARA.
Other cognitive assessments
The Continuous Performance Test (CPT; Conners & Multi-Health Systems Staff, 2004), a computerized assessment in which children respond to target letters, provided a measure of sustained attention; omissions, commissions and hit rate were recorded. Visual scanning speed was assessed using the Visual Scanning subtest of the Delis-Kaplan Executive Function System (D-KEFS; Delis, Kaplan & Kramer, 2001). This involved the detection of visual targets in a search task. Completion time (in seconds) and the number of errors were used to calculate the time taken per correct target. Both tasks were administered at T1, T2 and T3.
Table 2 shows descriptive statistics for T1 and T2 scores by group. Performance at T1, T2 and T3 is shown in Table 4 for the subset of children from the adaptive and non-adaptive groups who were followed up one year after training.
One-way ANOVAs were performed for each measure at T1 as a function of group (adaptive, non-adaptive and no intervention). Significant group differences were found for visuo-spatial STM, F(2, 91) = 4.03, p =.02, with the non-adaptive group outperforming the no-intervention group, F(1, 58) = 7.82, p =.01, and for verbal WM, F(2, 91) = 4.20, p =.02, with the non-adaptive, F(1, 62) = 6.91, p =.01, and no intervention groups, F(1, 62) = 6.52, p =.01, scoring higher than the adaptive group. There were also significant group differences at T1 for commissions on the CPT, F(2, 88) = 5.31, p =.01, with the no intervention group significantly higher than the adaptive group, F(1, 59) = 8.61, p =.01, and also for the Visual Scanning task, F(2, 76) = 3.49, p =.04, with the adaptive group scoring significantly lower than the non-adaptive, F(1, 61) = 5.04, p =.03, and no intervention groups, F(1, 62) = 6.03, p =.02.
To test group effects on training gains, general linear models were performed separately for the different T2 measures with scores at T2 entered as the dependent variable and scores at T1 and group as independent variables. For the measures with significant group differences at baseline, group*T1 interaction terms were added to the model. Interaction terms were created by first centring T1 scores and multiplying the centred score by the group dummy variables. Thus, the independent variables entered for visuo-spatial STM, verbal WM, CPT commissions and processing speed (the measures on which there were group differences at T1) were T1 scores, group, the T1*group interactions, and the centred T1 score.
Table 3 summarizes the outcome of these analyses. Significant training effects were observed for visuo-spatial STM, verbal WM and visuo-spatial WM, with scores significantly greater for the adaptive than both other groups at T2 (see Figure 1). This pattern of results was replicated when only the WM measures that were repeated at T1 and T2 were included. Again, scores were significantly greater for the adaptive group than both other groups for visuo-spatial STM, verbal WM and visuo-spatial WM: Verbal STM (Digit Recall), adaptive–non-adaptive, p =.19, adaptive–no intervention, p =.64; Visuo-spatial STM (Dot Matrix) adaptive–non-adaptive, p <.01, adaptive–no intervention, p <.01; Verbal WM (Backward Digit Recall) adaptive–non-adaptive, p =.01, adaptive–no intervention, p <.01; Visuo-spatial WM (Mr X) adaptive–non-adaptive, p =.05, adaptive–no intervention, p <.01. Basic Reading scores at T2 were also significantly predicted by group, with the no intervention group outperforming the other two groups. There were no other significant training effects.
Baseline performance for the subset of children from the adaptive and non-adaptive groups retested 12 months after training (T3) was analysed in series of one-way ANOVAs comparing scores for each measure at T1 that was re-administered at T3. There were no significant group differences at baseline. To test the effect of group on outcome, general linear models were performed separately for each of the T3 measures, with performance at T3 entered as the dependent variable and performance at T1 and group entered as independent variables. Group was a significant predictor of T3 scores for verbal WM and the processing aspect of the sentence counting task, with the adaptive group outperforming the non-adaptive group (see Table 4).
Table 4. T1, T2 and T3 scores and group coefficients for the children included in the 12 month follow-up
A Bonferroni correction reduces the probability criterion for significance to .003 at both T2 for the whole sample and T3 for those included in the follow-up. The only finding that withstands this correction is verbal WM at T2. However, effect sizes (Cohen's d) were substantial at T2, ranging from .67 to .99 between the adaptive and non-adaptive groups and .57 to 1.63 between the adaptive and no intervention groups.
In this RCT, adaptive WM training significantly boosted performance on untrained WM tasks in children with low WM. This enhancement was substantial in magnitude and was partially sustained for 12 months. Children who completed adaptive training made significantly greater improvements in tests of visuo-spatial STM and verbal and visuo-spatial WM than either children who completed a non-adaptive version of training or those who received no intervention. There were few significant differences between the two control groups, suggesting that the gains associated with adaptive training are unlikely to be due to expectancy effects. This is the first double-blinded RCT study on this key group of poor learners that meets stringent criteria for intervention research (e.g. Shipstead et al.,2010), and the results reinforce outcomes of an earlier training study with children with low WM (Holmes et al., 2009). Once again, WM training failed to enhance performance on tests of verbal STM (Holmes et al., 2009; Holmes, Gathercole & Dunning, 2010a). This is consistent with evidence that unlike the other measures, verbal STM tasks tap a highly specialized component of WM that places minimal demands on the executive control of WM and is linked with vocabulary acquisition (Baddeley, Gathercole & Papagno, 1998), rather than more general academic learning.
The boost in verbal WM performance with adaptive training persisted for 12 months for the subgroup of children re-tested at this point, beyond the 6-month period previously investigated (e.g. Holmes et al., 2009; Dahlin, 2010). This is the first indication using an RCT methodology that this relatively short but intensive intervention can lead to long-term improvements in verbal WM, the system that is likely to support learning in the predominantly verbal environment of the classroom. The effect size for the gains in verbal WM was large, though due to the relatively low power it would be premature to draw strong conclusions about the specificity of the long-term gains.
In an earlier study by Holmes et al. (2009), WM training was associated with improvements in the abilities of children with low WM to follow multi-step spoken instructions. In contrast, there were no significant enhancements to performance on laboratory analogues of WM-related classroom tasks, including following instructions, immediately after training in the present study. The only change observed in these tasks was in processing ability at follow-up where the accuracy with which children counted the number of words in the sentence counting task improved; there was no significant enhancement in the children's recall performance on this task. This represents a major limitation on the utility of training as, clearly, its potential value is in enhancing classroom functioning rather than performance on laboratory-based WM assessments. One possibility is that the benefits of training are simply restricted to computer-based WM tasks that share many of the surface features of the training tasks (Dahlin, Neeley, Larsson, Backman & Nyberg, 2008). Alternatively, it may be that the training regime employed here only does half of the job required. One of the cardinal principles of neurorehabilitation is that scaffolding and support is required for training to generalize and be effective in new situations (Wilson, 2008). WM trainees may therefore need guidance, practice and reinforcement to apply their newly developed skills or strategies to everyday activities with structures that deviate substantially from the trained tasks but which nonetheless depend in part on WM.
The greatest improvements in WM following training were observed in complex span measures strongly associated with children's academic achievements in literacy and mathematics (Swanson & Siegel, 2001; Alloway, Gathercole, Willis & Adams, 2004). However, adaptive WM training did not significantly improve children's performance on standardized reading and mathematics tests either immediately after training or one year later. Indeed, the only significant change in any group was an increase in basic reading scores for the no intervention group. These data stand in contrast with Holmes et al.'s (2009) findings that mathematical abilities were enhanced by adaptive training 6 months after training was completed. However, this earlier study lacked the comparison control condition at follow-up required to provide a stringent test of the specificity of training gains. It does, however, remain possible that outcome measures employed to date lack sufficient sensitivity to detect subtle and developing changes in learning abilities. In this respect, process-based tests of aspects of reading and mathematics abilities known to tax WM (e.g. Adams & Hitch, 1997; Geary, Hoard, Byrd-Craven, Nugent & Numtee, 2007) may well provide more sensitive measures of changes in WM capacity in situ.
Training had no significant impact on visual scanning or the ability to sustain attention over extended periods. It also had no effect on nonverbal reasoning, contrary to studies that have used n-back training paradigms (e.g. Jaeggi, Buschkuehl, Jonides & Perrig, 2008; Jaeggi et al., 2011), and others in which CWMT (Klingberg et al., 2005) has been used despite comparable statistical power. We therefore have no evidence to support claims that WM training enhances nonverbal IQ.
In summary, this study establishes two important facts about WM training using a stringent methodology designed to satisfy critics of the field. First, training in low WM children leads to generalized enhancements to a wide range of untrained WM tasks. Second, these gains do not translate into capacity improvements on ecologically valid measures of WM or to gains in academic progress. A priority now is to establish whether additional training activities can be developed to promote the flexible application of newly enhanced WM skills to less predictable memory-demanding situations in the classroom.
This research was supported by grant R1165301 to Susan Gathercole from the Leverhulme Trust. We would like to thank the children, teachers and parents whose participation made the study possible and Pearson for allowing us access to the Cogmed training program for the purposes of this evaluation.