Curiosity‐based learning in infants: a neurocomputational approach

Abstract Infants are curious learners who drive their own cognitive development by imposing structure on their learning environment as they explore. Understanding the mechanisms by which infants structure their own learning is therefore critical to our understanding of development. Here we propose an explicit mechanism for intrinsically motivated information selection that maximizes learning. We first present a neurocomputational model of infant visual category learning, capturing existing empirical data on the role of environmental complexity on learning. Next we “set the model free”, allowing it to select its own stimuli based on a formalization of curiosity and three alternative selection mechanisms. We demonstrate that maximal learning emerges when the model is able to maximize stimulus novelty relative to its internal states, depending on the interaction across learning between the structure of the environment and the plasticity in the learner itself. We discuss the implications of this new curiosity mechanism for both existing computational models of reinforcement learning and for our understanding of this fundamental mechanism in early development.

of infant visual category learning, capturing existing empirical data on the role of environmental complexity on learning. Next we "set the model free", allowing it to select its own stimuli based on a formalization of curiosity and three alternative selection mechanisms. We demonstrate that maximal learning emerges when the model is able to maximize stimulus novelty relative to its internal states, depending on the interaction across learning between the structure of the environment and the plasticity in the learner itself.
We discuss the implications of this new curiosity mechanism for both existing computational models of reinforcement learning and for our understanding of this fundamental mechanism in early development.

RESEARCH HIGHLIGHTS
• We present a novel formalization of the mechanism underlying infants' curiosity-driven learning during visual exploration.
• We implement this mechanism in a neural network that captures empirical data from an infant visual categorization task.
• In the same model we test four potential selection mechanisms and show that learning is maximized when the model selects stimuli based on its learning history, its current plasticity and its learning environment.
• The model offers new insight into how infants may drive their own learning.

| INTRODUCTION
For more than half a century, infants' information selection has been documented in lab-based experiments. These carefully designed, rigorously controlled paradigms allow researchers to isolate a variable of interest while controlling for extraneous environmental influences, offering a fine-grained picture of the range of factors that affect early learning. Decades of developmental research have brought about a broad consensus that infants' information selection and subsequent learning in empirical tasks are influenced by their existing representations, the learning environment, and discrepancies between the two (for a review, see Mather, 2013). On the one hand, there is substantial evidence that infants' performance in these studies depends heavily on the characteristics of the learning environment. For example, early work demonstrated that infants under 6 months of age prefer to look at patterned over homogenous grey stimuli (Fantz, Ordy, & also Kovack-Lesh & Oakes, 2007). Thus, the representations infants learn depend on bottom-up perceptual information. Equally, however, infants' existing knowledge has a profound effect on their behavior in these experiments. For example, while newborns respond equivalently to images of faces irrespective of the race of those faces, by 8 months infants show holistic processing of images of faces from their own race, but not of other-race faces, which they process featurally (Ferguson, Kulkofsky, Cashon, & Casasola, 2009). Similarly, 4-month-old infants with pets at home exhibit more sophisticated visual sampling of pet images than infants with no such experience (Hurley, Kovack-Lesh, & Oakes, 2010;Hurley & Oakes, 2015;Kovack-Lesh, McMurray, & Oakes, 2014). Effects of learning history also emerge when infants' experience is controlled experimentally. For example, after a week of training with one named and one unnamed novel object, 10-month-old infants exhibited increased visual sampling of the previously named object in a subsequent silent lookingtime task (Twomey & Westermann, 2017; see also Bornstein & Mash, 2010;Gliga, Volein, & Csibra, 2010). Thus, learning depends on the interaction between what infants encounter in-the-moment and what they know (Thelen & Smith, 1994).

| Active learning in curious infants
A long history of experiments, starting with Piaget's (1952) notion of children as "little scientists", has shown that children are more than passive observers; rather, they take an active role in constructing their own learning. Recent work demonstrates this active learning in infants also.
For example, allowing 16-month-old infants to choose between two novel objects in an imitation task boosted their imitation of novel actions subsequently performed on the selected item (Begus, Gliga, & Southgate, 2014). Similarly, in a pointing task, 20-month-old infants were more likely to elicit help from their caregivers in finding a hidden object when they were unable to see the hiding event than when they saw the object being hidden (Goupil, Romand-Monnier, & Kouider, 2016). Indeed, even younger infants systematically control their own learning: for example, 7to 8-month-olds increased their visual sampling of a sequence of images when those images are moderately-but not maximally or minimallypredictable (Kidd, Piantadosi, & Aslin, 2012; see also Kidd, Piantadosi, & Aslin, 2014). However, as a newly developing field active learning in infants is currently poorly understood (Kidd & Hayden, 2015).
Critically, outside the lab infants interact with their environment freely and largely autonomously, learning about stimuli in whichever order they choose (Oudeyer & Smith, 2016). This exploration is not driven by an external motivation such as finding food to satiate hunger. Rather, it is intrinsically motivated (Baldassarre et al., 2014;Berlyne, 1960;Schlesinger, 2013): in the real world infants learn based on their own curiosity. Consequently, in constructing their own learning environment, infants shape the knowledge they acquire. However, in the majority of studies on early cognitive development, infants' experience in a learning situation is fully specified by the experimenter, often through a preselected sequence of stimuli that are presented for fixed amounts of time. Thus, we currently know little about the cognitive processes underlying infants' curiosity as a form of intrinsic motivation, or indeed the extent to which what infants learn from curiosity-driven exploration differs from what they learn in more constrained environments. Given that active exploration is at the heart of development, understanding how they construct their learning experiences-and consequently, their mental representations-is fundamental to our understanding of development more broadly.

| Computational studies of intrinsic motivation
In contrast to the relative scarcity of research into infant curiosity, recent years have seen a surge in interest in the role of intrinsic motivation in autonomous computational systems. Equipping artificial learning systems with intrinsic motivation mechanisms is likely to be key to building autonomously intelligent systems Oudeyer, Kaplan, & Hafner, 2007), and consequently a rapidly expanding body of computational and robotic work now focuses on the intrinsic motivation mechanisms that may underlie a range of behaviors; for example, low-level perceptual encoding (Lonini et al., 2013;Schlesinger & Amso, 2013), novelty detection (Marsland, Nehmzow, & Shapiro, 2005), and motion planning (Frank, Leitner, Stollenga, Förster, & Schmidhuber, 2014).
Computational work in intrinsic motivation has suggested a wide range of possible formal mechanisms for artificial curiosity-based learning (for a review, see . For example, curiosity could be underpinned by a drive to maximize learning progress by interacting with the environment in a novel manner relative to previously encountered events . Alternatively, curiosity could be driven by prediction mechanisms, allowing the system to engage in activities for which predictability is maximal (Lefort & Gepperth, 2015) or minimal (Botvinick, Niv, & Barto, 2009). Still other approaches assume that curiosity involves maximizing a system's competence or ability to perform a task (Murakami, Kroger, Birkholz, & Triesch, 2015).
Although this computational work investigates numerous potential curiosity algorithms, it remains largely agnostic as to the psychological plausibility of the implementation of those mechanisms . For example, many autonomous learning systems employ a separate "reward" module in which the size and timing of the reward are defined a priori by the modeler. Only recently has research highlighted the value of incorporating developmental constraints in curiosity-based computational and robotic learning systems (Oudeyer & Smith, 2016;Seepanomwan, Caligiore, Cangelosi, & Baldassarre, 2015). While this research shows great promise in incorporating developmentally inspired curiosity-driven learning mechanisms into artificial learning systems, a mechanism for curiosity in human infants has yet to be specified. The aim of this paper therefore is to develop a theory of curiosity-based learning in infants, and to implement these principles in a computational model of infant categorization.

| The importance of novelty to curiositybased learning
From very early in development, infants show a novelty preference; that is, they prefer new items to items they have already encountered (Fantz, 1964;Sokolov, 1963). As infants explore an item, however, it becomes less novel; that is, the child habituates. During habituation, if a further new stimulus appears, and that stimulus is more novel to the infant than the currently attended item, the infant abandons the habituated item in favor of the new. Thus, novelty and curiosity are linked: broadly, increases in novelty elicit increases in attention and learning (although see e.g., Kidd et al., 2012Kidd et al., , 2014, for evidence that excessive novelty leads to a decrease in attention). Here, we propose that curiosity in human infants consists of intrinsically motivated novelty minimization in which discrepancies between stimuli and existing internal representations of those stimuli are optimally reduced (see also Rescorla & Wagner, 1972;Sokolov, 1963).
On this view, infants will selectively attend to stimuli that best support this discrepancy minimization. However, to date there is no agreement in the empirical literature as to what an optimal learning environment might be. For example, Bulf, Johnson, and Valenza (2011) demonstrated that newborns learned from highly predictable sequences of visual stimuli, but not from less predictable sequences.
In contrast, 10-month-old infants in a categorization task formed a robust category when familiarized with novel stimuli in an order that maximized, but not minimized, overall perceptual differences between successive stimuli (Mather & Plunkett, 2011). Still other studies have uncovered a "Goldilocks" effect in which learning is optimal when stimuli are of intermediate predictability (Kidd et al., 2012(Kidd et al., , 2014; see also Kinney & Kagan, 1976;Twomey, Ranson, & Horst, 2014). From this perspective, the degree of novelty and/or complexity in the environment that best supports learning is unclear.
Across these studies, novelty and complexity are operationalized differently; for example, as objective environmental predictability (Kidd et al., 2012(Kidd et al., , 2014, or objective perceptual differences (Mather & Plunkett, 2011). In contrast, in the current work we emphasize that for infants who are engaged in curiosity-driven learning, novelty is not a fixed environmental quantity but is highly subjective, depending on both perceptual environmental characteristics and what the learner knows. Importantly, each infant has a different learning history which can affect their exploratory behavior. For example, infant A plays with blocks at home and has substantial experience with stacking cube shapes. Infant B's favorite toy is a rattle, and she is familiar with the noise it makes when shaken. Consequently, the blocks at nursery will be more novel to infant B, and the rattle more novel to infant A. On this view, novelty is separate from any objective measure of stimulus complexity; for example, sequence predictability or differences in visual features (Kidd et al., 2012(Kidd et al., , 2014Mather & Plunkett, 2011). Thus, a fully specified theory of curiosity-driven learning must explicitly characterize this subjective novelty based both on the learner's internal representations (what infants know) and the learning environment (what infants experience). In the following paragraphs we provide a mechanistic account of this learnerenvironment interaction using a neurocomputational model.

| Computational mechanisms for infant curiosity
Computational models have been widely used to investigate various cognitive processes, lending themselves in particular to capturing early developmental phenomena such as category learning (e.g., Althaus & Mareschal, 2013;Colunga & Smith, 2003;Gliozzi, Mayor, Hu, & Plunkett, 2009;Mareschal & French, 2000;Mareschal & Thomas, 2007;Munakata & McClelland, 2003;Rogers & McClelland, 2008;Westermann & Mareschal, 2004. Here we take a connectionist or neurocomputational approach in which abstract simulations of biological neural networks are used to implement and explore theories of cognitive processes in an explicit way, offering process-based accounts of known phenomena and generating predictions about novel behaviors. Neurocomputational models employ a network of simple processing units to simulate the learner situated and acting in its environment. Inputs reflect the task environment of interest, and can have important effects across representational development. Like learning in infants, learning in these models emerges from the interaction between learner and environment. Thus, neurocomputational models are well suited to implementing and testing developmental theories. In the current work we employed autoencoder networks: artificial neural networks in which the input and the output are the same (Cottrell & Fleming, 1990;Hinton & Salakhutdinov, 2006; see Westermann & Mareschal, , 2012. Autoencoders implement Sokolov's (1963) influential account of novelty orienting in which an infant fixates a novel stimulus to compare it with its mental representation. While attending to the stimulus the infant adjusts this internal representation until the two match. At this point the infant looks away from the stimulus, switching attention elsewhere. Therefore, the more novel a stimulus, the longer fixation time will be. Similarly, autoencoder models receive an external stimulus on their input layer, and aim to reproduce this input on the output layer via a hidden layer.
Specifically, an input representation is presented to the model via activation of a layer of input nodes. This activation flows through a set of weighted connections to the hidden layer. Inputs to each hidden layer unit are summed and this value passed through a typically sigmoid activation function. The values on the hidden units are then passed through the weighted connections to the output layer. Again, inputs to each output node are summed and passed through the activation function, generating the model's output representation. Learning is achieved by adapting connection weights to minimize error, that is, the discrepancy between the input and output representations. Because multiple iterations of weight adaptation are required to match the model's input and output, error acts as an index of infants' looking times (Mareschal & French, 2000) or, more broadly, the quality of an internal representation.
Self-supervised autoencoder models are trained with the wellknown generalized delta rule (Rumelhart, Hinton, & Williams, 1986) with the special case that input and target are the same. The weight update rule of these models is: where Δw is the change of a weight after presentation of a stimulus. The first term, (i − o), describes the difference between the input and the model's representation of this input. The second term, Because learning in neurocomputational models is driven by the generalized delta rule, we propose that the delta rule can provide a mechanistic account of curiosity-based learning. Specifically, weight

| A test case: infant categorization
The ability to categorize-or respond equivalently to-discriminably different aspects of the world is central to human cognition (Bruner, Goodnow, & Austin, 1972). Consequently, the development of this powerful skill has generated a great deal of interest, and a large body of research now demonstrates that infant categorization is flexible and affected by both existing knowledge and in-themoment features of the environment (for a review, see Gershkoff- Rakison, 2005). Categorization therefore lends itself well to testing the curiosity mechanism specified above. In Experiment 1 we present a model that captures infants' behavior in a recent categorization task in which the learning environment was artificially manipulated (thus examining different learning environments in a controlled laboratory study in which infants do not select information themselves). Then, in Experiment 2 we test the curiosity mechanism by "setting the model free", allowing it to choose its own stimuli. We compare the learner-environment interaction instantiated in the curiosity mechanism against three alternative mechanisms, and demonstrate that learning history and learning plasticity (i.e., the learner's internal state) as well as in-the-moment input (i.e., the learning environment) are all necessary for maximal learning.
Taken together, these simulations offer an explicit and parsimonious mechanism for curiosity-driven learning, providing new insight into existing empirical findings, and generating novel, testable predictions for future work.

| EXPERIMENT 1
Early evidence for infants' ability to form categories based on small variations in perceptual features came from an influential series of familiarization/novelty preference studies by Barbara Younger (Younger, 1985;Younger & Cohen, 1983. In this paradigm, infants are familiarized with a series of related stimuli-for example, an infant might see eight images of different cats, for 10 seconds each. Then, infants are presented with two new images side-by-side, one of which is a novel member of the just-seen category, and one of which is out-of-category. For example, after familiarization with cats, an infant might see a new cat and a new dog. Based on their novelty preference, if infants look for longer at the out-of-category stimulus than the within-category stimulus the experimenter concludes that they have learned a category during familiarization which excludes the out-of-category item. In this example, longer looking at the dog than the cat image would indicate that infants had formed a "cat" category which excluded the novel dog exemplar (and indeed, they do ;Quinn et al., 1993) Younger (1985)  Infants' increased looking times to the peripheral stimulus indicated that they had learned a category that included the category-central stimulus. This study was one of the first to demonstrate the now much-replicated finding that infants' categorization is highly sensitive to perceptual variability (e.g., Horst, Oakes, & Madole, 2005;Kovack-Lesh & Oakes, 2007;Quinn et al., 1993;Rakison, 2004;Rakison & Butterworth, 1998;Younger & Cohen, 1986).
The target empirical data for the first simulation are from a recent extension of this study which to our knowledge has not yet been captured in a computational model. Mather and Plunkett (2011;hence-forth M&P) explored whether the order in which a single set of stimuli was presented during familiarization would affect infants' categorization. They trained 48 10-month-old infants with the eight stimuli from Younger (1985, E1). Although all infants saw the same stimuli, M&P manipulated the order in which stimuli were presented during the familiarization phase so that in one condition, infants saw a presentation order which maximized perceptual differences across the stimulus set, and a second condition which minimized overall perceptual differences.
At test, all infants saw two simultaneously presented novel stimuli, in line with Younger (1985): one category-central and one peripheral.
M&P found that infants in the maximum distance condition showed an above-chance preference for the peripheral stimulus, while infants in the minimum distance condition showed no preference. Thus, only infants in the maximum distance condition formed a category.
M&P theorized that if stimuli in this task were represented in a "category space", then infants in the maximum distance condition would traverse greater distances during familiarization than infants in the minimum distance condition, leading to better learning. However, it is not clear from these empirical data how infants adjusted their representations according to the different presentation regimes. To translate this theory into mechanism, we used an autoencoder network to simulate M&P's task. Closely following the original experimental design, we trained our model with stimulus sets in which presentation order maximized and minimized successive perceptual distances. To enable more fine-grained analyses we tested additional conditions with intermediate perceptual distances as well as randomly presented sequences (the usual case in familiarization/novelty preference studies with infants).
Like M&P we then tested the model on new peripheral and categorycentral stimuli. Based on their results, we expected the model to form the strongest category after training with maximum distance stimuli, then intermediate/random distance, and finally minimum distance.

| Model architecture
We used an autoencoder architecture consisting of four input units, three hidden units, and four output units (Figure 2). Each input unit corresponded to one of the four features of the training stimuli (i.e., leg length, neck length, tail thickness and ear separation; see Figure 1).
Hidden and output units used a sigmoidal activation function and weights were initialized randomly.

| Stimuli
Stimuli were based on Younger's (1985) Figure 1). Neither of these test stimuli was part of the training set.

| Procedure
During training, each stimulus was presented for a maximum of 20 sweeps (weight updates) or until network error fell below a threshold of 0.01 (Mareschal & French, 2000). The threshold simulated infants' looking away after fully encoding the present stimulus. To obtain an index of familiarization, we tested the model with the entire training set after each sweep (with no weight updating) and recorded sum squared error (SSE) as a proxy for looking time (Mareschal & French, 2000;Westermann & Mareschal, 2012  Thus, as with the infants in M&P, "looking" in the model decreased over training.

| Test trials
In M&P, increased looking to the peripheral stimuli at test was taken as evidence that infants had learned a category. Again using SSE as a proxy for looking time, we collapsed our analyses across the two peripheral stimuli (Mather & Plunkett, 2011), and calculated proportion of total test SSE (i.e., target looking / target looking + distractor looking) to the peripheral stimulus, as depicted in All other between-condition differences were also significant (all ps < .0001). Note that although infants did not show evidence of category formation in M&P's minimum distance condition, the authors argue that these infants were in fact learning a category; since distances were smaller, these infants traversed less of the category space than their peers in the maximum distance condition, and their category representations were therefore not sufficiently robust to be detected at test. However, our model data are less variable than M&P's empirical data, likely accounting for our detection of differences where M&P found null effects.
Overall, our results support M&P's distance-based account.
We make their theoretical category space explicit by implementing stimuli as feature vectors, which can be interpreted as locations in Euclidean space. The greater overall Euclidean distances in the max condition therefore force the model to "travel" further from trial to trial. Maximizing overall ED leads to greater error early in training, and therefore greater adaptation, resulting in stronger category learning overall. The model therefore explains how manipulation of stimulus order during training can lead to observed differences in learning at test.
In Experiment 1 (as in M&P) the order of stimulus presentation was fixed in each condition to control the mean successive ED. This approach created an artificially structured environment in which the model learned best from the inputs with the most inter-stimulus variation. Taken together, the empirical and computational data indicate that both infants and the model learn differently in differently structured environments-even when those differences may seem minor, such as the order in which stimuli F I G U R E 3 Proportion SSE to peripheral stimulus at test in Experiment 1 ***p < .001 chance *** *** *** *** all between-condition differences *** are experienced. However, Experiment 1 reflected artificially optimized rather than curiosity-based learning. An important question for research on curiosity-based learning is how a model that selects its own experiences structures its environment and how learning in this self-generated environment compares with learning in the artificially optimized environment in Experiment 1.
Thus, in Experiment 2 we allowed the model to choose the order in which it learned from stimuli based both on environmental and internal factors. Specifically, in line with theories of intrinsic motivation in which curiosity is triggered when a learner notices a discrepancy between the environment and their representation (e.g., Loewenstein, 1994), the model scans the environment and then selects the stimulus that maximizes a given function. This learning is analogous to an infant looking at and processing an array of objects before choosing one to learn from. We compared the curiosity-based learning discussed above with three alternative strategies that maximized objective complexity, subjective novelty, or plasticity at each learning step.

| EXPERIMENT 2
In Experiment 2, the model played an active role in its own learning by selecting the order in which it learned from stimuli. We explored four possible mechanisms for stimulus selection.

| Model architecture and stimuli
Model architecture and parameters and stimuli were identical to those used in Experiment 1. Stimulus selection proceeded without replacement; thus, as in Experiment 1 the model saw exactly eight stimuli.

| Procedure
The procedure used in Experiment 2 was identical to that used in Experiment 1, with the exception that stimulus order was determined by the model based on the following four methods of stimulus selection.

| Curiosity
In the curiosity condition we tested our formalization of infant curios- maximal. Critically, weights were not updated after this stage, simulating a novelty detection mechanism rather than the novelty reduction process of learning.

| Objective complexity maximization
M&P used Euclidean distance as a measure of inter-stimulus novelty and showed that maximizing novelty objectively present in the learning environment led to better learning than minimizing this novelty.
However, M&P selected the presentation orders in advance of the experiment so that the max condition maximized mean ED between stimuli across the sequence as a whole. However, our model aimed to provide an account of in-the-moment information selection. Thus, in the objective complexity maximization condition, at each step the model chose the stimulus that was maximally distant (by ED) from the current stimulus. Complexity is therefore specifically implemented as ED here. In this condition the first stimulus was chosen randomly and successive stimuli were selected so that the next stimulus had the maximal ED (i.e., perceptual distance) from the currently processed stimulus.

| Subjective novelty maximization
In the subjective novelty maximization condition the model selected stimuli by maximizing i − o, leading to the selection of a stimulus that was maximally different from its representation in the model. This mechanism maximized novelty relative to the model's learning history.
Subjective novelty maximization therefore reflects prediction-errorbased computational reinforcement learning systems (for a review, see Botvinick et al., 2009; see also Ribas-Fernandes et al., 2011), in which the learner seeks out learning opportunities that maximize the difference between expectation and observation.

| Plasticity maximization
Choosing which it was most ready to learn (disregarding how much it would actually be able to learn from that stimulus).
In all conditions the test phase was exactly as in Experiment 1, comparing network error to central and peripheral stimuli as a measure of strength of category learning.

| Results and discussion
Proportion of total SSE for peripheral test stimuli is depicted in were greatest in-the-moment, the longer-term effect of learning history, which determines the model's readiness to learn, was ignored. This result demonstrates that the additional plasticity provided by the o(1 − o) term was necessary for maximal learning; omitting this term affected the extent to which the model could adapt to its learning environment, reducing its ability to select stimuli that would lead to optimum information gain with respect to its learning history. However, maximizing plasticity alone is not sufficient to maximize learning: the model also performed better in the curiosity condition than in the plasticity maximization condition (Mdn = 0.75, W = 575, p < .001, r = −1.51). Since this latter mechanism ignores the in-the-moment effect of the environment this result suggests that while focusing solely on the environment is not the best strategy for active learning, ignoring how much can actually be learned from a stimulus is not optimal either. Finally, in line with Experiment 1 and M&P, the objective complexity maximization outperformed the subjective novelty and plasticity maximization conditions (respectively, W = 564, p < .0001, r = −1.37; W = 56, p < .0001, r = −1.36), further highlighting the importance of environmental input; however, we found no difference in performance between the subjective novelty maximization and plasticity maximization conditions (W = 318, p = .55, r = −0.12).
Overall, then, our formalization of curiosity maximized learning via the dynamic interaction of plasticity, learning history, and in-the-moment environmental input.
Next, we were interested in the level of complexity of the sequences that maximized learning in the curiosity condition. In the context of Experiment 1 and M&P, we might expect that the curious model had maximized these environmental distances. However, other empirical work suggests that intermediate difficulty could best support learning (Kidd et al., 2012(Kidd et al., , 2014Kinney & Kagan, 1976;Twomey et al., 2014). Equally, simplicity has been shown to support learning in some cases (Bulf et al., 2011;Son, Smith, & Goldstone, 2008). To help make sense of these conflicting results, all of which come from experiments with predetermined stimulus presentation orders, we analyzed the stimulus sequences generated by the curious model. Overall, the model generated four different sequences out of the total possible 40,320, depicted in Figure 5. On the one hand, these sequences are very similar; recall that the model selected stimuli without replacement, reducing the degrees of freedom as training proceeded. On the other hand, they are not identical. Their differences stem from the stochasticity provided to the model by the random weight initialization, which can be interpreted as differences between participants (Thomas & Karmiloff-Smith, 2003). Thus, as in human data, the model data exhibit individual differences underlying a single global pattern of behavior. Nonetheless, since the model generated only four different sequences over 24 runs, this result also predicts that systematicity in infants' curiosity-based learning should be relatively robust.
To obtain an index of the level of complexity of the generated orders we ranked the entire set of 40,320 permutations by mean overall ED, generating 281 unique values. Table 1    Category learning is therefore a process of moving from location to location within this space. From this perspective, the order in which the curious model chooses stimuli maximizes the number of times it traverses the central location in this space, resulting in strong encoding of this area relative to weak encoding of peripheral stimuli. More generally, the curiosity mechanism makes the intriguing prediction for future work that infants engaged in curiosity-driven learning will switch systematically between stimuli of maximum and minimum objective complexity.

| Novelty is in the eye of the beholder
Our goal here was to develop a mechanistic theory of infants' intrinsically motivated-or curiosity-based-visual exploration. We selected the autoencoder model and its learning mechanism based on their roots in psychological theory and their established success in capturing infants' behavior in empirical tasks. Importantly, the proposed curiosity mechanism is theoretically compatible with classical optimal incongruity approaches (e.g., Hebb, 1949;Kagan, 1972;Loewenstein, 1994;Vygotsky, 1980). According to these theories, learning is optimal in environments of intermediate novelty. Typically, these approaches have interpreted this intermediacy as information that is neither too similar nor too different from what the learner has previously encountered-as seen in the "Goldilocks" effect observed in recent empirical work (Kidd et al., 2012(Kidd et al., , 2014 (Goupil et al., 2016), this suggests that infants may also implicitly optimize their own learning (for an early empirical test of this predction, see Twomey, Malem, & Westermann, 2016).
Second, in line with looking time studies showing that infants select information systematically (Kidd et al., 2012(Kidd et al., , 2014 seeking out intermediate complexity at each learning event, infants may switch systematically between more and less objectively complex stimuli in the pursuit of maximal subjective novelty. Third, then, our account goes further than classical theories in which curiosity is viewed as either a novelty-seeking or a novelty-minimizing behavior (e.g., Loewenstein, 1994). Rather, our model predicts that infants' visual exploration should exhibit both novelty seeking and noveltyminimizing components when novelty is viewed objectively, unifying these theories in a single mechanism.

| A new approach to computational curiosity in visual exploration
This work contributes to computational research in intrinsic motivation by modeling curiosity using the mechanisms inherent in the existing model based on in-the-moment, local decision-making without a separate, top-down system for monitoring learning progress and/or reward. Existing computational and robotic systems typically simulate reward as generated by a discrete, engineered module that calculates a reward value using task-specific computations. Our model departs from this approach, showing that domain-general mechanisms can produce the motivation to learn, performing a similar function to reward without requiring a separate module; that is, in our model, "reward" is part of the algorithm itself. Overall, then the current work offers an explicit implementation of curiosity in infants' visual exploration, and offers a broader account of the cognitive mechanisms that may drive curiosity: learning that integrates a search for subjective novelty modulated by the learner's plasticity. Here, intrinsically motivated information selection emerges from within the model by exploiting its learning mechanism in a way that optimizes the reduction of discrepancy between expectation and experience.
Overall, this neurocomputational model provides the first formal account of curiosity-based learning in human infants, integrating subjective novelty and intrinsic motivation mechanisms in a single model.
The model is based on the view that early learning is an active process in which infants select information to construct their own optimal learning environment, and it provides a parsimonious mechanism by which this learning can take place. Clearly, our model is restricted to visual exploration; thus, investigating whether these mechanisms generalize to embodied learning situations is an exciting avenue for future work. Equally, it is possible that another one of the many potential mechanisms for intrinsically motivated learning may take over later in development, particularly once metacognition is established and language begins in earnest (e.g., Gottlieb, Oudeyer, Lopes, & Baranes, 2013). Nonetheless, the current implementation of curiosity not only provides novel insight into infant curiosity-driven category learning and makes predictions for future work both in and outside the lab, but also offers a new mechanistic theory of early intrinsically motivated visual learning.