Skip to main content


Lexical Tone Acquisition in Children

Recent work has shown that the acquisition of lexical tones in L1 production is a more ‘protracted process’ than previously hypothesized (Rattanasone et al. 2018). Accordingly, increasing attention has been payed to the questions of how and why children’s and adults’ tones differ (Wong 2012, Singh and Fu 2016). We present evidence from Thai tone production that ties up these two lines of inquiry.


With regard to how children’s and adults’ tones differ, we present evidence showing that the tones of young children are distinct from those of adults along a variety of perceptually relevant acoustic dimensions (Wong 2012): rhyme duration, F0 median, min, max, range, and slope. We also found that the relative timing of the F0 inflection point of falling and rising tones, with respect to the initiation of the rhyme, is different for the youngest children group and adults. Young children have significantly later inflection in rising tones and significantly earlier inflections in falling tones. We interpret these patterns as strongly suggestive that the acquisition of relative timing is an integral component in the acquisition of adult-like lexical tone categories.


With regard to why children’s and adults’ tones differ, we hypothesize that differences in F0 median, min, and max follow directly from anatomical differences. However, differences in F0 inflection point cannot be explained in these terms. Building on the insight that children falling tones are more similar to their adult counterparts than children rising tones, we follow Wong (2012) in proposing an explanation based on motor control abilities. Rising pitch contours involve synergistic control of both sternohyoid and cricothyroid muscles (CT) to lower and subsequently raise the pitch. Falling tones, on the other hand, require only tensing followed by relaxation of the CT (Hallé 1994). Differences in relative timing between children and adults, as well as between rise and falls, thus, could stem from different maturity in controlling the muscles responsible for pitch control.


This hypothesis cannot be modeled explicitly since models that map larynx dynamics to pitch output are still in their infancy (Döllinger et al. 2017). We bypass the problem by presenting a more macroscopic type of dynamical modeling cast in the task dynamics framework of Articulatory Phonology (AP). Lexical tones have been successfully modeled in AP (Gao 2008) as abstract target states for an F0 tract variable (Saltzman and Munhall 1989). We show that a model of this type with optimized parameters is capable of qualitatively simulating the observed children and adult patterns.