Chapter
What Can We Learn From Learning Curves?
Gottfried J. Mayer-Kress$, Karl M. Newell$, Yeou-Teh Liu#
$Department of Kinesiology, Penn State University, University Park, PA16801, #Taipei Physical Education College, Taipei, Taiwan
Presented at
: International Conference on Complex Systems - October 25-30, 1998, Nashua, NH1. Introduction
A common feature of all complex system is that in one way or another they show a form of adaptation that we can call learning in a broad sense of the word. In the following we present a number of historical examples of human motor learning that suggests a power law behavior of performance as a function of practice time. There have been theoretical explanation attempt based on "chunking" of information that is learned algorithmically [Newell & Rosenbloom, 1981]. We want to re-examine the generic form of learning curves in the framework of non-linear complex systems where we would expect both exponential [
Shaw & Alley, 1985] as well as power-law regularities in learning curves, depending on the dynamical state the system is in.
2. Functional Classes of Learning Curves
2.1. Exponential Functions And Powerlaws
Already in the beginning of this century it was observed that there are strong regularities in the general shape of learning curves in different areas of human motor learning [Thurstone, 1919]. Without much theoretical foundation a number of different functions have been proposed to fit the data [Thorndike, 1927]. It turns out, however, that all these functions fall into two categories [Newell, et all. 1998]:
1. Exponential functions: They correspond to a constant learning rates and one fixed time-scale. In this case the performance function p(n) changes with the number of practice sessions as:
![]()
where A is a constant corresponding to the asymptotic performance parameter, B indicates the difference to the initial performance,
g indicates the magnitude of the learning rate (generally g is negative) and therefore its inverse defines an intrinsic time-scale of the system. The constant n0 allows taking into account for a shift in the start time (in the following we assume n0 = 0). If we plot the logarithm of the performance function we obtain a linear graph with slope g :![]()
2. Powerlaw functions: The learning rate is decreasing and there is no single time scale ("fractal scaling", self similarity) [
Schroeder, 1991]. The general form of the performance function p(n) is given by:![]()
The parameters have analogous meanings as in the exponential case 1. The main difference is that
a does not play the role of a time-scale as g did but is an example of a "scaling exponent". For certain classes of complex systems scaling exponents play a central role and are "universal" in the sense that their value does not depend on system details. The exponent a can be estimated from the slope of the performance function (minus its asymptotic value) in a double logarithmic representation:![]()
Theoretical models based on an algorithmic information processing approach used a "chunking" hypothesis which lead to a powerlaw prediction for the learning curve [Newell & Rosenbloom, 1981]. The experimental findings indicate that the corresponding exponents are individual and task specific. On the other hand, from a complex systems viewpoint one can argue that learning corresponds to non-equilibrium phase transitions (Haken, 1983, Haken et al.,1985) and therefore one would expect that the power-law (scaling) exponent characterize universality classes. To our knowledge there has been not much empirical work that tried to classify learning curves according to their scaling exponents. On the other hand in the context of dynamical systems motor learning has been interpreted as basically transient behavior en route to an attractor [Schoner & Kelso, 1988, Schoner, 1989, Kelso, 1995] and in that case one would predict an exponential shape of the learning curve.
2.2 Mirror Tracing
In his classic experiment Snoddy asked his subjects to trace a circuit of a 12-edge, star shaped path with one fourth inch width [Snoddy, 1926]. The direct vision of the tracing instrument and the hands were obstructed by a screen so that only the indirect mirror image of the tracing device and hands was available to the participants. The instruction to the participants was to "move around as fast as possible and avoid making contact". Each trial consisted of completing one circuit, and the ratio of 1000 over the sum of tracing time (T) and number of contact made (E) within each trail [1000/(T+E)] was used as the trial score and performance parameter. Figure 1 shows the experimental data scanned from the original figure.. They are from 100 participants practicing 20 trials per day, 4 days' performance. The performance parameter is plotted as a function of "practice time" which is given by the number of trials a subject has performed that task. In figure 1 we have plotted the results in two different representations:
1. Log-Linear:
We plot the logarithm of the performance parameter as a function of the practice time. Linear segments correspond to exponential domains of the learning curve. (Figure 1a)2. Log-Log: We plot the logarithm of the performance parameter as a function of the logarithm of practice time. Linear segments correspond to powerlaw domains of the learning curve. (Figure 1b)
Figure. 1.
Two different representation of Snoddy's mirror tracing experiments [Snoddy, 1926]. Each data point represents the median performance of 100 participants practicing 1 circuit of mirror tracing task. The entire practice consisted of groups of 20 circuits, the circuits being separated by 1 minute interval and the groups by 24 hours interval. The first 10 circuits of the first group were not shown on the figure. In (a) we fit an exponential to each group and to the whole data set, in (b) we fit a power law function.We can see the "warm-up decrement" at the beginning of each day's data segment. Also the power-law seems to fit the data slightly better. That is also confirmed by calculating regression functions but the difference is not very dramatic. Therefore we want to study in the following the problem of discriminating the two alternatives under the common constraints of small data sets that are contaminated by noise.
3. Approximating Power Laws by Exponential Functions
In the following example we approximate the power law P7(n) = n-0.7 (red line in the log-log plot of Figure 2) with the sum of two exponential functions with only real exponents
g (see eq.(1)). The two amplitudes and exponents in Figure 2 are B1= 0.3961, g 1=-1.36 (blue) and B2= 0.6038, g 2=-0.161 (yellow)). We can see that the sum of the two exponentials (green) approximates the power law fairly well over a certain range of practice times. 
Figure.2.
Approxiation of a power law function P7(n) = n-0.7 (red) as the sum of two exponential functions E(n) = E1(n) + E2(n), where E1(n) = 0.3961 e-1.36 n, E2(n) = 0.6038 e-0.161 n. We observe a reasonably good approximation over seven trials.In the example above we assumed that all solutions with different time scales are active at the same time. If we look at the decay rates of a process that evolves according to a power law then we observe that those decay rates also decay. For example, for the power law x(n) = n-0.7 we can fit different exponents in consecutive time units.
3. Time Scales and Power Laws
We have discussed that in the context of learning a fixed time scale can be expressed by a fixed learning rate that then leads to an exponential learning curve. This means that the distance of the function that is measured by the learning curve to its asymptotic value decreases at a constant rate. If the learning curve is a power law then the rate is not constant but rather decreases continuously.
As an example, let us consider a learning curve that measures errors En of consecutive trials and is characterized by a power law with exponent: En = E n
g. Here n represents trial number (time) and g the rates at which the errors are assumed to decrease with trial number. For each trial number we can approximate the power law by an exponential function by estimating the learning rate R(n) at that specific trial number n. Under the assumption that we are far away enough from the target so that we have a continuous improvement of the performance (i.e. En > En+1 , En - En+1 small compared to En) this can be done by dividing the difference in consecutive errors (En - En+1, performance increase due to learning) by the value of the error En at that trial.If we do this for consecutive trials then we observe a systematic decrease not only of the error but also of the rate at which the error changes, the learning rate R(n). In the situation of the power law learning curve (En = E n
g) as shown in Figure 3 the learning rate R(n) decreases as:![]()
If we interprete the inverse of a rate as a system timescale then the decreasing rate in eq.(5) corresponds to a stretching of time scales. Figure 3 shows the change in learning rate as a function of practice time for our example function p7(n):

Figure.3:
Decreasing local learning rate R(n) for power law En = n-0.7.The above technique of taking the pair-wise trial rate of change provides a direct evaluation of the local time scales and a new way to discriminate between exponential and power law behavior. In the above example, if the learning data were exponential then the plot R(n) would have shown a horizontal function as a reflection of change from trial to trial that is proportional to the level of performance. If the data were that of a power law then a R(n) would be a decreasing function. We use this method later to assess the function of learning for some published data and show that it has some advantages over using the percent of variance accounted for in curve fitting to determine the appropriate function of change.
4. Power Laws From Concatenated Exponentials.
In this example we show that one can approximate a power law by concatenated exponentials. This is a feature that will arise when different learning rates dominate in different phases of learning. Since we have a given (local) learning rate R(n) at any point of the learning curve, we can approximate a power law by a sequence of processes with fixed but decreasing rates. For instance we can interpolate an exponential function hmn(t) between values of the learning curve Em and En for times t between m and n. In Figure 4 we illustrate that process with the example of the power-law En = E ng from above with g=-0.7 (green) and exponential interpolation between consecutive trials, i.e. n-m = 1 (red) and n-m=2 (blue) for trials n,m < 10.
![]()
This simple simulation confirms that a sequence of processes governed by exponential laws of decreasing exponents could approximate a learning curve that can
be best fitted by a power law. Besides overall quality of global fit we therefore also need to consider non-random modulation of the learning curve that provides additional evidence for the presence of exponential processes if those deviations are convex downward over specific ranges of time scales.

Figure.4:
Log-Log plot of power law En = E ng with g=-0.7 (red) and exponential interpolation between consecutive trials (red) and next to consecutive trials (blue).
5. Power Laws And Superpositions Of Exponentials
. Although the class of exponential functions with real exponents does not constitute a basis set in function space (as opposed to, for example, trigonometric functions as Fourier components) they nevertheless can approximate power-law functions over a finite range with limited precision. To illustrate this case let us assume for the moment that we have N participants with exponential learning curves with fixed rates
gi > 0 centered at a mean rate g and with a Gaussian distribution of width Dg.This means that for each participant "i" we have a learning curve: Ei,n = Ei e
g(i)n. If we average across participants then we observe an approximate averaged learning curve En = E egn. For large values of Dg, however, the observed averaged data can have a better fit with a power law than with an exponential. In Figure 5 we illustrate such an example. In summary, this simulation shows that averaging learning data across individuals that exhibit an exponential function with different exponents can lead to a power law for the collective function of change. To address the hypothesis of exponential functions being more likely to emerge in simple tasks, we have reanalyzed data from a study that was set-up to examine the time evolutionary features of learning a single biomechanical freedom timing task [Newell, Liu, & Mayer-Kress, 1997, Liu, Mayer-Kress, & Newell, 1998].
Figure 5:
Log-Log plot of distribution of simulated exponential learning curves Ei(n) (dots) with average rate g = 0.7 and variance s 2 = 0.25. The average E(n) was calculated over N = 1000 simulated participants with 10 trials each.
6. Time Scales and Motor Tasks
Based on what we know from complex systems we are in a position to predict when we would expect which functional form of learning curves: Simple tasks that do not involve qualitative changes in movement coordination but simply improve timing or accuracy should have a tendency towards exponential relaxation. More complex tasks including bifurcations would be expected to display power laws.

Figure 6:
Observed learning rates from a single bio-mechanical freedom timing task (Liu, Mayer-Kress, & Newell, 1998). Target time is 125ms for 5-20 deg. flexion.In figure 6 we present local learning rates from six different subjects for a elbow flexion timing task [Liu et al, 1998]. The selected data show significant performance improvement over the first seven trials. The rates are initially positive and do not show a systematic decrease with n. The negative values indicate the influence of stochastic variations. The approximately constant learning rate is consistent with an exponential form of the corresponding learning curve.
References
Haken, H., 1983, Synergetics: An introduction (3rd Ed.), Springer-Verlag (Berlin).
Haken, H., Kelso, J. A. S., & Bunz, H., 1985, A theoretical model of phase transitions in human hand movements, Biological Cybernetics, 51, 347-356.
Kelso, J. A. S., 1995, Dynamic Patterns, MIT press (Cambridge).
Liu, Y.-T., Mayer-Kress, G., & Newell, K. M ., 1998, A piecewise linear, stochastic map model for the sequential trial strategy of discrete timing tasks, Manuscript under review.
Newell, A., & Rosenbloom, P. S., 1981, Mechanisms of skill acquisition and the law of practice. In Cognitive skills and their acquisition, edited by J. R. Anderson, Erlbaum (Hillsdale), 1.
Newell, K. M., Liu, Y.T., & Mayer-Kress, G., 1997, The sequential structure of movement outcome in learning a discrete timing task, Journal of Motor Behavior, 29, 366-382.
Newell, K. M., Mayer-Kress, G., & Liu, Y.-T., 1998, Time scales in motor learning and development, Manuscript under review.
Snoddy, G. S., 1926, Learning and stability, J. Applied Psychology, 10, 1-36.
Shaw, R. E., & Alley, T. R., 1985, How to draw learning curves: Their use and justification. In Issues in the ecological study of learning, edited by T. D. Johnston & A. T. Pietrewicz, Erlbaum (Hillsdale), 275.
Schoner, G. (1989). Learning and recall in a dynamic theory of coordination patterns. Biological Cybernetics, 62, 39-54.
Schoner, G., & Kelso, J. A. S. (1988a). A synergetic theory of environmentally-specified and learned patterns of movement coordination. I. Relative phase dynamics. Biological Cybernetics, 58, 71-80.
Schoner, G., & Kelso, J. A. S. (1988b). A synergetic theory of environmentally-specified and learned patterns of movement coordination. II. Component oscillator dynamics. Biological Cybernetics, 58, 81-89.Schroeder, M., 1991, Fractals, chaos, power laws: Minutes from an infinite paradise, Freeman (New York).
Thorndike, E. L., 1927, The law of effect, American J. Psychology, 39, 212-222.
Thurstone, L. L., 1919, Psychological Monographs, XXVI, Whole No. 114.