Research ArticlePHYSICS

The structure of musical harmony as an ordered phase of sound: A statistical mechanics approach to music theory

See allHide authors and affiliations

Science Advances  17 May 2019:
Vol. 5, no. 5, eaav8490
DOI: 10.1126/sciadv.aav8490


Music, while allowing nearly unlimited creative expression, almost always conforms to a set of rigid rules at a fundamental level. The description and study of these rules, and the ordered structures that arise from them, is the basis of the field of music theory. Here, I present a theoretical formalism that aims to explain why basic ordered patterns emerge in music, using the same statistical mechanics framework that describes emergent order across phase transitions in physical systems. I first apply the mean field approximation to demonstrate that phase transitions occur in this model from disordered sound to discrete sets of pitches, including the 12-fold octave division used in Western music. Beyond the mean field model, I use numerical simulation to uncover emergent structures of musical harmony. These results provide a new lens through which to view the fundamental structures of music and to discover new musical ideas to explore.


The ubiquity of music throughout history and across cultures raises a fundamental question: Why is this way of arranging sounds such a powerful medium for human artistic expression? Although there are myriad musical systems and styles, certain characteristics are nearly universal, including emergent symmetries such as a restriction to a discrete set of sound frequencies (pitches). Historically, the theory of music has followed an empirical top-down approach: Patterns are observed in music and generalized into theories. Recent work has aimed to generalize these generalized theories to uncover new potential patterns that can lead to new theories of music (13). Here, instead, we observe patterns that emerge naturally from a bottom-up theory. We start from two basic (and conflicting) principles: A system of music is most effective when it (i) minimizes dissonant sounds and (ii) allows sufficient complexity to allow the desired artistic expression. Mathematical statement of these principles allows a direct mapping onto a standard statistical mechanics framework. We can thereby apply the tools of statistical mechanics to explore the phenomena that emerge from this model of music. Just as in physical systems where ordered phases with lower symmetry (e.g., crystals) emerge across transitions from higher-symmetry disordered phases (e.g., liquids), we observe ordered phases of music self-organizing from disordered sound. These ordered phases can replicate elements of traditional Western and non-Western systems of music, as well as suggesting new directions to be explored.

The basis for a bottom-up approach was provided by discoveries in the field of psychoacoustics originating with Helmholtz (4) and further developed in the 20th century, which established a quantitative understanding of how sound is perceived. This leads to the idea that the structure of music is related to a minimization of dissonance D, as explored by Plomp and Levelt (5), Sethares (6, 7), and others. Minimization of D cannot be the only criterion for an effective musical system, however, or we would all listen to “music” composed from just a single pitch. Instead, an effective system of music must have some degree of complexity to provide a sufficiently rich palette from which to compose. A recognition of this idea has led to work on quantifying complexity in music, including by computing the entropy S of music in the context of information theory (8) or by considering musical systems to be self-organizing via an evolutionary process (9).

The model I present here combines both the minimization of D and the maximization of S. I draw an analogy to thermodynamic systems with energy U and entropy S, whose macrostate is determined by minimizing Helmholtz free energy F = UTS. The fixed temperature T is a parameter that specifies the trade-off between decreasing U and increasing S. Here, I similarly introduce a parameter T that specifies the trade-off between decreasing D and increasing S. A musical system in equilibrium will then be found by minimizing F = DTS, allowing us to exploit the powerful array of tools developed for studying physical systems in statistical mechanics.

The remainder of this paper is organized as follows: I next describe the general model presented here, including how dissonance is quantified. Then, we study the behavior of the model in the mean field approximation and observe phase transitions between disordered sound and ordered distributions of pitches that reproduce commonly used musical systems. Last, we turn to a more realistic model with fewer assumptions and use numerical simulation to explore the patterns that emerge on a lattice of interacting tones.

Music: A system of interacting tones

Music is composed of a set of sounds, commonly referred to as “tones,” arranged in time. Individual tones can be characterized by their fundamental frequency f (“pitch”) with amplitude A, and a set of “partials” with amplitudes αnA and frequencies ϕnf that approximately specify the frequency spectrum of the particular sound. For example, ϕn = n = 1,2, … and αn = 1/n for a sound produced by a sawtooth wave, as given by its Fourier series. In general, ϕn and αn determine the “color” or “timbre” of the sound and depend on the instrument producing the sound.

Here, we consider music as a system of Nt tones {ti}. Each tone ti has pitch fi, and for present purposes, we take all tones to have the same timbre (the same αn and ϕn). Two tones ti and tj interact with each other with strength Dij via a listener’s perception of dissonance between tones. The total dissonance Dtot = ∑{ij}Dij, where the sum is over all pairs {ij}.

Dissonance is quantified as Dij = aijD(fi, fj), where D(fi, fj) is a function specifying the dissonance between two tones with pitches fi and fj and a given timbre that are perceived together to a degree specified by the parameter aij (e.g., tones overlapping in time would have larger aij). We can compute D(fi, fj) for a given timbre by first positing a function d(fa, fb) that quantifies the dissonance (also known as roughness) for a pair of pure tones (that is, tones with pitches fa and fb and a single partial with ϕ = α = 1) and then summing over all pairs of partials. It is found experimentally that d(fa, fb) increases with pitch difference Δx = log2fa/fb up to a critical value Δx = wc and then falls off to zero for larger Δx (see Fig. 1A) (5). (With this logarithmic scale for pitch, Δx = 1 corresponds to a change of one octave.) The measured values of wc are found to depend on the lesser of fa and fb, varying from wc ∼ 0.5 for the lower audible range to wc ∼ 0.02 for the higher audible range. The preference for consonant pairs of pure tones is seemingly universal, appearing even in cultures with little exposure to Western music (10). We then make the simple assumption that the dissonance D of two non-pure tones is found by summing up the pairwise contributions from all pairs of partials (see Methods) (7). Figure 1B shows D(fa, fb) versus Δx with a fixed wc = 0.03 and the sawtooth timbre. This quantification of dissonance represents one of several competing approaches, but this formalism could also be applied to other models of dissonance perception [e.g., (11, 12)].

Fig. 1 Quantifying dissonance.

(A) Pure-tone dissonance curves dx) at different critical widths wc. (B) Dissonance function Dx) for tones with a sawtooth timbre and a fixed wc = 0.03. Just-tuned commonly used intervals are indicated by red dotted lines. Green dots show x = n/12, for integers n. (C) Fourier components dk of the periodic dissonance function Dpx) corresponding to Dx) shown in (B).

As noted by Plomp and Levelt (5) and others, for harmonic timbres (i.e., with ϕn = n), the prominent minima of D(fa, fb) occur at or near frequency ratios prominently featured in Western music. For example, the dotted lines in Fig. 1B indicate intervals of an octave, fifth, fourth, major third, and minor third. These intervals represent pitches with frequency ratios fa/fb = 2,3/2,4/3,5/4, and 6/5, respectively. One can note that the values of Δx corresponding to these ratios are reasonably approximated by integers 12,7,5,4, and 3 divided by 12 (green dots in Fig. 1B show integers n/12). Hence, Western music traditionally divides the octave into 12 distinct pitches. We will see that this pattern and others emerge from the model presented here.


Mean field approximation

We first apply a mean field approximation and study the equilibrium distribution of pitches. In the limit Nt → ∞ and all aij = a0 constant, the pitches within the set are described only by their probability distribution P(x), where the pitch x = log2f/fref is specified by the ratio of its frequency f with an arbitrary reference frequency fref. P(x) represents the probability with which a pitch x is used in a system of music. We make two further simplifying assumptions: (i) We take a fixed value of wc, so D is a function of x only, and (ii) we assume the periodicity P(x) = P(x + 1). That is, the pitch distribution is the same in every octave. With these assumptions, we define Dp(x)=n=D(x+n) and restrict our attention to a single octave x = [0,1). Integrating over P(x), we obtain the total dissonanceDtot=120101P(x)Dp(xy)P(y)dydx(1)and entropyS=01P(x)ln P(x)dx(2)

We now wish to find the distribution P(x) that minimizes the free energy F = DtotTS, subject to the constraint that P(x) is normalized. Using a variational method, we obtain an integral equation for the extremum condition and find stable minima using an iterative approach (see Methods) (13).

Figure 2A shows the solutions P(x) for several values of T using the first 10 partials of the sawtooth timbre and wc = 0.03 [see fig. S1A for a plot of P(x) over a continuous range of T]. At high T, P(x) = 1 is the stable solution, and the system is completely disordered. At low T, P(x) approaches a single delta-function spike at an arbitrary pitch. As T increases from zero, additional peaks emerge, eventually leading to a transition to a P(x) with 12 equal peaks. We will see below that the high- and low-T behaviors are separated by phase transitions from random, disordered sound to the ordered, discrete tuning systems used in music.

Fig. 2 Mean field results.

(A) Solutions P(x) to Eq. 4 for a range of T and wc = 0.03. (B) Modulus ∣pk∣ of order parameters versus temperature T. The dashed line indicates the predicted Tc1 = − d12/2.

Insight can be gained using the Fourier representations P(x)=12k=0pke2πikx+pk*e2πikx and Dp(x)=k=0dkcos(2πkx). For example, the real coefficients dk are shown in Fig. 1C for the Dp(x) calculated from (Fig. 1B). Normalization requires p0 = 1, with the other pk representing order parameters that describe the degree of k-fold periodic order.

Figure 2B shows ∣pk(T)∣ for k = 0 to 20 for the data shown in (Fig. 1A). As expected, the high temperature solution has p0 = 1 and all other pk = 0. As T decreases, a single component p12 is the first to become nonzero at Tc1 = 20.2. After a further decrease of T, the other pk become nonzero at T < Tc2 = 16.2.

The transition between the high-T disordered solution and the 12-fold solution can be understood from Landau theory. In the neighborhood of Tc1, we can expand F in powers of a single order parameter pk0 (see Methods). We find that there is a second-order phase transition at Tc1 = − dk0/2 from the disordered phase to a phase with k0-fold order, where dk0 is the most negative Fourier coefficient. d12 is the most negative Fourier coefficient in Fig. 1C, with the transition to the 12-fold phase occurring at T = − d12/2, as indicated by the dashed line in Fig. 2B.

The free energy only depends on the modulus of pk0, with the complex phase randomly chosen by spontaneous symmetry breaking. This symmetry breaking is seen in Fig. 2 as the continuous symmetry of the disordered phase changes to the 12-fold translational symmetry below Tc1. The randomly chosen phase of p12 gives rise to the particular positions of the 12 peaks in P(x). In musical terms, a benchmark pitch is chosen arbitrarily and is fixed only by convention (e.g., A440 at 440 Hz). In this phase, all 12 peaks are equivalent and equally spaced, calling to mind the equal temperament (ET) tuning that has been commonly used in Western music for the last several hundred years (14).

When more than one pk ≥ 1 ≠ 0, there are additional terms in the free energy arising from interactions between the modes. The coupling yields more complex behavior where multiple modes undergo transitions at the same temperature Tc2, reducing the symmetry to only the assumed octave-wise (“onefold”) symmetry. The spontaneous symmetry breaking randomly chooses one of the 12 peaks to become dominant over the others. This symmetry breaking calls to mind a tuning system that favors a particular key, as was common in medieval music through the time of J. S. Bach. For example, “just intonation” (JI) tuning places pitches at small-integer ratios relative to the root pitch. In Fig. 3, we compare the maxima of P(x) from the mean field model with the pitches used in the JI and ET systems. Above Tc2, all pitches are treated equally by symmetry, so we must have the even 12-fold division of ET. When the symmetry is broken below Tc2, a single pitch can be designated as the root, and the pitches tend to align with the JI scheme. Just below Tc2, the peaks reflect a compromise between ET and JI, which is similar to tuning systems that have also been used historically, such as the sixth-comma meantone system (14).

Fig. 3 Comparison of peak positions of P(x) versus T from the mean field model with ET or JI tuning.

Two values are shown for the tritone (the pitch near 600 cents) in JI, corresponding to an augmented fourth or diminished fifth. The mean field result shows equal peaks at both positions.

The dependence of P(x) on both T and the fixed value of wc can be visualized via a color map of dpk∣/dT, with different values of k represented by superimposing different colors (see Methods). The resulting plot is essentially a phase diagram with transitions made apparent by large values of dpk∣/dT. Figure 4 shows the phase diagram for the sawtooth timbre used above, with T swept down at each value of wc. [See fig. S1 for plots of P(x) versus T at several values of wc, as well as a comparison of the phase diagrams with T swept up and down. As expected for a second-order transition, Tc1 is the same for T swept up and down, up to a small computational error. The lower transition shows significant hysteresis in Tc2, illustrating that mode coupling leads to a first-order transition]. The lines on the plot show −dk/2 versus wc, with the value of k indicated by the same color scale. When T is swept down from the disordered phase, the first transition is well described by the single-mode, mean field prediction. The single value of k0 can be seen clearly just below Tc1 by a distinct color. As more modes become mixed in, the color fades, suddenly becoming bright white after the transition at Tc2. While the 12-fold phase that appears from 0.022 < wc < 0.042 reproduces the 12-fold division of the octave used in Western music, other ranges of wc exhibit phases with other values of k0, such as 5, 7, 19, and 31. Suggestively, these values of k0 are among those used in some non-Western music traditions (15, 16) or in modern music (17). The particular values of k0 that appear, and their relative stabilities, depend on the choice of timbre (see fig. S1, C and D, for phase diagrams of timbres featured in non-Western music that do not use a 12-fold octave division). This mean field model therefore provides an avenue for optimizing the timbre of instruments to achieve a desired stability (or instability) for an existing or new musical system.

Fig. 4 Phase diagram showing significant order parameters ∣pk∣ versus T and wc.

dpk∣/dT is plotted with color indicated by the color bar, superimposing all values of k. Regions of a single color indicate phases with one or few dominant pk; white regions indicate phases with many significant pk. The lines show −dk/2 versus wc for values of k corresponding to the color bar. wc corresponding to pitches from C4 (middle C) to C8 are indicated.

The mean field model assumes a fixed value of wc, although experimentally wc is found to depend on the root pitch (lower of the two pitches). The values of wc that correspond to root pitches from middle C (C4) to the highest C on the piano (C8) are indicated on Fig. 4. Each ordered phase occupies a fairly small range of root pitches, with the 12-fold phase occurring only at relatively high pitch, likely because of the gross simplifications of the mean field approximation.

So far, we have considered a model with a simplified dissonance function and assumed octave-wise periodicity, treated in the mean field approximation. One can explore other assumptions for the periodicity as well, simply by redefining x = logbf/fref for b other than 2. For example, one finds that for tones with odd harmonic partials (ϕn = 1,3,5, … ), taking b = 3 (so “octaves” are separated by a factor of 3 in frequency) yields a 13-fold phase known as the Bohlen-Pierce scale (18) with lower free energy than with b = 2 (see fig. S2). One could simplify further in a Potts-type model (19), in which the tones ti may take on only a discrete set of q pitches, xi = n/q, n = 0,1, … , q − 1. This model allows further analytic calculation and lends itself to other methods from statistical mechanics. These models serve to demonstrate that phase transitions can occur from disordered sound to ordered arrangements of pitches that bear notable resemblance to familiar musical systems.

The tone lattice

To go beyond the mean field model, we turn to numerical simulation of a system of interacting tones. Using this method, we will be able to relax the unrealistic assumption that each tone interacts equally with all other tones. Instead, each tone will interact with a subset of the other tones. This can be accomplished by placing the tones on a lattice, with interaction only with a set of nearest neighbors. This lattice can be interpreted as existing in an abstract space, where tones on nearby lattice sites have a stronger harmonic relationship than more distant ones. Although many lattices with different dimension and different interactions may be studied, here, I choose one of the simplest cases that can be expected to produce rich behavior: tones on a two-dimensional (2D) square lattice, with only short-range interactions. Given octave-wise periodicity of interactions, this system can be described using the well-studied XY model (20, 21), where the pitch x stands in for the angle θ. The XY model on a 2D lattice is known to produce particularly rich behavior. As in superfluid thin films (22) and nematic liquid crystals (23), one expects to observe a Kosterlitz-Thouless transition from a disordered state, with free vortices to an ordered state exhibiting quasi–long-range order (2426). A vortex in this system is a region surrounding a topological defect, where the pitches along a closed path enclosing the defect traverse one or more octaves. Following a quench of the system from the disordered to ordered state, metastable states are observed with bound vortices and antivortices (2729). In this case, moreover, interactions tend not only to align neighboring pitches but also to favor sudden jumps by particular intervals Δx ≈ 3/12,4/12,5/12,7/12,9/12. This type of XY model, where the energy has multiple minima, has found application, for example, in describing stacking domains in bilayer graphene (30), ferroelectric domains in hexagonal manganites (31), and axion-based cosmological models (32). For sufficiently small vortices, we expect not a continuous variation of x around a vortex but, instead, a number of domain boundaries where x changes by one of the more consonant intervals.

To study the 2D XY behavior that emerges from a set of tones on a lattice, we consider interactions only with a set of nearest neighbors and simulate the resulting stochastic dynamics (see Methods). As before, tones with pitches fi and fj interact because of their mutual dissonance D(fi, fj). For maximum generality, we make no assumption about the periodicity of the pitch distribution and allow the dissonance width wc to vary with the root pitch min(fi, fj) so that intervals involving lower pitches have broader dissonance curves, as observed experimentally. [Figure S3 shows Dx) for several root pitches]. We then use a combination of Langevin dynamics and Metropolis Monte Carlo (MC) to simulate the behavior of the tone lattice at temperature T.

Figure 5A shows the histogram of pitch classes xi mod 1 following a simulated quench from a disordered state to selected T (see fig. S4A for the full pitch histograms). At T = 5.5 and above, no long-range order of the pitches is observed. At T = 5 and below, the pitches show clear ordering. At T = 5 and T = 4, we see five- and sevenfold division of the octave, as was also seen in the mean field result. At T = 3.5, the 12-fold division of the octave has emerged, reproducing the Western musical system (for example, see fig. S4B showing the 12-fold division of the octave in both the frequency spectrum of the T = 3.5 tone lattice and the spectrum of Bach’s Prelude in D major, BWV850). Now, we can delve deeper by studying the spatial arrangement of these pitches on the lattice. [See fig. S5 for plots of the distribution Cx, r) of intervals Δx = xixj versus distance r between sites i and j, at T = 5.5 and T = 3.5].

Fig. 5 Tone lattice simulation results.

(A) Histograms of pitches x mod 1 in a metastable configuration of the tone lattice following a quench to temperature T. At T = 6, no ordering is observed. At T = 5,4, and 3.5, ordering is observed, with the octave divided into 5, 7, and 12, respectively. (B) Pitches on the tone lattice at T = 3.5. Pitch domains are labeled with pitch indices 0 to 11. Major or minor triads are marked with triangles. Junctions of more than three pitch domains are marked with circles. (C) The Tonnetz (edges shown in blue) with connections between neighboring pitch domains at T = 3.5 shown in red.

Figure 5B shows a visualization of the pitches on the lattice in a metastable configuration following a quench to T = 3.5. Each pixel corresponds to one lattice site colored according to a hue/saturation/value triplet. The hue (ranging from blue to red, as shown on the color bar) represents the pitch class xi mod 1, with the saturation and value adjusted to represent ⌊xi⌋, the octave in which that pitch lies. Darker (lighter) colors represent lower (higher) octaves. The image is characterized by domains of the same hue, indicating a single pitch class within that domain (pitch classes are labeled by indices 0 to 11). Repeated runs of the simulation result in final states that differ in the details but show the same general behavior, such as the same division of the octave, and similar correlations. Figure S6 shows simulation results at T = 4, 5, and 6.

The quenched tone lattice can be viewed as a metastable configuration of bound vortices and antivortices, as expected for the 2D XY model. Because pitches can vary continuously only at a high cost of dissonance, it is more favorable to find domains of nearly constant pitch class separated by domain boundaries, where the pitch class changes by a consonant interval. The most common interval across a boundary is a fourth or fifth (change in pitch index of 5 or 7 mod 12). Major and minor thirds are also common (change in pitch index of 3 or 4 mod 12). Places where two or more domain boundaries meet at a point constitute a topological defect, with the surrounding domains forming a vortex around it. The simplest vortex consists of three domains that meet at a point (see points marked with a triangle in Fig. 5B). The pitches of these domains consist of a root pitch, a major or minor third above the root, and a fifth above the root. This combination of three pitches forms a major or minor triad—the type of chord that forms the backbone of Western music. Vortices with more domain boundaries are indicated by circles. (See the Supplementary Materials for sound clips of selected regions of the tone lattice.)

The arrangement of pitch domains on the lattice shown in Fig. 5B reflects elements of musical harmony. The Tonnetz (Fig. 5C), originally described by Euler (33, 34), is a graph in which the nodes are pitch indices and the edges (shown in blue) represent fifths, major thirds, and minor thirds connecting those pitch classes. In the ET approximation, the Tonnetz can be viewed as a graph on a torus, with the connections around the torus indicated by the gray circles. Moving continuously to the right, we increase by fifths, cycling around the “circle of fifths.” Moving along one diagonal direction changes by minor thirds, and the other diagonal direction changes by major thirds. Each set of three adjacent nodes forming a triangle represents a major or minor triad. The pitch classes on the tone lattice in Fig. 5B that share a border can be represented by a similar graph. We construct this graph first by identifying all pitch domains, defined as contiguous regions with at least five lattice sites having the same pitch, rounded to the nearest of the 12 pitch classes. Pitch domains that share a border are then identified as those that have at least two adjacent lattice sites in one domain bordering two adjacent lattice sites in the other domain. The resulting graph of all neighboring domains on the tone lattice (red lines in Fig. 5C) maps directly onto the Tonnetz. All possible fifths are present (for example, the ellipse in Fig. 5B traverses the circle of fifths), with many of the thirds present as well.

The major or natural minor diatonic scales used in traditional Western music consist of seven contiguous pitch classes around the circle of fifths (or a trapezoid on the Tonnetz), whereas the anhemitonic pentatonic scales used in other musical traditions consist of five contiguous pitches on the circle of fifths. Therefore, all such scales may be constructed from contiguous regions of the tone lattice, with major and minor triads within these scales appearing in these regions. Furthermore, as the tones on a finite lattice relax through the quenched metastable states toward equilibrium, some of the 12 pitch classes begin to grow and others shrink, possibly eventually disappearing. Because the domains are most often separated by fifths or fourths, it is likely that the domains that grow and do not disappear will be contiguous on the circle of fifths. Therefore, the pitches on the tone lattice are expected to converge toward a major or minor diatonic scale (and, upon further relaxation, to an anhemitonic pentatonic scale). Figure 6A shows a histogram of pitch classes from the T = 3.5 tone lattice, arranged in order of the circle of fifths. One can see that the first seven pitch classes are all fairly common (in the shaded box), and the three least common pitch classes are contiguous on the circle of fifths and outside the box. This type of pitch distribution is common in Western music, where many of the pitches are drawn from a major or minor diatonic scale (and thus from seven contiguous pitch classes on the circle of fifths), but with some exceptions. For example, the pitch distribution from Bach’s BWV850 is shown in Fig. 6B, in which all 12 pitch classes are present, but with the first 7 pitch classes (corresponding to the D major scale) occurring more frequently than the latter 5. In other runs of the simulation, or at different times within one run, one might observe peaks in the pitch distribution with more equal or less equal height (or some missing), which would agree better with other styles of music.

Fig. 6 Comparison of the pitch class distribution on the tone lattice and in a selected piece of music.

(A) Histogram of pitch classes 12x mod 12 rounded to the nearest pitch class on the T = 3.5 tone lattice and arranged by ascending fifths. (B) Histogram of pitch classes appearing in Bach’s Prelude and Fugue in D major, BWV850, arranged by ascending fifths. The shaded area corresponds to the notes of a diatonic scale.

The arrangement of triads on the tone lattice gives rise to the same elements of music theory that have been extracted from the Tonnetz, such as a geometric interpretation of tonality (35), and progressions of triads that produce parsimonious voice leading, the concept that allows the combination of melody and harmony known as counterpoint (36). By varying the quench temperature T, pitch distributions with different numbers of pitch classes are observed. Inspecting the arrangement of pitch domains on lattices at different T, one observes certain commonly occurring intervals across domain boundaries. One can then propose “Tonnetzes” in these musical systems by analogy to the usual Tonnetz with 12 pitch classes, where the most commonly occurring interval forms the horizontal edges and the less common intervals are represented by diagonal edges (see fig. S6).

The fact that the behavior of the tone lattice in two dimensions replicates many features of traditional Western music raises the question: What happens in other dimensions? No phase transition is expected in 1D (unless the interactions are extended to long range). In 3D, the expected phase transition of a system with XY symmetry has been studied extensively in the context of cosmology. In this case, the phase transition is not a Kosterlitz-Thouless transition but, instead, a normal first- or second-order transition. Still, though, one expects the emergence of topological defects in the ordered phase if the cooling takes place at a finite rate by the so-called Kibble-Zurek mechanism (37). In these topological defects, the point-like vortex cores seen in the 2D lattice are now extended along 1D paths, originally referred to as cosmic strings, and later observed in condensed matter systems (31, 38). It is possible that the extension of the tone lattice to three dimensions may give rise to a 3D Tonnetz-like structure, as has been previously proposed in the spiral array model (35), or in generalized Tonnetzes (3).

The results presented here demonstrate that many patterns in music can emerge from this statistical mechanics framework. Undoubtedly, further insights into the structure of music can be gained by studying related models and bringing all of the tools of statistical mechanics to bear. The historical development of thermodynamics provides a parallel to the present case. Starting in the 17th century, empirical laws (Boyle’s law, Charles’s law, etc.) were postulated to explain observations of the behavior of gases. This top-down approach to thermodynamics proved quite useful. However, the development of statistical mechanics yielded a bottom-up theory, and this fundamental understanding led to an array of new discoveries in the 20th century and beyond. Likewise, top-down music theory has proven quite useful for composing and understanding music, but a bottom-up theory provides a foundational understanding, perhaps leading to new ways of composing, understanding, and enjoying music.


Constructing the dissonance function

The experimentally measured pure-tone dissonance curves can be parameterized by d(fa, fb) = e−[ln(∣Δx∣/wc)]2/wc (as shown in Fig. 1A), where Δx = log2(fa/fb). For the mean field model, I set wc at a fixed value. For the more realistic dissonance function in the tone lattice simulations, I parameterized wc = 0.67 min (fa, fb)−0.68, with coefficients set by a fit to observed values from (5).

The dissonance of two non-pure tones is then found by summing up the contributions from all pairs of partialsD(fa,fb)=n,mlnmd(ϕnfa,ϕmfb)(3)where lnm = min (αn, αm)0.606 is the perceived “loudness” of the pair of partials, as approximated by Sethares (7). The general results presented here are robust to changes in the choice of parameters used to construct the dissonance function.

Mean field model

To find normalized P(x) that are extrema of F, I extremized F=UTSμN, where N=01P(x)dx1 and μ is a Lagrange multiplier. Applying the variational principle, I found the extremum conditionP(x)=exp[1T01Dp(xy)P(y)dy]01exp[1T01Dp(zy)P(y)dy]dz(4)

Stable numerical solutions to this integral equation, which is similar to a Fredholm equation, can be found using an iterative approach (13). With the action of the right hand side of Eq. 4 on P(x) represented by the operator Q̂, the sequence Q̂nP0(x) either converges to or oscillates about a solution of the extremum conditionP(x)=Q̂P(x), where P0(x) is an initial trial function. The convergence will be stable against perturbations for minima of F and unstable for maxima. To reduce the chance of oscillations, I instead applied the operator Q˜=γI+(1γ)Q̂ with, say, γ = 2/3 and I the identity operator. In practice, I discretized P(x) (0 ≤ x < 1) as an array (typically with 2400 elements) and repeatedly applied Q˜ to an initial P0 using numerical integration until the mean square difference between successive terms falls below a threshold. P0 could be an array of random numbers, for example. For sequences of solutions P(x) versus T, I used a random P0 for the first solution and then used that solution plus small random jitter as the P0 to solve at the incremented value of T. This speeds up convergence and maintains the same symmetry at each value of T except when a new symmetry is broken. The addition of random jitter ensures that the solution moves away from unstable fixed points.

In terms of the Fourier representations P(x)=12k=0pke2πikx+pk*e2πikx and Dp(x)=k=0dkcos(2πkx), we have dissonanceDtot=12d0+18k=1dkpk2(5)

Using the Fourier representation and a Taylor series expansion for the logarithm, we have entropyS=n=2(1)n1n(n1)2n01dx(k=1pke2πikx+pk*e2πikx)n(6)

The transition between the high-T disordered solution and the 12-fold solution can be understood from Landau theory. Sufficiently close to Tc1, all pk ≥ 1 are small, and only the leading terms of S are required. Up to second order in pk, we have free energyF12d0+14k=1(dk2+T)pk2(7)

A phase transition can occur when a second-order term changes sign, assuming that the highest-order terms are positive. Therefore, as T is reduced from the disordered phase, the term with the most negative dk = dk0 will be the first to change sign, and a single-mode k0 will take on a nonzero pk0. Because there is only a single pk ≥ 1 ≠ 0 just below Tc1, we can use a single-mode approximation to findFFk0=12d0+14(dk02+T)pk02+T32pk04+T96pk06,(8)and with all subsequent terms also positive. From Landau theory, we conclude that there is a second-order phase transition at Tc1 = − dk0/2, from the disordered phase to a phase with k0-fold order.

When more than one pk ≥ 1 ≠ 0, there are additional terms in the free energy arising from interactions between the modes. Mathematically, these terms arise from cross terms in Eq. 6 whose exponential factors cancel out, thereby not integrating to zero. In the absence of this mode coupling, one would just expect a series of transitions where pk becomes nonzero at T = − dk/2 for all negative values of dk. Instead, the mode coupling yields more complex behavior where multiple modes undergo transitions at the same temperature. These transitions may be either first or second order and can display hysteresis.

Phase diagrams of the mean field solution versus T and wc were constructed by finding P(x) versus swept T as above and then repeating at various fixed values of wc. Fourier amplitudes ∣pk∣(T) were obtained from the series of solutions P(x), and the derivative dpk∣/dT was calculated by a finite difference. The color of a pixel at each (T, wc) pair was determined as follows. Thirty-two colors spanning a spectrum color map (MATLAB’s “jet” color map) are specified by red-green-blue triplets (rk, gk, bk) (k = 1 − 32). The pixel’s color was then set to c(T, wc) = ∑k − (dpk∣/dT)(rk, gk, bk), scaled to adjust the saturation of the image.

Tone lattice simulation

Here, I placed the tones on a 100 × 100 2D square lattice with periodic boundary conditions at the edges and with interactions only with the eight nearest and next-nearest neighbors. I set aij = 1 for these neighboring tones ti and tj, and aij = 0 otherwise. The dissonance function was calculated with wc depending on the root pitch, as described in the “Constructing the dissonance function” section. Instead of the sawtooth timbre used in the mean field model, I used harmonic partials given by Helmholtz (4) for a plucked string: α1−6 = 1,0.812,0.561,0.316,0.13,0.028. With the lack of periodic boundary conditions for the pitch distribution, we must now impose additional boundary conditions to prevent the pitches from diverging to x = ± ∞. This was accomplished via quadratic or quartic terms added to the dissonance function to artificially increase the dissonance of very low or high pitches or very large intervals. Specifically, I add δD=0.2(Δx)4+0.3125xr2, where the interval Δx = log2(fa/fb) and the root pitch xr = log2( min (fa, fb)/300 Hz). That is, an interval of one octave is barely suppressed at all with δD = 0.2, and intervals of two octaves are suppressed somewhat by δD = 3.2 (as compared to typical temperature in the simulation T = 3.5 − 6). Likewise, root frequencies two octaves above or below 300 Hz are suppressed by δD = 1.25. (See fig. S3 for plots of these dissonance functions.)

Via the analogy between dissonance and energy, the gradient of dissonance can be viewed as a force, pulling intervals toward lower dissonance. Assigning the tones a mass m and damping constant ζ, we can write a stochastic equation of motionmx¨i+ζẋi+dDi/dx=2Tζηiwhere Di is the total dissonance of tone i with its neighbors and ηi is a Gaussian white noise scalar. Evolving this system of equations for all i yields the dynamics of the system at temperature T. I chose m and ζ such that the dynamics are typically overdamped so that the system rapidly thermalizes.

Simulating the Langevin dynamics of the system explores the phase space around one local minimum of the total dissonance. To fully sample the phase space, I also included MC steps, chosen via the Metropolis algorithm, resulting in larger jumps in phase space. Here, a lattice site i was chosen at random, and I calculated the change in total dissonance δD if the pitch xi were changed to a new, randomly selected value. If exp( − δD/T) < r, with 0 ≤ r < 1 a uniformly distributed random number, then the change was rejected; otherwise, the change was accepted.

The set of tones was initialized into a disordered state with pitches xi drawn from a normal distribution with variance σ2 = 1. The simulation proceeded by performing a number (in this case, 3000) of MC steps followed by thermalization via Langevin dynamics at a fixed temperature T. This sequence was then repeated 2500 times, essentially simulating quenching from the disordered state.


Supplementary material for this article is available at

Fig. S1. Mean field solutions versus T at selected wc and phase diagrams for different timbres and sweep directions.

Fig. S2. Comparison of F versus T for different timbres and octave periodicities.

Fig. S3. Dissonance functions Dx) for different root pitches (and hence different wc) in legend, calculated as described in Methods for the lattice simulations.

Fig. S4. Pitch histograms and spectra from the tone lattice simulation.

Fig. S5. Distribution of intervals (pitch differences Δx = xixj) Cx, r) as a function of distance r between sites i and j.

Fig. S6. Metastable or disordered configurations of the tone lattice following a quench to T = 4, T = 5, and T = 6.

Movie S1. Major diatonic scale on the quenched tone lattice.

Movie S2. Major and minor triads on the quenched tone lattice.

Description of MATLAB scripts.

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.


Acknowledgments: I acknowledge important contributions from H. Mathur and R. Petshek on methods and models in statistical mechanics. Competing interests: The author declares that he has no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the author.
View Abstract

Navigate This Article