How ratios, frequencies, and transforms create harmony

There is an ancient suspicion, one that has persisted across civilizations and millennia, that music and mathematics are not merely related disciplines but are, at some fundamental level, expressions of the same underlying reality. The Pythagoreans believed that the cosmos itself vibrated according to numerical law — that the orbits of planets, the proportions of architecture, and the intervals of a lyre were all manifestations of the same divine arithmetic. This idea, which we might today dismiss as mysticism, turns out to contain a kernel of profound mathematical truth.
Music, at its core, is organized sound. And sound, at its core, is organized vibration — mechanical disturbances propagating through a medium, governed by the laws of differential equations, resonance, and periodic motion. When we listen to a chord and feel its consonance or dissonance, we are, in a very real sense, perceiving mathematical relationships. The pleasure of a perfect fifth and the tension of a minor second are not merely cultural conventions; they arise from the physical structure of vibrating systems and from the way the human auditory system processes frequency ratios.
The story of how mathematics came to describe music is one of the great intellectual journeys in the history of science. It begins with Pythagoras and his disciples in the sixth century BCE, who discovered that consonant musical intervals correspond to simple integer ratios of string lengths. It passes through the medieval scholars who incorporated music as one of the four mathematical arts of the quadrivium, alongside arithmetic, geometry, and astronomy. It gathers momentum with the natural philosophers of the seventeenth and eighteenth centuries — Mersenne, Euler, d'Alembert, Bernoulli — who began to describe vibrating strings through differential equations. And it reaches a kind of culmination with Joseph Fourier's extraordinary insight in the early nineteenth century: that any periodic waveform, no matter how complex, can be decomposed into a superposition of pure sinusoidal oscillations.
This article traces that journey with mathematical rigor. It is addressed to readers comfortable with calculus, linear algebra, and the language of analysis — graduate students in mathematics or physics, engineers working in signal processing, musicians who want to understand the theoretical foundations of their art, and machine learning researchers working with audio data. Our goal is not merely to survey the history but to illuminate the deep structural reasons why mathematics and music are so intimately intertwined.
Before we can appreciate why musical intervals have the mathematical properties they do, we must understand what sound is — not intuitively, but precisely.
Sound is a longitudinal mechanical wave: a propagating pattern of compressions and rarefactions in a medium, typically air. A vibrating object — a guitar string, a vocal cord, a speaker cone — displaces the air molecules around it, creating local pressure variations that propagate outward at the speed of sound, approximately in air at room temperature.
The simplest mathematical model of a sound wave is a sinusoidal function of time and space:
where is the pressure variation from equilibrium, is the amplitude, is the wavenumber, is the wavelength, is the angular frequency, and is the phase. The , measured in Hertz, is the number of complete oscillations per second:
where is the period of the oscillation. Frequency is the mathematical correlate of what we perceive as pitch: higher frequency corresponds to higher pitch, with human hearing spanning roughly to .
More generally, any sound can be represented as a real-valued function of time . Musical tones correspond to functions that are approximately periodic: for some fundamental period . This periodicity is what gives musical notes their definite pitch. The waveform need not be sinusoidal — in fact, it almost never is in real instruments — but its periodic structure is what determines our perception of a fundamental frequency.
The amplitude of a wave is related to its intensity, which in turn corresponds perceptually to loudness. The sound pressure level is measured in decibels:
where is the standard reference pressure at the threshold of human hearing.
But frequency and amplitude alone do not fully characterize a musical sound. Two instruments playing the same note at the same loudness sound distinctly different. This quality — called timbre or tone color — is determined by the shape of the waveform, or equivalently, by the distribution of energy across different frequency components. This is precisely where Fourier analysis enters the picture, but we must first travel back to ancient Greece to understand the structural principles that make musical sound special.
The discovery that musical consonance is rooted in simple numerical ratios is attributed, by ancient tradition, to Pythagoras of Samos (c. 570–495 BCE). The canonical story describes Pythagoras passing a blacksmith's forge and noticing that certain hammer combinations produced harmonious sounds while others did not. The historical accuracy of this account is disputed — the acoustics of hammers do not actually work this way — but the underlying mathematical discovery, concerning vibrating strings and the Pythagorean monochord, is genuine and of the first importance.
Consider a string of length , fixed at both ends, under tension , with linear mass density . The wave equation governing transverse displacement is:
With boundary conditions , the solutions — the normal modes — are:
The corresponding frequencies are:
The fundamental frequency is inversely proportional to the string length. This is Mersenne's law: . Halving the length doubles the frequency; taking two-thirds of the length multiplies the frequency by .
From this physical fact, the Pythagoreans extracted a profound musical principle. The fundamental harmonic ratios they identified are:
These three intervals form the foundation of virtually every musical culture on Earth. Their universality is not coincidental; it reflects deep mathematical and perceptual facts about periodic vibration.
The Pythagorean scale is constructed by stacking perfect fifths. Starting from a reference pitch , each successive pitch is obtained by multiplying by , then reducing by octaves (dividing by powers of 2) to keep all pitches within a single octave. Formally:
where is chosen so that . After twelve such steps, we expect to return to the starting pitch seven octaves higher. But:
The discrepancy — the Pythagorean comma — is:
This small but audible discrepancy means that a Pythagorean scale cannot be perfectly closed. It is the first hint that the mathematics of musical tuning is not trivially consistent — a tension that would drive centuries of theoretical innovation.
The normal mode analysis reveals something even more fundamental. When a real string vibrates, it does not oscillate in a single mode. Instead, it vibrates simultaneously in all its normal modes, each with its own amplitude and phase. The resulting motion is a superposition:
The frequencies present in this superposition are — that is, the fundamental frequency and all of its integer multiples. This sequence is the in acoustics:
The -th term is called the -th harmonic or the -th overtone. The musical intervals generated by consecutive harmonics follow a characteristic pattern — the ratio between the -th and -th harmonic is :
| Harmonics | Ratio | Interval |
|---|---|---|
| 1st → 2nd | Octave | |
| 2nd → 3rd | Perfect fifth | |
| 3rd → 4th | Perfect fourth | |
| 4th → 5th | Major third | |
| 5th → 6th | Minor third |
The first few are precisely the consonances identified by Pythagoras. Two tones with frequency ratio (in lowest terms) share harmonics at every common multiple of and . The simpler the ratio, the more harmonics are shared, the more the tones blend, and the more consonant the interval sounds. This is the mathematical explanation of why (perfect fifth) is more consonant than (major second): the first pair shares every other harmonic, the second shares harmonics only every ninth and eighth term.
The harmonic series also explains timbre. A flute emphasizes the fundamental and has weak upper harmonics, producing a pure, clear tone. A violin has strong contributions from many harmonics, producing a rich, complex sound. A clarinet, due to its cylindrical bore closed at one end, suppresses even harmonics and emphasizes odd ones, giving it a hollow, woody quality. The mathematical structure is the same — a sum of harmonics — but the weighting differs, and this weighting is precisely what Fourier analysis quantifies.
The Pythagorean ratios give us the octave, fifth, and fourth. A practical musical system needs more pitches — enough to support melody and harmony across many keys. The construction of a musical scale is, fundamentally, a problem in number theory and Diophantine approximation.
Just Intonation addresses the limitations of Pythagorean tuning by basing intervals on the natural harmonic series. In just intonation, the major third is (from the 5th harmonic) rather than the Pythagorean . The just major scale has frequency ratios:
These produce beautifully pure chords in the tonic key. However, moving from one key to another changes the frequency relationships. The mathematical reason is that the ratios , , and generate a three-dimensional lattice in frequency space (corresponding to prime factors 2, 3, and 5), and no finite set of pitches can perfectly tile all positions on this lattice.
Equal Temperament is the modern solution, now universal in Western music. In twelve-tone equal temperament (12-TET), the octave is divided into twelve equal semitones, each with a frequency ratio of:
Every semitone is identical, so a piece can be transposed to any key without changing the character of its intervals. The sacrifice is that no interval except the octave is perfectly pure: the equal-tempered fifth has ratio , deviating from the just by only about 2 cents — barely audible in isolation but cumulative over complex music.
The mathematics underlying this compromise is the problem of finding rational approximations to . This is an irrational number, and its continued fraction expansion:
The convergents indicate the optimal equal temperaments: 12-TET, 19-TET, 41-TET, and 53-TET all appear as denominators, each giving progressively better approximations to the pure fifth at the cost of more notes per octave. The choice of twelve divisions is, in a precise sense, the best small- compromise — optimal enough for practical music-making, small enough for a standard keyboard.
The mathematical structure of musical harmony extends naturally into geometry and group theory. If we represent the twelve pitch classes as elements of (integers modulo 12), then transposition by semitones is the map:
and inversion about pitch class is:
The group generated by all and is the dihedral group of order 24, which acts on the set of pitch classes and, by extension, on chords and musical structures.
The Tonnetz (German for "tone network"), revived and formalized in neo-Riemannian theory by Richard Cohn and others, is a two-dimensional lattice in which pitches are arranged so that perfect fifths run in one direction and major thirds in another. Every major and minor triad occupies a triangle on this lattice, and the three basic neo-Riemannian operations — Parallel (P), Relative (R), and Leading-tone exchange (L) — correspond to reflections of these triangles. Voice-leading between chords becomes small displacements on the lattice, and the smoothness of a harmonic progression can be measured by the total distance traveled.
Interval content in chords can be characterized algebraically by the interval vector: for a set , the interval vector entry at class counts the number of pairs in separated by interval . This gives a precise description of why certain chord progressions feel smooth or tense, and why the tritone — 6 semitones, the unique interval class mapping to itself under — has its peculiarly ambiguous harmonic character.
We have seen that musical tones are periodic waveforms whose harmonic content determines their timbre. The question that confronts us is: how do we systematically extract this harmonic content from an arbitrary waveform? This is the problem that Joseph Fourier addressed in his Théorie Analytique de la Chaleur (1822), ostensibly in the context of heat conduction — and his solution transformed not only physics and engineering but the whole of modern analysis.
Fourier's central claim was audacious: any periodic function can be represented as a sum of sine and cosine functions at integer multiples of the fundamental frequency. Let be a function of period . Then, under appropriate regularity conditions:
where the Fourier coefficients are:
Using Euler's formula , this takes the elegant complex form:
where the complex Fourier coefficients are:
The functions form an orthonormal basis for the Hilbert space with inner product:
The Fourier coefficients are precisely the inner products . The Parseval–Plancherel identity then states that the map is an isometric isomorphism from to :
This is a statement about the conservation of energy between the time domain and the frequency domain. The total mean-square energy of a signal equals the sum of the squared magnitudes of its Fourier coefficients — a fact of profound importance in signal processing, quantum mechanics, and information theory alike.
The Fourier framework provides an exact mathematical language for describing musical timbre. Given a periodic waveform , its Fourier expansion reveals the amplitude and phase of each harmonic component. The sequence is called the spectral envelope or harmonic spectrum of the sound, and it is the mathematical fingerprint of an instrument's timbre.
A pure sine wave has only a single nonzero Fourier coefficient — corresponding to the complete absence of overtones. This is the sound approximated by a flute playing softly. A square wave with period and amplitude has the Fourier expansion:
This waveform contains only odd harmonics (), with amplitudes decaying as — matching the spectral character of the clarinet. A sawtooth wave contains all harmonics with amplitudes decaying as :
This spectrum is characteristic of bowed string instruments and brass — instruments with rich, bright tones and strong upper harmonics.
For real instruments, spectra are time-varying. The amplitude envelope — the evolution through attack, decay, sustain, and release (ADSR) — is as important perceptually as the steady-state spectrum. The Short-Time Fourier Transform (STFT) addresses this by computing the Fourier transform within a sliding window centered at time :
The magnitude squared is the spectrogram — a visual representation of how frequency content evolves in time. Music production software universally displays audio as spectrograms, and the experience of listening to music unfold is, in a very real sense, the experience of tracking the spectrogram of a complex superposition of instrumental sounds.
Fourier's decomposition of periodic functions was generalized through the nineteenth and twentieth centuries into harmonic analysis — the mathematical study of the representation of functions as superpositions of basic waves. The Fourier transform of a function is:
with inverse:
In , the Plancherel theorem guarantees , making the Fourier transform a unitary operator — energy is preserved under the transform.
For practical digital signal processing, the relevant tool is the Discrete Fourier Transform (DFT). Given a sequence of samples :
The DFT can be computed in time using the Fast Fourier Transform (FFT) algorithm, discovered by Cooley and Tukey in 1965 (though earlier versions appear in the work of Gauss). The FFT is arguably the most important algorithm in applied mathematics; it underlies virtually every piece of modern digital audio technology.
In audio compression, the FFT enables techniques such as the Modified Discrete Cosine Transform (MDCT), the mathematical core of MP3, AAC, and other perceptual audio codecs. These codecs exploit auditory masking: loud sounds near a given frequency render quieter sounds at nearby frequencies imperceptible. By computing the frequency-domain representation of an audio signal and discarding components below the masking threshold, codecs achieve compression ratios of 10:1 or more while maintaining perceptual transparency.
The mathematical character of music extends beyond acoustics and signal processing into the structural organization of compositions. Music unfolds in time according to patterns of repetition, variation, symmetry, and transformation — patterns that can be described precisely using the mathematics of groups and symmetry.
Symmetry transformations in music theory include transposition, inversion (replacing each ascending interval with a descending one of the same size), retrograde (reversing time order), and retrograde-inversion. These four operations form the dihedral group , the symmetry group of the rectangle, and were exploited systematically by composers of the Second Viennese School — Schoenberg, Berg, Webern — in twelve-tone serialism.
In twelve-tone technique, a composition is based on a tone row — a permutation of the twelve pitch classes. The compositional material consists of 48 forms of the row generated by group actions: 12 transpositions 2 (original and inversion) 2 (original and retrograde). The mathematics is that of acting on itself by translation and negation.
J.S. Bach's Musical Offering and The Art of Fugue demonstrate a deeply mathematical approach to counterpoint. The fugue is built on the principle of invertible counterpoint: two melodic lines are designed so that either can serve as the bass or treble without violating harmonic rules. A fugue's elegance arises from the way symmetry transformations — augmentation (doubling note durations), diminution, stretto (overlapping subject entries), inversion — interact: constraints imposed by counterpoint rules are simultaneously satisfied by multiple transformed versions of the same theme. This is, structurally, the simultaneous satisfaction of linear constraints by symmetric group actions, a mathematical phenomenon as much as a musical one.
The mathematical framework of harmonic analysis now pervades virtually every branch of signal processing, communications engineering, and data science.
Digital Audio Workstations (DAWs) such as Pro Tools, Ableton Live, and Logic Pro are built on real-time FFT computation. Every equalizer (which shapes the frequency spectrum), every reverb (which convolves the signal with a room impulse response), and every pitch-correction plugin relies on the Fourier transform as its mathematical foundation. The phase vocoder uses the STFT to independently control the pitch and time-scale of audio signals — the mathematical technique behind the pitch-shifting and time-stretching effects ubiquitous in modern music production.
Wavelets provide a multiscale generalization of the Fourier transform. The continuous wavelet transform of a signal with respect to a wavelet is:
where is the scale parameter and is the translation parameter. Wavelets provide better time-frequency localization than the STFT for signals with both rapid transients and sustained tonal components — a description that fits music rather well.
Machine learning applied to music has become a major research area. Deep learning models for music transcription, chord recognition, and generative modeling all operate on audio feature representations grounded in Fourier analysis. Mel-frequency cepstral coefficients (MFCCs) — the most widely used audio features in machine learning — are computed by taking the FFT of an audio frame, mapping the frequency axis to the mel scale (a perceptually motivated logarithmic transformation), taking the logarithm of the power spectrum, and applying the Discrete Cosine Transform to decorrelate the coefficients. Generative models for music, including those based on Transformers and diffusion models, work directly in the spectrogram domain, iteratively denoising noisy spectrograms towards musically coherent output — the mathematics of stochastic differential equations over function spaces equipped with the Fourier inner product.
The connections traced throughout this article point to something deeper than technical correspondences. The Pythagoreans spoke of the Musica Universalis — the "music of the spheres" — the idea that the planets move according to harmonic ratios and produce an inaudible celestial music. Kepler, in his Harmonice Mundi (1619), calculated that the angular velocities of the planets at perihelion and aphelion stand in musical intervals. The numerical coincidences are inexact, but the underlying intuition — that the same mathematical structures appear in orbital mechanics and musical harmony — has been vindicated in a form Kepler could not have anticipated.
Why should simple integer ratios correspond to perceptually consonant sounds? The answer lies at the intersection of physics, neuroscience, and information theory. Two tones with frequency ratio (in lowest terms) produce a combined waveform that is periodic with period . The auditory system performs a kind of neural Fourier analysis via the mechanical resonance of the basilar membrane — a tapered structure in the inner ear that acts as a biological frequency analyzer. Different positions along the basilar membrane respond to different frequencies, and two tones with a simple ratio produce strongly overlapping neural activation patterns, experienced as consonance.
There is also a deep connection between musical aesthetics and the mathematics of self-similarity. Musical compositions at multiple scales exhibit self-similar structure: a symphony has movements, movements have themes, themes have phrases, phrases have motives — each level exhibiting similar patterns of tension, development, and resolution. The power-spectral density of many great musical compositions scales as (pink noise) over a wide range of temporal scales, a fractal-like balance between predictability and surprise. This structure, observed by Voss and Clarke in 1975, is also found in heartbeat intervals, river flow, and electronic noise, suggesting a deep connection between aesthetic pleasure and optimal information-theoretic complexity.
The equal temperament compromise encodes yet another mathematical truth: the group is the unique small cyclic group that simultaneously approximates the ratios , , and to within a tolerable error. The mapping from the infinite prime-factor lattice (generated by powers of 2, 3, and 5) to the finite cyclic group is the mathematical act of temperament — and the choice of twelve is optimal among small cyclic groups for the simultaneous approximation of the first five prime overtones.
The journey from Pythagoras to Fourier is one of the great intellectual trajectories in the history of human thought. It began with the startling discovery that consonant musical intervals correspond to simple integer ratios of string lengths — a discovery that suggested the universe is fundamentally numerical. It passed through centuries of debate about the proper construction of scales, revealing the deep tension between the pure integer ratios of just intonation and the practical requirements of transposition. It culminated in Fourier's extraordinary theorem, which showed that any periodic waveform — any musical sound whatsoever — can be decomposed into a sum of pure sinusoidal oscillations, each corresponding to a specific harmonic frequency.
The mathematical framework that emerged — harmonic analysis — has become one of the central pillars of modern mathematics and its applications. The Fourier transform and its descendants (the DFT, FFT, STFT, wavelet transform) are among the most powerful analytical tools ever developed, indispensable in every branch of signal processing, communications, imaging, and data science.
But the significance of the mathematics of music extends beyond its practical applications. Music is perhaps the most immediate and universal human experience of abstract mathematical structure. When we hear a chord, we are directly perceiving a ratio. When we experience the tension of a dissonance resolving to a consonance, we are experiencing the mathematical process of a complex waveform converging to a simpler one. When we are moved by the architecture of a Bach fugue or the development of a Beethoven sonata, we are responding to the interplay of symmetry, variation, and transformation in time.
The Pythagoreans were right, in a sense they could not fully have appreciated: number and harmony are indeed the same thing, seen from different angles. Mathematics does not merely describe music; it constitutes music, at every level from the physics of vibrating strings to the group theory of twelve-tone rows to the spectral analysis of a digital audio file. In the Fourier series, the continuous waveform of a musical sound is represented as an infinite discrete sum of pure frequencies — a perfect metaphor for the relationship between mathematical abstraction and human experience. From Pythagoras to Fourier, from the monochord to the digital audio workstation, mathematics and music have been, as they always were, the same conversation conducted in two different languages.
Applied mathematician and AI practitioner. Founder of MathLumen, exploring mathematics behind machine learning and scientific AI.

A meditation on e^{iπ} + 1 = 0
Euler's identity unites five fundamental constants in a single, breathtaking equation. We explore why mathematicians...

The German mathematician's proof of the Mordell conjecture — and decades of structural insight — earn mathematics' highest honour
Gerd Faltings has been awarded the 2026 Abel Prize for introducing powerful tools in arithmetic geometry and resolving...

Inside the equation that powers modern AI
The Transformer architecture revolutionized AI with a single mechanism: self-attention. We break down the linear...