Table of Contents
This article is essentially an update of an older one titled “On the Record: ‘Technical Ability’“, formatted for better clarity and with further amendments that I feel would be useful.
More than a year back, I implemented the new ranking system where both tonal and technical performance were components in determining a headphone/earphone’s overall sound quality, at least within the context of this website. As I’ve stated in that post:
“… as a quick refresher (and I really mean quick, don’t argue with me about what I’m about to say since they’re condensed summaries of 5,000 word articles) tonality is basically tuning and frequency response, while technicalities is an umbrella term referring to unmeasurable aspects such as resolution, transients and imaging.
So, why the change?
From the beginning this system was already being used for my ranking list, just subconsciously and in a more ‘arbitrary’ way. Specifying and breaking down the main criteria of my grading system helps me be more transparent about my inner processes, as well as to help me be more consistent with my rankings.
I’ll probably not break down my rankings further than this since the problem of weighting individual components gets worse the more components I specify. I think there should at least be a certain degree of “abstractness” to the rankings since, at the end of the day, this is a subjective list of personal opinions.”
And that’s where I left things, but there’s a nagging feeling at the back of my head that tells me that I should update my “Technical Ability” article (which is almost 3 years old at this point, originally posted on Head-Fi too!) and also consolidate that with my explanation of tonality (some of which is also deep within the rather lengthy Graphs 101 article).
So without further ado here’s a lengthy ramble about how I, some guy on the internet, review and evaluate headphones & earphones through the concepts of tonality and technicalities.
FR, Sound Signature, et cetera
This metric is probably the less controversial of the two considering that it is the most directly measurable aspect of audio, yet may possibly be more controversial considering that it’s governed by the highly subjective phenomena that is… well, personal taste.
Now unfortunately this topic requires a baseline understanding on how frequency response measurements work, so if you’re not yet familiar then do give my Graphs 101: How to Read Headphone Measurements article a good read through. After all, the concepts of tonality and frequency response are inherently linked so to properly understand the former you have to understand the latter.
All set? Here we go.
It’s obvious but I feel like have to re-iterate this: tonality is derived from the word “tone”, hence it is a metric primarily based within the frequency domain. The frequency response of an IEM therefore affects its tonality. It is sometimes also referred to as “tone colour”, though I’d reserve that term for the metric of timbre (explained later).
(This definition is separate from the musical-theory definition of the word which is about the arrangements of pitches and chords. In the audio reproduction context, again, tonality is linked to frequency response.)
Breaking it down further, the tone of any given instrument is made up of the fundamental frequency (which is also referred to as the fundamental tone, this determines the note in question) and the harmonic frequencies (which is also referred to as the overtones, these give the instruments their unique sound). For instance, a C4 (middle C) note played by different instruments will look as follows on a frequency response plot:
Even though each instrument is playing the same note, the tonality of said instruments are clearly different. And measurable! So for those buying into the myth that it’s somehow impossible to measure harmonics… here you go.
So when someone refers to “tonal balance”, that’s also reference to how accurate an instrument may sound; after all, if the balance between the fundamentals and each order of harmonics are correct, so must be the sound.
From there, there is the debate on what constitutes as “correct” or neutral. Because the concept of neutrality is so fluid and subjective, especially in the IEM world where individual’s own Head-Related Transfer Function (HRTF) rears its ugly head the strongest, tonality therefore also becomes a very subjective metric.
So then, what is neutral?
Even within the academic world, the debate rages on. And if the objectivists cannot decide, you can only imagine the chaos that reigns in the anti-graph subjectivist camp, the peace of which exists only by a flimsy thread of “agreeing to disagree”. But as a reviewer it is far more important that people understand my perception of neutrality when reading my reviews, which is what my “IEF Neutral Targets” provide.
So whenever you read a review on In-Ear Fidelity, you can at least be assured that all my tone-related descriptors are done relative to these neutral targets. Transparency is key, and so is consistency.
You hear the term “colouration” getting thrown around a lot but here’s how I see it. When the tonality of the sound gets skewed to any direction, it goes from having a neutral tone to a coloured one. Skewing towards the low frequencies creates a “dark tonality”, while skewing towards the higher frequencies creates a “bright tonality”. Being lower-frequency-biased puts the focus more on the fundamentals and lower-order harmonics, which subjectively gives the instruments some extra richness and heft. On the other hand, being higher-frequency-biased puts the focus more on the higher-order harmonics, which can boost the clarity of the instruments as well as improving the perception of “air”.
Here’s where things get very subjective: how does one determine the “quality” of a transducer’s “tonality”?
Many assume that because I run the world’s largest public database of headphone & earphone measurements, that I rank the tone of any given headphone/IEM based on its deviation of some target curve. While certain target curves are a factor in my rankings (especially my own), what I judge is really on how well a given transducer presents the sound signature it is trying to present.
For instance, here we have a headphone that attempts a “balanced” or neutral kind of sound signature, and executes it well:
And here we have… whatever this is supposed to be:
Or perhaps the difference between a well-executed bassy-V and one that goes way too far:
The list goes on and the examples plentiful in both my ranking lists. I try to be as “signature agnostic” as possible in my evaluation of a transducer’s tonality; it should not matter if a manufacturer or consumer chooses to go for any kind of non-neutral signature (or non-Harman for that matter). V-shaped, bassy, bright, dark, warm, and everything else in-between, as long as the desired tonal profile is executed properly the final tone grade should reflect that.
The big keyphrase here is “executed properly“. If a headphone or IEM attempts a wild-and-whacky signature and it sounds horrible, that’s not the fault of my ranking system. Tonality may be up to whims of subjective taste, but even then there are limits.
But for whatever reason, at least in my own opinion, frequency response isn’t the be-all end-all when it comes to sound quality. There still seem to be some other phenomena going on that affects the characteristics of a transducer, and while not completely independent of tonality still is separate enough to warrant it being its own metric.
What is it then? You may cry. Truth to the matter is… nobody who has personally observed these non-FR differences has a clue, me included. There are theories of course, one of which I’ll debunk myself, but in general there is no real consensus. In effect here is where the objectivist community shun me and I move my butt firmly back on the centrist fence.
Ladies and gents, welcome to the hazy world of “technicalities”.
Do note that whatever I write here is basically pseudoscience in that most of the things here aren’t peer-reviewed or academically researched (or scientifically accurate, for that matter!), more simply being a description of how I personally interpret what I’m hearing and how I assign “markers” to identify how good an IEM is beyond the veil of personal preference and taste.
“Technicalities”, “technical ability”, or “technical performance” is an umbrella term that encompass basically all non-tonal aspects of sound reproduction. Under IEF metrics, it refers to the following:
- Resolution (aka “detail”)
- Imaging (aka “stereoimaging”)
- Positional accuracy
- Timbre (though this involves tonality as well, more on this later)
You can see where the aforementioned “ironic dichotomy” with regards to tone and technicalities come in here; tonality may be the most easily measurable aspect of audio but the interpretation of the measurements is probably one of the most subjective aspects of the hobby. Yet the concept of “technicalities”, while immeasurable and implies complete subjectivity, also implies a certain level of objectivity where more/less of something is always better (resolution, positional accuracy, speed etc.).
I guess I’ll start with the concepts of “transients”, which is linked to both resolution and timbre.
Also known as “speed” in other circles, I personally define transients as comprising of the initial “attack” function and the subsequent “decay” function. In the professional world of synthesisers the term “ADSR envelope” is used, which stands for “Attack-Decay-Sustain-Release”.
I lump the A-D-S parameters into the “attack” term as a catch-all and rename the Release parameter back into decay for a few reasons:
- Given that we’re dealing with audio reproduction rather than audio synthesising, we can assume that the “attack” in the traditional ADSR envelope would be basically instantaneous in a transducer reproducing audio and therefore a fixed variable (and can be ignored).
- The ADSR envelope breaks down the length of the note’s “hit” as how the attack decays into the sustain. For transducers, I think it can be simplified as “length of attack” assuming the above point is implemented.
- Having just two parameters (attack and decay) makes things easier to break down and explain.
Under the assumption that a transducer (upon receiving the analog signal) will hit maximum SPL instantaneously, the next issue is how long this “point” gets dragged on. Too long subjectively creates this muddy or congested effect. This is what I’d term as “length of attack, “attack length”, “attack speed” etc.. In an ideal situation, the shorter the length of attack, the better.
Short (sharp) attack vs long (blunt) attack, shown in 3 notes being struck in quick succession:
It’s hard to describe a “sharp” attack; notes simply come off as clean and well-defined when that happens, and if you don’t have a frame of reference it just sounds “normal”.
However, it’s easy to listen out for “blunt” attack as a lot of low quality drivers exhibit this quality. Plucked strings may come off as muted, percussions come off as banging on pillows; basically any instrument that is heavily reliant on that initial burst of SPL on the ADSR envelope will suffer when played back on a transducer with blunted attack.
When most people talk about the speed of the driver, they’re usually referring to the attack function. It’s a much better objective metric after all; shorter = better and there’s little room for argument. On the other hand, decay is a much more fickle metric to talk about; of course, too much decay is quite obviously detrimental to the integrity of the sound, but it’s also not like attack where the shortest is objectively better. Decay is one of the things that requires a very delicate balance.
Low decay contributes to the metric of “definition”. When the notes attack fast and don’t have a lot of linger afterward, the notes are much more clearly distinguished and so better defined. However in real life, nothing has “zero decay”. You bang on a drum, the skin continues to vibrate for a short time after it was struck. Pluck a guitar string and there’s still sustain of the note long after you’ve released. Add the effects of room acoustics and post-processing mastering effects and everything we know about what makes a sound “the sound” isn’t just what tone it produces but also the pattern in which it decays.
Here are the different types of decay visualised in graph form:
There is a problem in having too little decay; it’s not really representing what you’d probably hear in real life or with a good pair of speakers. Yes, the notes are very well defined but they will sound unnatural. Examples of stuff with short decay are drivers like BAs and electrostats, which have been generally described as having this “ethereal” presence which I would personally attribute to them having way too little linger beyond the initial note. You find yourself wondering if the note was even struck at all because of how fast it disappeared.
Long decay is pretty self-explanatory; you can see that the notes will start smearing into one another and there is very little separation between every attack. Bad drivers are usually the cause of this, perhaps too limp a diaphragm material or too much acoustic resonance without the housing, who knows. The end effect, just like having a long attack, contributes to that muddy/congested sound.
A while back, I mentioned that many theories have been put forth regarding “technicalities” and that I’ll debunk one. Well buckle up, because now I’m about to denounce one of the most popular metrics in the headphone measurements game…
(The case against)
Why don’t you post distortion graphs? Everyone else is doing it!
Harmonic distortion (or even worse, total harmonic distortion) is, with the exception of fringe extreme cases, virtually useless in the context of headphones, IEMs, and even sources.
For one thing, harmonic distortion at different frequencies behave… differently. So the “industry standard” where distortion values are published @1000Hz are largely useless. Or, god forbid, a weighted average single-value metric like SINAD which is highly reductionist with near-zero correlation with actual listener preference.
For another, not all harmonic distortion is created equal and so publishing a plot of THD against frequency is still too simplified. The effects of auditory masking mean that lower-order harmonic distortion would largely be inaudible especially if the distortion were at lower frequency ranges, and virtually every headphone and IEM out there have distortion profiles that are second and third-order dominated typically under 1kHz.
In short, THD is largely irrelevant and even a full breakdown a headphone/IEM’s distortion profile isn’t going to tell you much since it’s almost guaranteed to be lower-order-dominated anyways.
I myself have actually tried to test this myself out of morbid curiosity; I launched a DAW, manually boosted harmonic distortion and see at which point would I actually start to hear differences. And here are the results:
For a pure 1000Hz tone, I could barely make out third harmonic distortion (3HD) at 3%, and second harmonic distortion (2HD) at a little over 5%. For a pure 200Hz tone, I gave up after not hearing differences past 8% even for 3HD. Didn’t even bother with 2HD, much less lower frequeny tones.
- For general music listening, the differences are even more subtle even as harmonic distortion is added to the entire frequency range. It takes a ridiculous amount of 2HD to even begin to hear differences (think 10%+), and even then it’s only obvious for higher pitches and doesn’t at all sound like the “tubey warmth” you’d probably expect.
I’ve retested my findings with REW’s tone generator which allows the addition of harmonic distortion. New results below:
- Pure 1000Hz tone
- 2HD: 0.56% (-45dB)
- 3HD: 0.20% (-54dB)
- Pure 500Hz tone
- 2HD: 2.0% (-34dB)
- 3HD: 1.0% (-40dB)
- Pure 200Hz tone
- 2HD: 3.5% (-29dB)
- 3HD: 1.3% (-38dB)
- Pure 50Hz tone
- 2HD: 7.1% (-23dB)
- 3HD: 2.5% (-32dB)
And remember, this is the absolute best case scenario since we’re only dealing in pure tones and not music. Once actual music listening is in the picture, the audibility threshold of such distortions go way up.
What is the general range of THD you can expect out of a headphone or IEM? Sub-1%. And I’m being very liberal with the “1%” value I’ve chosen here considering that it’s usually the peak of the THD typically in the sub-50Hz regions; with the right measurement conditions and a low-noise microphone to eliminate distortion artifacts from the environment, the actual distortion measured from the transducer in frequencies 100Hz and upwards would probably be closer to sub-0.1%. A far, far cry from threshold of audibility, especially once you move away from pure tones and into music listening.
Two conclusions you can draw from here:
- Every other human being is able to pick out sub-0.1% THD in their headphones and differentiate them according to distortion characteristics, and my ears are totally broken.
- THD is irrelevant.
But here’s the thing: don’t take it from me. Take it from actual research into this very topic: the Gedlee papers (paper 1 and paper 2) and research performed by the same team that came up with the Harman Target.
- Gedlee’s research puts the correlation between THD and listener rating at r = -0.423. Loosely correlated, but ultimately not a reliable predictor for sound performance.
- IMD correlation is even lower at r = -0.345.
- Listen Inc’s research concludes that “none of the headphone distortion measurements could reliably predict listener preference based on audible distortion”, in reference to the metrics of THD, IMD and multitone distortion.
- Both researches have suggested alternatives to the traditional distortion measurements based on their results; Gedlee with the “Gedlee metric” and Listen Inc with non-coherent distortion.
I will not comment if they are effective as that is a separate topic altogether.
- Both researches make mention of the phenomena of auditory masking as a reason for the irrelevance of THD, with Gedlee even calling it “intuitively obvious”.
Due to all these reasons, I won’t be publishing distortion measurements. I encourage others in the measurement-publishing field to move away from harmonic distortion as well so as not to mislead the general populace, but it’s a free world.
Where the worlds collide
While tonality and technicalities are largely separate, there are some ways in which each influence one another. After all, frequency and time domains meld together to form what we all perceive as sound.
Timbre (pronounced “tAm-buh”)
At least within the context of IEF reviews and in my own personal definition, timbre is tonality with time domain characteristics added on, more specifically decay. Or even more specifically, the pattern of decay. Each instrument decays differently so usually an IEM that excels at one particular style of instruments (bowed strings, plucked strings, percussions, vocals etc.) can’t be expected to perform as well for others.
As mentioned in the Decay section, an IEM needs to strike a balance that works with as many instruments as possible since they can’t make use of room acoustics or the acoustics of one’s own body to create timbre. Though in practicality, what happens is that you sometimes get drivers that inherently has a timbre “flavour”; for instance:
- Plastic timbre: Characterised by a hollow sound, sometimes also describes as a certain sense of weightlessness in the notes. This sometimes happens when the decay is unusually fast though is also exacerbated by a higher-frequency-biased tonality. Balanced armatures are its biggest offenders.
- Metallic timbre: Not necessarily due to an abnormally long decay but rather a ringing (AKA pulsing or oscillating) decay pattern. Misplaced peaks in the treble can also cause this, though the metallic effect can also manifest itself in baritone and bass instruments depending on severity. Anecdotal examples of IEMs with metallic timbre include the Dita Dream, JVC HA-FD01, even the venerable Focal Utopia.
Also known as warmth or lack thereof. What affects how warm an IEM is is combination of transients and tonality, though moreso on tonality. To generalise very broadly, an IEM with lower-frequency-biased tonality is more likely to have a lot of warmth while an IEM with a higher-frequency-biased tonality is less likely to have warmth. This likelihood is boosted by the length of decay, wherein more decay translates to more warmth in theory.
This all within a certain range of course; decay beyond a certain point wouldn’t sound warm anymore but rather turn into mud though there seems to be some amateurs equating warmth to the muddy effect.
On the other side of the spectrum, I don’t really like to use the term “cold” since it denotes a particularly negative connotation in itself; I prefer to go by a scale that starts with “mud” at the worst, “warm” somewhere beyond that, and finally “room temperature” that denotes a lack of warmth. For instance some might describe something like the ER4 as cold; I’d personally just say that it doesn’t have a lot of warmth.
Neither the presence nor absence of warmth are objective “good”s by themselves, only how well it plays with one’s personal preferences.
Texture is mostly derived from transients, in particular the shape of the attack and length of decay.
When there is enough decay (though again, not too much), the notes can overlap and so created a “smoothed” effect. Though, it is also possible to be both high-definition and smooth if the transducer strikes the balance appropriately. As mentioned, an IEM that is too smoothed has too much decay, and so the aforementioned detriments of long decay kicks in. On the other hand, a textured sound comes from low decay, each note well separated from the other. Too much texture results in the grainy effect, as well as the usual shortcomings of being unnatural.
On the other hand, smoothness and graininess might be associated with tonality, or more specifically harmonic distortion. Even order distortion is generally pleasant as they harmonise on whole octaves and can create this smoothed, “musical” sound. The “tube sound” is commonly associated with second order distortion, hence giving it their distinct signature. Odd order distortion is generally considered to be destructive due to their relative non-relevance to octave harmony, often being described as giving a fuzzy or grainy effect. As balanced armatures have consistently demonstrate dominating third-order distortion (some as severe as 1% as opposed to the usual of 0.01 or lower), this could be an explanation for the grain (or texture) that some hear on BA IEMs, though I’m not optimistic.
(And yes, I understand that all this is hypocritical of me just as I finished denouncing the evils of distortion measurements.)
Just like temperature, texture is not a measure of objective performance and is all personal taste. However, going too much in any direction (too smoothed, too textured) can be objectively bad and so can kick in as a negative metric for technicalities.
Geddes, E. R., & Lee, L. W. (2003, October). Auditory perception of nonlinear distortion-theory. In Audio Engineering Society Convention 115. Audio Engineering Society.
Temme, S., Olive, S., Tatarunis, S., Welti, T., & McMullin, E. (2014, October). The correlation between distortion audibility and listener preference in headphones. In Audio Engineering Society Convention 137. Audio Engineering Society.
Afterword: Yes I know that scientifically speaking, transducers such as headphones and IEMs are (generally) minimum phase devices. Whatever exists in time domain will be reflected in the frequency domain for these transducers, so all this talk about transients and time-domain are technically completely inaccurate in a truly objective sense. However, I can’t really come up with an alternative for the phenomena that I’ve experienced over the years that I’ve always attributed to time domain stuff, so all the things I’ve talked about here are essentially placeholders terms for the time being.
Cheers to the next few audiophiles who will publish the next big thing in headphone acoustic science, and hopefully prove me right.
Support me on Patreon to get access to tentative ranks, the exclusive “Clubhouse” Discord server and/or access to the Premium Graph Comparison Tool! With current efforts to measure more headphones, those in the exclusive Patreon Discord server get to see those measurements first before anybody else.
My usual thanks to all my current supporters and shoutouts to my big money boys: