In-Ear Fidelity

On the Record: “Technical Ability”

Do refer to the updated version of this post instead:

I have a bit of free time before I go on a short hiatus, so I guess here’s Episode 2 of my ramblings (assuming if it does become a semi-regular series). Today, I shall talk about one of the “buzzwords” that I like to throw around here: technical ability. Not necessarily an educational post but more me being transparent on how my critical listening process works. Maybe it could apply to you as well.

* Do note that whatever I write here is still mostly “pseudoscience” in that most of the things here aren’t peer-reviewed or academically researched (or scientifically accurate, for that matter!), more simply being a description of how I personally interpret what I’m hearing and how I assign “markers” to identify how good an IEM is beyond the veil of personal preference and taste.


Also known as “speed” in other circles, I personally define transients as comprising of the initial “attack” function and the subsequent “decay” function. In the professional world of synthesisers the term “ADSR envelope” is used, which stands for “Attack-Decay-Sustain-Release”. I lump the A-D-S parameters into the “attack” term as a catch-all and rename the Release parameter back into decay for a few reasons:

  • Given that we’re dealing with transducers than instruments, we can assume that the “A” in the ADSR envelope is instantaneous and therefore a fixed variable (and can be ignored).
  • The ADSR envelope breaks down the length of the note’s “hit” as how the attack decays into the sustain. For transducers, I think it can be simplified as “length of attack” assuming the above point is implemented.
  • Having just two parameters (attack and decay) makes things easier to break down and explain.


Alright, so going under the assumption that a transducer (upon receiving the analog signal) will hit maximum SPL instantaneously, the next issue is how long this “point” gets dragged on. Too long subjectively creates this muddy or congested effect. This is what I’d term as “length of attack, “attack length”, “attack speed” etc.. In an ideal situation, the shorter the length of attack, the better.

Short attack vs long attack, shown in 3 notes being struck in quick succession:


When most people talk about the speed of the driver, they’re usually referring to the attack function. It’s a much better objective metric after all; shorter = better and there’s little room for argument. On the other hand, decay is a much more fickle metric to talk about; of course, too much decay is quite obviously detrimental to the integrity of the sound, but it’s also not like attack where the shortest is objectively better. Decay is one of the things that requires a very delicate balance.

Low decay contributes to the metric of “definition”. When the notes attack fast and don’t have a lot of linger afterward, the notes are much more clearly distinguished and so better defined. However in real life, nothing has “zero decay”. You bang on a drum, the skin continues to vibrate for a short time after it was struck. Pluck a guitar string and there’s still sustain of the note long after you’ve released. Add the effects of room acoustics and post-processing mastering effects and everything we know about what makes a sound “the sound” isn’t just what tone it produces but also the pattern in which it decays.

Here are the different types of decay visualised in graph form:

Short attack and ideal decay
Low/No decay
Exaggerated long decay

There is a problem in having too little decay; it’s not really representing what you’d probably hear in real life or with a good pair of speakers. Yes, the notes are very well defined but they will sound unnatural. Examples of stuff with short decay are drivers like BAs and electrostats, which have been generally described as having this “ethereal” presence which I would personally attribute to them having way too little linger beyond the initial note. You find yourself wondering if the note was even struck at all because of how fast it disappeared.

Long decay is pretty self-explanatory; you can see that the notes will start smearing into one another and there is very little separation between every attack. Bad drivers are usually the cause of this, perhaps too limp a diaphragm material or too much acoustic resonance without the housing, who knows. The end effect, just like having a long attack, contributes to that muddy/congested sound.


It’s obvious but I feel like have to re-iterate this: tonality is derived from the word “tone”, hence it is a metric primarily based within the frequency domain. The frequency response of an IEM hence affects its tonality. It can also be referred to as “tone colour”, though I’d reserve that term for the metric of timbre (explained later).

Breaking it down further, the tone of any given instrument is made up of the fundamental frequency (which is also referred to as the fundamental tone, this determines the note in question) and the harmonic frequencies (which is also referred to as the overtones, these give the instruments their unique sound). For instance, a C4 (middle C) note played by different instruments will look as follows on a frequency response plot:

Comparison of overtones between piano, violin and trumpet playing C4

When someone refers to “tonal balance”, that’s also reference to how accurate an instrument may sound; after all, if the balance between the fundamentals and each order of harmonics are correct, so must be the sound. From there, there is the debate on what constitutes as “correct” or neutral which I won’t go into right now. Because the concept of neutrality is so fluid and subjective, especially in the IEM world where individual’s own Head-Related Transfer Function (HRTF) rears its ugly head the strongest, tonality therefore also becomes a very subjective metric. Thus, I’m very hesitant to judge IEMs based on what sounds “correct”, though I’ll mention it and will still arbitrarily be a factor in my final rankings.

You hear the term “colouration” getting thrown around a lot but here’s how I see it. When the tonality of the sound gets skewed to any direction, it goes from having a neutral tone to a coloured one. Skewing towards the low frequencies creates a “dark tonality”, while skewing towards the higher frequencies creates a “bright tonality”. Being lower-frequency-biased puts the focus more on the fundamentals and lower-order harmonics, which subjectively gives the instruments some extra richness and heft. On the other hand, being higher-frequency-biased puts the focus more on the higher-order harmonics, which can boost the clarity of the instruments as well as improving the perception of “air”.

Further Effects


To put it simply, timbre is tonality with time domain, more specifically decay. Or even more specifically, the pattern of decay. Each instrument decays differently so usually an IEM that excels at one particular style of instruments (bowed strings, plucked strings, percussions, vocals etc.) can’t be expected to perform as well for others. As mentioned in the Decay section, an IEM needs to strike a balance that works with as many instruments as possible since they can’t make use of room acoustics or the acoustics of one’s own body to create timbre. Though in practicality, what happens is that you sometimes get drivers that inherently has a timbre “flavour”; for instance:

  • Plastic timbre: Characterised by a hollow sound, sometimes also describes as a certain sense of weightlessness in the notes. This sometimes happens when the decay is unusually fast though is also exacerbated by a higher-frequency-biased tonality. Balanced armatures are its biggest offenders.
  • Metallic timbre: Not necessarily due to an abnormally long decay but rather a ringing (AKA pulsing or oscillating) decay pattern. Misplaced peaks in the treble can also cause this, though the metallic effect can also manifest itself in baritone and bass instruments depending on severity. Anecdotal examples of IEMs with metallic timbre include the Dita Dream and the JVC HA-FD01.


Also known as warmth or lack thereof. What affects how warm an IEM is is combination of transients and tonality, though moreso on tonality. To generalise very broadly, an IEM with lower-frequency-biased tonality is more likely to have a lot of warmth while an IEM with a higher-frequency-biased tonality is less likely to have warmth. This likelihood is boosted by the length of decay, wherein more decay translates to more warmth in theory. This all within a certain range of course; decay beyond a certain point wouldn’t sound warm anymore but rather turn into mud though there seems to be some amateurs equating the muddy effect with warmth.

I don’t really like to use the term “cold” since it denotes a particularly negative connotation in itself; I prefer to go by a scale that starts with “mud” at the worst, “warm” somewhere beyond that, and finally “room temperature” that denotes a lack of warmth. For instance some might describe something like the ER4 as cold; I’d personally just say that it doesn’t have a lot of warmth. Neither the presence nor absence of warmth can be objectively good, only how well it plays with one’s personal preferences.


Texture is mostly derived from transients, in particular the shape of the attack and length of decay. When there is enough decay (though again, not too much), the notes can overlap and so created a “smoothed” effect. Though, it is also possible to be both high-definition and smooth if the transducer strikes the balance appropriately. As mentioned, an IEM that is too smoothed has too much decay, and so the aforementioned detriments of long decay kicks in. On the other hand, a textured sound comes from low decay, each note well separated from the other. Too much texture results in the grainy effect, as well as the usual shortcomings of being unnatural.

On the other hand, smoothness and graininess can also be associated with tonality, or more specifically harmonic distortion. Even order distortion is generally pleasant as they harmonise on whole octaves and can create this smoothed, “musical” sound. The “tube sound” is commonly associated with second order distortion, hence giving it their distinct signature. Odd order distortion is generally considered to be destructive due to their relative non-relevance to octave harmony, often being described as giving a fuzzy or grainy effect. As balanced armatures have consistently demonstrate dominating third-order distortion (some as severe as 1% as opposed to the usual of 0.01 or lower), this could be an explanation for the grain (or texture) that some hear on BA IEMs.

Just like temperature, texture is not a measure of objective performance and is all personal taste. However, going too much in any direction (too smoothed, too textured) can be objectively bad and so can kick in as a negative metric for technicalities.


Yes I know that scientifically speaking, transducers such as headphones and IEMs are (generally) minimum phase devices. Whatever exists in time domain will be reflected in the frequency domain for these transducers, so all this talk about transients and time-domain are technically completely inaccurate in a truly objective sense. However, I can’t really come up with an alternative for the phenomena that I’ve experienced over the years that I’ve always attributed to time domain stuff, so all the things I’ve talked about here are essentially placeholders terms for the time being.

Cheers to the next few audiophiles who will publish the next big thing in headphone acoustic science, and hopefully prove me right.

5 thoughts on “On the Record: “Technical Ability””

  1. “Yes I know that scientifically speaking, transducers are minimum phase devices. Whatever exists in time domain will be reflected in the frequency domain for headphones, so all this talk about transients and time-domain are technically completely inaccurate in a truly objective sense. ”

    But, of course, it’s not actually objective. The music was all recorded to sound “correct” on a certain setup in a recording studio, and the producer/engineer had a “target audience,” whether they realized it or not, of a certain listening environment. Usually that target is a room with speakers. Both of these subjectivities are encoded into the music, and later recovered by our earphones.

    If EDM producers were trying to get their tunes to sound good on very fast speakers, I wouldn’t keep experiencing the decay of my BA earphones as too short. But they target these massive soundsystems with big-diameter subwoofers. Talking about a transducer’s effects in the time domain is actually a commentary on how closely the transients as heard through the transducer track the producer’s subjective intent, as reflected in the recording.

  2. You’re a genius, man. Explained these terms wannabe audiophiles always throw about with no clue very well for the usual person. Dynamic drivers are the best if made with light yet strong cones, but one driver can’t cover the entire spectrum while maintaining within-tolerance Spl with low distortion and high instrument separation. For speaker systems, 3 well-designed drivers to handle different parts of the spectrum are more than enough. It’s all a tradeoff. For example, subwoofers have to have heavy cones (high Mms) to really dig into the lows and eat much more power as a consequence.

    When it comes to IEMs, BA’s offer high detail simply because the attack and decay is very short compared to dynamics. Even when the artist didn’t really intend it, there is a sense of space for a particular set of notes just because “silence” comes back so quick after the driver has finished it’s job hitting the note. The glaring side-effect is a complete change of timbre. One has to work with getting the harmonic distribution somewhat good to keep the sound lifelike. But the problem trying to modify that on the driver level is the D2, D3 distortion increases. You will easily find an inexpensive single DD IEM doing 100+ dB across the spectrum all day with distortion you can’t put a finger on. A single BA IEM, even if reasonably well made, will have at least five to ten times the distortion trying to cover all those Hz.

    I consider Electrostats to be at the top of the summit, as far as “personal sound” is considered, only one has to adapt to the crazy sense of space even when there was close to none mixed into the track, because of virtually negligible decay. Result is, you will get spoilt big-time. The sound is not “accurate”, “reference”, or “exactly as intended”, but it’s really high on emotion. Cons are “exclusivity” in terms of compatibility and portability worries. I shouldn’t mention price, right?

    Planars seem to combine all the stuff we’re looking for without “maxing out all the points in one skill”. Especially in IEMs, recent developments have been really juicy. Cons are next to no isolation for headphones, hunger for power. Yes, I won’t mention that again.

    Keep up the good work.

  3. Hi, Thanks for making this amazing website!

    “However, I can’t really come up with an alternative for the phenomena that I’ve experienced over the years that I’ve always attributed to time domain stuff”

    You might be interested in Earthworks Audio QTC series microphone technology, maybe it could explain it.

    Thanks 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

First impressions

VSonic VS7: Unboxing

Introduction The VS7 is VSonic’s modern answer to their aging GR07, a model legendary for its “giant killer” status back

Read More »