bell notificationshomepageloginNewPostedit profiledmBox

Hoots : Why is automatic key detection hard? From what I understand, it is difficult for automatic algorithms to figure out the key of a recorded piece of music. For example, in one review of a range of professional "DJ" software - freshhoot.com

10% popularity   0 Reactions

Why is automatic key detection hard?
From what I understand, it is difficult for automatic algorithms to figure out the key of a recorded piece of music. For example, in one review of a range of professional "DJ" software that had a key detection feature, the software typically scored about 50%, only guessing the correct key half the time.

I guess I am confused by this. If we can have software like Auto-Tune that can adjust music to be in the correct key, why is it so difficult to detect the key in the first place?


Load Full (4)

Login to follow hoots

4 Comments

Sorted by latest first Latest Oldest Best

10% popularity   0 Reactions

Even individual notes have multiple names - did I just play G sharp or A flat? The same group of notes may be named as any of several chords. If I play C, E flat, G, B flat, do I mean C m7, or E flat 6? This depends on which key the composition is in, and that "key" depends on the overall context of (usually all) the other notes and chords in the composition. The idea of a "Key" is an abstract mental model that we overlay onto a acoustic phenomena to better understand its structure. That "acoustic phenomena" (musical composition) may be very complex, and no matter whether it is mathematically precise and regular or loose and free-form, our model of it will necessarily be an approximation. Stated otherwise, the music itself is a physical reality (vibrations in the real world), the "Key" exists only in our minds. Automatic key detection is one of those things scientists call a "hard problem".


10% popularity   0 Reactions

The problem is hard because composers make it so.

Arguably, the very definition of music composition is to obscure and interpret. Half of the techniques we learn have to do with prolongation, suspension, evasion, finding ways to make dissonant notes seem consonant, etc. If you subscribe to Schenkerian Analysis you might even believe all true masterworks are a nothing more than a (very complex) embellishment of I.

Interesting music is designed to challenge your ear to find its inner reason. And sometimes it isn't even there. It's no wonder computers have trouble with it.


10% popularity   0 Reactions

I guess I am confused by this. If we can have software like Auto-Tune
that can adjust music to be in the correct key, why is it so difficult
to detect the key in the first place?

Without getting into any of the particulars of automated key detection and its difficulties, I think the answer to this question is fairly simple:

Auto Detect needs a frame of reference - a baseline - from which to work. Music in any key has a particular pattern reflected in its notes. When we transpose to a different key, we duplicate that pattern, just using different notes.

A very simple example:

The pattern for the major scale is:

Tonic->Whole Step->Whole Step->Half Step->Whole Step-> Whole Step->Whole Step->Half Step==Octave.

Following that pattern starting on the note C, we get C->D->E->F->G->A->B->C.

But we can just as easily apply that pattern starting with the note D, giving us
D->E->F#->G->A->B->C#->D.

Replicating a pattern from different starting points is a great job for a computer - it's one of the things they do best - a fundamental computing operation. This is because it requires no original thought or analysis - it's a simple, mechanical/mathematical process.
So, once we have music in an established key, we can easily tell a computer program to transpose that music to a different key - simply replicate the pattern with a different set of notes.

But detecting the original key with no pre-existing frame of reference to work with and replicate is a very different and much more difficult job for a computer. It requires analysis and discernment and judgment. There is no pattern to be replicated. It requires a great deal of information and knowledge about music to determine the key of a piece, and it is sometimes quite ambiguous. In order for a program to accurately determine the key of a piece of music, it must have all of that knowledge available and be able to use it to arrive at the proper key. As a software developer, I can tell you that is a difficult computing problem indeed - getting it right is no mean feat.


10% popularity   0 Reactions

There's a few factors at play here:

Let's assume that we have a magical piece of software, which can listen to audio and tell us exactly what notes are being played. Even given this software, determining key is not a trivial problem. Sure, there are simple cases, but even humans disagree over many songs. A computer has no chance.

Take Sweet Home Alabama. The chords are D C G. Many electrons have been wasted arguing over whether this is a V IV I in G Major or a I bVII IV in D Major. I personally think it's in the key of "please never play that again", so I avoid analysing the infernal thing too closely.

Or take Hey Jude. The na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na na bit. If we transpose a bit, the chords are also D C G. But that's pretty clearly a I bVII IV in D major. Context is important, and building an algorithm to automatically determine that context is a complex problem.

So, we've established that 100% of the surveyed songs with a D C G progression are annoying. The next part of the problem is actually getting a list of pitches to do this key recognition.

You'll notice that I used the word "magical" in the previous section. Most pitch recognition software will do some sort of frequency analysis. Basically, they grab a section of audio, and determine what frequencies are present. We know the frequency of every note, so we can map that list of frequencies to a list of pitches.

Not so fast. Unfortunately, when an instrument plays a note, it produces more than one frequency. That's why a piano doesn't sound like a guitar. Some of those frequencies will be harmonic; that is, multiples of the root frequency. Others will not. If the instrument is not pitched (like untuned percussion, or a noise sweep), there will be lots of these inharmonic frequencies.

If you have a complete track, separating all these frequencies, determining which are pitches, and which are harmonics, is non-trivial. It's kind of like trying to separate the ingredients of a milkshake once they are mixed. It's certainly possible to get a good approximation, but hard to actually tell exactly what was being played. The (trained) human ear is much better at this task than computers.

Now, to be fair, if you're just trying to determine the key (rather than transcribe every note), this problem is easier to solve. I don't care who is playing what note; just the overall harmonic structure. But there's still plenty of room for your computer to make mistakes here.

A couple of comments have observed that even if you have a list of pitches, converting them to note names requires some idea of the key. This is because, in the vast majority of Western music, we have the concept of enharmonics. Basically, A# and Bb are the same frequency, and we choose the name based on the key.

For a lot of music, this isn't really a big problem. For example, here's a set of pitches:

A#/B?/C?? B#/C/D?? C##/D/E?? D#/E?/F?? E#/F/G?? F##/G/A?? G##/A/B??

It's pretty obvious that this is B? Major. You could call it A# Major, but that's a much more complicated way to spell the scale, so we don't. Equally, C?? Major is not a good name. That sort of heuristic is quite easy to add to software, so in this simple case, it's not really a problem.

It could be more problematic when there are two equally right options, like F# Major vs G? major. Again, either is correct, so you just pick one.

If the key is more ambiguous, then this could be more of an issue. But I think the other problems are much more significant.

Finally, on Auto-Tune. Auto-Tune's job is easier for a couple of reasons. Firstly, it's going in the other direction. It has a set of "good" notes (semitones, or a user-specified key), and moves any "bad" notes accordingly. It doesn't have to assign a key. Secondly, you generally autotune a single isolated instrument. That's much easier to handle than a complete mix. I don't know what Auto-Tune will do if you run it over the whole mix at once, but I don't think it will be pretty.

In short:

Even given a list of all the notes/chords, key detection is non-trivial
Getting that list of notes and chords automatically is not a reliable process

As a result, computers can certainly attempt automatic key recognition, and get close in a lot of cases, but it's unlikely that they will ever be 100% accurate. If someone would like to prove me wrong, I'd love a free copy of your software to verify your claims. For scientific purposes, of course.


Back to top Use Dark theme