This is the first post in hopefully what might become a series in the long run about useful AI applications for music, meant for people who are not coders or data scientists. First, we’ll look at a very useful case: automatic (high quality!) separation of music to stems.
A lot of research is currently being done on the subject of deep learning and such topics, with very exciting outcomes for musicians. An issue is, that not a lot of user friendly applications are available yet.
Spleeter is probably the source separation algorithm any normal musician is familiar with, if any, because several GUIs have been made available for it in the past years. The algorithm is only from 2019, but is already somewhat old technology.
Here is a chart of subjective quality of different source separation algorithms. It shows us that several of the more recent algorithms can yield better results than Spleeter.
Specifically, Hybrid Demucs scores very high, and thus it’s the one we’ll be looking at.
The results sound like this:
It’s quite amazing that it can preserve the flanger effect on the drums in that Pentangle track, as well as many other details that I’d imagine tend to get lost in the separation, especially since I doubt the model was trained with effected drum tracks. And listen to how punchy and well defined the drum track is on that Yes track! Amazing.
Since the “other” track is kind of the remainder after the drums, vocals and bass are removed, it can sometimes be a somewhat blurry since transient details might belong to the other tracks.
I’m not an expert on the subject, but I believe that due to frequency masking after several sources are mixed together, perfect reconstruction without adding synthesized partials/harmonics might be impossible… And of course if you do that, the parts won’t mix back again into the original audio. Actually I think that for this purpose (enhancement of audio which misses harmonic content) too there exist machine learning algorithms, but I’ll need to check that.
Usually, stuff like this is only available on GitHub, and most commonly, Python is needed to be installed to run the code. In all cases I’ve seen, there is no GUI, which means that the software is used through the command line.
Getting Python to run is surprisingly hard, since suddenly you should also deal with virtual environments and all kinds of issues. I’m not going to details, but there’s more potential pitfalls than one would imagine.
So with that being said, I’m not going to even write about running the code on your own computer. If you feel resourceful and interested to do that, the Internet has resources available.
The thing I’d like to present here is Google Colab notebooks, which are an absolutely fantastic service provided for free, where someone has prepared the code so that it can be run easily in the cloud. This has the added benefit that powerful GPUs are provided, which is essential for machine learning work. For someone like me who does not even have a GPU at home this makes machine learning much more accessible.
Usually the notebooks have some instructions on how to use them, and often some rudimentary GUI is available for interacting with files etc.
It has to be kept in mind that things can break, the notebooks can be taken offline, etc, but at least lately with some search engine work web interfaces for many interesting things can be found. On Colab you can also freely edit the code, which is great for more advanced users. There’s also sites like Replicate and Huggingface, where people can deploy their AI code, but so far I didn’t see a lot of audio projects on these sites. Hybrid Demucs actually does, but for now it’s broken.
The link to the Colab can be found in the description of the project, but here’s a direct link.
Usually it’s good practice to scroll quickly through the notebook and read all of the instructions so that you’ll have an idea of all of the prerequisites. On this notebook, the files to be processed will reside on a Google Drive in a directory called “demucs” (case sensitive). It supports all of the usual extensions, so go ahead, create that directory in the root of your Google Drive, and upload some music there.
The first thing you will see in the notebook is this cell. You’ll be running the cells by pressing on the small “play” icon which appears while hovering on those brackets.
Some messages about progress of the installation (remotely, in the cloud) will appear, and they should eventually stop with this:
Do as indicated, and press that button (or do it through the menu on top of the screen) to restart the runtime.
Proceeding with the next cell, the notebook will require access to your Google Drive. If you’re suspicious, you can always check the code to see what it does. With notebooks that are linked straight from reputable project pages, I consider them safe, but if you find a random notebook with an internet search, it’s prudent to at least glance at the code to see that it does not do something malicious.
After this, it’s just a matter of running the cell which has the code for separation, and hoping that you are connected to a very fast GPU! You’ll never know beforehand, which kind you are getting, so the eventual speed of processing can vary a lot.
For the free tier there’s also a limit to GPU time, so if you are doing hours of processing in a day, at some point they’ll shut you out from the GPUs for 24 hours or something. That’s, if you don’t buy a subscription, of course 🙂 Personally, the free GPU time has always been enough for me.
After everything is finished, you can just download everything from your Google Drive, and start to plan the next steps: transcribing everything, including the percussions to MIDI. Quite good tools are available for that too, and I will cover them in another post later!
I think the things Hybrid Demucs does are close to magic. The vocals sometimes retain even subtle reverbs and tiny breath sounds. The percussions are generally snappy with good attack phases. There’s very little spectral smearing. Compare that to Spleeter, where the separated tracks have often considerable spectral artifacts, and the drum tracks tend to lose their attack.
Apart from the obvious bootleg remixing, the use cases are many. Maybe there is an album which you like, where the vocals are mixed a bit too low. Maybe you want to slap a punishing autotune on everything. Or just be able to see and hear more analytically, what your favourite musicians are doing.
One thing to keep in mind is, that although the sum of the stems should be the original audio, they still have some bleed so editing them and then mixing them back together might do weird things.
But aren’t the weird things what we are doing this for?-)