audio crafting #2 – Image editing of sounds

Hello, and welcome to the second installment of my audio crafting series.

This series aims to teach you about the fine craft of sound design by hand, and promote free non-commercial tools.

Ever wondered how it would be to be able to edit sounds in a more visual way just like paintings?

In case you are not yet familiar with the technology, I have good news: you can!

Mac users might be familiar with Metasynth, and Photosounder opens up the possibilites for Windows users. But there also exists a free cross-platform tool called VirtualANS, available to all operating systems. I am using the Linux version, but the other versions should perform in an identical way.

So let’s start with a speech sample here:

I found that the resynthesis actually sounded better for this speech sample using the low sound quality option. The “high quality filter bank” option resulted in some ringing. I chose 2048 pixels for the height, and 64 pixels per beat with a BPM of 141. Higher settings give a more accurate representation of the sound. Use a low pixel height for a more robotic sound. I selected only 6 octaves for the sound in the import window, and around 150hz for the lowest frequency, since the original Youtube source did not contain any frequency data above octave 7. Now the data is nicely represented by the whole size of the image.


Here is the image exported as PNG to Gimp


And here I have created a grid of 1/16th of the image width, so I could sync the sounds to a 4/4 time if needed.


I then made a few transforms, rotating two of the layers a bit, moving them around, and applying a mosaic effect and left/right mirroring to the upper harmonics of the final words. You will now see how conditioned the human brain is to understand speech. When the spectrum changes, it quickly turns to almost incomprehensible noise in our minds. Here, I rotated the layers at most just a few degrees, but already the speech starts to be hard to follow at times. Also note how I have pasted every word to their own layer, this makes it much easier to do edits if I wanted to continue with this sound.


Be prepared to do a lot of back and forth jumping between VirtualANS and Gimp to fine tune what you are doing. And don’t stress about the sound becoming more and more alien sounding between each edit for now, it’s part of the spectral fun!

Now import what ever you did back to VirtualANS. After you have imported the image, VirtualANS will lose all settings, so you must set the octave, lowest frequency and speed settings again. But you can also abuse this. Using different values from the original will distort the sound, stretching or compressing the spectrum.

I got this:

A more useful thing to do is to work with the harmonics of melodic samples. You can tame instruments that have dissonant harmonics, and in general have more freedom over the timbre of the sound than is possible with any other method of editing.

Here is a sound of a softly plucked muted kantele I built some years ago:

Here is how the unprocessed sound looks like (notice the lack of higher harmonics and noisy components as shown by lots of empty space in the top area) when exported to Gimp. I used the high quality filters option and all 12 octaves in VirtualANS for this particular example. There is no sound in the uppermost octaves in the original recording, but I wanted to have that space available in case I wanted to put something there to make the sound more crisp:


I copied all the harmonics except the lowest one and pasted them into a single new layer. I then stretched them horizontally so that the longest decay is now as long as the fundamental of the original sound. I then moved the layer in “lighten” layer mix mode so that the ex-second harmonic closely coincides with the fundamental. This creates a somewhat beating, metallic sound.


And now, for something completely different. A lecture. This will help in thinking creatively about acoustics and harmonics.

Why does that sound sound “metallic” to us? Why do sounds quickly degrade to glassy/watery noise when doing spectral editing?

In the real word, most instruments have very specific harmonic content. A vibrating string will generally create a fundamental, then a harmonic an octave above, and so on. Practically all melodic western instruments are dominated by the octave having an 1:2 ratio in the harmonics.

In practice, it’s quite rare that a vibrating object would create no overtones at all. An ocarina gets close, creating a vibration quite close to a sine wave.

Ever attached a weight to a guitar string? Go ahead, try it (wax, Blu-Tak, or anything that sticks works)! You will change the mass properties of the string, thus wildly changing the overtone content.

A notable exception to the vibrations of musical instruments are the ones made from vibrating masses of metal like bars and tubes. Go look at the frequency content of a gamelan instrument, and you’ll see that the overtone content does not follow the familiar 1:2 octave-based relationship of guitars, flutes, pianos, and whatever we area accustomed to listening. And actually the same goes for marimba bars too. Ever wondered why they are scalloped at the bottom? That’s to move the harmonics around for a more balanced sound, especially when more than one note is played at the same time.

We have a great memory. Sounds with inharmonic content sound “metallic” because we have heard enough pieces of metal being hit to associate their particular overtone characteristics with metal materials.

Consider sounds created from photographs with image synthesizers. The overtone content has arguably a very sensible structure, like a picture of a dog for an example. But probably it would be almost impossible to think of a physical object that could vibrate in such a way as to create a similar overtone content. And our brains, having no point of reference, can’t really connect the with anything in the real world.

If you want to mimick sounds that resemble something real, try to stick to horizontal harmonics that don’t bend too much, and have some vertical recurring structure.

Next, I recorded a sound of a wine glass being hit.

And here the same sound after resynthesis. I am losing the attack (and some volume, but that could be remedied by increasing the brightness of the image), but who is interested in perfect recreations anyway? This is mutant sound art! Give me those artifacts!


What if we want to do some more careful editing on the spectrum? We need a guide. Luckily we can snatch one very easily from a screenshot of VirtualANS, since it helpfully shows the corresponding notes of the different frequencies on the left.

I cut, pasted and scaled it from the screenshot to a transparent layer over the sound. Now it will be very easy to move the harmonics around, or even create new ones where desired. You could even tune a gamelan hit…


Let’s zoom in and see what we have here?


The fundamental is at C2 (which I knew already, since I tuned the glass by filling it with water and checking with a guitar tuner before sampling), and then harmonics at almost every C and F until C6, where we have a lot of diffuse stuff. That’s not a lot of information actually, yet our ears and brains unmistakably can place the sound as “glassy”.

Actually most of the information we use to recognize sounds come from the noisy attack portion. Sadly, that is the hardest part to resynthesize from sine waves, since the phase information of the partials is lost when doing the required fourier transform and windowing . Another important source of information (since the lowest partials resemble each other on any instrument anyway) are the upper harmonics. Here most vibrating physical objects lose the simple octave relationship, and much more complex interactions occur.

What this means in practice is, that we can make subtle psychoacoustic tricks by mixing the fundamentals of one sound with the harmonics of another sound.

I want to raise those F harmonics to G, and also want to increase their sustain, imitating a physical object with less internal damping. Which would be a requirement hard to meet in the real world, since glass already has very little damping.

Actually another way to promote the partials would be to excite the object harder. But just try hitting the glass much harder than I already did… And of course you could bow it. Actually should too. Bow everything and sample it. Everything! Bowing is a very good way to impart a continuous flow of energy into the object, and interesting sounds can be found from many things. I used a bowed multisample of that same wineglass on several tracks already… I really love the effect!

But back to editing.

It’s just a simple matter of copy and paste here. I also duplicated the harmonic at F4 to C4, since C4 didn’t have one. To keep things more interesting I deleted the the fundamental and first harmonic of the kantele sound used earlier, and moved the rest of the harmonics of the kantele to coincide with C and G harmonics of the glass sound. Now we have a completely new, composite sound, exciting, no?

Here is how the sound looks like after all this:


Sounds laser sharp, really! (laser harp?)

And there, all that remains for now is a small test loop with the three sounds for the melody: first the harmonically enhanced glass sound, then the original unedited resynthesized sound, and finally with the original raw recording that didn’t go through VirtualANS.

Remember, here the imagination is the only limit! And I suggest having some method when doing this, because it’s easy to end up with sounds that are not really very useful. Also vocoding, etc is possible. Just think about it: the sounds really are opened up on the operation table, and you are the master surgeon here.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.