||Image Processing Techniques for Audio - Part 1|
Image processing software provides all sorts of interesting functions for manipulating images: morphing, rippling, convolutions, etc. What if we could apply some of these processes to audio files? It is a great idea, but if you try to get an image processor to load in a standard format audio file it is not surprising that it will fail. But that does not mean it is impossible...
From the computer's point of view, image data and audio data are no different from each other. It is all just numbers to a computer. As far as the computer is concerned, you could just as easily listen to an image file as look at one. Unfortunately though very little, if any, software allows you to directly do these kinds of things. But, with a little creative exploration it can be done.
Before we get started, it is important to understand some of the inner-workings of this process. There is one major difference between audio and image data. Audio files are single dimensional, in that they are played back one data item (sample) at a time in a continuous stream (width). Images, however, are two dimensional in that they have width and height and are displayed as such. In order to relate the two, we need to understand how audio and image processing software handles data differently.
For discussion purposes, we will use a small sampling of raw data (Figure 1). As audio data, this is processed as a single stream of data played back from left to right. In order to use this as image data though, it needs to have width and height. We want to be sure to use all the data, so it is necessary to choose our dimensions appropriately.
3 2 1 0 7 6 5 4 11 10 9 8 15 14 13 12
STEP 1: Preparing the audio file.
The problem is that it is very unlikely that you will find an AIFF, WAV or other audio file format loader or saver in an image processor. It is also unlikely that you will find an image data loader or saver in an audio processor. Fortunately, there is a storage format that is common between audio and image data formats and that is RAW.
We have included a sample IFF audio file for this tutorial: Piano.iff (Figure 7). It is an 8-bit sound with a sampling rate of 16,780. To begin, we need to convert the sound file to RAW format. SOX is smart enough that it will recognize what we want it to do, so from the CLI:
is sufficient. This will result in the creation of a new file called Piano.raw.
Most image processors do not specifically contain a RAW module, and ImageFX is no exception. The SCULPT image format, however, is a RAW data format and ImageFX contains two SCULPT modules: SCULPT GREY and SCULPT RGB. The difference between the two is that SCULPT GREY is an 8-bit data format while SCULPT RGB is 24-bit. Because our sample audio file is 8-bit, we will be working in greyscale.
To perform a reverse process on the sound we need to use the horizontal and vertical flip transformations, as explained earlier. Select TRANSFORM (Figure 11) and then FLIP HORIZONTAL. Then TRANSFORM, again, and FLIP VERTICAL.
In order to playback the new reversed sound we need to save it as a RAW data file. Select SAVE and then SCULPT for the Save Format. ImageFX knows that this is a greyscale image so it will use the appropriate SCULPT GREY 8-bit format. Select SAVE AS and name the file Piano-Backward.raw.
STEP 5: Listening to the modified sound.
We could convert the new RAW file to IFF format using SOX before playing it back, but it is not necessary. However, because the sound is in RAW format we have to tell the audio player what sampling rate to play it back at.
will do what we need. We have used the original sampling rate of Piano.iff for reference here, but you can try whatever rate you want. Note that ImageFX automatically appends a suffix ".grey" to the name you give it.
STEP 6: The weird and potentially wonderful.
Earlier we talked about the strange effect that happens if we only flip in one direction. To hear it, do one more FLIP VERTICAL transformation and save it as Piano-Horizontal.raw. Because we have already done a vertical transformation once, we are now flipping it back again. Effectively, doing just a horizontal flip. To hear this weird thing:
We have only touched on the basic procedure here. Next time, we will delve a little more in depth into this process. In the mean time, explore this technique further and if you come up with anything particularly exciting please drop me an email. These ideas are presented only as a guide to possibilities. Hopefully they will be used as a starting point to something new and wonderful.