Saturday, March 04, 2006

DTMF Sampling - Constructing a Wave

-->


How the heck do you generate dynamic sample data suitable for the wave format (PCM)? It's actually not that bad. The key is to follow the wave format specification



http://replaygain.hydrogenaudio.org/file_format_wav.html
http://ccrma.stanford.edu/courses/422/projects/WaveFormat/

http://www.sonicspot.com/guide/wavefiles.html

CONSTRUCTING THE WAVE
Code for this blog is posted at Github here

So we see that the first 44 bytes of a wave file is dedicated to wave format / header information, from byte 45 onwards contains all the sample data - the bits which make the noise.

In my Wave class - the constructor allows setting of basic Wave settings (sample rate, Resolution [8 or 16 bit], Channel [left, right, mono, left-right stereo]) and creates a byte array 44 cells in size populating it with intial header values. All this code is pretty much standard, not really worth explaining as much of it can be a copy paste job. The main thing is to follow the wave format spec given in the above links.

The sampling code is a bit more interesting as a bit of maths is involved. Essentially DTMF requires the summation of Sine waves of two frequencies to generate a tone recognised by a phone exchange (DUAL TONE multi Frequency). Standard Frequencies for each digit on the phone dial pad (0-9 * # a b c d) can be found at :

http://users.tkk.fi/~then/mytexts/dtmf_generation.html

As far as I am aware, DTMF frequencies are international standards and so the posted frequencies should work with phone exchanges world wide.

In my solution I created a basic data containing class called SineWave. In it's constructor, the frequency (Hz) is given as an int, along with its left and right amplitude (volume) at which the frequency should be should be sampled.

Ok, so looking at the Wave class we have a static method ConstructWave (internal method), which in addition to encoding properties takes an array of SineWaves (the frequencies to be summed) and a TimeSpan (how long the resulting sample should be played).

Say for instance we wanted to generate a tone for the digit '1', using the frequencies specified at http://users.tkk.fi/~then/mytexts/dtmf_generation.html we can construct 2 SineWaves, and pass them to ConstructWave :



//
// playing frequencies at full volume
//
SineWave[] sineWaves = new SineWave[2];
sineWaves[0] = new SineWave(1209, 1, 1);
sineWaves[1] = new SineWave(697, 1, 1);

//
// generate an 8 bit 16kHz sample in mono
//
Wave digitOne = Wave.ConstructWave(sineWaves, 16000, Resolution.EightBit, AudioMode.Mono, TimeSpan.FromMilliseconds(250));


THE IMPLEMENTATION OF SAMPLING

On closer inspection of the ConstructWave method in the Wave class we can see that all sampling is contained in the AppendSample method. Using the target sample rate and sample duration (provided in the Wave constructor) its relatively easy to determine how many bytes the wave sample data should be (Data Size):

i.e.

sample data Byte count = (Sample Rate (Hz) / duration (in seconds)) * no. bytes per sample

WHERE
no. of bytes per sample = (resolution / 8) * no.of channels
IF Mono : no. of channels = 1
ELSE no. of channels = 2



According to the wave header format (above) the total sample data byte count (or data size) should be assigned to bytes 40 - 43. Helper methods ExtractByte and ExtractInt have been written in the Wave class to extract each byte in a 4 byte int (int 32) via bit masking. The Frame Size should also be set in bytes 4 - 7 :


Frame Size = DataSize + 36
[NOTE: 36 is the number of remaining bytes in wave header passed the Frame Size record]


Ok, now to calculate the sample byte data itself..... Both frequencies should be assigned a constant, which in code I have called dataSlice:


dataSlice = (2 * PI) / (waveTime / sampleTime);
[NOTE: waveTime = 1 / frequency (Hz)
sampleTime = 1 / sample rate (Hz)]


Using the number of samples as the loop invariant ( Sample Rate (Hz) / duration (in seconds)) we can calculate each fragment of sample data (bytes 44 - end of array) for the left and right channel (or just the left if mono) as follows:



dataLeft = (Math.Sin (i * FrequencyOneDataSlice) * LeftAmplitude) + (Math.Sin (i * FrequencyTwoDataSlice) * LeftAmplitude) ;

dataRight = (Math.Sin (i * FrequencyOneDataSlice) * RightAmplitude) + (Math.Sin (i * FrequencyTwoDataSlice) * RightAmplitude) ;

WHERE
LeftAmplitude = relative volume of left channel (must be <= 0.5) RightAmplitude = relative volume of right channel (must be <= 0.5) i = current loop iteration



Finally, we mask the resulting number using the resolution of the wave we are sampling (8 / 16 / 24 bit) and store it in the underlying byte array. If you are storing multi channel data (Left, Right, LeftRight Stereo), each data byte should be interleaved....



i.e. 8-bit LeftRight Stereo:

...

waveBytes[n] = ExtractFirstByte(dataLeft)

waveBytes[n+1] = ExtractFirstByte(dataRight)

...


16-bit Stereo:

...

waveBytes[n] = ExtractFirstByte(dataLeft)

waveBytes[n+1] = ExtractSecondByte(dataLeft)

waveBytes[n+2] = ExtractFirstByte(dataRight)

waveBytes[n+3] = ExtractSecondByte(dataRight)

...


And that's it! As the iteration continues, the byte array is filled with Dual Tone byte samples until the target sample size has been reached ( Sample Rate (Hz) / duration (in seconds)).


PLAYING DIRECTLY TO THE SOUNDCARD


Using Pinvoke, we can access winmm.dll - a windows resource to play or save the generated wave as follows:


...

//External method declaration
[DllImport("winmm.dll", SetLastError = true)]
static extern bool PlaySound( IntPtr pszSound, System.UIntPtr hmod, uint fdwSound );

// calling the declared external method
IntPtr ptr = Marshal.UnsafeAddrOfPinnedArrayElement(this.m_waveBytes, 0);
PlaySound(ptr, UIntPtr.Zero, (uint) SoundFlags.SND_MEMORY);
...


References : http://209.171.52.99/audio/concatwavefiles.asp


Wednesday, March 01, 2006

Mp3's & Wave - Constructing DTMF audio files in C#

TERMS:

DTMF: Dual Tone Multi-Frequency - the beeps sent by a phone to the exchange when the User enters a phone number.


THE PROBLEM:

1 - Constructing a wave using a common format (PCM)
2 - Constructing / sampling each digit's tone using standard frequencies
3 - Convert a phone number to a wave
4 - Setting channel (left, right, mono, left - Right stereo) and sampling settings
5 - Integrating unmanaged code into a managed app (using winmm.dll and the Lame encoder)
6 - Encoding the generated wave as an Mp3

** NOTE: This process has been patented as MP3 Telephony by HCV WirelessTM

THE PLATFORM:

C# (easily transferrable to other syntax though) using the Lame Mp3 encoder


BACKGROUND:

I recently had to create a basic DTMF converter for a client using managed code. It's initial inception would be an application with the view that it would eventually be ported over to a website to be used as a service.

Esentially, all the app needed to do was take a phone number string and encode it into its DTMF representation as an Mp3 file. This meant that first I would have to create the sample as a Wave before ripping it as an Mp3 using the Lame encoder. An added feature was to be able to set which channel the DTMF tones would be generated for - Left, Right, Mono, or Left - Right stereo.

Over the next few blog entries I will tackle each segment of the problem.