Streaming audio input frame size implications
| Fri, 2006-09-15 04:28 | |
|
The subject of frame size in streaming audio has come up from time to time in this forum, as well as in various example programs. After doing some empirical studies on this subject using a Nokia 6620, I have concluded something quite different from the conventional wisdom regarding this question.
I had assumed that Series 60, 2nd Edition devices used a native frame size of 320 bytes (160 samples). I began developing an application that would analyze sound in real-time. Using a frame size of 160 samples, I began to notice that on every 13th frame, the last 32 samples were unfilled when the buffer was returned. Initially I found this out by pre-filling each buffer with the number 11111 in every element just before calling ReadL. Since the acoustic environment was quiet during my testing, this value would never occur naturally. But as the buffers were received by MaiscBufferCopied, every 13th buffer was found to contain 11111 in the last 32 slots. I subsequently confirmed this more directly by examining the Length() of the returned descriptor. I found it suspicious that 13 x 160 - 32 = 2048, a nice power of 2. I began to experiment with frame sizes of 128, 132, 150, 180, and 256 samples. In the cases of 128 and 256, the buffers were always filled completely. But for all the other frame sizes, the buffers were filled in a manner that suggested that things were forced to come out even at 2048 samples by leaving some portion of occasional buffers only partially filled. For example, with a frame size of 180, every 12th buffer was unfilled in the last 112 slots. 12 x 180 - 112 = 2048. As long as you use the Length() of the returned descriptor to process the returned buffer, and not just assume it is full, gap-free streaming is still possible. However, it is simpler and more efficient to use a frame size that does not require any shortened buffers. On the Nokia 6620, that means divisors of 2048. The special role of 2048 samples is more than just a numerological oddity. It also turns out to set a limit on the latency in getting the audio data into your application. I tested the hypothesis that audio blocks of 2048 samples were developed in their entirety by the system audio drivers before any of that data is delivered to the application. I used User::TickCount() within my MaiscBufferCopied callback to timestamp the arrival of the buffers. It turns out that with a frame size of 256 samples, 8 frames are delivered as fast as MaiscBufferCopied can take them. Then, based on the timestamps, there is a delay of about 256 msec. before the next group of 8 frames is delivered (I am using a sample rate of 8000 samples per second). When the first of those 8 frames is returned, it is already almost 256 msec. old. In my application, I need to display a real-time analysis of the sound in less time than that, so this latency is intolerable. As long as the system insists on accumulating 2048 samples before giving any of them to me, the latency problem will always remain. About the only thing I can do is to try a faster sample rate so that 2048 samples will not take quite so long. If anyone is also working on real-time audio analysis with streaming input and you would like to see the code that I used in these studies, just send me a message through the forum. Robert Scott Ypsilanti, Michigan |
|






Forum posts: 71
Please refer to some of the previous posts by Andrew.hayes. Those are really good one. Those posts should help you a lot in understanding the frame size and their behaviour.
Forum posts: 21
I have seen those postings, and they refer to FP2 and FP3 phones and to other codecs besides PCM. But I can tell you that the native frame size for streaming PCM input in a Nokia 6620 (Symbian 7.0s) is 2048 samples at 8000 samples per second.
Robert Scott
Ypsilanti, Michigan
Forum posts: 11
And someone have test N70 phone and Nokia Audio Stream Example?
Forum posts: 4
Regards,
Sheshu Kumar Inguva.
Forum posts: 11
Forum posts: 21
inguvasheshu wrote:
This will not work. If I deal only with big buffers, then I will have terrible latency. My application requires immediate analysis of the sound for feedback to the user. It is a piano tuning application. If the latency is more than 100 msec., then the user will complain that the device is reacting too slowly.
Robert Scott
Real-Time Specialties
Forum posts: 95
Hi,
The "ReadL()" method copies the data from H/W buffer (buffer used by DevSound and lower layers) to the user specified descriptor. The size of the buffer used by DevSound is platform and format dependent. I suppose the size of this buffer can not be chosen by the application.
It would be better to use frames of size same as size of the DevSound buffer.
If you are working on S60 V3, you can use "CErrorConcealment" custominterface to get data in frames rather than in buffers of multiple frames. However, I am not sure whether frame mode can be enabled for PCM record.
Chao,
Raghav