Well as far as i can interpret is that you want to know weather the in coming input signal to the microphone is speech or not (Means Voice Signal or Unvoiced Signal), Well if this is what you exactly want then there is a way to conclude weather the incoming audio signal is voiced or unvoiced,
Thus please refer the G.729 Annex B (VAD / DTX and CNG). Specifically VAD (Voice Activity Detection) Helps in idendifing the coming speech signal (frame) is Voiced or Unvoiced, Hence it helps in Identifing to know, weather the coming signal is containg the Speech data or Silence Data.
If you are going for the above then let me know if you face any problem regarding G.729 Annex B algorthim
I want to develop similar application, record only when someone's speaking / there's quite loud noise. What do you mean by G.729 Annex B (VAD / DTX and CNG) ? how can I get it? is it a plug in or something?
Anyone succeed with this way? please guide me thanx
Yes this is the ITU standard Which is going to detect the Speech, Thats Why it is called the Voice Activity Detection. By Pluging this You only encode the Speech and not the silence. For encoding of silence you got For thr DTX, And for decoding of the Silence you go for the CNG.
If you're recording voice from the microphone, you'd probably be interested in using AMR to compress the speech.
If you do use AMR, that already has speech detection algorithms you require inbuilt. If, when setting up the AMR encoder, you specify DTX (discontinuous transmission) to be turned on, then the AMR codec will do the kind of detection algorithms of which you speak.
It's all based around observed background silence and changes in energy levels. I'm no expert, but the upshot of it is that it will encode voice audio when someone is speaking, and produce no encoded data whilst it is detecting background silence. At the decoding side, the AMR codec will decode and play voice when voice data has been recorded, and in periods where silence was recorded and no data is available it will generate "comfort noise" to produce an uninterrupted audio streams that don't contain horrible artifacts where transmission starts/ends.
Forum posts: 31
Well as far as i can interpret is that you want to know weather the in coming input signal to the microphone
is speech or not (Means Voice Signal or Unvoiced Signal), Well if this is what you exactly want then there
is a way to conclude weather the incoming audio signal is voiced or unvoiced,
Thus please refer the G.729 Annex B (VAD / DTX and CNG). Specifically VAD (Voice Activity Detection)
Helps in idendifing the coming speech signal (frame) is Voiced or Unvoiced, Hence it helps in Identifing
to know, weather the coming signal is containg the Speech data or Silence Data.
If you are going for the above then let me know if you face any problem regarding G.729 Annex B algorthim
Cheers
Ranjeet
Forum posts: 44
I want to develop similar application, record only when someone's speaking / there's quite loud noise. What do you mean by G.729 Annex B (VAD / DTX and CNG) ? how can I get it? is it a plug in or something?
Anyone succeed with this way? please guide me
Regards,
Irma
Forum posts: 31
Activity Detection. By Pluging this You only encode the Speech and not the silence. For encoding of silence
you got For thr DTX, And for decoding of the Silence you go for the CNG.
Refer ITU site.
Ranjeet
Forum posts: 76
If you're recording voice from the microphone, you'd probably be interested in using AMR to compress the speech.
If you do use AMR, that already has speech detection algorithms you require inbuilt. If, when setting up the AMR encoder, you specify DTX (discontinuous transmission) to be turned on, then the AMR codec will do the kind of detection algorithms of which you speak.
It's all based around observed background silence and changes in energy levels. I'm no expert, but the upshot of it is that it will encode voice audio when someone is speaking, and produce no encoded data whilst it is detecting background silence. At the decoding side, the AMR codec will decode and play voice when voice data has been recorded, and in periods where silence was recorded and no data is available it will generate "comfort noise" to produce an uninterrupted audio streams that don't contain horrible artifacts where transmission starts/ends.
Hope this helps.
Regards.
Andy.