Speech/voice recognition: Audio signal electrical properties

With the exponential growth of computing resources and capabilities, picture recognition and speech recognition is no more that daunting task. The lowest costing smart phone to cheap notebooks sold now in the market have enough capabilities at the edge side to process voice and video. Even if you have a simple and basic system with just microphone, encoder and a simple processor to connect (high-speed) audio and video data to a cloud computer, you can very well have speech and picture recognition with such systems. In this article let's look at properties of an audio speech signal in such a way that computer can understand and convert into text equivalent of that speech. Let's try to find out what the dictionary meaning of word ‘audio’. Audio is anything related to sound what we people can hear. So the speech, voice, sound is all audio. Generally accepted audio frequency band is in the range of 20 Hertz - 20,000 Hertz. This is the frequency band human ear can sense. A audio sound where we can't really make out anything and annoying is noise, otherwise audio-noise. Why we are calling it as audio noise! Since we also have noise in other frequency bands. A audio signal can be recorded, transmitted and re-played. We human beings can produce sounds using vocal cords with the support of lungs, tongue, mouth, nose, and throat. However for hearing it's only ear, an ex...

You've read this far — sign in to keep reading

Sign in to keep reading.