Speech Recognition with C# and XML Grammars

On November 3, 2011, in C#, Programming, by LuCuS

Listening EarBy request from a couple of my readers, I am back to show more speech recognition using C#. In my first speech recognition article “Simple Speech Recognition Using C#“, I introduced you to the Speech Recognition Engine provided by the .NET framework. In that article, I showed you how to setup a new SRE and to accept any voice input and display it in a rich text box. The very next day, I took that application one step further in my article “Simple Speech Recognition Using C# – Part 2” by introducing you to grammars. Grammars are basically a list of input options you want your application to listen for. Adding grammars will cause your application to listen for only those options and nothing else. The grammar used in that article was built using the Choices object and by providing that object with a list of hardcoded options. Today, I want to show you how to replace the Choices object with a grammar XML file. Let’s begin.

The first thing you are going to need for this is of course your grammar file. ┬áThe grammar file is simply an XML file that contains a list of rules where each rule has a list of items that the SRE will listen for. Each rule must contain an id which will be used to tell the SRE which list of options to listen for. In the example below, you will see that I have defined 2 rules. You will also notice that I have added a scope attribute with the value “public” to each rule. If you do not specify the “root” attribute in the first “<grammar>” tag of your XML file, you will need to add a public scope to your rules, otherwise your SRE will not be able to read the rules.

<grammar xmlns="http://www.w3.org/2001/06/grammar"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.w3.org/TR/speech-grammar/grammar.xsd"
  xml:lang="en-US" version="1.0" root="command1">

  <rule id="command1" scope="public">
    <one-of>
      <item>start</item>
      <item>stop</item>
      <item>continue</item>
    </one-of>
  </rule>

  <rule id="command2" scope="public">
    <one-of>
      <item>test</item>
      <item>help</item>
      <item>hello</item>
    </one-of>
  </rule>

</grammar>

For this example, we are going to be using the first rule from our grammar XML file. In order for us to do that, we’ll need to create a new SpeechRecognitionEngine, tell it which audio device to use, and give it a callback handler where the recognized text will be processed and handled. To keep it simple, we’ll just copy the construction of our SRE from our other articles.

            SpeechRecognitionEngine recognitionEngine = new SpeechRecognitionEngine();
            recognitionEngine.SetInputToDefaultAudioDevice();
            recognitionEngine.SpeechRecognized += (s, args) =>
            {
                foreach (RecognizedWordUnit word in args.Result.Words)
                {
                    // You can change the minimun confidence level here
                    if (word.Confidence > 0.8f)
                        freeTextBox.Text += word.Text + " ";
                }
                freeTextBox.Text += Environment.NewLine;
            };

Now that we have our SRE constructed, it’s time to build our Grammar object. In the last article, you will that we built a new GrammarBuilder object and constructed our Grammar object using this grammar builder. This time, we’re going to replace that grammar builder with a string indicating the name of the grammar XML file we’ll be using. After that, we’ll go ahead and load our grammar object into our SRE and tell it to begin listening by calling the RecognizeAsync method like we did in the other articles.

            Grammar g = new Grammar("grammar.xml", "command1");
            recognitionEngine.LoadGrammar(g);
            recognitionEngine.RecognizeAsync(RecognizeMode.Multiple);

In the snippet above, you’ll see that I passed 2 arguments to my Grammar object’s constructor. The first parameter is the name of the XML file that contains my grammar rules. The second parameter is the id of the rule I want to use from that XML file. If you want to omit the second parameter, you will need to include the “root” parameter on the “<grammar>” element of your XML file. If you do that, the first element of your XML file would look like:

<grammar …. root=”command1″>

That’s it! You are now ready to start using your grammar XML file in your SRE application. Here is a screenshot of the code below in action.

Speech Recognition Example

And, here is the complete code I used to make this happen.

using System;
using System.Text;
using System.Windows.Forms;
using System.Speech.Recognition;

namespace SpeechRecognition
{
    public partial class MainForm : Form
    {
        public MainForm()
        {
            InitializeComponent();

            SpeechRecognitionEngine recognitionEngine = new SpeechRecognitionEngine();
            recognitionEngine.SetInputToDefaultAudioDevice();
            recognitionEngine.SpeechRecognized += (s, args) =>
            {
                foreach (RecognizedWordUnit word in args.Result.Words)
                {
                    // You can change the minimun confidence level here
                    if (word.Confidence > 0.8f)
                        freeTextBox.Text += word.Text + " ";
                }
                freeTextBox.Text += Environment.NewLine;
            };

            Grammar g = new Grammar("grammar.xml", "command1");
            recognitionEngine.LoadGrammar(g);
            recognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
        }
    }
}

Thank you for your interest in my site. If you find the information provided on this site useful, please consider making a donation to help continue development!

PayPal will open in a new tab.
$2.00
$5.00
Other

Related Posts

Tagged with:  

11 Responses to Speech Recognition with C# and XML Grammars

  1. JimmyKomy says:

    Thanks for this wonderful tutorial. I have one more question, how could we use this namespace to make Arabic recognition??

    • LuCuS says:

      Hmmm? That’s a tough one. I’ve never tried doing speech recognition with languages other than English. The only way I can think of to do something like that would be to create multiple instances of your SRE and swap between them depending on the language you are listening for. In your XML file, you would change the IDs of your rules to something like “english” and “arabic”. Then, you would list your words / items for each language. In your code, you will need to add a reference System.Globalization like this:

      using System.Globalization;

      Next, you will need to create a new CultureInfo object (System.Globalization.CultureInfo) for each culture like this:

      CultureInfo englishCulture = new CultureInfo(“en-US”);
      CultureInfo arabicCulture = new CultureInfo(“ar-EG”);

      Then, you would create a new SRE for each culture info object and an SRE that will be the main SRE you’ll be using like so:

      SpeechRecognitionEngine mainSRE = new SpeechRecognitionEngine();
      SpeechRecognitionEngine englishSRE = new SpeechRecognitionEngine(englishCulture);
      SpeechRecognitionEngine arabicSRE = new SpeechRecognitionEngine(arabicCulture);

      After that, you could add a combobox that lists your different languages. When a user selects a different language, swap out your SRE with the one that corresponds to the culture / language they selected.

      if(cmbLanguage.Text.Equals(“Arabic”))
      mainSRE = arabicSRE;
      else
      mainSRE = englishSRE;

      However, there are 2 things to keep in mind here. 1) This is all hypothetical. Alhtough I know the code above compiles, I have no way of testing whether this theory actually works in practice since I only speak English. 2) The method used in this article only listens for and recognizes the words you have listed in your XML grammar file. If phrases other than the ones listed in your XML file are spoken, the app will simply ignore them. So, if you need something that can listen for specific commands like the ones listen in your XML file and you need it to also listen for freely spoken words, you’ll need more of a hybrid approach as this example alone will not work.

  2. JimmyKomy says:

    The code of the other language does not work. Could you make a project using another language like French or something ???

  3. shuvro says:

    Again a great article LuCuS!!..I also apply it and it works great.Then i have a question to ask you…why we use xml grammers for speech recognition? Is this increases the efficiency of recognizing words??

    • LuCuS says:

      XML grammars are mostly for building a system that looks for specific commands, texts, or phrases. A few of my other readers are working on an application that can accept several different spoken languages and convert the input into output of a different language. Another reader is using grammars to provide her application to be controlled using different languages as well. To do that, she’ll be including a rule for each language that her app will support. Then, the user will be able to pick their language from a combobox. Other readers use grammars to group together specific commands. For example, one list of commands might control the application itself whereas another rule (list of commands) might control the OS. One of the best things about using grammars is that commands can be added or removed during runtime with ease and persisted to the filesystem as opposed to having the commands hardcoded in your app where they will be reset every time you restart the app since it’s all stored in memory. For example, I have another reader that is working on a voice controlled robot. By using grammars, he can add new commands for his robot to recognize without the need for re-building his application. Instead, he just adds the new commands to the XML file. At some point, he said he plans on speaking new commands to the robot and it will add these new commands to the XML for him. When that happens, the robot will not know what to do when it hears those commands. With the help of a neural network or something similar, he can teach his robot how to learn on its own what to do when it encounters these commands again in the future.

  4. eddie_wyman says:

    Hi,
    I’m curently researching options for a speech recognition engine to support my businesses Medical transcription workflow. all roads seem to lead to Nuance and this is not a company I wish to get involved with or to relay on. Having seen you various bloggs on speech recognition I wonder if you could me in the direction of some alternatives?

    nige

  5. shuvro says:

    Hi LuCuS,
    how can a spoken word can be compared with an audio file?Consider a case,I say “This is an example” by microphone and save it as a .mp3 file.Then i will save it in a database.Then when my application will run,if i say “This is an example” it can detect that by comparing with that audio file from database.Though i don’t try to make that application,but i like to hear your advice as an expert that if my concept is right or is it possible to do that and how can i compare those 2 values?

    • LuCuS says:

      There are several ways you could approach this type of application. One way would be to create a neural network & train it using different voices saying the same phrase. Another way would be to use a markov model. Another solution would be to convert the audio into wave frequencies & measure the peaks, valleys, & distance between. This approach wouldn’t be very accurate, but could possibly yield some decent results. The markov model would be more efficient than a raw neural network & would probably be the method I would go with because it would return better results out of all the solutions. I’ve used markov chains in all kinds of applications such as data mining, computer vision, etc… They’re reasonably easy to understand & build. If you do decide to play with neural nets, you should definitely look into support vector machines. They are 100x faster than pure neural nets.

  6. LongTimeAgo says:

    I know you post it few years ago, but i didn’t find a solution of my problem… So I hope your still here.

    Do you know if it’s possible to use a specific microphone and not the default microphone?

    Thanks a lot

    • LuCuS says:

      You can use any microphone you want. I’ve used microphones built into my laptops, USB microphones, and even those 3.5mm jack microphones. Basically you just need a way to get audio into the app.

      • LongTimeAgo says:

        Thanks for your answer.

        I found how to use a specific microphone. I select the microphone I want to use and I define it as the default microphone. In that way, I can use the function SpeechRecognitionEngine.SetInputToDefaultAudioDevice .
        The other way is to use a stream with the function SetInputToAudioStream.
        Like in this link : https://channel9.msdn.com/coding4fun/articles/NET-Voice-Recorder
        (it’s explain at Starting Recording in the note)
        but I did not manage to run it.

        So, if you know how to use a SpeechRecognitionEngine without SetInputToDefaultAudioDevice and in real-time that will be really great.
        I’m very grateful to you.

Leave a Reply