Speech Recognition with C# and XML Grammars

On November 3, 2011, in C#, Programming, by LuCuS

Listening EarBy request from a couple of my readers, I am back to show more speech recognition using C#. In my first speech recognition article “Simple Speech Recognition Using C#“, I introduced you to the Speech Recognition Engine provided by the .NET framework. In that article, I showed you how to setup a new SRE and to accept any voice input and display it in a rich text box. The very next day, I took that application one step further in my article “Simple Speech Recognition Using C# – Part 2” by introducing you to grammars. Grammars are basically a list of input options you want your application to listen for. Adding grammars will cause your application to listen for only those options and nothing else. The grammar used in that article was built using the Choices object and by providing that object with a list of hardcoded options. Today, I want to show you how to replace the Choices object with a grammar XML file. Let’s begin.

The first thing you are going to need for this is of course your grammar file.  The grammar file is simply an XML file that contains a list of rules where each rule has a list of items that the SRE will listen for. Each rule must contain an id which will be used to tell the SRE which list of options to listen for. In the example below, you will see that I have defined 2 rules. You will also notice that I have added a scope attribute with the value “public” to each rule. If you do not specify the “root” attribute in the first “<grammar>” tag of your XML file, you will need to add a public scope to your rules, otherwise your SRE will not be able to read the rules.

<grammar xmlns="http://www.w3.org/2001/06/grammar"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.w3.org/TR/speech-grammar/grammar.xsd"
  xml:lang="en-US" version="1.0" root="command1">

  <rule id="command1" scope="public">
    <one-of>
      <item>start</item>
      <item>stop</item>
      <item>continue</item>
    </one-of>
  </rule>

  <rule id="command2" scope="public">
    <one-of>
      <item>test</item>
      <item>help</item>
      <item>hello</item>
    </one-of>
  </rule>

</grammar>

For this example, we are going to be using the first rule from our grammar XML file. In order for us to do that, we’ll need to create a new SpeechRecognitionEngine, tell it which audio device to use, and give it a callback handler where the recognized text will be processed and handled. To keep it simple, we’ll just copy the construction of our SRE from our other articles.

            SpeechRecognitionEngine recognitionEngine = new SpeechRecognitionEngine();
            recognitionEngine.SetInputToDefaultAudioDevice();
            recognitionEngine.SpeechRecognized += (s, args) =>
            {
                foreach (RecognizedWordUnit word in args.Result.Words)
                {
                    // You can change the minimun confidence level here
                    if (word.Confidence > 0.8f)
                        freeTextBox.Text += word.Text + " ";
                }
                freeTextBox.Text += Environment.NewLine;
            };

Now that we have our SRE constructed, it’s time to build our Grammar object. In the last article, you will that we built a new GrammarBuilder object and constructed our Grammar object using this grammar builder. This time, we’re going to replace that grammar builder with a string indicating the name of the grammar XML file we’ll be using. After that, we’ll go ahead and load our grammar object into our SRE and tell it to begin listening by calling the RecognizeAsync method like we did in the other articles.

            Grammar g = new Grammar("grammar.xml", "command1");
            recognitionEngine.LoadGrammar(g);
            recognitionEngine.RecognizeAsync(RecognizeMode.Multiple);

In the snippet above, you’ll see that I passed 2 arguments to my Grammar object’s constructor. The first parameter is the name of the XML file that contains my grammar rules. The second parameter is the id of the rule I want to use from that XML file. If you want to omit the second parameter, you will need to include the “root” parameter on the “<grammar>” element of your XML file. If you do that, the first element of your XML file would look like:

<grammar …. root=”command1″>

That’s it! You are now ready to start using your grammar XML file in your SRE application. Here is a screenshot of the code below in action.

Speech Recognition Example

And, here is the complete code I used to make this happen.

using System;
using System.Text;
using System.Windows.Forms;
using System.Speech.Recognition;

namespace SpeechRecognition
{
    public partial class MainForm : Form
    {
        public MainForm()
        {
            InitializeComponent();

            SpeechRecognitionEngine recognitionEngine = new SpeechRecognitionEngine();
            recognitionEngine.SetInputToDefaultAudioDevice();
            recognitionEngine.SpeechRecognized += (s, args) =>
            {
                foreach (RecognizedWordUnit word in args.Result.Words)
                {
                    // You can change the minimun confidence level here
                    if (word.Confidence > 0.8f)
                        freeTextBox.Text += word.Text + " ";
                }
                freeTextBox.Text += Environment.NewLine;
            };

            Grammar g = new Grammar("grammar.xml", "command1");
            recognitionEngine.LoadGrammar(g);
            recognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
        }
    }
}

Related Posts

Tagged with:  

5 Responses to Speech Recognition with C# and XML Grammars

  1. JimmyKomy says:

    Thanks for this wonderful tutorial. I have one more question, how could we use this namespace to make Arabic recognition??

    • LuCuS says:

      Hmmm? That’s a tough one. I’ve never tried doing speech recognition with languages other than English. The only way I can think of to do something like that would be to create multiple instances of your SRE and swap between them depending on the language you are listening for. In your XML file, you would change the IDs of your rules to something like “english” and “arabic”. Then, you would list your words / items for each language. In your code, you will need to add a reference System.Globalization like this:

      using System.Globalization;

      Next, you will need to create a new CultureInfo object (System.Globalization.CultureInfo) for each culture like this:

      CultureInfo englishCulture = new CultureInfo(“en-US”);
      CultureInfo arabicCulture = new CultureInfo(“ar-EG”);

      Then, you would create a new SRE for each culture info object and an SRE that will be the main SRE you’ll be using like so:

      SpeechRecognitionEngine mainSRE = new SpeechRecognitionEngine();
      SpeechRecognitionEngine englishSRE = new SpeechRecognitionEngine(englishCulture);
      SpeechRecognitionEngine arabicSRE = new SpeechRecognitionEngine(arabicCulture);

      After that, you could add a combobox that lists your different languages. When a user selects a different language, swap out your SRE with the one that corresponds to the culture / language they selected.

      if(cmbLanguage.Text.Equals(“Arabic”))
      mainSRE = arabicSRE;
      else
      mainSRE = englishSRE;

      However, there are 2 things to keep in mind here. 1) This is all hypothetical. Alhtough I know the code above compiles, I have no way of testing whether this theory actually works in practice since I only speak English. 2) The method used in this article only listens for and recognizes the words you have listed in your XML grammar file. If phrases other than the ones listed in your XML file are spoken, the app will simply ignore them. So, if you need something that can listen for specific commands like the ones listen in your XML file and you need it to also listen for freely spoken words, you’ll need more of a hybrid approach as this example alone will not work.

  2. JimmyKomy says:

    The code of the other language does not work. Could you make a project using another language like French or something ???

  3. shuvro says:

    Again a great article LuCuS!!..I also apply it and it works great.Then i have a question to ask you…why we use xml grammers for speech recognition? Is this increases the efficiency of recognizing words??

    • LuCuS says:

      XML grammars are mostly for building a system that looks for specific commands, texts, or phrases. A few of my other readers are working on an application that can accept several different spoken languages and convert the input into output of a different language. Another reader is using grammars to provide her application to be controlled using different languages as well. To do that, she’ll be including a rule for each language that her app will support. Then, the user will be able to pick their language from a combobox. Other readers use grammars to group together specific commands. For example, one list of commands might control the application itself whereas another rule (list of commands) might control the OS. One of the best things about using grammars is that commands can be added or removed during runtime with ease and persisted to the filesystem as opposed to having the commands hardcoded in your app where they will be reset every time you restart the app since it’s all stored in memory. For example, I have another reader that is working on a voice controlled robot. By using grammars, he can add new commands for his robot to recognize without the need for re-building his application. Instead, he just adds the new commands to the XML file. At some point, he said he plans on speaking new commands to the robot and it will add these new commands to the XML for him. When that happens, the robot will not know what to do when it hears those commands. With the help of a neural network or something similar, he can teach his robot how to learn on its own what to do when it encounters these commands again in the future.

Leave a Reply