Subscribe to this blog

Widget_logo

Building Grammars in .NET

September 12th, 2008 Josh Posted in programming 3 Comments »

As I’ve shown before (here, here, and here), the Microsoft .NET Speech API allows you to quickly and easily build applications that take advantage of the good folks at Microsoft Research’s work on speech recognition. The general process is that you can construct a grammar that the engine will recognize for you, and then event handlers for those recognized events will be triggered. This process then, will only be as effective as the grammars that you can construct for the the SpeechRecognitionEngine. In this post I’ll show you some of the things that you can do to construct Grammars in C#.

The SpeechRecognitionEngine class can load one or many Grammar objects. The MSDN page describes these Grammar objects in the following way:

The System.Speech.Recognition..::.Grammar class provides run time objects that allow an application to specify a specific combination of words, choices of words, and other speech elements that the Speech platform uses to identify meaningful phrases.

The Grammar object fully supports the W3C Speech Recognition Grammar Specification (SRGS) and Context Free Grammar (CFG) specifications. For more information, see SpeechRecognitionGrammar Specification.

Grammars may be precise word phrases, such as “Turn the computer off,” or provide choices, semantic lookup tables or wildcards, such as “Change color to” and a look up table of acceptable values.

An application’s recognition engine, as managed by instances of SpeechRecognizer or [T:System.Speech.Recognition.SpeechRecognitionEngine,] may create and load one or more instances Grammar, independently enabling or disabling particular grammars instance, and set Grammar properties such as priorities (Priority and weight (Weight)

So you can see that we can construct Grammars that will recognize certain types of input and then load these into the SpeechRecognitionEngine. These can be prioritized to help with conflict resolution as well as toggled on and off if you know that only certain grammars are appropriate for a certain context.

So how do we construct these grammars? Grammars can be constructed in a variety of ways, but we are going to examine the GrammarBuilder Class as this is the easiest way to programmatically construct a Grammar.

The simplest example is simple string matching. If you have a discrete atomic command (e.g. shutdown) that takes no parameters, you can initialize a GrammarBuilder with a string and it will fire its recognize event when that string is heard.

GrammarBuilder stringGb = new GrammarBuilder("shutdown");

If this was added to a Grammar and loaded in the Recognizer it would fire a recognized event for this Grammar whenever the word “shutdown” was heard. This is easy enough, but somewhat limited, because what if you wanted to also allow them to say “turn off” or “quit”? At this point you would have to add 3 Grammars, one with each of those strings. Enter the Choices class.

The Choices class allows us to create a phrase where one part of the phrase may have many different acceptable values. In our above example, we want the verbs “shutdown”, “turn off” and “quit” to be interchangeable. To show how this can be used with a phrase, lets make the user ask nicely by preceding either of those with “please”.

GrammarBuilder choicesGB = new GrammarBuilder("please");
Choices verbChoices = new Choices("shutdown", "turn off", "quit");
choicesGB.Append(verbChoices);

Now our grammar will trigger an event for all of the following phrases:

  • please shutdown
  • please turn off
  • please quit

You can probably now imagine how you could combine strings and choices to come up with a variety of different recognizable phrases. However, there is a lot more that this engine can handle, but before we get into that lets review what happens when a phrase is recognized by the SpeechRecognitionEngine.

Remember that when a phrase is recognized, all that the engine will trigger is a SpeechRecognizedEvent to which you can assign a handler. This same event is fired for all phrases in all grammars that are loaded, so the problem becomes figuring out which phrase triggered the event. There are numerous strategies for dealing with this so I won’t go into detail here, but note that the SpeechRecognizedEventArgs object has a RecognitionResult that contains a pointer to the Grammar as well as the Text of the phrase in addition to any Semantic Values that you have assigned (more on that later). For an example see here.

Why take that little detour? Because once you’ve tried to actually do something with the result of a recognition event you’ll realize the utility of SemanticValues. Semantic Values allows you to assign values to certain responses so that you do not have to do annoying string parsing in the event handler.

Our first basic example will show how we can simply get the appropriate text in the event handler without having to do string parsing. Lets say that we know that the user is either going to say ‘yes’ or ‘no’ to a request, but we want to recognize synonyms of those phrases. We can do something like the following:

GrammarBuilder yesChoices = new Choices("yes", "yeah", "ok").ToGrammarBuilder();
yesChoices.Append(new SemanticResultValue(true));
GrammarBuilder noChoices = new Choices("no", "nope").ToGrammarBuilder();
noChoices.Append(new SemanticResultValue(false));
 
Choices allChoices = new Choices(yesChoices, noChoices);
SemanticResultKey choiceKey = new SemanticResultKey("YesNoBool", allChoices);
 
GrammarBuilder choiceGB = new GrammarBuilder(choiceKey);

What does this do for us? It creates a GrammarBuilder that has 2 choices. Each of these Choices has a SemanticValue that can be obtained in the recognition handler. Semantically there is a Yes choice (which is actually itself a choice of “yes”, “yeah” and “ok”) and a No choice (”no” and “nope”). The Recognition engine, since we have assigned semantic values to these, will set the “YesNoBool” variable to either true or false depending on which Semantic choice the user makes. This is great for us. No string parsing, and thus we can easily build up grammars with many synonyms and never worry about adding more complexity in the event handler.

So far we have looked at inputs that are just combinations of predefined words. This is great for many scenarios, but what if we want to allow more free form input? Lets say that our grammar is built to allow users to get a current stock price. You would probably want to match on any of the following phrases:

  • Show me the stock price of foo
  • Let me see the stock price of foo
  • What is the price of foo

Here foo may be any stock ticker and lets say we’re lazy so we don’t want to hard code all of them. The solution lies in the DictationGrammar class. This class is exactly what it sounds like–a class for dealing with understand user dictation. Now, extracting the ‘foo’ from any of those in the event handler could be error prone, especially as you add more and more recognized ways of asking for a stock price. This is where semantic values come into play again. So lets see how this might work:

GrammarBuilder stockGB = new GrammarBuilder();
GrammarBuilder dictation = new GrammarBuilder();
dictation.AppendDictation("spelling");
 
stockGB.Append(new Choices("Show me the stock price of", "Let me see the stock price of", "What is the stock price of"));
stockGB.Append(new SemanticResultKey("DictationInput", dictation));

This code creates a choice of antecendents and then appends a SemanticResultKey whose name is DictationInput. Note that since we assigned no semantic value, the value that is assigned to DictationInput will be the actual text of the dictation. Finally, note that when we created the Dictation we passed the parameter “spelling”. This was to let the Recognizer know that we wanted the user to spell out the ticker name and that it should not interpret the input as words. If we had not passed that parameter it would have tried to match the input to words. That type of dictation might be useful if you were trying to build a voice IM client for example.

That’s all for now. Good luck and let me know if you build anything cool :)

  • StumbleUpon
  • Digg
  • del.icio.us
  • Reddit
  • Mixx
  • Technorati
  • Pownce
  • Furl

Getting your C# app to run in the System Tray

July 6th, 2008 Josh Posted in programming No Comments »

While making some additions to the Speech Framework, I needed to figure out how to get my WPF application to run in the system tray. For the speech framework, the sys-tray is a natural place for the application to run, as we ideally don’t want to see any UI and the application should be always on. I opted out of a service, because I did want the rich UI of an application and for it to show up in task manager, etc. It was very straightforward, but I thought I’d write up the steps for future reference.

We will start by creating a NotifyIcon object in our Window’s Constructor. Note that it is in the System.Windows.Forms namespace, and you may need to add a reference to your project in order to access it (it is not there by default in a new WPF project). You may also need to add a System.Drawing reference to your project as well. Once your references are in order, lets write some code to create the icon for us in the Window Constructor:

//Set up the system tray icon
_ni = new NotifyIcon();
_ni.BalloonTipTitle = "Qurbit Home Automation Engine";
_ni.BalloonTipText = "The engine is still running, it has just been minimized.  Double click to restore to normal size";
_ni.Text = "Qurbit Home Automation Engine";
//Add a double click handler
_ni.MouseDoubleClick += new System.Windows.Forms.MouseEventHandler(_ni_MouseDoubleClick);
try
{
    _ni.Icon = new System.Drawing.Icon("Images\\flag.ico");
}
catch (Exception e)
{
    Debug.Print(e.StackTrace);
}

The above code will create the necessary NotifyIcon object and give it an icon, text, etc. Customize it as is appropriate.

Next we want to make sure that when a user minimizes the window we show the icon and do not show our application on the task bar. To do so, we need to trap the Window’s OnStateChangedEvent. For more info on how window events work start here. In our xaml we need to add this attribute to the window element:

StateChanged=”Window_StateChanged”

Then in C#, we can add a handler for this. First however, let’s write a helper function for toggling the visibility of our window. It will take a boolean that indicates whether the window should live in the system tray or not.

private void windowInSysTray(bool inTray)
{
    if (inTray)
    {
        //Hide the window
        this.ShowInTaskbar = false;
        this.WindowState = WindowState.Minimized;
 
        if (_ni != null)
        {
            //Show the icon
            _ni.Visible = true;
            //Show the balloon tip
            //_ni.ShowBalloonTip(2000);
        }
    }
    else
    {
        //Show the window
        this.ShowInTaskbar = true;
        //Restore it
        this.WindowState = WindowState.Normal;
        //Bring it to the front
        this.Activate();
        //Remove the sys tray icon
        if (_ni != null)
        {
            _ni.Visible = false;
        }
    }
}

Finally, the event handler looks like the following:

private void Window_StateChanged(object sender, EventArgs e)
{
    //If the state is minimized then change its display type
    if (this.WindowState == WindowState.Minimized)
    {
        windowInSysTray(true);
    }
    else
    {
        windowInSysTray(false);
    }
}

Next we need to implement the double click handler for the notify icon to restore the application:

private void _ni_MouseDoubleClick(object sender, System.Windows.Forms.MouseEventArgs e)
{
    windowInSysTray(false);
}

At this point you should be able to minimize and restore your application as you would expect. There is on final bit of cleanup that we are going to want to include, and that is to handle what happens when the window is closed. If we allow the user to close the window with the close button, then we need to explicitly remove the notify icon or it will stay down there until the area is redrawn (the user mouses over it). For my app however I am going to make the close button also minimize the app to the sys tray and then you really close via a sys tray menu. So lets grab onto the OnClosing Method by adding the following attribute to our window object in the xaml code:

Closing=”Window_Closing”

Then lets write an event handler:

private void Window_Closing(object sender, System.ComponentModel.CancelEventArgs e)
{
    //Trap the event
    e.Cancel = true;
    //Put it in the system tray
    windowInSysTray(true);
}

Now, when we close the application, it minimizes itself to the system tray as well. Unfortunately, if you run it at this point there is no way to close the application smoothly, so lets add a context menu to the system tray icon that has a close option.

In order to get started, lets add a click handler to the NotifyIcon in the window constructor:

_ni.MouseClick += new System.Windows.Forms.MouseEventHandler(_ni_MouseClick);

Then we need to implement the click handler. It will be very simple, and just check for a right mouse click. If it detects one, then it will show the context menu:

void _ni_MouseClick(object sender, System.Windows.Forms.MouseEventArgs e)
{
    if (e.Button == MouseButtons.Right)
    {
        //Show the context menu
        System.Windows.Controls.ContextMenu menu = (System.Windows.Controls.ContextMenu)this.FindResource("SysTrayContextMenu");
        menu.IsOpen = true;
    }
}

In order for that to work we now need to add a context menu. There a couple of ways we could do this, but lets build ours in XAML so that down the road we can have some fun with it. To do this, lets add a context menu to our XAML file, as the first element within the window:

<Window.Resources>
    <ContextMenu x:Key="SysTrayContextMenu">
        <MenuItem Header="Open" Click="MenuOpen_Click"></MenuItem>
        <MenuItem Header="Exit" Click="MenuExit_Click"></MenuItem>
    </ContextMenu>
</Window.Resources>

Now we have a context menu, with two options. All that remains is to implement their click handlers. The open option is will just open up the window by calling our windowInSysTray method, and the exit one will remove our NotifyIcon (cleanup so that it doesn’t linger around) and then shutdown the application.

private void MenuOpen_Click(object sender, RoutedEventArgs e)
{
    windowInSysTray(false);
}
 
private void MenuExit_Click(object sender, RoutedEventArgs e)
{
    //Remove the system tray icon
    _ni.Dispose();
    //Shut it down
    System.Windows.Application.Current.Shutdown();
}

There you have it. Pretty simple stuff, and if you want to add more options or add fancier effects just edit the xaml file.

  • StumbleUpon
  • Digg
  • del.icio.us
  • Reddit
  • Mixx
  • Technorati
  • Pownce
  • Furl

Javascript, Flash and cross-browser clipboard fun

June 20th, 2008 Josh Posted in programming 1 Comment »

Recently I have been wrestling with writing some code that will manipulate text on a user’s clipboard in a web browser. I thought I would share my findings here for others who might be interested in the same topic.

First lets start by looking at the Windows Clipboard in general. The clipboard, as we all know, stores blobs that the user has copied or cut from windows applications for future use. It stores this data in a variety of formats (if writing a windows application you can use the formats specified by the DataFormats class.) It is a convenient way to transport data between applications in windows. A simple example of writing to the clipboard in c# would then look like:

// For this example, the data to be placed on the clipboard is a simple
// string.
string textData = "I want to put this string on the clipboard.";
 
// After this call, the data (string) is placed on the clipboard and tagged
// with a data format of "Text".
Clipboard.SetData(DataFormats.Text, (Object)textData);

Doing this on in a windows app is great, and easy, because that is really what the api’s were designed for… but what if we want to do this in a web app? Why would we do this you ask, well one argument is that as web apps and client apps converge, users are starting to expect more and more of the familiar windows UI concepts to be available to them on the web. Yeah, we’ll say that is the reason.

In a web app, there are primarily 3 ways that we are going to run code that will interact with our user: javascript, flash, and silverlight. We’ll take a look at each of these and see what is possible.

Javascript

So what can we do with javascript and the clipboard? It depends on the browser and on the interaction that you want. Lets start with the simplest, and potentially cross browser action: execCommand. The execCommand method can perform a variety of operations, but the ones we care about are ‘copy’, ‘cut’, and ‘paste’. Because execCommand was built with enabling javascript based wysiwyg editors in mind, it operates on the selected areas of a web page. That is, text that has been highlighted or where there is an active cursor (e.g. in a text box). So how can we use execCommand? If you want to simply copy the selection to the clipboard, then it is trivial and you can just:

document.execCommand("Copy");

This copies the user’s selection to the clipboard.

What is good about the execCommand copy approach? It is potentially cross browser (not with default security settings in FF) and simple to implement. So long as they have javascript enabled you can use it. What is bad? In modern browsers the user will be prompted the first time that you attempt to do this and warned that you are trying to access their clipboard, which is a bad UX but necessary for security reasons.

Furthermore, you may only copy text that the user has selected in the browser.

For more examples of execCommand clipboard stuff, see this Geekpedia tutorial or this excellent JS WYSIWYG Editor tutorial.

The IE only alternative

An option, if for some strange reason you would like to build an application that only works in IE, is to take advantage of the clipboardData object. Using this, you can write and read directly from the clipboard and the user prompt is ignored when using the default security settings. Using this you can only take advantage of two ‘types’ on the clipboard: ‘text’ and ‘url’

To write text to the clipboard, you could then do the following:

var textToCopy = "This is some text for the clipboard";
window.clipboardData.setData("Text", textToCopy);

One nice thing about this, is that you can write arbitrary text to the clipboard (no user selection needed). But remember, this is Internet Explorer only. Mozilla does have a solution, but it is FF only. So if you want to use the clipboard in all browsers and not change security settings what can we do?

Flash

The best answer so far is flash. The idea (originally credited to Mark O’Sullivan) is to use the Flash setClipboard method to write to the clipboard. Why is this better than the javascript solutions? Because, the good people at Adobe did all the cross browser work for us. Now the only requirement is that the browser has a flash plugin, which given the penetration of flash is a fair bet. What we can then do is, when we want to copy something to the clipboard, we pass (via javascript) the the text to our flash object and let it do the clipboard dirty work.

Jeffothy Keyings has a good example of how to do this, and links to the open source flash object (_clipboard.swf) by O’Sullivan.

Silverlight

I looked and looked, and to the best of my knowledge Silverlight does not yet expose a method for accessing the clipboard. Alas.

Conclusions

So where does that leave us?

  1. If you want cross browser clipboard access you will need to use flash.
  2. If you do not want a dependency on flash, then you only have a good solution for IE

One last thing…

One more thing that is worth noting is that you can only write to and read from the clipboard from within the browser using the TEXT format. Why is that important? Because if you want to put images, or rich text on the clipboard you can’t. If you go into your browser and highlight some text with styles and then paste it into Microsoft Word, you see that the styles are preserved. That is because this is written to the Clipboard and read from it as RTF. Not only that, but if you paste it into Microsoft OneNote, you’ll find that you can see the original source of the clip. That is because as rich client apps, IE (or FF), Word and OneNote are not sandboxed in the same way that flash and javascript are. To see this more clearly try downloading Clipboard Inspector and looking at the format of items that you copy to the clipboard.

When clipping from a web browser you get the HTML option and there is header information showing source, etc:

When you write via javascript or flash you are forced to write out text, so when applications access the clipboard, even if you have valid HTML on there they will html encode the characters thinking they they are text, and thus not render them correctly. There are numerous reasons why this makes sense from a security perspective, but it is a pain and something that web developers need to be aware of nonetheless.

  • StumbleUpon
  • Digg
  • del.icio.us
  • Reddit
  • Mixx
  • Technorati
  • Pownce
  • Furl

Speech Plugins: Weather

June 15th, 2008 Josh Posted in programming No Comments »

In this entry we’ll look at building a basic plugin for a speech engine (Microsoft Speech III: Setting up a pluggable infrastructure), that will respond to simple queries about the weather. For this tutorial we will be using the Yahoo Weather API. Note that if you are just interested in how to use their API, it should be easy enough to extract what you’re interested without doing the previous tutorials.

We begin by creating a new project in our Speech Engine solution, and setting it up as described in the previous tutorial. One this is setup, we can begin by creating a WeatherPlugin class that implements the ISpeechPlugin interface:

using Speech_Project.Core;
using System.Speech.Recognition;
using System.Xml.XPath;
using System.Xml;
 
namespace Speech_Project.Plugins
{
    [SpeechPluginAttribute("Weather",
        "This retrieves weather data from the Yahoo weather services API",
        "1.0")]
    public class WeatherPlugin : ISpeechPlugin
    {
        private string _grammarName = "Weather Plugin";
 
        #region ISpeechPlugin Members
        public Grammar getGrammar()
        {
           ...
        }
 
        public void handleSpeechInput(string input)
        {
            ...
        }
 
        public string getGrammarName()
        {
            return _grammarName;
        }
 
        #endregion
    }
}

Now we just need to build our grammar and implement the handlers. In order to make it easier to standardize the language that we’ll be using, lets create a helper class called WeatherSpeechChoices that will contain the strings that we recognize in our grammar. We will give it two commands to start; one will get today’s forecast and the other, tomorrow’s.

public static class WeatherSpeechChoices
    {
        //Inputs
        public const string forecastTomorrow = "What is tomorrow's forecast";
        public const string forecastToday = "What is today's forecast";
    }

Now that we have this defined we can implement the grammar building method, as before we just need to add these choices to the grammar and the engine will compile all of the various plugin grammars.

public Grammar getGrammar()
        {
            //Create a set of choices
            Choices thisChoices = new Choices(
                WeatherSpeechChoices.forecastTomorrow,
                WeatherSpeechChoices.forecastToday
                );
 
            //Create a grammar based on these choices
            Grammar thisGrammar = new Grammar(thisChoices.ToGrammarBuilder());
            // Set the Grammar name
            thisGrammar.Name = _grammarName;
 
            return thisGrammar;
        }

Halfway there, now we just need to implement the event handler. This is the fun part, where we call the Weather service. Lets implement this by having the event handler call a getForecastString method that takes the number of days out as an integer parameter. Once we know we will build that we can write our event handler:

        public void handleSpeechInput(string input)
        {
            switch (input)
            {
                case WeatherSpeechChoices.forecastToday:
                    string forecast = this.getForecastString(0);
                    Talker.Say(forecast);
                    break;
 
                case WeatherSpeechChoices.forecastTomorrow:
                    string forecast1 = this.getForecastString(1);
                    Talker.Say(forecast1);
                    break;
            }
        }

Now to implement the getForecastString method. Take a look at the API and you will see that we need to be issuing requests to http://weather.yahooapis.com/forecastrss. This URL takes two types of parameters, US zipcodes OR Yahoo location IDs. In order to get the Yahoo location ID, their page states that:

The location parameter can be a US Zip code or a location ID. To find your location ID, browse or search for your city from the Yahoo! Weather home page. The weather ID is in the URL for the forecast page for that city. You can also get the location ID by entering your zip code on the home page. For example, if you search for Los Angeles on the Yahoo! Weather home page, the forecast page for that city is http://weather.yahoo.com/forecast/USCA0638.html. The location ID is USCA0638.

However in this tutorial we will be using a zipcode (I live in the US). You may additionally change the units (celsius or farenheit) by appending a “u=c” or “u=f” to the URL. For this example lets, add two private members to our WeatherPlugin class that will help us create the URL:

private string _zipCode = "55407";
private const string _serviceURL = "http://xml.weather.yahoo.com/forecastrss?p=";

Now that we know what the URL is, we need to know how to parse the response. This service will respond with an XML document that is RSS 2.0 compliant. We will make use the XPathNavigator .NET class to parse the response. So lets see what our method looks like:

        public string getForecastString(int daysFromNow)
        {
             // Create a new XmlDocument  
            XPathDocument doc = new XPathDocument(_serviceURL + _zipCode);
 
            // Create navigator  
            XPathNavigator navigator = doc.CreateNavigator();
 
            // Set up namespace manager for XPath  
            XmlNamespaceManager ns = new XmlNamespaceManager(navigator.NameTable);
            ns.AddNamespace("yweather", "http://xml.weather.yahoo.com/ns/rss/1.0");
 
            //Get the forecast with XPath
            XPathNodeIterator nodes = navigator.Select("//rss/channel/item/yweather:forecast", ns);
 
            string day = "";
            string text = "";
            string low = "";
            string high = "";
 
            nodes.MoveNext();
            XPathNavigator node = nodes.Current;
 
            if (daysFromNow == 1)
            {
                nodes.MoveNext();
                node = nodes.Current;
            }
 
            day = node.GetAttribute("day", ns.DefaultNamespace);
            text = node.GetAttribute("text", ns.DefaultNamespace);
            low = node.GetAttribute("low", ns.DefaultNamespace);
            high = node.GetAttribute("high", ns.DefaultNamespace);
 
            string forecast = "It will be " + text + " with a high of " + high + ", and a low of " + low;
            return forecast;
        }

Now lets break this method down a little further. In order to parse the document correctly we need to load the yweather namespace (http://xml.weather.yahoo.com/ns/rss/1.0). We select our first node for parsing by jumping to the first yweather:forecast item. This particular service only returns a 1 day forecast, so we know that if the daysFromNow variable is either 1 or 0 (you should implement something to enforce this). Once we have retrieved the correct node we get the attributes as described by the API, construct an English sounding sentence out of them, and return that to our Engine class. Note that if we are worried about lag or connection overhead we could cache the navigator object and use that for subsequent calls. In this example though, we hit the webpage every time you ask for the weather.

This service returns much more than the temperature in its response including windchill, speed and direction, atmospheric conditions, sunrise, sunset and others. It also returns the current conditions in the yweather:condition node. For an example rss feed, here is a link to the output for Minneapolis. As you can see its pretty easy to extend the plugin and get it to tell you this other information if you want. Until next time, have at it and let me know what you come up with.

  • StumbleUpon
  • Digg
  • del.icio.us
  • Reddit
  • Mixx
  • Technorati
  • Pownce
  • Furl

Microsoft Speech III: Setting up a pluggable infrastructure

June 13th, 2008 Josh Posted in programming No Comments »

This post will build on the work done in Microsoft Speech II: Getting your computer to listen, where we saw how to interact with our computer via voice. In this tutorial we will build a plugin infrastructure which we will use to add functionality to our Speech Project in the future. While we won’t be adding any new functionality in this tutorial, we will be refactoring what we had into a “Personality” plugin. Please note that these ideas aren’t uniquely mine and I borrow heavily from Matthew Cochran’s excellent tutorial on the subject of plugins–I just adapted them for the speech project’s purposes.

Lets start by opening the project as we left it in Microsoft Speech II: Getting your computer to listen. At this point we should refactor our namespace to be something more useful than “Tutorial1″, so lets change that to “Speech_Project” for now (make sure to update the XAML as well).

Next we need to add some folders for building out our framework. Add two folders named “Core” and “Plugins” to your project:

Now lets move the Say method out of the Window1 class and into its own class called Talker. We can use this later when we add more options to the output (SMS, IM, Email, etc.). So create a Talker class in the Speech_Project.Core namespace and move the say method into it:

using System.Speech.Synthesis;
 
namespace Speech_Project.Core
{
    public class Talker
    {
        public static void Say(string input)
        {
            var synth = new SpeechSynthesizer();
            var sayThis = new Prompt(input);
            synth.Speak(sayThis);
        }
    }
}

Next we need to define the interface to which all of our plugins will conform. Plugins for now will need to tell the engine which grammars they support and then have an event handler for when their grammar is triggered (see Tutorial 2). Finally, they’ll need to tell the engine which grammars are theirs by providing a method for getting the grammar name. Lets create an ISpeechPlugin interface that looks like the following:

using System.Speech.Recognition;
 
namespace Speech_Project.Core
{
    public interface ISpeechPlugin
    {
        Grammar getGrammar();
        void handleSpeechInput(string input);
        string getGrammarName();
    }
}

In addition to the plugin interface we should add a plugin attribute class. This will allow us to get some meta data about the plugins when we load them, look for version collisions, and other useful things down the road. So lets add a SpeechPluginAttribute:

using System;
 
namespace Speech_Project.Core
{
    [AttributeUsage(AttributeTargets.Class)]
    class SpeechPluginAttribute : Attribute
    {
        public SpeechPluginAttribute(string name, string description, string version)
        {
            _description = description;
            _name = name;
            _version = version;
        }
 
        private string _description;
        private string _name;
        private string _version;
 
        public string Description
        {
            get { return _description; }
            set { _description = value; }
        }
 
        public string Name
        {
            get { return _name; }
            set { _name = value; }
        }
 
        public string Version
        {
            get { return _version; }
            set { _version = value; }
        }
    }
}

Now that we have our Interface defined, lets create our first plugin. Start by creating a new Project in your solution for your Personality plugin, of the type class library. Change its Build properties to output to the Plugins directory of the main Project (to get this working in VS debug mode make sure it is /bin/debug/Plugins/). Next make sure you reference the main project and set the Main class to be an instance of the ISpeechPlugin Module (and make sure you add our custom attribute).

Now we can take the code that was in the Window1 class and move it into this Plugin. So lets take the grammar construction code and the event handler and put it in the Personality class like so:

using Speech_Project.Core;
using System.Speech.Recognition;
 
namespace Speech_Project.Plugins
{
    [SpeechPluginAttribute("Personality Plugin", 
        "This plugin handles all of the personality like responses for the plugin", 
        "1.0")]
    public class PersonalityPlugin : ISpeechPlugin
    {
        private string _grammarName = "PersonalityPlugin";
        #region ISpeechPlugin Members
 
        public Grammar getGrammar()
        {
            //Create a set of choices
            Choices thisChoices = new Choices("Computer",
                "What is your name?",
                "What is your favorite blog?"
                );
 
            //Create a grammar based on these choices
            Grammar thisGrammar = new Grammar(thisChoices.ToGrammarBuilder());
            // Set the Grammar name
            thisGrammar.Name = _grammarName;
 
            return thisGrammar;
        }
 
        public void handleSpeechInput(string input)
        {
            switch (input)
            {
                case "Computer":
                    Talker.Say("Yes master?");
                    break;
                case "What is your name?":
                    Talker.Say("My name is Anna");
                    break;
                case "What is your favorite blog?":
                    Talker.Say("Qurbit of course");
                    break;
            }
        }
 
        public string getGrammarName()
        {
            return _grammarName;
        }
 
        #endregion
    }
}

Now that we’ve done a little refactoring (yes you should do more: abstract out the strings, etc.), lets get to the fun part–dynamic binding. Lets start by creating an Engine class that will do the heavy lifting. The engine class will look in the plugins folder for any .dll files and if they implement the ISpeechPlugin will instantiate these into a List via Reflection. After loading all of the plugins, we can get our engine to load all of their grammars. Then when the engine recognizes any of the Choices in the loaded grammars we can trigger the event handler for the correct ISpeechPlugin.

using System.Collections;
using System.IO;
using System;
using System.Collections.Generic;
using System.Reflection;
using System.Speech.Recognition;
 
namespace Speech_Project.Core
{
    class Engine
    {
        public static List<ISpeechPlugin> _plugins;
 
        public static void LoadPlugins()
        {
            //Get all the assemblies in the Plugins folder
            List<Assembly> assemblies = LoadAssemblies();
 
            //Load the plugins 
            _plugins = GetPlugIns(assemblies);
 
        }
 
        private static List<Assembly> LoadAssemblies()
        {
            //Look in the plugin dir for .dlls
            DirectoryInfo dInfo = new DirectoryInfo(Path.Combine(Environment.CurrentDirectory, "Plugins"));
            FileInfo[] files = dInfo.GetFiles("*.dll");
            List<Assembly> plugInAssemblyList = new List<Assembly>();
 
            if (null != files)
            {
                foreach (FileInfo file in files)
                {
                    plugInAssemblyList.Add(Assembly.LoadFile(file.FullName));
                }
            }
 
            return plugInAssemblyList;
        }
 
        private static List<ISpeechPlugin> GetPlugIns(List<Assembly> assemblies)
        {
            List<Type> availableTypes = new List<Type>();
 
            foreach (Assembly currentAssembly in assemblies)
                availableTypes.AddRange(currentAssembly.GetTypes());
 
            // get a list of objects that implement the ICalculator interface AND 
            // have the CalculationPlugInAttribute
            List<Type> pluginList = availableTypes.FindAll(delegate(Type t)
            {
                List<Type> interfaceTypes = new List<Type>(t.GetInterfaces());
                object[] arr = t.GetCustomAttributes(typeof(SpeechPluginAttribute), true);
                return !(arr == null || arr.Length == 0) && interfaceTypes.Contains(typeof(ISpeechPlugin));
            });
 
            // convert the list of Objects to an instantiated list of ICalculators
            return pluginList.ConvertAll<ISpeechPlugin>(delegate(Type t) { return Activator.CreateInstance(t) as ISpeechPlugin; });
        }
    }
}

At this point, we have an infrastructure that allows us to abstract out the functionality in a given plugin, and so long as there is a .dll in the plugins folder that implements the ISpeechModule interface we can get an instance of it. Now all we need to do is get the engine to route the events to the correct plugins when it recognizes input. So lets add a SpeechRecognized Event handler to the engine that will look through the plugins to see if any have a handler for the recognized grammar name and if so then call the handler in that plugin

        public static void SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
        {
            foreach (ISpeechPlugin plugin in _plugins)
            {
                //This is the correct handler
                if (e.Result.Grammar.Name == plugin.getGrammarName())
                {
                    plugin.handleSpeechInput(e.Result.Text);
                }
            }
        }

Finally, lets add an InitializeSRE method to the engine that will create the SpeechRecognitionEngine object and load the grammars from the plugins as well as attach the SpeechRecognized handler:

        public static SpeechRecognitionEngine InitializeSRE()
        {
            //Create the speech recognition engine
            SpeechRecognitionEngine sre = new SpeechRecognitionEngine();
 
            //Set the audio device to the OS default
            sre.SetInputToDefaultAudioDevice();
 
            //Reset the Grammars
            sre.UnloadAllGrammars();
 
            //Load the plugins
            LoadPlugins();
 
            //Load all of the grammars
            foreach (ISpeechPlugin plugin in _plugins)
            {
                sre.LoadGrammar(plugin.getGrammar());
            }
 
            //Set the recognition mode
            sre.RecognizeAsync(RecognizeMode.Multiple);
 
            //Add an event Handler
            sre.SpeechRecognized +=
                new EventHandler<SpeechRecognizedEventArgs>(Engine.SpeechRecognized);
 
            return sre;
        }

Now we can just call Engine.InitializeSRE in our window constructor and it will load all the plugins and handle all of the event routing. Build your plugin and run the WPF application and give it a try. Pretty simple huh? Now we can just keep building plugins to extend the functionality of our Speech Project. Stay tuned for more plugins.

  • StumbleUpon
  • Digg
  • del.icio.us
  • Reddit
  • Mixx
  • Technorati
  • Pownce
  • Furl

Microsoft Speech II: Getting your computer to listen

June 11th, 2008 Josh Posted in programming No Comments »

In this post I’ll show you how to take our simple project from the first example, Microsoft Speech I: Getting your computer to talk, and get it to understand input spoken into a microphone. Note that everything I’m showing can be found in the MSDN pages for the Speech API. Before we get started, make sure you have a microphone and it’s hooked up to your PC and all that other obvious stuff….

Lets begin by creating an instance of the SpeechRecognitionEngine class in the WPF window’s constructor. When doing this we have two options, to either run the Speech Recognizer in-process (your application’s process) or in Vista to use the OS’s Speech Recognizer out of process. In this tutorial we will be using it in-process. In order to get started make sure that you add the namespace:

using System.Speech.Recognition;

Now lets add some code to the Window’s constructor to create the object and get it ready to start listening:

public Window1()
{
      InitializeComponent();
 
      //Create the speech recognition engine
      SpeechRecognitionEngine sre = new SpeechRecognitionEngine();
 
      //Set the audio device to the OS default
      sre.SetInputToDefaultAudioDevice();
 
      //Reset the Grammars
      sre.UnloadAllGrammars();
}

At this point, the engine is listening to input, but doesn’t know what it should be listening for, nor what to do if it heard something interesting. The speech recognition engine uses Grammars (composed of Choices) to help parse the audio input. As a developer you can construct these grammars in a variety of ways, including XML that conforms to the W3C grammar spec, but we will go the route of using a GrammarBuilder in C#.

Lets add some code to the constructor to load a grammar that recognizes some basic sentences in the audio input. Start by creating a Choices object with a few commands in it and then loading it as a grammar in the Speech Recognition Engine:

//Create a set of choices
Choices thisChoices = new Choices("Computer",
    "What is your name?",
    "What is your favorite blog?"
    );
 
//Create a grammar based on these choices
Grammar thisGrammar = new Grammar(thisChoices.ToGrammarBuilder());
// Set the Grammar name
thisGrammar.Name = "Tutorial2";
 
//Load the grammar
sre.LoadGrammar(thisGrammar);

Now if we run the program, the SRE will be listening for choices that are in the loaded grammar to occur in the audio input stream. When it hears one of these it will trigger an event that we can react to. Lets add an event handler for a SpeechRecognizedEvent

//Set the eventing model
sre.RecognizeAsync(RecognizeMode.Multiple);
 
//Add an event handler
sre.SpeechRecognized +=
    new EventHandler<SpeechRecognizedEventArgs>(sre_SpeechRecognized);

We now need to implement the event handler. The SpeechRecognizedEventArgs contains in its result a variety of interesting properties about the speech recognized, but the ones we’ll be using right now are the the Result.Grammar.Name and Result.Text. The grammar name is useful for knowing which grammar contained the choices triggering the event (if you have multiple grammars, as we will later), and the Text contains the text that triggered the event. Before we do this, lets implement a Say method that will contain the code from Tutorial 1 in a Say method to help make the computer talk:

public void Say(string input)
{
    var synth = new SpeechSynthesizer();
    var sayThis = new Prompt(input);
    synt