WPF实现录音和语音识别的两种方案

现在的位置: 首页 > 综合 > 正文

RSS

WPF实现录音和语音识别的两种方案

2013年10月11日 ⁄ 综合 ⁄ 共 6225字 ⁄ 字号小中大 ⁄ 评论关闭

开发环境： Windows 7

工具 : VS2012

前言：

最近在做微软的一点小项目需要用到语音识别，但是微软的语音识别真的太不给力了，其中遇到很多麻烦。偶然听到大熊说google的语音识别接口，于是搜索资料弄了一个，整理出来，希望能帮助需要的朋友，大神别喷就好。

一、使用Google Speech API

思路解析： 1、首先通过WPF录音，这里注意码率必须为16000。

2、得到wav格式的录音文件流

3、将该录音文件流传给google的语音识别接口 http://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=zh-CN

4、解析google识别出来的文字信息。

先看如下参考资料：

http://www.cnblogs.com/onlytiancai/archive/2008/08/02/p2p_sound_chat.html

http://blog.csdn.net/dlangu0393/article/details/7214728

http://www.cnblogs.com/eboard/archive/2012/02/29/speech-api.html

我的程序中引用的是蛙蛙池塘的第一个附件中的dll，把client中的dll引进项目即可实现录音。

附上我的代码：

MainWindow.xaml

<Window x:Class="SoundRecord.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="MainWindow" Height="350" Width="525"
        Closing="Window_Closing_1">
    <StackPanel Orientation="Vertical">
        <TextBlock Name="statusBlock" FontSize="30" Foreground="Red" Text="还未开始..."/>
        <StackPanel Orientation="Horizontal" HorizontalAlignment="Center">
            <Button Name="StartBtn" Width="200" Height="200" Background="Red" Content="开始" FontSize="40" Click="StartBtn_Click_1"/>
            <Button Name="StopBtn" Width="200" Height="200" Background="Red" Content="结束" FontSize="40" Click="StopBtn_Click_1"/>
        </StackPanel>
    </StackPanel>

</Window>

MainWindow.cs

using Microsoft.DirectX.DirectSound;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Data;
using System.Windows.Documents;
using System.Windows.Input;
using System.Windows.Media;
using System.Windows.Media.Imaging;
using System.Windows.Navigation;
using System.Windows.Shapes;
using WawaSoft.Media;

namespace SoundRecord
{
    /// <summary>
    /// Interaction logic for MainWindow.xaml
    /// </summary>
    public partial class MainWindow : Window
    {
        CaptureSound record = null;
        private int fileIndex = 0;

        public MainWindow()
        {
            InitializeComponent();

            WaveFormat format = DirectSoundManager.CreateWaveFormat(16000, 16, 1);
            record = new CaptureSound(format);
        }

        private void StartBtn_Click_1(object sender, RoutedEventArgs e)
        {
            statusBlock.Text = "录音中...";
            record.FileName = "F:\\hello-" + fileIndex + ".wav";
            record.Start();
        }

        private void StopBtn_Click_1(object sender, RoutedEventArgs e)
        {
            statusBlock.Text = "录音停止...";
            record.Stop();
            Thread thread = new Thread(new ThreadStart(GoogleSTT));
            thread.Start();
        }

        /// <summary>
        /// 
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void Window_Closing_1(object sender, System.ComponentModel.CancelEventArgs e)
        {
            MessageBox.Show("Closing App");
            record = null;
        }


        delegate void MyDelegate();
            /// <summary>
        /// 调用GOOLE语音识别引擎
        /// </summary>
        /// <returns></returns>
        private void GoogleSTT()
        {
            MyDelegate dl = new MyDelegate(delegate() { statusBlock.Text = "语音请求中"; });
            Dispatcher.BeginInvoke(dl);

            Console.WriteLine("语音请求中...");
            string result = string.Empty;
            try
            {
                string inFile = record.FileName;
                FileStream fs = new FileStream(inFile, FileMode.Open);
                byte[] voice = new byte[fs.Length];
                fs.Read(voice, 0, voice.Length);
                fs.Close();
                HttpWebRequest request = null;
                string url = "http://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=zh-CN";
                Uri uri = new Uri(url);
                request = (HttpWebRequest)WebRequest.Create(uri);
                request.Method = "POST";
                request.ContentType = "audio/L16; rate=16000";
                request.ContentLength = voice.Length;
                using (Stream writeStream = request.GetRequestStream())
                {
                    writeStream.Write(voice, 0, voice.Length);
                }
 
                using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
                {
                    using (Stream responseStream = response.GetResponseStream())
                    {
                        using (StreamReader readStream = new StreamReader(responseStream, Encoding.UTF8))
                        {
                            result = readStream.ReadToEnd();
                            Console.WriteLine("语音解析 :" + result);
                        }
                    }
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.StackTrace);
            }
            //return result;
        }
    }
}

注意在运行之前请先安装如下方法进行设置 (这里很容易出问题)：

修改配置文件 App.config

<configuration>
<startup useLegacyV2RuntimeActivationPolicy="true">
  <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.0"/>
</startup>
</configuration>

修改后会出现错误：什么LoaderLock 之类的。

翻译成中文是 : 正试图在 os 加载程序锁内执行托管代码。不要尝试在 DllMain 或映像初始化函数内运行托管代码，这样做会导致应用程序挂起。

问题的解决方法:

这个问题只需要把vs2005菜单的调试->异常->Managed Debuggin Assistants->LoaderLock 的选中状态去掉即可！如果异常（exception）这一项没有的话，在工具---自定义---命令选项卡---左边选择调试--右边把异常托到菜单里就可以了。异常也有个快捷键Ctrl+Alt+E.

丑陋的测试界面，点击开始后开始录音，录完后点击结束即可上传数据给google，然后等待控制台输出结果即可，这里并没有进行json解析，需要的朋友自己解析吧。

结果：

这里例子的代码在这里： http://download.csdn.net/detail/bboyfeiyu/5165924

二、使用微软的语音识别接口

直接上代码了。

using System;
using System.Speech.Recognition;

namespace SpeechRecognitionApp
{
  class Program
  {
    static void Main(string[] args)
    {

      // 需要使用中文语音识别的话把en-US改为zh-CN即可，不过微软的特点就是不准
      using (
      SpeechRecognitionEngine recognizer =
        new SpeechRecognitionEngine(
          new System.Globalization.CultureInfo("en-US")))
      {

        // 载入语法，这里是没有载入自定义的语法。你也可以设定语法，可以指定一些命令之类的
        recognizer.LoadGrammar(new DictationGrammar());
        /* 自己添加语法，在colorChoice 里面增加即可，具体要根据你需要的来设定。
        Choices colorChoice = new Choices(new string[] { "red", "green", "blue" });
        GrammarBuilder colorElement = new GrammarBuilder(colorChoice);
        Grammar grammar = new Grammar(colorElement);
        grammar.Enabled = true;
        // Create and load a dictation grammar.
        recognizer.LoadGrammar(grammar); 
        */
        // 增加事件处理
        recognizer.SpeechRecognized += 
          new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);

        // Configure input to the speech recognizer.
        recognizer.SetInputToDefaultAudioDevice();

        // Start asynchronous, continuous speech recognition. 启动语音识别
        recognizer.RecognizeAsync(RecognizeMode.Multiple);

        // Keep the console window open.
        while (true)
        {
          Console.ReadLine();
        }
      }
    }

    // Handle the SpeechRecognized event.    语音识别事件处理，获得识别到的文本。
    static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
    {
      Console.WriteLine("Recognized text: " + e.Result.Text);
    }
  }
}

参考资料（MSDN）：

http://msdn.microsoft.com/zh-cn/library/system.speech.recognition.speechrecognitionengine.aspx

结束语：希望能够帮到需要的人吧，有什么错误大神指出，别喷就好。

【上篇】傅老师课堂：Oracle高级应用之物化视图(materialized view)
【下篇】Log4j详细配置

作者: infringe

该日志由 infringe 于11年前发表在综合分类下，最后更新于 2013年10月11日.
转载请注明: WPF实现录音和语音识别的两种方案 | 学步园 +复制链接

抱歉!评论已关闭.

学步园

WPF实现录音和语音识别的两种方案

作者: infringe

书签

最新文章New

本站推荐

返回首页