开发环境: Windows 7
工具 : VS2012
前言 :
最近在做微软的一点小项目需要用到语音识别,但是微软的语音识别真的太不给力了,其中遇到很多麻烦。偶然听到大熊说google的语音识别接口,于是搜索资料弄了一个,整理出来,希望能帮助需要的朋友,大神别喷就好。
一、使用Google Speech API
思路解析: 1、首先通过WPF录音,这里注意码率必须为16000。
2、得到wav格式的录音文件流
3、将该录音文件流传给google的语音识别接口 http://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=zh-CN
4、解析google识别出来的文字信息。
先看如下参考资料 :
http://www.cnblogs.com/onlytiancai/archive/2008/08/02/p2p_sound_chat.html
http://blog.csdn.net/dlangu0393/article/details/7214728
http://www.cnblogs.com/eboard/archive/2012/02/29/speech-api.html
我的程序中引用的是 蛙蛙池塘 的第一个附件中的dll,把client中的dll引进项目即可实现录音。
附上我的代码:
MainWindow.xaml
<Window x:Class="SoundRecord.MainWindow" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" Title="MainWindow" Height="350" Width="525" Closing="Window_Closing_1"> <StackPanel Orientation="Vertical"> <TextBlock Name="statusBlock" FontSize="30" Foreground="Red" Text="还未开始..."/> <StackPanel Orientation="Horizontal" HorizontalAlignment="Center"> <Button Name="StartBtn" Width="200" Height="200" Background="Red" Content="开始" FontSize="40" Click="StartBtn_Click_1"/> <Button Name="StopBtn" Width="200" Height="200" Background="Red" Content="结束" FontSize="40" Click="StopBtn_Click_1"/> </StackPanel> </StackPanel> </Window>
MainWindow.cs
using Microsoft.DirectX.DirectSound; using System; using System.Collections.Generic; using System.IO; using System.Linq; using System.Net; using System.Text; using System.Threading; using System.Threading.Tasks; using System.Windows; using System.Windows.Controls; using System.Windows.Data; using System.Windows.Documents; using System.Windows.Input; using System.Windows.Media; using System.Windows.Media.Imaging; using System.Windows.Navigation; using System.Windows.Shapes; using WawaSoft.Media; namespace SoundRecord { /// <summary> /// Interaction logic for MainWindow.xaml /// </summary> public partial class MainWindow : Window { CaptureSound record = null; private int fileIndex = 0; public MainWindow() { InitializeComponent(); WaveFormat format = DirectSoundManager.CreateWaveFormat(16000, 16, 1); record = new CaptureSound(format); } private void StartBtn_Click_1(object sender, RoutedEventArgs e) { statusBlock.Text = "录音中..."; record.FileName = "F:\\hello-" + fileIndex + ".wav"; record.Start(); } private void StopBtn_Click_1(object sender, RoutedEventArgs e) { statusBlock.Text = "录音停止..."; record.Stop(); Thread thread = new Thread(new ThreadStart(GoogleSTT)); thread.Start(); } /// <summary> /// /// </summary> /// <param name="sender"></param> /// <param name="e"></param> private void Window_Closing_1(object sender, System.ComponentModel.CancelEventArgs e) { MessageBox.Show("Closing App"); record = null; } delegate void MyDelegate(); /// <summary> /// 调用GOOLE语音识别引擎 /// </summary> /// <returns></returns> private void GoogleSTT() { MyDelegate dl = new MyDelegate(delegate() { statusBlock.Text = "语音请求中"; }); Dispatcher.BeginInvoke(dl); Console.WriteLine("语音请求中..."); string result = string.Empty; try { string inFile = record.FileName; FileStream fs = new FileStream(inFile, FileMode.Open); byte[] voice = new byte[fs.Length]; fs.Read(voice, 0, voice.Length); fs.Close(); HttpWebRequest request = null; string url = "http://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=zh-CN"; Uri uri = new Uri(url); request = (HttpWebRequest)WebRequest.Create(uri); request.Method = "POST"; request.ContentType = "audio/L16; rate=16000"; request.ContentLength = voice.Length; using (Stream writeStream = request.GetRequestStream()) { writeStream.Write(voice, 0, voice.Length); } using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { using (Stream responseStream = response.GetResponseStream()) { using (StreamReader readStream = new StreamReader(responseStream, Encoding.UTF8)) { result = readStream.ReadToEnd(); Console.WriteLine("语音解析 :" + result); } } } } catch (Exception ex) { Console.WriteLine(ex.StackTrace); } //return result; } } }
注意在运行之前请先安装如下方法进行设置 (这里很容易出问题):
修改配置文件 App.config
<configuration> <startup useLegacyV2RuntimeActivationPolicy="true"> <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.0"/> </startup> </configuration>
修改后会出现错误:什么LoaderLock 之类的。
翻译成中文是 : 正试图在 os 加载程序锁内执行托管代码。不要尝试在 DllMain 或映像初始化函数内运行托管代码,这样做会导致应用程序挂起。
问题的解决方法:
这个问题只需要把vs2005菜单的 调试->异常->Managed Debuggin Assistants->LoaderLock 的选中状态去掉即可!如果异常(exception)这一项没有的话,在工具---自定义---命令选项卡---左边选择调试--右边把异常托到菜单里就可以了。异常也有个快捷键Ctrl+Alt+E.
丑陋的测试界面,点击开始后开始录音,录完后点击结束即可上传数据给google,然后等待控制台输出结果即可,这里并没有进行json解析,需要的朋友自己解析吧。
结果 :
这里例子的代码在这里 : http://download.csdn.net/detail/bboyfeiyu/5165924
二、使用微软的语音识别接口
直接上代码了。
using System; using System.Speech.Recognition; namespace SpeechRecognitionApp { class Program { static void Main(string[] args) { // 需要使用中文语音识别的话把en-US改为zh-CN即可,不过微软的特点就是不准 using ( SpeechRecognitionEngine recognizer = new SpeechRecognitionEngine( new System.Globalization.CultureInfo("en-US"))) { // 载入语法,这里是没有载入自定义的语法。你也可以设定语法,可以指定一些命令之类的 recognizer.LoadGrammar(new DictationGrammar()); /* 自己添加语法,在colorChoice 里面增加即可,具体要根据你需要的来设定。 Choices colorChoice = new Choices(new string[] { "red", "green", "blue" }); GrammarBuilder colorElement = new GrammarBuilder(colorChoice); Grammar grammar = new Grammar(colorElement); grammar.Enabled = true; // Create and load a dictation grammar. recognizer.LoadGrammar(grammar); */ // 增加事件处理 recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized); // Configure input to the speech recognizer. recognizer.SetInputToDefaultAudioDevice(); // Start asynchronous, continuous speech recognition. 启动语音识别 recognizer.RecognizeAsync(RecognizeMode.Multiple); // Keep the console window open. while (true) { Console.ReadLine(); } } } // Handle the SpeechRecognized event. 语音识别事件处理,获得识别到的文本。 static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e) { Console.WriteLine("Recognized text: " + e.Result.Text); } } }
参考资料(MSDN):
http://msdn.microsoft.com/zh-cn/library/system.speech.recognition.speechrecognitionengine.aspx
结束语: 希望能够帮到需要的人吧,有什么错误大神指出,别喷就好。