现在的位置: 首页 > 综合 > 正文

CMUSphinx Learn – Building application with pocketsphinx

2014年02月15日 ⁄ 综合 ⁄ 共 12372字 ⁄ 字号 评论关闭

Building application with pocketsphinx

使用pocketsphinx构建应用

Installation

安装

Pocketsphinx is a library that depends on another library called SphinxBase which provides common functionality across all CMUSphinx projects. To install Pocketsphinx, you need to install both Pocketsphinx and
Sphinxbase. It's possible to use Pocketsphinx both in Linux and in Windows.

Pocketsphinx是一个依赖于SphinxBase的库,SphinxBase为所有CMUSphinx项目提供常用功能。为了安装Pocketsphinx,需要同时安装Pocketsphinx和Sphinxbase,在Linux和Windows系统上都可以使用Pocketsphinx。

 

First of all, download the released packages pocketsphinx and sphinxbase, checkout them from subversion or download a snapshot. For more details see download
page
. Unpack them into same directory. On Windows, you will need to rename 'sphinxbase-X.Y' (where X.Y is the SphinxBase version number) to simply ‘sphinxbase' for this to work.

首先,下载pocketsphinx和sphinxbase的发布包,可以从subversion或者snapshot下载,更多详细信息看下载页。将它们解压到同一个目录,在Windows中,你需要将‘sphinxbase-X.Y’(X.Y是SphinxBase的版本号)重新命名为‘sphinxbase’,这样才能正常工作。

 

Unix-like Installation

Unix下安装

In a unix-like environment (such as Linux, Solaris, FreeBSD etc):

在unix环境下(比如Linux,Solaris,FreeBSD等等)

  • On step one, build and install SphinxBase. If you downloaded directly from the repository, you need to do this at least once to generate the configurefile:
  • 第一步,构建并安装SphinxBase,如果直接从资源库下载,需要做以下操作来产生配置文件。
% ./autogen.sh
  • if you downloaded the release version, or ran autogen.sh at least once, then compile and install:
  • 如果下载发布版本,或至少运行一次autogen.sh,然后编译、安装:

% ./configure % make % make install

  • If you want to use fixed-point arithmetic, you must configure SphinxBase with the –enable-fixed option. You can also set installation prefix with –prefix.
    You can also configure with or without python.
  • 如果想使用定点算法,必须使用-enable-fixed选项来配置SphinxBase,你可以在安装时设置-prefix作为前缀,也可以通过配置来决定使用或者不使用python版本。
  •  
  • The sphinxbase will be installed in /usr/local/ folder. Not every system loads libraries from this folder automatically. To load them you need
    to configure the path to look for shared libaries. It can be done either in the file /etc/ld.so.conf or with exporting environment variables:
  • sphinxbase将被安装在/user/local文件夹下,并不是所有系统都会将库自动导入到这个目录,为了导入它们,需要配置共享库的路径,可以通过编辑/etc/ld.so.conf 文件或者输出环境变量:
export LD_LIBRARY_PATH=/usr/local/lib export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
  • Then change to pocketsphinx folder and perform the same steps
  • 然后改变pocketsphinx文件夹,执行相同的步骤:
% ./configure % make % make install
  • To test installation, run pocketsphinx_continuous and check that it recognizes words you are saying to the microphone.
  • 为了测试安装是否成功,运行pocketsphinx_continuous,对着麦克风说话,检查识别出的单词。

Windows

In MS Windows (TM), under MS Visual
Studio 2008 (or newer - we test with Visual C++ 2008 Express):

在MS windows(TM),MS Visual Studio 2008(或者更新的 - 我们使用的是Visual C++ 2008 专业版)

  • load sphinxbase.sln located in sphinxbase directory
  • 打开位于sphinxbase目录下的sphinxbase.sln
  • compile all the projects in SphinxBase (from sphinxbase.sln)
  • 编译SphinxBase中的所有项目
  • load pocketsphinx.sln in pocketsphinx directory
  • 打开位于pocketsphinx目录下的pocketsphinx.sln
  • compile all the projects in PocketSphinx
  • 编译pocketsphinx中的所有项目

MS Visual Studio
will build the executables under .\bin\Release or .\bin\Debug (depending on the version
you choose on MS Visual Studio), and the libraries under .\lib\Release or .\lib\Build.
To run pocketsphinx_continuous, don't forget to copy sphinxbase.dll to the bin folder. Otherwise the executable will fail to find this library.

MS Visual Studio编译的可执行文件在 .\bin\Release 或者 .\bin\Debug (取决于所选MS Visual Studio的版本 ),库文件在 .\lib\Release 或者 .\lib\Build。为了可以运行pocketsphinx_continuous,不要忘记拷贝sphinxbase.dll到bin目录,否则pocketsphinx_continuous会因无法找到库文件而无法运行。

XCode Installation (for iPhone)

Sphinxbase uses the standard unix autogen system, and there's a script included, build_for_iphone.sh that
will setup configure to create binaries that are XCode friendly.

Sphinxbase使用标准的unix自动生成系统,其中包含一个脚本文件build_for_iphone.sh,这个脚本文件可以配置创建XCode的二进制文件。

./autogen.sh
./build_for_iphone.sh simulator
./build_for_iphone.sh device

Then in XCode, open your project info, and for 'All Configurations', and set:

在XCode中,打开项目面板,‘所有配置’选项,然后如下设置:

'Header Search Paths' = "$(HOME)$(SDK_DIR)/include/pocketsphinx"
'Library Search Paths' = "$(HOME)$(SDK_DIR)/lib"
'Other Linker Flags' = "-lpocketsphinx"

 

Pocketsphinx API Core Ideas

Pocketsphinx API的核心理念

Pocketsphinx API is
designed to ease the use of speech recognizer functionality in your applications

Pocketsphinx API 设计的目的是使语音识别函数在应用中更加的简单。

It is much more likely to remain stable both in terms of source and binary compatibility, due to the use of abstract types.

抽象类型的使用更有可能是为了保持源代码和二进制兼容的稳定。

 

    It is fully re-entrant, so there is no problem having multiple decoders in the same process.

        它是可重入的,因此在同一个进程中拥有多个解码器是没有问题的。

 

The new language model API (in
SphinxBase) supports linear interpolation of multiple models at run-time.

新的语言模型的API(在SphinxBase)支持运行时多模型的线性插值。

 

It has enabled a drastic reduction in code footprint and a modest but significant reduction in memory consumption.

在代码封装和适度性方面会有很大的影响,但在内存消耗上会显著降低。

 

         Reference documentation for the new API is
available at 
http://cmusphinx.sourceforge.net/api/pocketsphinx/

            
新的API的参考文档可以在http://cmusphinx.sourceforge.net/api/pocketsphinx/获得。

 

Basic Usage (hello world)

基本用法

There are few key things you need to know on how to use the API:

关于如何使用API你需要知道一些事情:

       Command-line parsing is done externally (in <cmd_ln.h>)

       外部解析命令行(在<cmd_ln.h>)

       Everything takes a ps_decoder_t * as the first
argument.

         每个对象使用一个 ps_decoder_t *
作为第一个对象。

 

To illustrate the new API,
we will step through a simple “hello world” example. This example is somewhat specific to Unix in the locations of files and the compilation process. We will create a C source file called hello_ps.c.
To compile it (on Unix), use this command:

为了举例说明新的API,我们将逐步介绍“hello world”的例子。这个例子在文件所在位置和编译过程方面对Unix系统来说有点特殊,我们将建立一个hello_ps.c的C源文件,使用以下命令在Unix系统上编译它:

gcc -o hello_ps hello_ps.c \
    -DMODELDIR=\"`pkg-config --variable=modeldir pocketsphinx`\" \
    `pkg-config --cflags --libs pocketsphinx sphinxbase`

Please note that compilation errors here mean that you didn't carefully read the tutorial and didn't follow the installation guide above. For example pocketsphinx needs to be properly installed
to be available through pkg-config system. To check that pocketsphinx is installed properly, just run pkg-config –cflags –libs pocketsphinx sphinxbase from the command line
and see that output looks like

请注意,编译出现错误那就意味着你并没有仔细阅读教程,没有按照上面的安装指南。比如,pocketsphinx需要正确安装才能获得pkg-config文件,为了检验pocketsphinx是否正确安装,从命令行运行pkg-config -cflags -libs pocketsphinx sphinxbase,并查看输出。

-I/usr/local/include -I/usr/local/include/sphinxbase -I/usr/local/include/pocketsphinx  
-L/usr/local/lib -lpocketsphinx -lsphinxbase -lsphinxad

 

Initialization

初始化

The first thing we need to do is to create a configuration object, which for historical reasons is called cmd_ln_t. Along with the general
boilerplate for our C program, we will do it like this:

由于历史原因,我们需要做的第一件事就是创建一个叫做cmd_ln_t配置对象,和C程序的模板一样,这样做:

#include <pocketsphinx.h>

int
main(int argc, char *argv[])
{
        ps_decoder_t *ps;
        cmd_ln_t *config;

        config = cmd_ln_init(NULL, ps_args(), TRUE,
                             "-hmm", MODELDIR "/hmm/en_US/hub4wsj_sc_8k",
                             "-lm", MODELDIR "/lm/en/turtle.DMP",
                             "-dict", MODELDIR "/lm/en/turtle.dic",
                             NULL);
        if (config == NULL)
                return 1;

        return 0;
}

The cmd_ln_init() function takes a variable number of null-terminated string arguments, followed by NULL. The first argument
is any previous cmd_ln_t * which is to be updated. The second argument is an array of argument definitions - the standard set can be obtained by calling ps_args().
The third argument is a flag telling the argument parser to be “strict” - if this is TRUE, then duplicate arguments or unknown arguments will cause parsing to fail.

cmd_ln_init()函数携带了一些以null结尾的字符串变量为参数,第一个参数是先前任何一个将要被更新的cmd_ln_t *变量,第二个参数是一组数据,ps_args()函数可以获得这样一组标准参数集,第三个参数是一个标志,告诉参数解析器是否“严格”解析 - 如果是TRUE,就会拷贝参数,未知参数就会导致解析失败。

The MODELDIR macro is defined on the GCC command-line by using pkg-config to
obtain the modeldir variable from PocketSphinx configuration. On Windows, you can simply add a preprocessor definition to the code, such as this:

GCC命令行定义了MODELDIR,它通过使用pkg-config从PocketSphinx配置中获得modeldir变量。Windows系统中,在代码中加入宏定义是很简单的,像这样:

#define MODELDIR "c:/sphinx/model"

(replace this with wherever your models are installed). Now, to initialize the decoder, use ps_init:

(用上面的宏定义替换模型安装的位置),现在,使用ps_init函数初始化解码器:

        ps = ps_init(config);
        if (ps == NULL)
                return 1;
 

Decoding a file stream

解码一个文件流

Because live audio input is somewhat platform-specific, we will confine ourselves to decoding audio files. The “turtle” language model recognizes a very simple “robot control” language, which recognizes phrases
such as “go forward ten meters”. In fact, there is an audio file helpfully included in the PocketSphinx source code which contains this very sentence. You can find it in test/data/goforward.raw.
Copy it to the current directory. If you want to create your own version of it, it needs to be a single-channel (monaural), little-endian, unheadered 16-bit signed PCM audio file sampled at 16000 Hz.

因为输入音频流要根据特定平台,所以我们要限制需要解码的音频文件。“turtle”语言模型可以识别简单的“机器人控制”语言,可以识别像“go forward ten meters”这样的短语。事实上,PocketSphinx源代码中含有这个句子的音频文件,在test/data/goforward.raw中可以找到它,将它拷贝当前目录。如果你想创建自己的音频文件,需要采用单声道,小端,16KHz采样,16带符号位PCM音频文件。

To do this, we will first open the file:

为解码,首先要打开文件:

FILE *fh; fh = fopen("goforward.raw", "rb"); if (fh == NULL) { perror("Failed to open goforward.raw"); return 1; }

And then decode it, using ps_decode_raw():

然后使用ps_decode_raw()函数来解码:

rv = ps_decode_raw(ps, fh, "goforward", -1); if (rv < 0) return 1;

Now, to get the hypothesis, we can use ps_get_hyp():

现在,使用ps_get_hyp()函数来获得结果:

char const *hyp, *uttid; int rv; int32 score; hyp = ps_get_hyp(ps, &score, &uttid); if (hyp == NULL) return 1; printf("Recognized: %s\n", hyp);

 

Decoding audio data from memory

从内存中解码音频数据

Now, we will decode the same file again, but using the API for
decoding audio data from blocks of memory. In this case, we need to first start the utterance using ps_start_utt():

现在,我们再次来解码同样的文件,但是使用从内存块解码音频数据的API,在这种情况下,我们要先用 ps_start_utt()函数来取得数据。

fseek(fh, 0, SEEK_SET); rv = ps_start_utt(ps, "goforward"); if (rv < 0) return 1;

 

We will then read 512 samples at a time from the file, and feed them to the decoder using ps_process_raw():

每次从文件中读取512个样本点,用ps_process_raw() 函数将它们放进解码器:

int16 buf[512]; while (!feof(fh)) { size_t nsamp; nsamp = fread(buf, 2, 512, fh); rv = ps_process_raw(ps, buf, nsamp, FALSE, FALSE); }

 

Then we will need to mark the end of the utterance using ps_end_utt():

然后需要用ps_end_utt()标记音频的结束位置:

rv = ps_end_utt(ps); if (rv < 0) return 1;

 

Retrieving the hypothesis string works in exactly the same way:

用同样方法取回字符串:

hyp = ps_get_hyp(ps, &score, &uttid); if (hyp == NULL) return 1; printf("Recognized: %s\n", hyp);

 

Cleaning up

清除

To clean up, simply call ps_free() on the object that was returned by ps_init().
You should not do anything to free the configuration object.

为了清除内存,调用ps_free()来释放ps_init()函数返回的对象,你不需要对配置对象做任何的释放操作。

 

Code listing

代码清单

#include <pocketsphinx.h>

int
main(int argc, char *argv[])
{
	ps_decoder_t *ps;
	cmd_ln_t *config;
	FILE *fh;
	char const *hyp, *uttid;
        int16 buf[512];
	int rv;
	int32 score;

	config = cmd_ln_init(NULL, ps_args(), TRUE,
			     "-hmm", MODELDIR "/hmm/en_US/hub4wsj_sc_8k",
			     "-lm", MODELDIR "/lm/en/turtle.DMP",
			     "-dict", MODELDIR "/lm/en/turtle.dic",
			     NULL);
	if (config == NULL)
		return 1;
	ps = ps_init(config);
	if (ps == NULL)
		return 1;

	fh = fopen("goforward.raw", "rb");
	if (fh == NULL) {
		perror("Failed to open goforward.raw");
		return 1;
	}

	rv = ps_decode_raw(ps, fh, "goforward", -1);
	if (rv < 0)
		return 1;
	hyp = ps_get_hyp(ps, &score, &uttid);
	if (hyp == NULL)
		return 1;
	printf("Recognized: %s\n", hyp);

        fseek(fh, 0, SEEK_SET);
        rv = ps_start_utt(ps, "goforward");
	if (rv < 0)
		return 1;
        while (!feof(fh)) {
            size_t nsamp;
            nsamp = fread(buf, 2, 512, fh);
            rv = ps_process_raw(ps, buf, nsamp, FALSE, FALSE);
        }
        rv = ps_end_utt(ps);
	if (rv < 0)
		return 1;
	hyp = ps_get_hyp(ps, &score, &uttid);
	if (hyp == NULL)
		return 1;
	printf("Recognized: %s\n", hyp);

	fclose(fh);
        ps_free(ps);
	return 0;
}
 
高级用法

For more complicated uses of the old API,
there are some significant differences:

相比旧的API的复杂用法,新的API有一些显著差异:

 

    There are no longer separate functions for getting partial and full hypotheses.

    不再有单独的获取部分和全部结果的函数。

 

    Word segmentations are accessed via iterators rather than being returned as arrays or lists.

     单词分段是通过迭代器来存取,而不是通过返回数组或者列表来存取。

 

    Language model switching is done externally (in <ngram_model.h>)

    在外部进行语言模型的转换

 

The first of these is straightforward. Before, you had to use uttproc_partial_result() to get partial results (i.e. before uttproc_end_utt() was
called), and uttproc_result() for full results. Now,ps_get_hyp() works for both.

第一个是很容易理解的,之前,不得不使用 uttproc_partial_result()函数来获得部分结果(uttproc_end_utt()调用之前),用uttproc_result()函数获得所有结果,现在,ps_get_hyp()函数两种情况都可以获取。

 

For word segmentations, the API provides
an iterator object which is used to, well, iterate over the sequence of words. This iterator object is an abstract type, with some accessors provided to obtain timepoints, scores, and (most interestingly) posterior probabilities for each word.

为了分割单词,API提供了一个迭代器对象,用于迭代单词序列,迭代器对象是一个抽象类型,具有获取时间点、分数、每个单词的后验概率的访问器,

 

Finally, language model switching is quite different. The decoder is always associated with a language model set object (yes,
even if there is only one language model). Switching language models is accomplished by:

最后语言模型转换是非常不同的,解码器和语言模型集合对象是有关联的(是的,即使只有一个语言模型),切换语言模型通过以下方式来完成:

       Getting a handle to the language model set object: ps_get_lmset()

      获得语言模型集合对象的句柄:ps_get_lmset().

      Selecting the new language model: ngram_model_set_select()

     选择新的语言模型:ngram_model_set_select().

      Telling the decoder the language model set has been updated: ps_update_lmset()

      通知解码器语言模型集合已经更新:ps_update_lmset().

抱歉!评论已关闭.