# Speech recognition part 1

Quite a while back, I used Jasper to create a speech recognition periodic table fun fact program.  That was before my website was revamped and I lost the original blog post.  (Well, I do have it in my backups, but it’s not really worth reviving at this point.)  I do have a video showing the results, however:

I’ve started to revisit this project, and here are my first thoughts…

## Introduction

I’m working on voice-activated scientific instrumentation, and I would like to have a speech recognition system that is

• easy to install
• easy to maintain
• easy to configure

Jasper was a lot of fun, but there is a tremendous amount of overhead, a (lack of ) tremendous amount of documentation, and a (sadly, not lack of) tremendous amount of hiccups along the way.  This last point is especially true for those trying to use a version 2 Raspberry Pi with Jessie.  Since my project does not require an exhaustive dictionary, and I am only interested in a few commands (at least, in the beginning), I wanted to take a minimalist approach to speech recognition.

## Setup

I’m using Kamino Base – a fresh Jessie install with just a few additional software packages and minimal configurations such as timezone, keyboard layout and ssh access.  I then installed pocketsphinx sudo apt-get install pocketsphinx which installs version 0.8-5.  No other software is installed at this point.  I have a USB microphone attached and have made sure I can record sound via arecord and play sound via aplay.  I have made no attempt to mess with default audio devices or modules to switch the order of devices, so for arecord, I need the -D plughw:1,0 flag to indicate that my mic is device #1.

### Progress

Simply running pocketsphinx_continuous -adcdev plughw:1,0 gets me some speech recognition.  Nice and simple.  Sadly, the text I want is buried deep within a whole bunch of information that is more-or-less useless to me.  We can get rid most of it with the -logfn /dev/null flag.  Now, I get output that looks like this:

Now, I find the result rather ironic, since I said, “I understand what you say”, but I’ll worry about accuracy later.  Right now, I want to be able to take the output of this command and use it in another program.  My though is this: what if I filter out all extraneous text and send it to a named pipe?  That way, another program can be aware of the most recently uttered text and use that information as desired.

Along the way, I have learned a bit about sed and named pipes.  Perhaps I’ll provide some more detail at some point, but here are the main facts:

• sed can be used to remove any lines that don’t contain the useful information.  In this case, the useful information has a nice identifier – 9 digits plus a colon and a space.
• there’s plenty of information on the web about linux and named pipes.  This project, however, required what is called a non-blocking pipe.  Surprisingly, this can be done with a [relatively straightforward c program](http://stackoverflow.com/q/7360473/2711057).
• In order to redirect sed to the pipe, we need to use the –unbuffered flag.

### The results

So, first thing we need is to make the ftee command described in the above link.  The code:

/* ftee - clone stdin to stdout and to a named pipe
(c) racic@stackoverflow
WTFPL Licence */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <signal.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
struct stat status;
char *fifonam;
char buffer[BUFSIZ];
ssize_t bytes;

signal(SIGPIPE, SIG_IGN);

if(2!=argc)
{
printf("Usage:\n someprog 2>&1 | %s FIFO\n FIFO - path to a"
" named pipe, required argument\n", argv[0]);
exit(EXIT_FAILURE);
}
fifonam = argv[1];

readfd = open(fifonam, O_RDONLY | O_NONBLOCK);
{
exit(EXIT_FAILURE);
}

{
perror("ftee: fstat");
exit(EXIT_FAILURE);
}

if(!S_ISFIFO(status.st_mode))
{
printf("ftee: %s in not a fifo!\n", fifonam);
exit(EXIT_FAILURE);
}

writefd = open(fifonam, O_WRONLY | O_NONBLOCK);
if(-1==writefd)
{
perror("ftee: writefd: open()");
exit(EXIT_FAILURE);
}

while(1)
{
bytes = read(STDIN_FILENO, buffer, sizeof(buffer));
if (bytes < 0 && errno == EINTR)
continue;
if (bytes <= 0)
break;

bytes = write(STDOUT_FILENO, buffer, bytes);
if(-1==bytes)
perror("ftee: writing to stdout");
bytes = write(writefd, buffer, bytes);
if(-1==bytes);//Ignoring the errors
}
close(writefd);
return(0);
}

which is compiled with gcc -o ftee ftee.c.  I put that in ~/bin, which I have in my PATH variable.

Next, we make a pipe using mkfifo /tmp/speech and then run the following command:

pocketsphinx_continuous -adcdev plughw:1,0 -logfn /dev/null | sed --unbuffered -n 's/^[0-9: ]\{11\}$.*$/\1/p' | ftee /temp/speech

In another shell, I can use cat &lt; /tmp/speech to view the text recognized by pocketsphinx. Here’s what’s going on: “-adcdev plughw:1,0” tells pocketsphinx to use my mic, which is device #1; “-logfn /dev/null” hides the INFO lines.  We then pipe the output to sed.  The “–unbuffered” flag allows for the output of sed to be further piped and “-n” prevents printing of output unless we say otherwise.  The sed regular expression says to search for a line beginning with 9 numbers a colon and a space ^[0-9: ]{11} and store whatever is left (.*) because we want that returned \1 and printed p.  Lastly, we pipe the output to ftee.

### Conclusion

So, at this point I have a very compact set of commands with minimal overhead that provides access to the most recent text recognized by the speech-to-text engine.  The next step is to create a program that will take advantage of this feature.

This site uses Akismet to reduce spam. Learn how your comment data is processed.