Friday 28 February 2014

Word Error Rate (WER) and Word Recognition Rate (WRR) with Python

"WAcc(WRR) and WER as defined above are, the de facto standard most often used in speech recognition."

WER has been developed and is used to check a speech recognition's engine accuracy. It works by calculating the distance between the engine's reults - called the hypothesis - and the real text - called the reference.

The distance function is based on the Levenshtein Distance (for finding the edit distance between words). The WER, like the Levenshtein distance, defines the distance by the amount of minimum operations that has to been done for getting from the reference to the hypothesis. Unlike the Levenshtein distance, however, the operations are on words and not on individual characters. The possible operations are:

Deletion: A word was deleted. A word was deleted from the reference.

Insertion: A word was added. An aligned word from the hypothesis was added.

Substitution: A word was substituted. A word from the reference was substituted with an aligned word from the hypothesis.

Also unlike the Levenshtein distance, the WER counts the deletions, insertion and substitutions done, instead of just summing up the penalties. To do that, we'll have to first create the table for the Levenshtein distance algorithm, and then backtrace in it through the shortest route to [0,0], counting the operations on the way.

Then, we'll use the formula to calculate the WER:


From this, the code is self explanatory:

def wer(ref, hyp ,debug=False):
    r = ref.split()
    h = hyp.split()
    #costs will holds the costs, like in the Levenshtein distance algorithm
    costs = [[0 for inner in range(len(h)+1)] for outer in range(len(r)+1)]
    # backtrace will hold the operations we've done.
    # so we could later backtrace, like the WER algorithm requires us to.
    backtrace = [[0 for inner in range(len(h)+1)] for outer in range(len(r)+1)]

    OP_OK = 0
    OP_SUB = 1
    OP_INS = 2
    OP_DEL = 3
    
    # First column represents the case where we achieve zero
    # hypothesis words by deleting all reference words.
    for i in range(1, len(r)+1):
        costs[i][0] = DEL_PENALTY*i
        backtrace[i][0] = OP_DEL
        
    # First row represents the case where we achieve the hypothesis
    # by inserting all hypothesis words into a zero-length reference.
    for j in range(1, len(h) + 1):
        costs[0][j] = INS_PENALTY * j
        backtrace[0][j] = OP_INS
    
    # computation
    for i in range(1, len(r)+1):
        for j in range(1, len(h)+1):
            if r[i-1] == h[j-1]:
                costs[i][j] = costs[i-1][j-1]
                backtrace[i][j] = OP_OK
            else:
                substitutionCost = costs[i-1][j-1] + SUB_PENALTY # penalty is always 1
                insertionCost    = costs[i][j-1] + INS_PENALTY   # penalty is always 1
                deletionCost     = costs[i-1][j] + DEL_PENALTY   # penalty is always 1
                
                costs[i][j] = min(substitutionCost, insertionCost, deletionCost)
                if costs[i][j] == substitutionCost:
                    backtrace[i][j] = OP_SUB
                elif costs[i][j] == insertionCost:
                    backtrace[i][j] = OP_INS
                else:
                    backtrace[i][j] = OP_DEL
                
    # back trace though the best route:
    i = len(r)
    j = len(h)
    numSub = 0
    numDel = 0
    numIns = 0
    numCor = 0
    if debug:
        print("OP\tREF\tHYP")
        lines = []
    while i > 0 or j > 0:
        if backtrace[i][j] == OP_OK:
            numCor += 1
            i-=1
            j-=1
            if debug:
                lines.append("OK\t" + r[i]+"\t"+h[j])
        elif backtrace[i][j] == OP_SUB:
            numSub +=1
            i-=1
            j-=1
            if debug:
                lines.append("SUB\t" + r[i]+"\t"+h[j])
        elif backtrace[i][j] == OP_INS:
            numIns += 1
            j-=1
            if debug:
                lines.append("INS\t" + "****" + "\t" + h[j])
        elif backtrace[i][j] == OP_DEL:
            numDel += 1
            i-=1
            if debug:
                lines.append("DEL\t" + r[i]+"\t"+"****")
    if debug:
        lines = reversed(lines)
        for line in lines:
            print(line)
        print("#cor " + str(numCor))
        print("#sub " + str(numSub))
        print("#del " + str(numDel))
        print("#ins " + str(numIns))
    return (numSub + numDel + numIns) / (float) (len(r))
    wer_result = round( (numSub + numDel + numIns) / (float) (len(r)), 3)
    return {'WER':wer_result, 'Cor':numCor, 'Sub':numSub, 'Ins':numIns, 'Del':numDel}
The code is based on the Java implementation of the algorithm by romanows.

Monday 24 February 2014

Brodcasting Shared "Board" with Google App Engine and Channel API

"Google App Engine's Channel API just works."
It's hard not to be constantly amazed by how easy backend web programming becomes when using the Google App Engine. The same is true, and even more so, for the Channel API.

This project is the first project in which I tried to implement the Channel API. At first, the whole "letting go and let Google do everything for you" had me a little bit on edge. Even right when starting, the api requires you to refer to a non-existent location:

   <script type="text/css" type="/_ah/channel/jsapi"></script>

But you should just let GAE worry about that. It links it to a real .js file when running on both localhost (via the dev_appserver.py) and when online.
If I were a more experienced App Engine user, I guess the '_ah' should have calmed me down. It seems that's where all the "magic pages" are hosted.
If, like me, you find it hard to believe something you can't see, you can view the code it's linking to at https://talkgadget.google.com/talkgadget/channel.js

Now let's get started:
1. Main page request
When the page (this example only includes one page) is requested, I want to create a channel using the api. a channel is a persistent connection between a client and the server. The use is practically similar to web sockets although the technology is very different.
For each client I'm creating a random UUID as a client id, and the channel name is based on that uuid.
To create several boards or "rooms" (i.e chat-rooms), the channel name will be comprised of the client id and a room id.


channel_name = boardId + '/' + clientId
token = channel.create_channel(channel_name)

So every client has his own channel. The client side javascript now has to connect to this channel, but even before that - I want to save this channel on the server, so I could send messages through it when I need (as "comets"). To do this, each room gets a memcache entry. The memcache only saves strings, so, like in GAE Blog: Pushing Update with the Channel API I'm going to save the client list as a stringified json

#load saved channels from the memcache
channels = json.loads(memcache.get(boardId) or '{}')
# add this channel, save a timestamp of when it was created
channels[clientId] = str(datetime.datetime.now()) 
# update the memcache - stringify to json again and save.
memcache.set(boardId, json.dumps(channels)) 

I'm going to send the client the HTML page as a response, and I'll be using jinja to pass some arguments on to the page. To connect to the channel, the client will need the token we created, and I'm also going to pass the client id argument.
template_values = {'token': token,
                       'clientId': clientId,
                       }
template = jinja_environment.get_template('main.html')
self.response.out.write(template.render(template_values))
2. Connect through the javascript client. We're going to be good script kiddies and do exactly what Google suggested. This code is copied directly from the api tutorial.

     function openConnection() {
          channel = new goog.appengine.Channel('{{ token }}');
          socket = channel.open();
          socket.onopen = onOpened;
          socket.onmessage = onMessage;
      }
      onMessage = function (m) {
          message = JSON.parse(m.data);
          if (message.type == "NOTE_ADDED") {
             addNote(message.note)
         }
      }
      sendMessage = function (path, data) {
          data['userId'] = "{{ userId }}";
          $.post(path, data);
      };

      onOpened = function () {
          connected = true;
      };
the openConnection function can be called on load. This function uses a Channel API function that opens a Comet connection to the python server.
   <body onload="openConnection();">
and we're connected!

3. Broadcast changes to the board to all clients
we need a new handler to handle post in the python server.

class AddNote(webapp2.RequestHandler):
    def post(self, boardId):
        note = self.request.get('note')
        # message to broadcast to everyone - note added
        message = {'type' : 'NOTE_ADDED', 'note' : note}
        # stringify the message so it can be sent over the channel
        messagejson = json.dumps(message)
        # get connected channels
        channels = json.loads(memcache.get(boardId) or '{}')
        for clientId in channels:
            # for each channel, sent the message
            channel.send_message(boardId+'/'+clientId, messagejson)
#...
application = webapp2.WSGIApplication(
                                     [('/(\w+)/', MainPage),
                                      ('/(\w+)/addnote', AddNote)]
Now, the JavaScript client will handle the message received event:

      onMessage = function (m) {
          message = JSON.parse(m.data);
   if (message.type == "NOTE_ADDED") {
  addNote(message.note)
   }
      }
5. Send
For the javascript client to add a message it just needs to run

      sendMessage("addnote", {
              note: document.getElementById("notecontent").value
          });
And that's it!
When a client sends an addnote message to the server(Step 5), the server broadcasts it as a NOTE_ADDED message to all the clients throught the Channel API (Step 4). Quick, simple, real-time connection!
Thank you Google.