Load and Run a LSTM model¶

One of the many good examples in CNTK is language modeling exercise in Examples/Text/PennTreebank. The documentation for this one is a bit sparse and the example is really just of a demo for how easy it is to use their “Simple Network Builder” to define a LSTM network and train it with stochastic gradient decent on data from the Penn Treebank Project. One command starts the learning:

cntk configFile=../Config/rnn.cntk

Doing so trains the network, tests it and saves the model. However, to see the model data in an easily readable form you need a trivial addition to the configfile: you need to add the following dumpnode command to put a dump file a directory of your choosing.

dumpnode=[
    action = "dumpnode"
    modelPath = "$ModelDir$/rnn.dnn"
    outputFile = "$OutputDir$/modeltext/dump"
]

However we have already run it and the model data is in a ziped file here.
https://1drv.ms/u/s!AkRG9Zk_IOUagsYYVxAGC4HJiL8a3w

This file is about 50MB so place it in a directory called models. This program does not require CNTK to run.

import numpy as np
import numpy.linalg as la
import math

The following is a ulitity to pull the word from a line in the vocab file.

modelpath = "path to your models"

def pullword(l):
    i = 0
    while l[i] == ' ' or l[i] == '\t': i+=1
    while l[i] != ' ' and l[i] != '\t': i+=1
    while l[i] == ' ' or l[i] == '\t':  i+=1
    while l[i] != ' ' and l[i] != '\t': i+=1
    while l[i] == ' ' or l[i] == '\t': i+= 1
    strm = ""
    while l[i] != ' ' and l[i] != '\t':
        strm = strm+l[i]
        i += 1
    return strm

OpenTensor is a function that opens the trained model files generated by cntk.

def opentensor(path):
    with open(path) as file:
        El = [[float(digit) for digit in line.split()] for line in file]
    return np.array(El)

E = opentensor(modelpath + '/E0.txt')
bO = opentensor(modelpath + '/bo0.txt')
WHO = opentensor(modelpath + '/WHO.txt')
WCO = opentensor(modelpath + '/WCO0.txt')
WXF = opentensor(modelpath + '/WXF.txt')
bF = opentensor(modelpath + '/bf0.txt')
WHF = opentensor(modelpath + '/WHF0.txt')
WCF = opentensor(modelpath + '/WCF0.txt')
WXI = opentensor(modelpath + '/WXIO.txt')
WHI = opentensor(modelpath + '/WHI.txt')
WCI = opentensor(modelpath + '/WCIO.txt')
WXC = opentensor(modelpath + '/WXC.txt')
WXO = opentensor(modelpath + '/WXO.txt')
WHC = opentensor(modelpath + '/WHC0.txt')
bC = opentensor(modelpath + '/bc0.txt')
bI = opentensor(modelpath + '/bi0.txt')
W2 = opentensor(modelpath + '/W2.txt')

next open the vocabulary file and create a list of the words.

wordlines = [line.rstrip('\n') for line in open(modelpath + '/vocab.txt', "rb")]

wordlist = []
for l in wordlines:
    wordlist.extend([pullword(l)])

worddict = { wordlist[i]: i for i in range(len(wordlist))}

The vocabulary is size 10000 and E is a 150x10000 matrix that has learned the compact representation of each word. getvec takes an english word looks in the wordlist to see if is there. If so, it returns the corresponding column vector of lenght 150.

def getvec(word, E):
    try:
        ind = worddict[word]
    except:
        print "word " + word + " not in dictionary"
        return
    V = E[:,ind]
    V.shape = (150,1)
    return V

def Sigmoid(x):
  return 1 / (1 + np.exp(-x))

the output vector of the rnn is a vector of length 10000. output[i] represents the relative likelyhood that the next word is the best to follow the string so far. Getwordsfromoutput returns the top 5 candidate words.

def getwordsfromoutput(output):
    lst = []
    for i in range(10000):
        lst.extend([(output[0,i], i)])
    dotsl = sorted(lst, key=lambda tup: -tup[0])
    #print dotsl[0:5]
    st = []
    for i in range(5):
        st.extend([wordlist[dotsl[i][1]]])
    return st

rnn is a direct translation of the lstm equations. the only difference is that we use an english word as input an return a list five possible nextwords as output.

def rnn(word,old_h, old_c):
      #features = SparseInputValue -> [10000 x *]
      Xvec = getvec(word, E)

      i = Sigmoid(np.matmul(WXI, Xvec) + np.matmul(WHI, old_h) + WCI * old_c + bI)
      f = Sigmoid(np.matmul(WXF, Xvec) + np.matmul(WHF, old_h) + WCF * old_c + bF)
      
      c = f*old_c + i *(np.tanh(np.matmul(WXC, Xvec) + np.matmul(WHC, old_h) + bC))
      
      o = Sigmoid(np.matmul(WXO, Xvec)+ np.matmul(WHO, old_h)+ (WCO * c)+ bO)
      
      h = o * np.tanh(c)
      
      #extract ordered list of five best possible next words
      q = h.copy()
      q.shape = (1, 200)
      output = np.matmul(q, W2)
      outlist = getwordsfromoutput(output)
      return h, c, outlist

import random
from random import randint

This takes any word as a starting point and constructs a sentence that is defined by the sequence generated by the RNN.

For the next word we randomly pick one of the top three suggested by the RNN.

c = np.zeros(shape = (200, 1))
h = np.zeros(shape = (200, 1))
output = np.zeros(shape = (10000, 1))
word = 'big'
sentence= word 
for _ in range(100):
    h, c, outlist = rnn(word, h, c)
    word = outlist[randint(0,4)]
    sentence = sentence + " " +word

print sentence+"."

big computer makers ' service businesses ' capacity under a plan covering its core parts since january was down almost half partly during october after hurricane all reported declines across its other areas where the <unk> will help reduce debt or interest rates while it has risen above overnight losses at c$ through an average life above its quarterly earnings rose sharply because of discontinued operations according early in london according with a company official </s> michael e. a. a. d. calif. vice president marketing group inc. santa bank holding co. palo r. john w. a. j. brown & sons ltd.