One of the many good examples in CNTK is language modeling exercise in Examples/Text/PennTreebank. The documentation for this one is a bit sparse and the example is really just of a demo for how easy it is to use their “Simple Network Builder” to define a LSTM network and train it with stochastic gradient decent on data from the Penn Treebank Project. One command starts the learning:
cntk configFile=../Config/rnn.cntk
Doing so trains the network, tests it and saves the model. However, to see the model data in an easily readable form you need a trivial addition to the configfile: you need to add the following dumpnode command to put a dump file a directory of your choosing.
dumpnode=[
action = "dumpnode"
modelPath = "$ModelDir$/rnn.dnn"
outputFile = "$OutputDir$/modeltext/dump"
]
However we have already run it and
the model data is in a ziped file here.This file is about 50MB so place it in a directory called models. This program does not require CNTK to run.
import numpy as np
import numpy.linalg as la
import math
The following is a ulitity to pull the word from a line in the vocab file.
modelpath = "path to your models"
def pullword(l):
i = 0
while l[i] == ' ' or l[i] == '\t': i+=1
while l[i] != ' ' and l[i] != '\t': i+=1
while l[i] == ' ' or l[i] == '\t': i+=1
while l[i] != ' ' and l[i] != '\t': i+=1
while l[i] == ' ' or l[i] == '\t': i+= 1
strm = ""
while l[i] != ' ' and l[i] != '\t':
strm = strm+l[i]
i += 1
return strm
OpenTensor is a function that opens the trained model files generated by cntk.
def opentensor(path):
with open(path) as file:
El = [[float(digit) for digit in line.split()] for line in file]
return np.array(El)
E = opentensor(modelpath + '/E0.txt')
bO = opentensor(modelpath + '/bo0.txt')
WHO = opentensor(modelpath + '/WHO.txt')
WCO = opentensor(modelpath + '/WCO0.txt')
WXF = opentensor(modelpath + '/WXF.txt')
bF = opentensor(modelpath + '/bf0.txt')
WHF = opentensor(modelpath + '/WHF0.txt')
WCF = opentensor(modelpath + '/WCF0.txt')
WXI = opentensor(modelpath + '/WXIO.txt')
WHI = opentensor(modelpath + '/WHI.txt')
WCI = opentensor(modelpath + '/WCIO.txt')
WXC = opentensor(modelpath + '/WXC.txt')
WXO = opentensor(modelpath + '/WXO.txt')
WHC = opentensor(modelpath + '/WHC0.txt')
bC = opentensor(modelpath + '/bc0.txt')
bI = opentensor(modelpath + '/bi0.txt')
W2 = opentensor(modelpath + '/W2.txt')
next open the vocabulary file and create a list of the words.
wordlines = [line.rstrip('\n') for line in open(modelpath + '/vocab.txt', "rb")]
wordlist = []
for l in wordlines:
wordlist.extend([pullword(l)])
worddict = { wordlist[i]: i for i in range(len(wordlist))}
The vocabulary is size 10000 and E is a 150x10000 matrix that has learned the compact representation of each word. getvec takes an english word looks in the wordlist to see if is there. If so, it returns the corresponding column vector of lenght 150.
def getvec(word, E):
try:
ind = worddict[word]
except:
print "word " + word + " not in dictionary"
return
V = E[:,ind]
V.shape = (150,1)
return V
def Sigmoid(x):
return 1 / (1 + np.exp(-x))
the output vector of the rnn is a vector of length 10000. output[i] represents the relative likelyhood that the next word is the best to follow the string so far. Getwordsfromoutput returns the top 5 candidate words.
def getwordsfromoutput(output):
lst = []
for i in range(10000):
lst.extend([(output[0,i], i)])
dotsl = sorted(lst, key=lambda tup: -tup[0])
#print dotsl[0:5]
st = []
for i in range(5):
st.extend([wordlist[dotsl[i][1]]])
return st
rnn is a direct translation of the lstm equations. the only difference is that we use an english word as input an return a list five possible nextwords as output.
def rnn(word,old_h, old_c):
#features = SparseInputValue -> [10000 x *]
Xvec = getvec(word, E)
i = Sigmoid(np.matmul(WXI, Xvec) + np.matmul(WHI, old_h) + WCI * old_c + bI)
f = Sigmoid(np.matmul(WXF, Xvec) + np.matmul(WHF, old_h) + WCF * old_c + bF)
c = f*old_c + i *(np.tanh(np.matmul(WXC, Xvec) + np.matmul(WHC, old_h) + bC))
o = Sigmoid(np.matmul(WXO, Xvec)+ np.matmul(WHO, old_h)+ (WCO * c)+ bO)
h = o * np.tanh(c)
#extract ordered list of five best possible next words
q = h.copy()
q.shape = (1, 200)
output = np.matmul(q, W2)
outlist = getwordsfromoutput(output)
return h, c, outlist
import random
from random import randint
This takes any word as a starting point and constructs a sentence that is defined by the sequence generated by the RNN.
For the next word we randomly pick one of the top three suggested by the RNN.
c = np.zeros(shape = (200, 1))
h = np.zeros(shape = (200, 1))
output = np.zeros(shape = (10000, 1))
word = 'big'
sentence= word
for _ in range(100):
h, c, outlist = rnn(word, h, c)
word = outlist[randint(0,4)]
sentence = sentence + " " +word
print sentence+"."