$\newcommand{\xv}{\mathbf{x}} \newcommand{\Xv}{\mathbf{X}} \newcommand{\yv}{\mathbf{y}} \newcommand{\zv}{\mathbf{z}} \newcommand{\av}{\mathbf{a}} \newcommand{\Wv}{\mathbf{W}} \newcommand{\wv}{\mathbf{w}} \newcommand{\tv}{\mathbf{t}} \newcommand{\Tv}{\mathbf{T}} \newcommand{\muv}{\boldsymbol{\mu}} \newcommand{\sigmav}{\boldsymbol{\sigma}} \newcommand{\phiv}{\boldsymbol{\phi}} \newcommand{\Phiv}{\boldsymbol{\Phi}} \newcommand{\Sigmav}{\boldsymbol{\Sigma}} \newcommand{\Lambdav}{\boldsymbol{\Lambda}} \newcommand{\half}{\frac{1}{2}} \newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}} \newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}}$
Steve Kommrusch
There are certain problem categories where having synthetic examples of input can improve the ability of an algorithm to optimally process the data space. One recent example in the literature is the CLGEN program [1] which generates synthetic GPU kernels using an LSTM network trained on code from github. The generated programs are then used to improve compilation techniques because a broader sample of possible programs can be analyzed by the compile step. Synthetic data can also be useful when directly using the data in question may have privacy implications (such as health data associated with specific patients).
For this project, I am interested in training a neural network to generate new examples of 3D shapes that fit within a broad learned category. I will implement the neural network using the Torch framework [2] as it is a common platform used broadly in the community. I have already ported HW3 from CS480 into Torch and many concepts and functions are easily remapped. For example, Torch has many of the same array access functions that numpy has [3]. This project ipynb file was edited using 'itorch notebook' instead of 'jupyter notebook' used in CS480.
While the model I hope to produce should be generally applicable to learning and creating 3D shapes, for this project I plan to use only this specific training shape set:
I am interested in exploring the best algorithms to use for this generation. I have not found LSTMs to be easy to use in Torch (examples and tutorials were cumbersome to use and often did not work correctly), and so will not be testing image generation with LSTMs (see appendix B for more details). But preliminary testing on 2D shape generation with a standard 3-layer fully connected network showed promise. The 'Methods' section will discuss algorithm ideas a bit more, but I am interested in exploring autoencoding through a bottleneck layer and various Torch training approaches for learned 3D image generation. I will try to find the algorithm and network parameters that result in the best images generated in the shortest time. To that end, I will use the CUDA options within Torch (which are easy to use). (Appendix A covers Torch and CUDA experiments I have done. The 'Results' section includes some discussion of performance stability I had with CUDA).
Some key algorithm questions are:
Some key data questions are:
This data and algorithms are interesting for several reasons. As noted, the general problem of synthetic data generation is useful in certain medical applications, which relates to research I am doing outside of CS480. Also, the algorithms best used for 3D shape learning are interesting even more broadly as large 3D data sets arise in many contexts and learning how to efficiently interact with this data using neural networks will be broadly valuable.
I broke the problem of generating 3D images within a trained range into these steps:
Regarding steps 4,5, and 6, below is a figure showing the final network that I found worked best for this problem. The autoencoder network has 7 layers of neurons with 2048,1024,64,4,64,1024, and 2048 neurons in each layer. This network is trained to generate the same output as input. The bottleneck layer can be observed to see how well spaced the source images are in the 4 dimensional feature space. Then the same weights learned for the autoencoder can be copied to a new network for image generation using 4 inputs. Since the network uses tanh for nonlinearity, the range of value on the 4 inputs is -1 to 1. 4 random numbers from -1 to 1 can be fed in to the generation network to create a novel image in a style similar to those that the network trained on. Images similar to a particular input image can be generated by varying the 4 feature values slightly from the input's mapping.
itorch.image("Autoencode.png")
The next cells create functions for generating the sphere, cube, rod, and cone image groups discussed in the introduction. I generate the 100 48x48x48 images. The first images are spheres and the last are large cones connected to a sphere with a rod. Sizes and orientation are randomized. During training, the order that images are presented to the network is randomized.
-- Here are the torch packages required for executing the
-- cells in this notebook
require 'torch'
require 'nn' -- neural networks
require 'optim' -- Improved optimizers including Conjugate Gradient
require 'cutorch' -- Improves performance by using GPU compute
require 'cunn'
-- Set up functions that allow shapes of various sizes, offsets, and rotations to be built
sphere = function(r,xoff,yoff,zoff,i)
local rsq = r*r
for xi = 1-xoff,size-xoff do
x = xi-(size/2)-0.5
for yi = 1-yoff,size-yoff do
y = yi-(size/2)-0.5
for zi = 1-zoff,size-zoff do
z = zi - (size/2)-0.5
if (x*x + y*y + z*z <rsq) then
X[i][xi+xoff][yi+yoff][zi+zoff] = 1.0
end
end
end
end
end
cube = function(r,xoff,yoff,zoff,rotation,i)
for xi = 1-xoff,size-xoff do -- size of world to position in
for yi = 1-yoff,size-yoff do
for zi = 1-zoff,size-zoff do
if (rotation < 0.25) then
x = xi - (size/2) - 0.5
y = yi - (size/2) - 0.5
z = zi - (size/2) - 0.5
elseif (rotation < 0.5) then
x = 1.414 * (xi - (size/2) - 0.5)
y = zi + yi - size - 1
z = zi - yi
elseif (rotation < 0.75) then
x = xi + zi - size - 1
y = 1.414 * (yi - (size/2) - 0.5)
z = zi - xi
else
x = xi - yi
y = xi + yi - size - 1
z = 1.414 * (zi - (size/2) - 0.5)
end
if (x > -r and x < r and y > -r and y < r and z > -r and z < r) then
X[i][xi+xoff][yi+yoff][zi+zoff] = 1.0
end
end
end
end
end
rod = function(r,xoff,yoff,zoff,rotation,i)
local rsq = r*r/4 -- This results in a thin cylinder
for xi = 1-xoff,size-xoff do
for yi = 1-yoff,size-yoff do
for zi = 1-zoff,size-zoff do
if (rotation < 0.33) then
x = xi - (size/2) - 0.5
y = yi - (size/2) - 0.5
z = zi - (size/2) - 0.5
elseif (rotation < 0.67) then
x = zi - (size/2) - 0.5
y = xi - (size/2) - 0.5
z = yi - (size/2) - 0.5
else
x = yi - (size/2) - 0.5
y = zi - (size/2) - 0.5
z = xi - (size/2) - 0.5
end
if (x > -r and x < r and y*y + z*z < rsq) then
X[i][xi+xoff][yi+yoff][zi+zoff] = 1.0
end
end
end
end
end
cone = function(r,xoff,yoff,zoff,rotation,i)
local rsq
for xi = 1-xoff,size-xoff do
for yi = 1-yoff,size-yoff do
for zi = 1-zoff,size-zoff do
if (rotation < 0.167) then
x = xi - (size/2) - 0.5
y = yi - (size/2) - 0.5
z = zi - (size/2) - 0.5
elseif (rotation < 0.333) then
x = -(xi - (size/2) - 0.5)
y = yi - (size/2) - 0.5
z = zi - (size/2) - 0.5
elseif (rotation < 0.5) then
x = zi - (size/2) - 0.5
y = xi - (size/2) - 0.5
z = yi - (size/2) - 0.5
elseif (rotation < 0.667) then
x = -(zi - (size/2) - 0.5)
y = xi - (size/2) - 0.5
z = yi - (size/2) - 0.5
elseif (rotation < 0.833) then
x = yi - (size/2) - 0.5
y = zi - (size/2) - 0.5
z = xi - (size/2) - 0.5
else
x = -(yi - (size/2) - 0.5)
y = zi - (size/2) - 0.5
z = xi - (size/2) - 0.5
end
rsq = (r-x)*(r-x)/4 -- radius of yz circle grows as x changes
if (x > -r and x < r and y*y + z*z < rsq) then
X[i][xi+xoff][yi+yoff][zi+zoff] = 1.0
end
end
end
end
end
-- Generate input training images
images = 100*6 -- Must be multiple of 6 to allow XYZ order permutations
size = 48
X = torch.Tensor(images,size,size,size):fill(0)
for i = 1,images/6 do
shape = 6.0*i/images -- 7% sphere, 9% each 3 simple shapes, 11% for 6 shape pairs
rotation = torch.uniform() -- random orientation of single shape or position of pair
r = (torch.uniform()+1)*(size/4) -- single shape 50-100% of available radius
-- Double shapes have large and small radii, and rod rotation
rlarge = torch.uniform()*size*0.075+size*0.225 -- 60% to 45% of available radius
rsmall = torch.uniform()*size*0.05+size*0.175 -- 45% to 35% of available radius
if (rotation < 0.17) then xdir=1 elseif (rotation < 0.33) then xdir=-1 else xdir=0 end
if (rotation < 0.33) or (rotation >= 0.67) then zdir=0
elseif (rotation < 0.5) then zdir=-1 else zdir=1 end
if (rotation < 0.67) then ydir=0 elseif (rotation < 0.83) then ydir=-1 else ydir=1 end
if (shape < 0.07) then -- Sphere
sphere(r,0,0,0,i)
elseif (shape < 0.16) then -- cube with random 45 degree rotation
cube(r,0,0,0,rotation,i)
elseif (shape < 0.25) then -- rod with 90 degree rotation
rod(r,0,0,0,rotation,i)
elseif (shape < 0.34) then -- cone with 90 degree rotation
cone(r,0,0,0,rotation,i)
elseif (shape < 0.45) then -- large sphere connected to small sphere by rod
rod(size/4,0,0,0,rotation,i)
sphere(rlarge,xdir*size*0.2,ydir*size*0.2,zdir*size*0.2,i)
sphere(rsmall,-xdir*size*0.25,-ydir*size*0.25,-zdir*size*0.25,i)
elseif (shape < 0.56) then -- large cube connected to small cube by rod
rod(size/4,0,0,0,rotation,i)
cube(rlarge,xdir*size*0.2,ydir*size*0.2,zdir*size*0.2,torch.uniform(),i)
cube(rsmall,-xdir*size*0.25,-ydir*size*0.25,-zdir*size*0.25,torch.uniform(),i)
elseif (shape < 0.67) then -- large cone connected to small cone by rod
rod(size/4,0,0,0,rotation,i)
cone(rlarge,xdir*size*0.2,ydir*size*0.2,zdir*size*0.2,torch.uniform(),i)
cone(rsmall,-xdir*size*0.25,-ydir*size*0.25,-zdir*size*0.25,torch.uniform(),i)
elseif (shape < 0.78) then -- large sphere connected to small cube by rod
rod(size/4,0,0,0,rotation,i)
sphere(rlarge,xdir*size*0.2,ydir*size*0.2,zdir*size*0.2,i)
cube(rsmall,-xdir*size*0.25,-ydir*size*0.25,-zdir*size*0.25,torch.uniform(),i)
elseif (shape < 0.89) then -- large cube connected to small cone by rod
rod(size/4,0,0,0,rotation,i)
cube(rlarge,xdir*size*0.2,ydir*size*0.2,zdir*size*0.2,torch.uniform(),i)
cone(rsmall,-xdir*size*0.25,-ydir*size*0.25,-zdir*size*0.25,torch.uniform(),i)
else -- large cone connected to small sphere by rod
rod(size/4,0,0,0,rotation,i)
cone(rlarge,xdir*size*0.2,ydir*size*0.2,zdir*size*0.2,torch.uniform(),i)
sphere(rsmall,-xdir*size*0.25,-ydir*size*0.25,-zdir*size*0.25,i)
end
end
Here are a few images showing 3 of the 48x48x48 images. Each of the 3 images has 48 2D images, one 2D image for each slice of the 3D image. The first image shows a sphere, the last 2 show 2 different examples of a large cube connected to a small cone.
itorch.image(X[1])
itorch.image(X[81])
itorch.image(X[82])
The full 48x48x48 source images are unnecessarily large for training, so the step below averages 8 voxels together to create 24x24x24 images for training. Also, 5 permutations (reflection and rotations) of the initial randomly oriented image are produced. This mimics how a limited input image set can be used to improve network training.
-- TIME NOTE: This cell takes about 2 minutes to run on my laptop
-- Generate 24x24x24 images by down-sampling 48x48x48
X24 = torch.Tensor(images,size/2,size/2,size/2):fill(0)
for i = 1,images/6 do for x = 1,size/2 do for y = 1,size/2 do for z = 1,size/2 do
X24[i*6][x][y][z] = (X[i][x*2][y*2][z*2] + X[i][x*2-1][y*2][z*2]
+X[i][x*2][y*2-1][z*2] + X[i][x*2-1][y*2-1][z*2]
+X[i][x*2][y*2][z*2-1] + X[i][x*2-1][y*2][z*2-1]
+X[i][x*2][y*2-1][z*2-1] + X[i][x*2-1][y*2][z*2-1])/8
X24[i*6-5][1+size/2-x][z][y] = X24[i*6][x][y][z] -- Permutation speeds up this generation
X24[i*6-4][y][x][z] = X24[i*6][x][y][z] -- step and emulates methods for creating
X24[i*6-3][1+size/2-y][z][x] = X24[i*6][x][y][z] -- more training samples from a limited
X24[i*6-2][z][x][y] = X24[i*6][x][y][z] -- source data set.
X24[i*6-1][1+size/2-z][y][x] = X24[i*6][x][y][z]
end end end end
In addition to being easier to train on, the 24x24x24 images are easier to display. Below is one image of each shape group and one of its 5 permutations. Each row has 24 2D images side-by-side to allow all 24 2D slices of one image to be shown in a row.
for i = 1,91,10 do -- Print example of all 10 shapes
itorch.image(X24[i*6]:reshape(24,24*24)) -- One of the 100 random sized/rotated originals
itorch.image(X24[i*6-3]:reshape(24,24*24)) -- One of 5 rotate/reflects of the original
end
We have images, now below I define a neural network for training. In the Results section I'll summarize experiments with parameters, but this neural network was, ultimately, the best configuration I found.
net = nn.Sequential() -- 'net' is the name for the autoencoder network
net:add(nn.Reshape(24*24*24))
net:add(nn.Linear(24*24*24, 2048))
net:add(nn.Tanh()) -- Hyperbolic tangent for non-linearity
net:add(nn.Linear(2048,1024))
net:add(nn.Tanh())
net:add(nn.Linear(1024,64))
net:add(nn.Tanh())
net:add(nn.Linear(64,4)) -- Bottleneck layer has 4 neurons
net:add(nn.Tanh())
net:add(nn.Linear(4,64))
net:add(nn.Tanh())
net:add(nn.Linear(64,1024))
net:add(nn.Tanh())
net:add(nn.Linear(1024,2048))
net:add(nn.Tanh())
net:add(nn.Linear(2048, 24*24*24))
net:add(nn.Reshape(24,24,24))
criterion = nn.MSECriterion() -- Use mean square error function to train on
-- Transfer everything to CUDA memory organization to aid performance
net = net:cuda()
criterion = criterion:cuda()
X24=X24:cuda()
dataset={} -- The 'trainer' class wants input/target organized in this format
function dataset:size() return images end
for i = 1,images do
dataset[i] = {X24[i],X24[i]}
end
-- Use criterion and net in quick test to catch simple typos
predict=net:forward(X24)
print(criterion:forward(predict, X24)) -- MSE of untrained network
itorch.image(predict[1]:reshape(24,24*24)) -- show example output of untrained network
The first network I tested is a simple training loop as documented in the Torch repository [10].
-- TIME NOTE: 100 iterations takes about 104 seconds on my laptop
timer = torch.Timer();
for i = 0, 100 do
-- feed the neural network and the criterion
mse = criterion:forward(net:forward(X24), X24)
-- train over this example in 3 steps
-- (1) zero the accumulation of the gradients
net:zeroGradParameters()
-- (2) accumulate gradients
net:backward(X24, criterion:backward(net.output, X24))
-- (3) update parameters with learning rate
net:updateParameters(0.01)
if (i%50 == 0) then
print("Iteration "..i..": mse="..mse)
end
end
print('Time elapsed: ' .. timer:time().real .. ' seconds')
The nn.StochasticGradient trainer is also explained in the Torch repository [10] and runs slower per iteration but converges better than the loop built in steps. But the nn package does not include a conjugate gradient option. (It would be interesting to know more details of this package, but I found that the optim package worked better and so didn't explore the issue.)
-- TIME NOTE: 2 iterations takes about 138 seconds on my laptop
timer = torch.Timer()
trainer = nn.StochasticGradient(net, criterion)
trainer.learningRate = 0.01
trainer.maxIteration = 2 -- Set to 2 for an example, see results section for experiments
trainer:train(dataset)
print('Time elapsed: ' .. timer:time().real .. ' seconds')
Having seen the improvement the conjugate gradient method can have over stochastic gradient decent (see Appendix A), I learned from references [11] and [12] that the 'optim' package provides a cg function that can be used for training. First, I tested with optim's stochastic gradient decent as shown in the following cells.
x, dl_dx = net:getParameters()
sgd_params = {
learningRate = 0.01
}
stepsgd = function(batch_size)
local current_loss = 0
local shuffle = torch.randperm(dataset:size())
batch_size = batch_size or 1000
for t = 1,dataset:size(),batch_size do
-- setup inputs for this mini-batch
-- no need to setup targets, since they are the same
local size = math.min(t + batch_size, 1+ dataset:size()) - t
local inputs = torch.Tensor(size, 24, 24, 24):cuda()
for i = 1,size do
inputs[i] = dataset[shuffle[i+t-1]][1]
end
local feval = function(x_new)
-- reset data
if x ~= x_new then x:copy(x_new) end
dl_dx:zero()
-- perform mini-batch gradient descent
local loss = criterion:forward(net:forward(inputs), inputs)
net:backward(inputs, criterion:backward(net.output, inputs))
return loss, dl_dx
end
_, fs = optim.sgd(feval, x, sgd_params)
-- fs is a table containing value of the loss function
-- (just 1 value for the SGD optimization)
current_loss = current_loss + fs[1]
end
return current_loss
end
--TIME NOTE: 100 iterations takes about 110 seconds on my laptop
timer = torch.Timer()
max_iters=100
do
for i = 1,max_iters do
local loss = stepsgd()
if (i%50 == 0) then
print(string.format('Epoch: %d Current loss: %4f', i, loss))
end
end
end
print('Time elapsed: ' .. timer:time().real .. ' seconds')
Finally, the very best training I got was using the optim.cg package which is used by the cells below. As discussed in the results section, these 2 cells describe the method I used for my best autoencoder training results. Running this notebook sequentially does a little training with 4 different methods, but just using the following 2 cells after redefining the neural network 'net' results in the lowest error.
x, dl_dx = net:getParameters()
cg_params = {
maxIter = 20
}
stepcg = function(batch_size)
local current_loss = 0
local shuffle = torch.randperm(dataset:size())
batch_size = batch_size or 1000
for t = 1,dataset:size(),batch_size do
-- setup inputs for this mini-batch
-- no need to setup targets, since they are the same
local size = math.min(t + batch_size, 1+ dataset:size()) - t
local inputs = torch.Tensor(size, 24, 24, 24):cuda()
for i = 1,size do
inputs[i] = dataset[shuffle[i+t-1]][1]
end
local feval = function(x_new)
-- reset data
if x ~= x_new then x:copy(x_new) end
dl_dx:zero()
-- perform mini-batch gradient descent
local loss = criterion:forward(net:forward(inputs), inputs)
net:backward(inputs, criterion:backward(net.output, inputs))
return loss, dl_dx
end
_, fs = optim.cg(feval, x, cg_params)
-- fs is a table containing value of the loss function
current_loss = current_loss + fs[1]
end
return current_loss
end
-- TIME NOTE: Running 100 iterations takes about 500 seconds on my laptop with 'fast' CUDA
timer = torch.Timer()
max_iters=1000
for i = 1,max_iters do
local loss = stepcg()
if (i%1000 == 0) then
print(string.format('Epoch: %d Current loss: %4f', i, loss))
end
end
print('Time elapsed: ' .. timer:time().real .. ' seconds')
I have now shown the data setup and autoencoder neural network methodology I developed. In the results section, I'll analyze the training and experimentation in more detail. Also, I leave the description of the 'gen' network used to generate images and the post-process step to create final 48x48x48 generated outputs until the later half of the 'Results' section as they will be more understandable after seeing the results of the autoencoder training.
During the setup and configuration of this project, I explored various approaches to learn which path would yield the best training with the least amount of time.
1) My first attempts at training were with the loop that explicitly does net:backward and net:updateParameters steps. I found that even over 100,000 iterations on autoencoder models with initial layers 1,000 and 4,000 neurons did not converge well. An example detail from this initial testing was that 500 training iterations with a network consisting of layers with 1024,512,64,8,3,8,64,512,1024 neurons took 160 seconds and yielded 0.1508 from the MSECriterion (lower is better, image results were very poor). With CUDA, the same 500 iterations took 11.6 sec and yielded a similar 0.1502 MSE.
2) Next, based on internet research, I trained with nn.StochasticGradient(net,criterion):train feature. 100 iterations with a 1024,512,64,8,3,8,64,512,1024 network on 100 images took 730 sec and yielded 0.1137 from MSECriterion. With CUDA, the same 100 iterations took 76 sec and yielded 0.1195 MSE. Below are more experiment results with CUDA on the nn.StochasticGradient approach. An MSE of 0.0452 has about the right size for the objects, but the shape is still a formless blob.
Network layer sizes | 100 iter time (s) | 100 iter MSE | 1900 more time (s) | 1900 more MSE |
---|---|---|---|---|
1024,512,64,8,3,8,64,512,1024 | 76 | 0.1195 | 1422 | 0.0452 |
4096,512,64,8,3,8,64,512,4096 | 285 | 0.0770 | 5740 | 0.0447 |
512,64,8,3,8,64,512 | 38 | 0.1336 | 763 | 0.0527 |
3) In preparation for this project I had tested 2D image autoencoding with the conjugate gradient code provided in CS480 by professor Chuck Anderson. Given that long training times with SGD were not yielding good results, I researched conjugate gradient in Torch and found that the 'optim' package supports SGD and CG. With CUDA, 20,000 optim.sgd iterations ran in 355 sec and got 0.1021 with 512,64,8,3,8,64,512. 2,000 SGD iterations took 34 seconds to get 0.1489. 200 CG iterations took 127 seconds and got 0.0295 MSE - the best result yet and in a reasonable time!
4) And now for something completely different. During optim.cg testing my CUDA performance fell off for some reason. I've seen this before and it may be related to getting the GPU clock stuck in a low-power and low-frequency mode. I thought the latest approved Ubuntu updates might help, and agreed to the download, but then my system would no longer start up the windowing system, and I lost wireless and Ethernet even when in console mode. Fortunately, I had a backup of this project file that was only 1 hour of work old. After surfing for answers and trying various patches and edits, I ended up re-installing Ubuntu 16.04.2 and reloading CUDA and TORCH (and Anaconda and Java). It only cost me about 8 wall clock hours and maybe 4 hours of effort. Lesson learned - backing up your work is good! Sadly, after the reinstall, CUDA was still slower than the earlier numbers for a couple days, then it got better for a day, then it got worse again. After the slowdown, 500 iterations with the explicit net:backward calls for 1024,512,64,8,3,8,64,512,1024 took 180 sec without CUDA and 90 seconds with CUDA (compared to 160 and 11.6 from note 1 above). The general recommendation on the internet for such slowdown is to update drivers and/or try to reinstall Torch, but I don't want to risk breaking my system again.
5) Given the great results with optim.cg, I tested the circle, square, rod, cone 100 image training with various parameters:
Network layer sizes | CG iters | Train iters | time (s) | MSE | Comment |
---|---|---|---|---|---|
512,64,8,3,8,64,512 | 20 | 200 | 127(old) | 0.0295 | Pre-slowdown CUDA |
512,64,8,3,8,64,512 | 20 | 200 | 1069(new) | 0.0205 | Newer CUDA behavior |
512,64,8,3,8,64,512 | 20 | 5000 | 24882 | 0.0079 | Matched 100 simple shapes OK |
1024,512,64,8,3,8,64,512,1024 | 20 | 200 | 2103 | 0.0272 | Worse than smaller |
512,64,8,2,8,64,512 | 20 | 200 | 1093 | 0.0229 | 2 feature neurons |
512,32,2,32,512 | 20 | 200 | 1037 | 0.0301 | Worse than larger |
512,128,16,2,16,128,512 | 20 | 200 | 1051 | 0.0218 | Bit better than 512,64,8 |
512,128,32,8,2,8,32,128,512 | 20 | 200 | 1034 | 0.0381 | Worse than smaller |
256,64,8,2,8,64,256 | 20 | 200 | 627 | 0.0358 | Worse than 512 |
256,64,8,2,8,64,256 | 10 | 200 | 322 | 0.0479 | Worse than 20 CG |
256,64,8,2,8,64,256 | 40 | 100 | 630 | 0.0388 | Bit worse than 20/200 |
7) After the above experiments with the 4 simple shapes, I added the 6 shape pairs as planned, and created the 600 image training set described in the methodology section.
Network layer sizes | CG iters | Train iters | time (s) | MSE | Comment |
---|---|---|---|---|---|
256,64,8,2,8,64,256 | 20 | 200 | 1998 | 0.0354 | Harder to converge complex shapes |
512,128,16,3,16,128,512 | 20 | 2000 | ~30000 | ~0.0130 | Not quite done overnight |
512,128,16,2,16,128,512 | 20 | 100 | 1820 | 0.0396 | Worse than 200 w/ 256 in layer |
256,128,16,2,16,128,256 | 20 | 100 | 1050 | 0.0390 | Faster than 100 w/512 |
256,128,16,2,16,128,256 | 20 | 10 | 57 | 0.0496 | Initial set of 10 |
256,128,16,2,16,128,256 | 20 | 500 | 2926 | 0.0246 | Correct object size and count, but not shape |
256,128,16,2,16,128,256 | 20 | 50 | 294 | 0.0500 | Randomness made worse than 10 iterations |
256,128,16,2,16,128,256 | 20 | 50 | 36 | 0.0481 | CUDA's back! View random loss range |
256,128,16,2,16,128,256 | 20 | 50 | 37 | 0.0467 | Repeat, compare to 0.0481 |
256,128,16,2,16,128,256 | 20 | 100 | 75 | 0.0390 | Baseline for param explore |
128,128,16,2,16,128,128 | 20 | 100 | 18 | 0.0394 | Similar to 256 |
128,16,2,16,128 | 20 | 100 | 46 | 0.0391 | Similar to more layers |
512,16,2,16,512 | 20 | 100 | 137 | 0.0305 | Slower but better |
512,32,2,32,512 | 20 | 100 | 136 | 0.0315 | Maybe same |
512,2,512 | 20 | 100 | 134 | 0.0380 | Worse, too few layers |
512,32,2,32,512 | 20 | 100 | 136 | 0.0321 | Repeat, compare to 0.0315 |
512,16,2,16,512 | 20 | 100 | 135 | 0.0386 | Repeat, compare to 0.0305 |
2048,16,2,16,2048 | 20 | 100 | 494 | 0.0297 | More nodes seems better |
2048,32,2,32,2048 | 20 | 100 | 500 | 0.0275 | 32 layer seems better |
2048,64,2,64,2048 | 20 | 100 | 496 | 0.0288 | 32 layer seems better |
1024,32,2,32,1024 | 20 | 100 | 241 | 0.0303 | Faster, a bit worse than 2048 |
1024,32,3,32,1024 | 20 | 100 | 245 | 0.0285 | 3 feature have good variance in plots |
1024,32,4,32,1024 | 20 | 100 | 245 | 0.0258 | 4 feature have good variance in plots |
2048,32,4,32,2048 | 20 | 100 | 499 | 0.0196 | |
2048,32,4,32,2048 | 20 | 2000 | 9986 | 0.0061 | Shapes recognizable |
2048,32,4,32,2048 | 20 | 7000 | 33499 | 0.0035 | Shape pairs look good |
4096,64,4,64,4096 | 20 | 10 | n/a | n/a | Out of memory error |
2048,32,4,32,2048 | 20 | 200 | 7322 | 0.0156 | Ack! CUDA's slow again! |
2048,32,4,32,2048 | 20 | 900 | 33814 | 0.0145 | Final overnight training with slow CUDA |
I ran a few more experiments after the CS480 course ended. I went ahead and updated Ubuntu with the latest changes, insured CUDA was up to date and libraries were correct in LD_LIBRARY_PATH, reran 'luarocks install' on all torch required components; CUDA performance seems fast and stable for now. Note that runs above and below often have different random 600 images, as well as random network starting weights, so the same network can train to different MSE's, making comparisons difficult. In the end I prefer the 2048,1024,64,4,64,1024,2048 code - it converges well and has a good bottleneck layer 4-neuron distribution. I suspect the larger network may handle more naturally occurring shapes better, and 64 neurons perhaps allows for the 4-neuron distribution to be processed better.
Network layer sizes | CG iters | Train iters | time (s) | MSE | Comment |
---|---|---|---|---|---|
4096,64,4,64,4096 | 20 | 50 | n/a | n/a | Out of memory error |
3100,64,4,64,3100 | 20 | 50 | n/a | n/a | Out of memory error |
2600,64,4,64,2600 | 20 | 80 | n/a | n/a | Out of memory error |
2048,64,4,64,2048 | 20 | 100 | 505 | 0.0208 | Fast CUDA! 64 seems worse than 32 |
2304,64,4,64,2304 | 20 | 90 | n/a | n/a | Out of memory error |
2176,64,4,64,2176 | 20 | 95 | 537 | 0.0261 | Worse than 2048 |
2048,32,4,32,2048 | 20 | 100 | 500 | 0.0239 | Variability rather large |
2176,64,4,64,2176 | 20 | 100 | 562 | 0.0304 | Initially worse than 2048 |
2176,64,4,64,2176 | 20 | 100 | ? | 0.0316 | Data down to the 2000 iter row from same run |
2176,64,4,64,2176 | 20 | 200 | ? | 0.0272 | |
2176,64,4,64,2176 | 20 | 360 | ? | 0.0195 | Compare to 100 iter, 0.0196 for 2048,32 above |
2176,64,4,64,2176 | 20 | 590 | ? | 0.0155 | Compare to 200 iter, 0.0156 for 2048,32 above |
2176,64,4,64,2176 | 20 | 700 | ? | 0.0145 | Compare to 900 iter, 0.0145 for 2048,32 above |
2176,64,4,64,2176 | 20 | 900 | ? | 0.0133 | |
2176,64,4,64,2176 | 20 | 1000 | ? | 0.0128 | |
2176,64,4,64,2176 | 20 | 1500 | ? | 0.0100 | |
2176,64,4,64,2176 | 20 | 2000 | 11239 | 0.0090 | Not as good as 2048,32 after 2000 iter |
2176,1024,64,4,64,1024,2176 | 20 | 100 | n/a | n/a | Out of memory error |
2048,1024,64,4,64,1024,2048 | 20 | 100 | 541 | 0.0174 | Different 600 images from 2176,64 run above |
2048,1024,64,4,64,1024,2048 | 20 | 100 | ? | 0.0288 | Data down to 2000 iter row from same run |
2048,1024,64,4,64,1024,2048 | 20 | 200 | ? | 0.0224 | |
2048,1024,64,4,64,1024,2048 | 20 | 700 | ? | 0.0103 | |
2048,1024,64,4,64,1024,2048 | 20 | 900 | ? | 0.0091 | |
2048,1024,64,4,64,1024,2048 | 20 | 1000 | ? | 0.0086 | |
2048,1024,64,4,64,1024,2048 | 20 | 1500 | ? | 0.0068 | |
2048,1024,64,4,64,1024,2048 | 20 | 2000 | 10400 | 0.0061 | |
2048,1024,64,4,64,1024,2048 | 20 | 3000 | ? | 0.0052 | Continuing 8000 more after 2000 start |
2048,1024,64,4,64,1024,2048 | 20 | 7000 | ? | 0.0039 | Worse than 2048,32 result |
2048,1024,64,4,64,1024,2048 | 20 | 10000 | 49977 | 0.0034 | Total time for 10,000 iters |
2048,32,4,32,2048 | 20 | 100 | 503 | 0.0251 | Same 600 images from 2048,1024,64 run above |
2048,32,4,32,2048 | 20 | 200 | ? | 0.0209 | Data from 100 iter to 10000 is same run |
2048,32,4,32,2048 | 20 | 700 | ? | 0.0137 | |
2048,32,4,32,2048 | 20 | 900 | ? | 0.0128 | |
2048,32,4,32,2048 | 20 | 1000 | 5034 | 0.0121 | |
2048,32,4,32,2048 | 20 | 2000 | ? | 0.0094 | |
2048,32,4,32,2048 | 20 | 3000 | ? | 0.0084 | |
2048,32,4,32,2048 | 20 | 7000 | ? | 0.0051 | Worse than previous 2048,32 and 2048,1024 |
2048,1024,64,4,64,1024,2048 | 20 | 16000 | 71876 | 0.0024 | New 600 images |
Below is a plot of some of the above data to aid visualization. I compare 3 of the networks that I gathered a fair amount of trials for.
Plot = require("itorch.Plot")
local plot = Plot() -- Plot selected data from experiments above
plot:circle({0,100 ,100 ,200 ,900 ,2000 ,7000 ,100 ,200 ,700 ,900 ,1000 ,2000 ,3000 ,7000},
{0,0.0196,0.0239,0.0156,0.0145,0.0061,0.0035,0.0251,0.0209,0.0137,0.0128,0.0121,0.0094,0.0084,0.0051},
'red','2048,32')
plot:circle({0,100 ,100 ,200 ,360 ,590 ,700 ,900 ,1000 ,1500 ,2000},
{0,0.0304,0.0316,0.0272,0.0195,0.0155,0.0145,0.0133,0.0128,0.0100,0.0090},
'green','2176,64')
plot:circle({0,100 ,100 ,200 ,700 ,900 ,1000 ,1500, 2000, 3000, 7000, 10000},
{0,0.0174,0.0288,0.0224,0.0103,0.0091,0.0086,0.0068,0.0061,0.0052,0.0039,0.0034},
'blue','2048,1024,64')
plot:title('2048,32 vs 2176,64 vs 2048,1024,64')
plot:xaxis('Training iterations'):yaxis('MSE')
plot:legend(true)
plot:draw()
itorch.image("MSE.png") -- note that 'plot' is not saved in notebook, so load in a saved copy
In addition to the testing above, I also tried mixing SGD and CG a bit - to see if some SGD iteration work might help the CG move away from a local minimum, but I didn't find that too helpful. I ran some more training during the day, hence the final MSE is better than the last entry in the chart above. Note that when CUDA runs fast, overnight training got to trained images with good quality and an MSE error of 0.0035.
predict = net:forward(X24) -- Use the network on the training data
print("MSE for trained network used in remaining notebook cells: "..criterion:forward(predict, X24))
for i = 1,91,10 do -- Print 1 of each of 10 shapes and predicted shape
itorch.image(X24[i*6]:reshape(24,24*24))
itorch.image(predict[i*6]:reshape(24,24*24))
end
-- The shapes shown are: sphere, cube, rod, cone, 2 spheres, 2 cubes, 2 cones,
-- large sphere and small cube, large cube and small cone, large cone and small sphere
Below are diagrams showing the position of the 600 images in X24 in the 4-dimensional feature space. I plot neuron 1 vs neuron 2 in one plot and neuron 3 vs neuron 4 in the other. The pair of 2D plots is a simplification of the full 4 dimensional space, but it gives a sense of the distribution of the images. As the network trained the 4 dimensional space seemed to get used better and become less sparse - an indication that the network was learning more useful ways to condense the inputs down to 4 dimensions. Note that a goal of this project relates to having an expressive but dense set of feature neurons. I want to have a space where any random location will map to a 'reasonable' image given the training inputs. 4 neurons seems to be a good number for this problem - the space still has some sparsity to it, indicating there is room for more learning and differentiation, but 4 values result in good training results and are not too sparse, as will be explored next.
predict = net:forward(X24)
features = net:get(9).output
spheres=features[{{1,36},{}}]
cubes=features[{{37,90},{}}]
rods=features[{{91,144},{}}]
cones=features[{{145,198},{}}]
spheresphere=features[{{199,264},{}}]
cubecube=features[{{265,330},{}}]
conecone=features[{{331,396},{}}]
spherecube=features[{{397,462},{}}]
cubecone=features[{{463,528},{}}]
conesphere=features[{{529,600},{}}]
Plot = require("itorch.Plot")
local plot = Plot()
plot:circle(spheres[{{},1}],spheres[{{},2}],'red','Spheres')
plot:circle(cubes[{{},1}],cubes[{{},2}],'green','Cubes')
plot:circle(rods[{{},1}],rods[{{},2}],'blue','Rods')
plot:circle(cones[{{},1}],cones[{{},2}],'black','Cones')
plot:circle(spheresphere[{{},1}],spheresphere[{{},2}],'cyan','SphereSphere')
plot:circle(cubecube[{{},1}],cubecube[{{},2}],'magenta','CubeCube')
plot:circle(conecone[{{},1}],conecone[{{},2}],'yellow','ConeCone')
plot:circle(spherecube[{{},1}],spherecube[{{},2}],'orange','SphereCube')
plot:circle(cubecone[{{},1}],cubecone[{{},2}],'purple','CubeCone')
plot:circle(conesphere[{{},1}],conesphere[{{},2}],'grey','ConeSphere')
plot:title('Feature neuron values for training images')
plot:xaxis('Neuron 1'):yaxis('Neuron 2')
plot:legend(true)
plot:draw()
plot = Plot()
plot:circle(spheres[{{},3}],spheres[{{},4}],'red','Spheres')
plot:circle(cubes[{{},3}],cubes[{{},4}],'green','Cubes')
plot:circle(rods[{{},3}],rods[{{},4}],'blue','Rods')
plot:circle(cones[{{},3}],cones[{{},4}],'black','Cones')
plot:circle(spheresphere[{{},3}],spheresphere[{{},4}],'cyan','SphereSphere')
plot:circle(cubecube[{{},3}],cubecube[{{},4}],'magenta','CubeCube')
plot:circle(conecone[{{},3}],conecone[{{},4}],'yellow','ConeCone')
plot:circle(spherecube[{{},3}],spherecube[{{},4}],'orange','SphereCube')
plot:circle(cubecone[{{},3}],cubecone[{{},4}],'purple','CubeCone')
plot:circle(conesphere[{{},3}],conesphere[{{},4}],'grey','ConeSphere')
plot:title('Feature neuron values for training images')
plot:xaxis('Neuron 3'):yaxis('Neuron 4')
plot:legend(true)
plot:draw()
itorch.image("Neurons12.png") -- 'plot' is not saved in notebook, so load in a saved copy
itorch.image("Neurons34.png") -- 'plot' is not saved in notebook, so load in a saved copy
Now we have observed the values of the 4 feature neurons from the trained autoencoder. Torch provides a way to set the weights and bias values in one network based on the training in another one. First, we will create a 4 input network with the layers to the output matching the autoencoder (see the network diagram in the introduction for reference).
gen = nn.Sequential()
gen:add(nn.Linear(4,64))
gen:add(nn.Tanh())
gen:add(nn.Linear(64,1024))
gen:add(nn.Tanh())
gen:add(nn.Linear(1024,2048))
gen:add(nn.Tanh())
gen:add(nn.Linear(2048, 24*24*24))
gen:add(nn.Reshape(24,24,24))
Next we simply copy the parameters from the autoencoder output layers to the new network that can be used to generate 3D images.
print(net:parameters()) -- Print weight and bias tensors for autoencoder network
print(gen:parameters()) -- Print weight and bias tensors for generation network
for i = 1,8 do -- Assign weights and biases in generation network from trained autoencoder
gen:parameters()[i]:copy(net:parameters()[i+8])
end
To test this approach, below are the 4 feature neuron values for one of the more complex images (large cube and small cone). The cell after that generates 8 images: the first is the original 24x24x24 training input to the autoencoder; the second is the output of the autoencoder for that image; the 3rd image shows that the generator network produces exactly the same output as the autoencoder when fed with the same 4 feature values; the 4th through 8th images show how the generated image is affected as one of the feature inputs is varied. As training progresses, the 4-coordinate feature value for the image moves around a bit, as improved feature mappings are found.
print(net:get(9).output[486])
itorch.image(X24[486]:reshape(24,24*24)) -- original image
itorch.image(net.output[486]:reshape(24,24*24))
for i=0,5 do -- Check that gen outputs same image with same inputs, then vary 1 input a bit
itorch.image(gen:forward(torch.Tensor({-0.1864,-0.8316,0.6090,-0.0173+0.2*i})):reshape(24,24*24))
end
To explore the image generation more, below are 10 images generated from the 'gen' network. The first is the image from location 0,0,0,0 and the other 9 are images generated from random inputs to 'gen'.
gen_10 = torch.Tensor(15,size/2,size/2,size/2)
gen_10[1] = gen:forward(torch.Tensor({0,0,0,0})) -- Center of generaton space
for i = 2,10 do -- 9 'random' generated images
gen_10[i] = gen:forward(torch.Tensor({torch.uniform()*2-1,torch.uniform()*2-1,
torch.uniform()*2-1,torch.uniform()*2-1}))
end
for i = 1,10 do
itorch.image(gen_10[i]:reshape(24,24*24))
end
The final step for the 48x48x48 image generation is to decompress the 24x24x24 outputs from 'gen' into the final desired format. The step below is an algorithm that I played around with a bit (using various weightings for the original voxel value and neighbor voxel values). In the end, the algorithm expands the 'gen' output voxels into 8 voxels and cleans up some of the grey backgrounds from the network output.
-- Regenerate 48x48x48 images from 24x24x24 images - do for
-- original X data, autoencoder output, and generative output.
-- To save time, just do 10 examples of each type for viewing.
X24_10 = torch.Tensor(10,size/2,size/2,size/2)
X_10 = torch.Tensor(10,size,size,size)
X48 = torch.Tensor(10,size,size,size)
predict_10 = torch.Tensor(10,size/2,size/2,size/2)
predict48 = torch.Tensor(10,size,size,size)
gen48 = torch.Tensor(10,size,size,size)
neighbors = torch.Tensor(3,3,3)
voxelavg = function(orig,corner,edge1,edge2,edge3,face1,face2,face3)
retval = orig - 0.4 + (corner*2+edge1+edge2+edge3+face1+face2+face3)/9
if (retval > 1.0) then -- strongly on voxel
return 1.0
elseif (retval > 0.5) then -- partly on voxel
return retval
else
return 0 -- off voxel
end
end
decompress = function(in24,out48,i)
for x = 1,size/2 do for y = 1,size/2 do for z = 1,size/2 do
-- Process X24 into X48 to evaluate this decompression step
neighbors:fill(0) -- Start all dark, applies to 'out-of-bounds' values
for dx = -1,1 do for dy = -1,1 do for dz = -1,1 do
if (x+dx > 0 and x+dx <= size/2 and y+dy >0 and y+dy <= size/2
and z+dz > 0 and z+dz <= size/2) then
neighbors[dx+2][dy+2][dz+2] = in24[i][x+dx][y+dy][z+dz]
end
end end end
for dx = -1,0 do for dy = -1,0 do for dz = -1,0 do -- decompress to 8 voxels
out48[i][x*2+dx][y*2+dy][z*2+dz] = voxelavg(in24[i][x][y][z],
neighbors[3+2*dx][3+2*dy][3+2*dz],
neighbors[3+2*dx][3+2*dy][2],neighbors[3+2*dx][2][3+2*dz],neighbors[2][3+2*dy][3+2*dz],
neighbors[3+2*dx][2][2],neighbors[2][3+2*dy][2],neighbors[2][2][3+2*dz])
end end end
end end end
end
-- TIME NOTE: This cell takes about 2 minutes on my laptop
for i = 1,10 do
X24_10[i]:copy(X24[i*60-54]) -- Sample each shape type
X_10[i]:copy(X[i*10-9]) -- One of the 100 originals
predict_10[i]:copy(predict[i*60-54])
decompress (X24_10,X48,i)
decompress (predict_10,predict48,i)
decompress (gen_10,gen48,i)
end
X_10=X_10:cuda()
X48=X48:cuda()
predict48=predict48:cuda()
print("Test criterion of same matrix: "..criterion:forward(X_10,X_10))
print("Test compress/decompress result: "..criterion:forward(X_10,X48))
print("Test decompressed autoencoder results: "..criterion:forward(X_10,predict48))
-- Show example 48x48x48 images
itorch.image(X_10[9]) -- Original cube and cone is interesting (spheres are 'easy' to learn)
itorch.image(X24_10[9]:reshape(24,24*24)) -- 24x24x24 image from 48x48x48 original
itorch.image(X48[9]) -- Reconstructed original (no neural net, just decompression)
itorch.image(predict_10[9]:reshape(24,24*24)) -- 24x24x24 autoencoder output same cube and cone
itorch.image(predict48[9]) -- Reconstructed 48x48x48 from autoencoder output
itorch.image(gen_10[2]:reshape(24,24*24)) -- Randomly generated image from 'gen' network
itorch.image(gen48[2]) -- 48x48x48 version of random 'gen' output
itorch.image(gen_10[3]:reshape(24,24*24)) -- 2nd randomly generated image from 'gen' network
itorch.image(gen48[3]) -- 48x48x48 version of 2nd random 'gen' output
This approach presents a valid way to generate 3D shapes that are within a broadly defined group of learned objects, as well as a way to generate similar objects relative to an existing sample object. The input and use mechanisms were created so that a new set of 48x48x48 input shapes could be provided for training this system. There are various stages of Torch code that are used by this algorithm which could be optimized, but the main need for optimization is the training loop using conjugate gradients. Even when CUDA is working well, I expect there remain opportunities for improved performance given good compiler or library coding techniques. This code can be used to augment sparse image repositories in various useful ways.
[1] CLGEN site with overview diagrams, slide deck, and code: https://github.com/ChrisCummins/clgen
[2] Main page for Torch: http://torch.ch/
[3] Tables with numpy to Torch conversions: https://github.com/torch/torch7/wiki/Torch-for-Numpy-users
[4] Description of the Torch nn class: https://github.com/torch/nn/blob/master/doc/overview.md#nn.overview.dok
[5] Torch tutorial on nn and cuda: http://ronan.collobert.com/torch/
[6] Overview of Torch cudnn modules (like LSTM): https://github.com/soumith/cudnn.torch
[7] Documentation on Torch nn modules https://github.com/torch/nn/blob/master/doc/convolution.md#nn.TemporalConvolution
[8] Example usage for Torch cudnn.LSTM and other modules: https://github.com/soumith/cudnn.torch/blob/master/test/test_rnn.lua
[9] A Torch autoencoder: http://rnduja.github.io/2015/11/06/torch-autoencoder/
[10] Example of training a network in torch: https://github.com/torch/nn/blob/master/doc/training.md
[11] Example of using optim package for training a network: https://github.com/vic-w/torch-practice/blob/master/mnist/trainMnist.lua
[12] Description of optimization algorithms for optim in Torch: https://github.com/torch/optim/blob/master/doc/algos.md
My experiments for this project were informed by some testing I did with CS480 HW3 using Torch with CPU and GPU modes.
My laptop is the system these experiments were run on, it has this HW:
- Intel Core i7-6700HQ @2.6GHz (4 cores, 8 threads, 3.5GHz max)
- NVIDIA GeForce GTX 965M (1,024 CUDA cores, 1.15GHz max)
The neural network model from HW3 had 18 inputs, 1 layer of 15 neurons, 1 layer with 10 neurons, and 1 output. It was trained for 1000 iterations. (The HW3 code uses SCG instead of SGD for the backpropagation, which likely improves the rate of convergence, but shouldn't affect the performance discussed in this email).
- HW3 code using SCG from NeuralNetworks.py: 4.02s
- HW3 code using SGD from NeuralNetworks.py: 2.03s
- Torch CPU using SGD: 2.05s
- Torch GPU using SGD: 0.42s
- HW3 code for SCG trained to same RMSE as SGD: 0.19s
As another note, I ran CPU matrix multiply in Torch compared with CUDA 1,000x200,000, 200,000x1,000 to produce 1,000x1,000. The GPU was over 10X faster:
- CPU: 2.79s
- GPU: 0.21s
This performance was visible with some of the results in the main report above; although CUDA sometimes seems to get stuck in a low-performance mode on my laptop. Here are some samples of experiments repeated here for easy reference.
- Train a network with 13,824 inputs through 9 layers using updateParameters loop:
- CPU time: 160 seconds
- GPU time: 11.6 seconds
- Train nn.StochasticGradient:
- CPU time: 730 seconds
- GPU time: 76 seconds
-- Example code showing how to set up matrices for CUDA use in Torch
a=torch.rand(1000,200000)
b=torch.rand(200000,1000)
c=torch.Tensor(1000,1000)
timer = torch.Timer() -- the Timer starts to count now
c:mm(a,b)
print(c[5][5])
print('Time elapsed using CPU: ' .. timer:time().real .. ' seconds')
a=torch.rand(1000,200000):cuda()
b=torch.rand(200000,1000):cuda()
c=torch.Tensor(1000,1000):cuda()
timer = torch.Timer() -- the Timer starts to count now
c:mm(a,b)
print(c[4][4])
print('Time elapsed using CUDA: ' .. timer:time().real .. ' seconds')
a=torch.rand(10,20):cuda() -- shrink arrays so large arrays are garbage collected
b=torch.rand(20,10):cuda()
c=torch.Tensor(10,10):cuda()
For CS480 HW3, I analyzed stock price correlations and I thought that would be a good data set to test LSTM usage in Torch. For part of this project I had wanted to test LSTM use to generate 3D images since CLGEN [1] had success with that approach. In the end, I could not get meaningful learning to occur using either the cudnn:LSTM nor rnn:LSTM package. I suspect something about the data organization or LSTM parameter set was not correct in my code. In any case, for the case of 3D images, I expect that LSTMs would not have been ideal anyway, so I shifted my focus to the networks discussed in the main body of this project report. Below is some of the code I was attempting to use for LSTM training, based on references [6],[7], and [8].
-- This cell is example code and may not run well if attempted.
require 'cudnn'
lstm = cudnn.LSTM(18, 9, 3)
lstm = lstm:cuda()
print(lstm)
rnn = cudnn.RNN(18,9,3)
rnn = rnn:cuda()
print(rnn)
module = nn.Module(18,9,3)
module = module:cuda()
print(module)
LSTMtrain=torch.Tensor(90,16,18):cuda()
LSTMT=torch.Tensor(90,16,1):cuda()
for i = 1,90 do
for j = 1,16 do
LSTMtrain[{i,j,{}}] = Xtrain[{i*16+j-16,{}}]
LSTMT[{i,j,1}] = T[i*16+j-16]
end
end
w,dw = lstm:getParameters()
criterion = nn.MSECriterion():cuda()
timer = torch.Timer() -- the Timer starts to count now
for i = 0, 1000 do -- Train for 1000 epochs
-- reset LSTM state
lstm:reset()
-- feed it to the neural network and the criterion
criterion:forward(lstm:forward(LSTMtrain), LSTMT)
-- train over this example in 3 steps
-- (1) zero the accumulation of the gradients
dw:zero()
-- (2) accumulate gradients
lstm:backward(LSTMtrain, criterion:backward(lstm.output, LSTMT))
-- (3) update parameters with a 0.02 learning rate
lstm:updateParameters(0.02)
end
print('Time elapsed: ' .. timer:time().real .. ' seconds')