Artificial Intelligence
In the following, N is a feedforward neural network architecture taking a vector
x
T = ( x1 x2 · · · xn )
of n inputs. The complete collection of weights for the network is denoted w and
the output produced by the network when applied to input x using weights w is
denoted N(w, x). The number of outputs is arbitrary. We have a sequence s of m
labelled training examples
s = ((x1, l1),(x2, l2), . . . ,(xm, lm))
where the li denote vectors of desired outputs. Let E(w; (xi
, li)) denote some
measure of the error that N makes when applied to the ith labelled training
example. Assuming that each node in the network computes a weighted summation
of its inputs, followed by an activation function, such that the node j in the network
computes a function
g
w
(j)
0 +
X
k
i=1
w
(j)
i
input(i)
!
of its k inputs, where g is some activation function, derive in full the
backpropagation algorithm for calculating the gradient
∂E
∂w
=
∂E
∂w1
∂E
∂w2
· · ·
∂E
∂wW
T
for the ith labelled example, where w1, . . . , wW denotes the complete collection of
W weights in the network.
[20 marks]