Multilayer Perceptrons with nodes of the following kind

EvilRobot has two dogs called Fido and Fifi
March 20, 2023
Define what it means for a heuristic function to be admissible,
March 20, 2023

Multilayer Perceptrons with nodes of the following kind

COMPUTER SCIENCE TRIPOS Part IB – 2017 – Paper 4
Artificial Intelligence (SBH)
This question is about neural networks. We consider initially multilayer perceptrons
with nodes of the following kind.
Pn
i=0 wizi σ
a
z = σ(a)
z0 = 1
w0
z1
w1
z2
w2
zn
wn
(a) Derive an expression for the gradient ∂Ei(w)
∂wj
for weight wj
in an output node
when Ei(w) is the error for the ith example
Ei(w) = 1
2
(yi − h(w; xi))2
,
h(w; xi) is the output of the complete network for the ith example, and σ(a) = a.
You need only derive the expression for the output node. [3 marks]
(b) Derive an expression for the gradient ∂Ei(w)
∂wj
for weight wj
in an output node
when σ(a) = 1/(1 + exp(−a)) and the error for the ith example is
Ei(w) = −yi
log h(w; xi) + (1 − yi) log(1 − h(w; xi)).
You may use the fact that dσ(a)/da = σ(a)(1 − σ(a)). You need only derive the
expression for the output node. [7 marks]
(c) In the standard backpropagation algorithm the central quantity of interest for
each node N is δ = ∂Ei(w)/∂a. It is proposed that, instead of using nodes in
the form presented above, we introduce functions φi and construct multilayer
networks from nodes that compute z = σ(a) where
a =
Xn
i=0
wiφi(z).
Here, z
T =

z0 z1 · · · zn

and the functions φi are fixed, having no further
parameters. A multilayer perceptron is constructed from nodes of this kind. Give
a detailed, general derivation of the formula for computing δ for a non-output
node N in this network, assuming you know the values of δ for the nodes
connected to the output of N. [10 marks]