Artificial Intelligence (sbh11)

Artificial Intelligence (sbh11)
March 20, 2023
Derive a gradient descent algorithm for training the linear regression model described
March 20, 2023

Artificial Intelligence (sbh11)

COMPUTER SCIENCE TRIPOS Part IB – 2022 – Paper 6
Artificial Intelligence (sbh11)
This question addresses a variation on the usual multilayer perceptron.
x
(1)
1
x
(1)
n
w, w0
x
(2)
1
.
.
.
x
(2)
n
w, w0
x
(p)
1
.
.
.
x
(p)
n
w, w0
.
.
.
.
.
.
v, v0
.
.
.
y
σ(a1)
σ(a2)
σ(ap)
σ
The input vector is divided into p groups each with n elements. Let x
(j)
i be the ith
element in the jth group and let x
(j) = ( x
(j)
1
· · · x
(j)
n )
T
. There are p nodes in the
hidden layer, which share a single weight vector w and bias w0. Thus the kth hidden
node computes σ(ak) where σ is an activation function and
ak = w
T x
(k) + w0.
Let a = ( a1 · · · ap )
T
. The output node then combines the hidden nodes in the
usual way using weights v and v0 to produce y = σ(a) where
a =
X
p
i=1
viσ(ai) + v0.
Collecting all the parameters of the network into a single vector θ, the error for a
single labelled example is E(θ).
(a) Show that the value of δ = ∂E(θ)/∂a is
δ = σ
0
(a)
∂E(θ)
∂y .
[3 marks]
(b) Find expressions for the partial derivatives of E(θ) with respect to the
parameters of the single output node. [5 marks]
(c) Show that the partial derivatives δi = ∂E(θ)/∂ai
for the hidden nodes are
δi = δviσ
0
(ai).
[5 marks]
(d) Find expressions for the partial derivatives of E(θ) with respect to the
parameters of the hidden nodes. [7 marks]