【深度学习】如何一步步实现SGD随机梯度下降算法

互联网
2025-08-31 09:54:02

如何一步步实现SGD随机梯度下降算法

文章目录如何一步步实现SGD随机梯度下降算法SGD随机梯度下降算法的作用MNIST_SAMPLE数据集SGD算法的七大步骤Step1. 初始化模型参数Step2. 计算预测值predictionsStep3. 计算损失lossStep4. 计算梯度gradientsStep5. 更新模型参数Step6. 重复Step2-5Step7. 停止在MNIST_SAMPLE数据集上训练linear_model把7个步骤的代码封装成类衡量指标metric验证精度validation accuracy验证函数训练linear_model 使用learner.fit函数训练模型

SGD随机梯度下降算法的作用

它是一种优化算法，自动调整模型参数，提升模型的性能。我们今天要在MNIST_SAMPLE数据集上实现SGD算法。

MNIST_SAMPLE数据集

MNIST_SAMPLE数据集只有数字3和数字7的图片。

from fastai.vision.all import * path = untar_data(URLs.MNIST_SAMPLE) from fastbook import * import torch matplotlib.rc('image', cmap='Greys') threes = (path/'train'/'3').ls().sorted() sevens = (path/'train'/'7').ls().sorted() seven_tensors = [tensor(Image.open(o)) for o in sevens] three_tensors = [tensor(Image.open(o)) for o in threes] stacked_sevens = torch.stack(seven_tensors).float()/255 stacked_threes = torch.stack(three_tensors).float()/255 valid_3_tens = torch.stack([tensor(Image.open(o)) for o in (path/'valid'/'3').ls()]) valid_3_tens = valid_3_tens.float() / 255 valid_7_tens = torch.stack([tensor(Image.open(o)) for o in (path/'valid'/'7').ls()]) valid_7_tens = valid_7_tens.float() / 255 train_x = torch.cat([stacked_threes, stacked_sevens]).view(-1, 28*28) train_y = tensor([1]*len(threes) + [0]*len(sevens)).unsqueeze(1) dset = list(zip(train_x, train_y)) dl = DataLoader(dset, 128) valid_x = torch.cat([valid_3_tens, valid_7_tens]).view(-1, 28*28) valid_y = tensor([1]*len(valid_3_tens) + [0]*len(valid_7_tens)).unsqueeze(1) valid_dset = list(zip(valid_x, valid_y)) valid_dl = DataLoader(valid_dset, 128) SGD算法的七大步骤 Step1. 初始化模型参数

使用torch.randn()随机初始化参数，然后使用require_grad_方法表示需要追踪模型参数的梯度。

def init_params(size, std=1.0): return (torch.randn(size)*std).requires_grad_() bias = init_params(1) weights = init_params((28*28,1)) Step2. 计算预测值predictions

先定义一个简单的网络模型，只有一个全连接层。然后使用这个网络模型计算预测值。为什么需要bias? y=w*x, 如果x=0是输入，那么预测值始终为0，不利于模型的训练； y=w*x+b, 加入了bias，将使得模型更加灵活。

我们用训练集的前4个图片数据，作为样例测试这个函数。在pytorch中，@表示矩阵乘法运算符。

def linear1(xb): return xb@weights + bias batch = train_x[:4] preds = linear1(batch) >>> preds

Step3. 计算损失loss

loss：基于预测值（predictions）和目标值（targets），使用某种损失函数loss_function,计算两者有多相近。先定义损失函数：把预测值变成0-1之间的值，如果预测的目标是数字3，就计算预测值和1之间的距离；如果预测的目标是数字7，那么预测值本身就是它到0之间的距离。 sigmoid方法的作用：把任何输入值（无论正负）变成0-1之间的值。

def mnist_loss(predictions, targets): predictions = predictions.sigmoid() return torch.where(targets==1, 1-predictions, predictions).mean() loss = mnist_loss(preds, train_y[:4])

Step4. 计算梯度gradients

梯度：指的是我们将怎么改变模型参数，它指明了具体的方向。我们不需要手动计算梯度，因为深度学习库会自动帮助我们计算，只需要在初始化模型参数的时候，指明require_grad=True，它会自动保存模型参数的梯度。 backward()方法指的是反向传播算法，调用此方法会帮助我们自动计算每一层参数的梯度。相应地，因为设置了requires_grad=True, 新的梯度也会被自动保存。当我们计算一个神经网络的导数的时候，这被称为向后传播过程。

loss.backward() weights.grad.shape, weights.grad.mean(), bias.grad

把代码封装到calc_grad方法中，便于模块化地调用：

def calc_grad(xb, yb, model): preds = model(xb) loss = mnist_loss(preds, yb) loss.backward() calc_grad(batch, train_y[:4], linear1) weights.grad.mean(), bias.grad

为什么两次的输出结果不一样呢，因为pytorch自动将第一次计算的梯度保存了，第二次会在第一次的基础上再计算梯度，所以当然就结果不一样了，因此我们需要再下一次梯度前，将模型参数的梯度置为0.

weights.grad.zero_() bias.grad.zero_(); Step5. 更新模型参数

学习率learning rate决定了每次更新模型参数的大小程度（也称为步长）。通常都设置得很小。

lr = 1. weights.grad -= weights.grad * lr bias.grad -= bias.grad * lr Step6. 重复Step2-5

在这里我们需要将整个训练数据集分成mini_batches，然后将一个个batch喂入网络，为什么这样子做？

一次性预测整个训练数据集会花费太长时间和太多内存如果一张张图片训练的话，梯度将变得不稳定和不精确

所以我们迭代整个训练集的子集来完成训练，即将数据集分成mini_batches

现在我们要在整个训练集的基础上更新参数。

将训练集分成很多mini batches，然后训练。

def train_epoch(model, lr, params): for xb,yb in dl: calc_grad(xb, yb, model) for p in params: p.data -= p.grad*lr p.grad.zero_() for i in range(10): train_epoch(model, lr, params) Step7. 停止

这是最基本的SGD算法。在fastai深度学习库中，已经被封装成一个类了，我们只需要在创建Learner的时候指明loss_func=SGD.

在MNIST_SAMPLE数据集上训练linear_model 把7个步骤的代码封装成类 class SGD_Optim: def __init__(self,params,lr): self.params,self.lr = list(params),lr def step(self, *args, **kwargs): for p in self.params: p.data -= p.grad.data * self.lr def zero_grad(self, *args, **kwargs): for p in self.params: p.grad = None opt = SGD_Optim(linear_model.parameters(), lr) 衡量指标metric

loss主要是便于模型的训练，现在介绍的metric是便于我们在验证集上直观地了解模型的性能。

验证精度validation accuracy

在这里我们使用预测正确的平均值（即精度）来作为衡量指标。模型验证过程同模型训练过程一样，我们将验证集分成一个个mini_batch，然后让模型去计算预测值preds，大于0.5的表示是数字3，否则表示数字7，然后计算预测正确的平均值，表示验证精度。

def batch_accuracy(xb, yb): preds = xb.sigmoid() correct = (preds>0.5) == yb return correct.float().mean() 验证函数 def validate_epoch(model): accs = [batch_accuracy(model(xb), yb) for xb,yb in valid_dl] # combine all the acc in this list into a single 1-dimensional tensor return round(torch.stack(accs).mean().item(), 4) 训练linear_model

通常情况下，训练一次模型包括一个训练周期train_epoch和一个验证周期validate_epoch。在这里我们采用linear_model，训练20次。

linear_model = nn.Linear(28*28,1) def train_model(model, epochs): for i in range(epochs): train_epoch(model) print(validate_epoch(model), end=' ') train_model(linear_model, 20)

使用learner.fit函数训练模型

在fastai深度学习库中，内置函数learner.fit已经实现了train_model函数，为了使用此函数，我们需要先创建一个learner，而使用learner需要传入参数dataloaders，所以我们先创建dataloaders, 然后初始化一个Learner对象，然后调用fit函数。

dls = DataLoaders(dl, valid_dl) learn = Learner(dls, nn.Linear(28*28,1), opt_func=SGD, loss_func=mnist_loss, metrics=batch_accuracy) learn.fit(10, lr=lr)

在fastai深度学习库中我们已经实现了SGD类的代码，所以我们只需要添加参数opt_func=SGD，便可以使用SGD优化算法。

第10轮的精度为0.967615，和上面的第10轮的精度差不多。也就是说，fastai只是有一些内置类和函数，让我们少写了一些代码，训练模型的速度便快了一些，精度上并没有太大提升。

标签：

【深度学习】如何一步步实现SGD随机梯度下降算法由讯客互联互联网栏目发布，感谢您对讯客互联的认可，以及对我们原创作品以及文章的青睐，非常欢迎各位朋友分享到个人网站或者朋友圈，但转载请说明文章出处“【深度学习】如何一步步实现SGD随机梯度下降算法”

上一篇
Java常用设计模式及其应用场景

下一篇
Python日志记录全解析：从入门到进阶的实用指南