为什么在Pytorch中对网络的权重进行复制时，它将在反向传播后自动更新？

问题内容：

我写了以下代码作为测试，因为在我的原始网络中，我使用ModuleDict，并且取决于我提供的索引将仅对该网络的一部分进行切片和训练。

我想确保只有切成薄片的层会更新其权重，所以我编写了一些测试代码来进行仔细检查。好吧，我得到一些奇怪的结果。假设我的模型有2个层，第1层是FC，第2层是Conv2d，如果我对网络进行切片并且仅使用第2层，则我希望第1层的权重保持不变，因为它们未被使用，并且第1层后将更新第2层的权重。

所以我的计划是使用一个for循环从网络中获取所有权重，然后再进行训练，然后在1之后进行optimizer.step()。两次，我都将那些权重完全存储在2个Python列表中，以便以后可以比较它们的结果。好吧，出于某种原因，如果我将它们与torch.equal()我认为的两个列表进行比较，则它们是完全相同的，因为也许内存中仍然存在某种隐藏链接？因此.detach()，当我从循环中获取权重时，我尝试使用权重，结果仍然相同。在这种情况下，第2层的权重应该有所不同，因为在训练之前它应包含来自网络的权重。

在下面的代码中指出，我实际上是在使用layer1并忽略layer2。

完整代码：

class mymodel(nn.Module):
    def __init__(self):
        super().__init__() 
        self.layer1 = nn.Linear(10, 5)
        self.layer2 = nn.Conv2d(1, 5, 4, 2, 1)
        self.act = nn.Sigmoid()
    def forward(self, x):
        x = self.layer1(x) #only layer1 and act are used layer 2 is ignored so only layer1 and act's weight should be updated
        x = self.act(x)
        return x
model = mymodel()

weights = []

for param in model.parameters(): # loop the weights in the model before updating and store them
    print(param.size())
    weights.append(param)

critertion = nn.BCELoss() #criterion and optimizer setup
optimizer = optim.Adam(model.parameters(), lr = 0.001)

foo = torch.randn(3, 10) #fake input
target = torch.randn(3, 5) #fake target

result = model(foo) #predictions and comparison and backprop
loss = criterion(result, target)
optimizer.zero_grad()
loss.backward()
optimizer.step()


weights_after_backprop = [] # weights after backprop
for param in model.parameters():
    weights_after_backprop.append(param) # only layer1's weight should update, layer2 is not used

for i in zip(weights, weights_after_backprop):
    print(torch.equal(i[0], i[1]))

# **prints all Trues when "layer1" and "act" should be different, I have also tried to call param.detach in the loop but I got the same result.

问题答案：

您必须clone使用参数，否则只需复制引用即可。

weights = []

for param in model.parameters():
    weights.append(param.clone())

criterion = nn.BCELoss() # criterion and optimizer setup
optimizer = optim.Adam(model.parameters(), lr=0.001)

foo = torch.randn(3, 10) # fake input
target = torch.randn(3, 5) # fake target

result = model(foo) # predictions and comparison and backprop
loss = criterion(result, target)
optimizer.zero_grad()
loss.backward()
optimizer.step()


weights_after_backprop = [] # weights after backprop
for param in model.parameters():
    weights_after_backprop.append(param.clone()) # only layer1's weight should update, layer2 is not used

for i in zip(weights, weights_after_backprop):
    print(torch.equal(i[0], i[1]))

这使

False
False
True
True

为什么在Pytorch中对网络的权重进行复制时，它将在反向传播后自动更新？

微信关注