我正在使用PyTorch进行一个视频动画项目。我的数据集包含3904x60 mfcc音频特征(输入)和相应的3904x3视频特征(输出)。目标是训练一个神经网络模型,使得给定一个未知的音频特征,模型将其映射到其相应的视频特征中。换句话说,神经网络执行60到3的特征映射。我已经按照本教程构建了神经网络:
class ConvNet(nn.Module):
def __init__(self):
super().__init__()
self.layer1 = nn.Sequential(
nn.Conv1d(1, 32, kernel_size=5, stride=1, padding=2),
nn.ReLU(),
nn.MaxPool1d(kernel_size=2, stride=2))
self.layer2 = nn.Sequential(
nn.Conv1d(32, 64, kernel_size=5, stride=1, padding=2),
nn.ReLU(),
nn.MaxPool1d(kernel_size=2, stride=2))
self.drop_out = nn.Dropout()
self.fc1 = nn.Linear(15 * 64, 1000)
self.fc2 = nn.Linear(1000, 3)
def forward(self, x):
out = self.layer1(x)
out = self.layer2(out)
out = out.reshape(out.size(0), -1)
out = self.drop_out(out)
out = self.fc1(out)
out = self.fc2(out)
return out
我的训练代码如下所示:
model = ConvNet()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for epoch in range(num_epochs):
for i, (a, v) in enumerate(train_loader):
# Run the forward pass
a = a.float()
v = v.long()
outputs = model(a.view(a.size(0),1,a.size(1)))
loss = criterion(outputs, v)
loss_list.append(loss.item())
# Backprop and perform Adam optimisation
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Track the accuracy
total = labels.size(0)
_, predicted = torch.max(outputs.data, 1)
correct = (predicted == labels).sum().item()
acc_list.append(correct / total)
if (i + 1) % 100 == 0:
print('Epoch[{}/{}],Step[{}/{}],Loss{:.4f},Accuracy{:.2f}%'
.format(epoch + 1, num_epochs, i + 1, total_step, loss.item(),
(correct / total) * 100))
但在训练中收到错误:
---
/Users/soumith/miniconda2/conda-bld/pytorch_1532623076075/work/aten/src/THNN/generic/ClassNLLCriterion.c: 21不支持多目标
我将批量大小定义为4,因此迭代中的每个a和v应该分别是4 x 60张量和4 x 3张量。我如何解决这个问题?
问题可能是因为您用于nn. CrossEntropyLoss()
的目标函数的定义。v
是您所说的4 x 3张量,这似乎不正确。
在损失=准则(输出,v)
中,损失函数期望v
是一个小批量张量,每个值都描述了C类(即0到C-1)。参见https://pytorch.org/docs/stable/nn.html?highlight=crossentropyloss#torch.nn.CrossEntropyLoss中的“形状”选项卡
目标:(N)其中每个值为0≤目标[i]≤C−1