09-多分类问题

image-20210303165309014

数据集

手写数字集:

image-20210303171439663

一共有10个类别。

softmax layer

设$Z^l\in\mathbb{R}^K$是第$l$层的线性输出,则这个多分类函数为:

P(y=i)=eZij=0K1eZj,i{0,,K1}P(y=i)=\frac{e^{Z_i}}{\sum\limits^{K-1}_{j=0}e^{Z_j}},i\in \{0,\cdots,K-1\}

为了满足的条件$P(y=i)\ge0$,且$\sum\limits_{i=0}^{N}P(y=i)=1$,其中$N$为类别总数。

损失函数 - Cross Entropy

Loss(y,y^)=ylogy^Loss(y,\hat y) = -y\log{\hat y}
image-20210303175145206
y = np.array([1.,0.,0.])
z = np.array([0.2,0.1,-0.1])
y_pred = np.exp(z) / np.exp(z).sum()
loss = (-y*np.log(y_pred)).sum()
print(loss)

通过pytorch进行调用torch.nn.CrossEntropyLoss()

y = torch.LongTensor([0])
z = torch.Tensor([[0.2,0.1,-0.1]])
criterion = torch.nn.CrossEntropyLoss()
loss = criterion(z,y)
print(loss)

使用交叉熵,定义$y$需要使用长整型变量,其中的$0$表示第$0$个分类。

举个实例

import torch
criterion = torch.nn.CrossEntropyLoss()
Y = torch.LongTensor([2,0,1])

Y_pred1 = torch.Tensor([ 
    [0.1,0.2,0.9],# 2
    [1.1,0.1,0.2],# 0
    [0.2,2.1,0.1],# 1
])
Y_pred2 = torch.Tensor([
    [0.8,0.2,0.3],
    [0.2,0.3,0.5],
    [0.2,0.2,0.5]
])

l1 = criterion(Y_pred1,Y)
l2 = criterion(Y_pred2,Y)
print("Batch Loss1 = {},Batch Loss2 = {}".format(l1.data,l2.data))

CrossEntropyLoss VS NLLLoss

CrossEntropyLoss <==> LogSoftmax + NLLLoss

实战

导入相应包

import torch
from torchvision import transforms # 对图像进行处理
from torchvision import datasets
from torch.utils.data import DataLoader
import torch.nn.functional as F # 使用激活函数
import torch.optim as optim

预处理数据

batch_size = 64

# 用于处理图像
transform = transforms.Compose([
    transforms.ToTensor(), 
    transforms.Normalize((0.1307,),(0.3081,))
])

# 导入训练数据
train_dataset = datasets.MNIST(root='./数据集/mnist/',
							  train=True,
							  download=True,
                              transform = transform)
# 打乱顺序
train_loader = DataLoader(train_dataset,
                         shuffle=True,
                         batch_size=batch_size)

# 导入测试数据
test_dataset = datasets.MNIST(root='./数据集/mnist/',
							  train=False,
							  download=True,
                              transform = transform)
# 打乱顺序
test_loader = DataLoader(test_dataset,
                        shuffle=True,
                        batch_size=batch_size)

通过transforms进行处理图像,主要是改变PIL ImageTensor,从单通道变成多通道,使用transforms.ToTensor()进行实现。

image-20210304101427030

transforms.Normalize((0.1307,),(0.3081,)),其中第一个参数表示均值,第二个参数表示方差,这个函数用途是进行归一化,即映射到0/1分布。

设计模型

image-20210304102023388
class Net(torch.nn.Module):
    
    def __init__(self,):
        super(Net,self).__init__()
        self.l1 = torch.nn.Linear(784,512)
        self.l2 = torch.nn.Linear(512,256)
        self.l3 = torch.nn.Linear(256,128)
        self.l4 = torch.nn.Linear(128,64)
        self.l5 = torch.nn.Linear(64,10)
        
    def forward(self,x):
        x = x.view(-1,784)
        x = F.relu(self.l1(x))
        x = F.relu(self.l2(x))
        x = F.relu(self.l3(x))
        x = F.relu(self.l4(x))
        x = self.l5(x)
        return x
    
model = Net()

构造损失函数和优化器

criterion = torch.nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(),lr=0.01,momentum=0.5)

训练和测试

  • 训练

def train(epoch):
    running_loss = 0.0
    for batch_idx,data in enumerate(train_loader,0):
        inputs,target = data
        
        optimizer.zero_grad()
        
        # forward + backward + update
        outputs = model(inputs)
        
        loss = criterion(outputs,target)
        loss.backward()
        
        optimizer.step()
        
        running_loss += loss.item()
        
        if batch_idx % 300 == 299:
            print('[{:d},{:5d}] loss:{:.3f}'.format(
            		epoch+1,batch_idx+1,running_loss/300))
            running_loss = 0.0
  • 测试

def test():
    correct = 0
    total = 0
    with torch.no_grad():
        for data in test_loader:
            images,labels = data
            outputs = model(images)
            _,predicted = torch.max(outputs.data,dim=1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            
    print("Accuracy on test set:{:.2%}".format(correct/total))
  • 运行

if __name__ == '__main__':
    for epoch in range(10):
        train(epoch)
        test()

特征提取

  • 傅里叶变换

  • wavelet

  • CNN:自动提取

作业

kaggle多分类问题:https://www.kaggle.com/c/otto-group-product-classification-challenge/data

最后更新于

这有帮助吗?