3 神经网络激活函数损失函数

#NeuralNetwork #ReLU #PyTorch #Sigmoid #CrossEntropy #Xavier

1 基本神经网络层

import torch
import torch.nn as nn

函数	说明	用例
`nn.Linear(in_features, out_features, bias)`	线性全连接层. 指定输入和输出特征的数量. 输入的最后一维必须等于 `in_features`; `bias` 决定是否添加偏置项, 默认 True	`fc = nn.Linear(100, 50)` `x = torch.randn(32, 100)` `out = fc(x)`
`.weight` `.bias`	访问 `nn.Linear` 内部的可训练参数 `.weight`: `(out_features, in_features)` `.bias`: `(out_features,)`
`nn.ReLU()`	ReLU 激活函数 $max (x, 0)$
`nn.Sequential()`	将多层网络写在一起	`model = nn.Sequential(nn.Linear(4, 5), nn.ReLU(), nn.Linear(5, 3), nn.ReLU(), nn.Linear(3, 1))` `output = model(input_data)`
`nn.BatchNorm1d(num_features)`	BN 层(批归一化). 需要指定特征数量
`nn.Dropout()`	Dropout 层. 需要指定概率
示例:

model = nn.Sequential(
    nn.Linear(4, 5),      
    nn.BatchNorm1d(5),    
    nn.ReLU(),
    nn.Dropout(0.3),   # 丢弃这一层中30%的神经元
    nn.Linear(5, 3),      
    nn.BatchNorm1d(3),   
    nn.ReLU(),
    nn.Dropout(0.5),   # 丢弃这一层中50%的神经元
    nn.Linear(3, 1)      
)

2 常用激活函数层

函数	说明
`nn.LeakyReLU(negative_slope)`	$LeakyReLU (x) = x$ if $x > 0$ else $α x$ , 这里 $α$ 是一个小的正数, 通常在 0.01~0.2 之间.
`nn.SiLU()`	$SiLU (x) = \frac{x}{1 + e^{- x}}$
`nn.Sigmoid()`	$Sigmoid (x) = \frac{1}{1 + e^{- x}}$
`nn.Tanh()`	$\tanh (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}$ $\tanh (x) = 2 Sigmoid (2 x) - 1$

3 常用损失函数层

函数	说明	例子
`nn.BCEWithLogitsLoss (pos_weight)` Sigmoid + 交叉熵	二分类中常常使用 Sigmoid 和交叉熵的组合. 可以直接用这个函数而非手动组合, 因为它内置了一些数值优化处理. `pos_weight` 用来在标签不均衡时调整惩罚的权重, 在多分类时它是一个张量^[1]	`pos_weight = tensor([1.5, 3.0, 1.0])` `criterion = nn.BCEWithLogitsLoss` `(pos_weight)` `loss = criterion(pred, target)`
`nn.CrossEntropyLoss( weight, ignore_index)` Softmax + 交叉熵	多分类中常常使用 Softmax 和对数似然 (交叉熵) `weight`: 和上面的 `pos_weight` 一致
`nn.L1Loss`	$loss = \frac{1}{n} \sum_{i = 1}^{n} \| {\hat{y}}_{i} - y_{i} \|$ 对离群点不敏感; 但是 0 误差时导数不连续, 但实际训练中影响不大
`nn.MSELoss()`	$loss = \frac{1}{n} \sum_{i = 1}^{n} ({\hat{y}}_{i} - y_{i})^{2}$ 完全光滑, 对离群点敏感
`nn.SmoothL1Loss(beta)`	公式见^[2]

&\frac{0.5(\hat{y}{i}-y{i})^{2}}{\beta},|\hat{y}{i}-y{i}|<\beta,\
&|\hat{y}{i}-y{i}|-0.5\beta,\mathrm{else.}
\end{aligned}
\right.$$

4 自定义神经网络

4.1 自定义层

PyTorch 准备了 nn.Linear() nn.Conv2d() nn.LSTM() 等网络层. 但有时候我们想要自定义某个层.
自定义网络层都需要继承 nn.Module. 且都需要 __init__ 和 forward 这两个方法.

class CustomLayer(nn.Module):
    def __init__(self, *args):
        super().__init__()
        # 初始化参数和子模块
    
    def forward(self, x):
        # 前向传播
        return x

例如我们手动实现 nn.Linear():

import torch
import torch.nn as nn

class MyLinear(nn.Module):
    def __init__(self, in_features, out_features):
        super().__init__()
        self.in_features = in_features
        self.out_features = out_features
        
        self.weight = nn.Parameter(torch.empty(out_features, in_features))
        self.bias = nn.Parameter(torch.empty(out_features))
        
        nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        nn.init.zeros_(self.bias)
    
    def forward(self, x):
        return x @ self.weight.t() + self.bias

nn.Parameter 会把张量包装成可训练的参数.
使用了 nn.init.kaiming_uniform_ 对权重初始化.

如果是不含参的层:

class LeakyReLUEfficient(nn.Module):
    def __init__(self, negative_slope=0.01):
        super().__init__()
        self.negative_slope = negative_slope
        
    def forward(self, x):
        return torch.maximum(self.negative_slope * x, x)

4.2 模型参数的初始化

函数	说明
`nn.init.zeros_(tensor)`	全部为 0, 基本不用
`nn.init.uniform_(tensor, a, b)` `nn.init.normal_(tensor, mean, std)`	随机初始化
`nn.init.xavier_uniform_(tensor, gain)` `nn.init.xavier_normal_(tensor, gain)`	Xavier初始化
`nn.init.kaiming_uniform_(tensor, a, mode, nonlinearity)` `nn.init.kaiming_normal_(tensor, a, mode, nonlinearity)`	Kaiming 初始化

4.3 自定义模型

我们可以像拼积木一样组合各个网络层. 下面是残差块的一个实现:

class ResidualBlock(nn.Module):
    def __init__(self, hidden_dim: int, dropout: float = 0.1):
        """
        初始化残差快
        hidden_dim(int): 隐藏层特征维度
        dropout(float): 丢弃率
        """
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(hidden_dim, hidden_dim),
            nn.BatchNorm1d(hidden_dim),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_dim, hidden_dim),
            nn.BatchNorm1d(hidden_dim)
        )
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        residual = x
        out = self.net(x)
        out += residual
        return torch.relu(out)

常用权重计算公式: 该类权重 = 总样本数 / (类别数 * 该类样本数) ↩︎
$\mathrm{loss}=\left\{$ ↩︎