Dropout正則化

一、Dropout正則化項

在深度學習領域中，眾所周知所有的神經網路都可能會發生過擬合的現象，而Dropout就是一種解決過擬合問題的正則化方法。Dropout就是在每層神經網路中隨機去掉一部分的神經元，在訓練過程中，被去掉的神經元不參與前向傳播和後向傳播，即該神經元對神經網路的權重更新沒有貢獻。

Dropout正則化項的數學表示為：

    W^{*} = \mathop{\arg \min}_{W} \frac{1}{N} \sum_{i=1}^{N} L(y_{i}, f(x_{i}, W)) + \frac{\lambda}{2} \lVert W \rVert_2^2 + \frac{\mu}{2} L(D(W,Z))

其中，$\lambda$和$\mu$都是正則化係數，$L$是損失函數，$W$是模型的參數，$D$是dropout函數，$Z$是包含dropout概率的參數。

二、Dropout正則化的理解

在深度學習里，每個神經元都是由上一層的所有神經元線性組合而來的，每個神經元都承擔了一部分的參數權重。如果某個神經元的參數權重過大，就容易產生過擬合的現象。而Dropout就是通過隨機去掉神經元，讓模型中的神經元不會過度擬合，增加模型的魯棒性，並通過取平均等方式減少神經元產生的參數權重衝擊。

三、Dropout正則化方法

Dropout正則化的方法是在神經網路中隨機捨棄一部分神經元，使得該模型在訓練過程中並不會對任何一個神經元過度依賴，從而緩解過擬合的情況。在計算前向傳遞和後向傳遞時，我們只考慮留下的神經元，在計算梯度時，我們不考慮捨棄的神經元。在測試過程中，所有的神經元都被考慮在內，只不過每個神經元的輸出都要乘以一定的因子，這個因子是訓練階段神經元被留下的概率。

四、Dropout正則化代碼

下面是使用PyTorch實現Dropout正則化的代碼：

    import torch.nn as nn
    
    class Net(nn.Module):
        def __init__(self):
            super(Net,self).__init__()
            self.fc1 = nn.Linear(10, 5)
            self.dropout1 = nn.Dropout(p=0.5)
            self.fc2 = nn.Linear(5, 2)
            self.relu = nn.ReLU()

        def forward(self, x):
            x = self.fc1(x)
            x = self.dropout1(x)
            x = self.relu(x)
            x = self.fc2(x)
            return x

在以上代碼中，我們定義了一個Net類，在初始化函數中定義了三層神經網路結構，在第一層後加入了一個概率為0.5的dropout正則化方法 dropout1。在前向傳播的時候，dropout正則化會計算每個神經元的dropout因子，並且讓被捨棄的神經元在反向傳播中不參與梯度計算。

五、Dropout正則化LSTM

LSTM通常用於處理序列數據，並且由於LSTM的神經元之間存在一定的依賴關係，因此對LSTM神經網路進行dropout正則化更容易發生過擬合現象。解決這個問題的辦法就是使用recurrent_dropout參數。recurrent_dropout可以理解為在每個時間步t，在第i個單元應用dropout。使用recurrent_dropout參數的LSTM實現如下：

    model = Sequential()
    model.add(LSTM(128, input_shape=(timesteps, data_dim), return_sequences=True))
    model.add(Dropout(0.4, seed=1001))
    model.add(LSTM(64, input_shape=(timesteps, data_dim), return_sequences=True, dropout=0.25, recurrent_dropout=0.25))
    model.add(Dropout(0.4, seed=1001))
    model.add(LSTM(32, input_shape=(timesteps, data_dim), return_sequences=False, dropout=0.25, recurrent_dropout=0.25))

六、Dropout正則化加在哪

在神經網路中，Dropout正則化可以加到全連接層中，也可以加到卷積層中。我們可以結合實際情況來進行針對性的調整。

七、Dropout正則化的原理

Dropout正則化的原理非常簡單，就是在神經網路的訓練階段，給定一個dropout概率，對每一層的神經元進行隨機匹配的掉落。被掉落的神經元對損失函數的梯度不產生貢獻，因此模型對每個特徵的依賴會分散到不同的神經元上。這樣可以避免過度擬合的問題，增強模型的泛化能力。

八、Dropout正則化有什麼用

Dropout正則化作為一種防止過擬合的方法，其具有以下的優點：

簡單易實現：Dropout正則化本質上是一種在訓練過程中將神經元關閉的技術，因此不需要額外的計算資源。
提高模型的表現：Dropout正則化可以避免過擬合的現象，提高模型的泛化能力。
有效緩解特徵之間的耦合：Dropout正則化可以有效地讓模型中的神經元不會過度擬合，增加模型的魯棒性，並通過取平均等方式減少神經元產生的參數權重衝擊。

九、Dropout正則化Matlab代碼

以下是使用Matlab代碼實現Dropout正則化的例子：

    net.layers = {
        struct('type', 'conv', 'weights', {{(randn(5,5,3,64)-0.5)*sqrt(2/size(imdb.images.data,1)) zeros(1, 64, 'single')}}, 'stride', 1, 'pad', 2)
        struct('type', 'relu')
        struct('type', 'pool', 'method', 'max', 'pool', [3 3], 'stride', 2, 'pad', 1)                                          
        struct('type', 'conv', 'weights', {{(randn(5,5,64,64)-0.5)*sqrt(2/25/64) zeros(1, 64, 'single')}}, 'stride', 1, 'pad', 2)   
        struct('type', 'relu')
        struct('type', 'pool', 'method', 'max', 'pool', [3 3], 'stride', 2, 'pad', 1)                                                                                                                                             
        struct('type', 'conv', 'weights', {{(randn(5,5,64,128)-0.5)*sqrt(2/25/64) zeros(1, 128, 'single')}}, 'stride', 1, 'pad', 2)
        struct('type', 'relu')
        struct('type', 'conv', 'weights', {{(randn(5,5,128,128)-0.5)*sqrt(2/25/128) zeros(1, 128, 'single')}}, 'stride', 1, 'pad', 2)
        struct('type', 'relu')
        struct('type', 'conv', 'weights', {{(randn(5,5,128,128)-0.5)*sqrt(2/25/128) zeros(1, 128, 'single')}}, 'stride', 1, 'pad', 2)
        struct('type', 'relu')
        struct('type', 'pool', 'method', 'max', 'pool', [3 3], 'stride', 2, 'pad', 1)   

        struct('type', 'dropout', 'fraction', 0.5)                  

        struct('type', 'conv', 'weights', {{(randn(4,4,128,256)-0.5)*sqrt(2/25/128) zeros(1, 256, 'single')}}, 'stride', 1, 'pad', 0)
        struct('type', 'relu')
        struct('type', 'conv', 'weights', {{(randn(4,4,256,256)-0.5)*sqrt(2/25/256) zeros(1, 256, 'single')}}, 'stride', 1, 'pad', 0)
        struct('type', 'relu')
        struct('type', 'conv', 'weights', {{(randn(4,4,256,256)-0.5)*sqrt(2/25/256) zeros(1, 256, 'single')}}, 'stride', 1, 'pad', 0)
        struct('type', 'relu')
        struct('type', 'pool', 'method', 'max', 'pool', [3 3], 'stride', 2, 'pad', 0)                                       

        struct('type', 'dropout', 'fraction', 0.5)                  

        struct('type', 'conv', 'weights', {{(randn(4,4,256,512)-0.5)*sqrt(2/25/256) zeros(1, 512, 'single')}}, 'stride', 1, 'pad', 0)
        struct('type', 'relu')
        struct('type', 'conv', 'weights', {{(randn(4,4,512,512)-0.5)*sqrt(2/25/512) zeros(1, 512, 'single')}}, 'stride', 1, 'pad', 0)
        struct('type', 'relu')
        struct('type', 'pool', 'method', 'max', 'pool', [3 3], 'stride', 2, 'pad', 0)            

        struct('type', 'dropout', 'fraction', 0.5)

        struct('type', 'conv', 'weights', {{(randn(4,4,512,512)-0.5)*sqrt(2/25/512) zeros(1, 512, 'single')}}, 'stride', 1, 'pad', 0)
        struct('type', 'relu')
        struct('type', 'conv', 'weights', {{(randn(1,1,512,10)-0.5)*sqrt(2/512) zeros(1, 10, 'single')}}, 'stride', 1, 'pad', 0)       
        struct('type', 'softmaxloss')                         
    };

以上的Matlab代碼中定義了一個卷積神經網路，其中有兩個dropout層，fraction參數表示Dropout的比例。

原創文章，作者：小藍，如若轉載，請註明出處：https://www.506064.com/zh-tw/n/239320.html