深入理解並使用Grad-CAM

現如今，深度學習算法在許多領域都取得了巨大的成功，其中又以視覺領域最為突出。深度卷積神經網絡具有強大的特徵提取和模式識別能力，但模型的黑盒現象已經成為一個普遍的問題。因為很難理解它為何會得出特定的結果，以及它在圖像中關注什麼。

在這種情況下，研究人員提出了許多技術來解釋卷積神經網絡的工作，例如Grad-CAM（Gradient-weighted Class Activation Mapping）技術。Grad-CAM是一種可視化方法，它可以將卷積神經網絡輸出結果的可解釋性可視化。它告訴我們卷積神經網絡在哪裡關注圖像，以及這些區域如何幫助分類或回歸任務。

一、Grad-CAM的原理

理解Grad-CAM的基本原理非常重要。Grad-CAM的核心思想是要找到一個能夠反映網絡輸出概率的空間位置權重映射。具體而言，Grad-CAM的做法是將輸出概率的梯度回傳到卷積層上，並將卷積層的輸出特徵圖和權重進行加權平均。通過這種方式，可以得到一個重要性分數，該分數與輸出概率相關而能夠反映圖像區域的重要程度。

下面是Grad-CAM核心算法代碼：

class GradCAM:
    def __init__(self, model, candidate_layers=None):
        self.model = model
        self.extractor = ModelOutputs(model, candidate_layers or model.outputs[0])
        
    def forward(self, input):
        return self.model(input)

    def __call__(self, input, index=None):
        features, output = self.extractor(input)

        if index is None:
            index = np.argmax(output.cpu().data.numpy())

        one_hot = np.zeros((1, output.size()[-1]), dtype=np.float32)
        one_hot[0][index] = 1

        one_hot = Variable(torch.from_numpy(one_hot), requires_grad=True)
        one_hot = torch.sum(one_hot.cuda() * output)

        self.model.zero_grad()
        one_hot.backward(retain_graph=True)

        grads_val = self.extractor.get_gradients()[-1].cpu().data.numpy()
        target = features[-1].cpu().data.numpy()[0, :]
        weights = np.mean(grads_val, axis=(2, 3))[0, :]
        cam = np.sum(target * weights[:, None, None], axis=0)
        cam = np.maximum(cam, 0)
        cam = cv2.resize(cam, (input.shape[3], input.shape[2]))
        cam = cam - np.min(cam)
        cam = cam / np.max(cam)
        return cam

其中的ModelOutputs類是一個包裝類，它可以幫助我們同時獲取卷積層和輸出層。以下是ModelOutputs的代碼：

class ModelOutputs:
    def __init__(self, model, candidate_layers):
        self.model = model
        self.gradients = None
        self.activation_maps = dict()

        for (name, module) in self.model.named_modules():
            if name in candidate_layers:
                module.register_backward_hook(self._get_gradients)
                module.register_forward_hook(self._get_activation(name))

    def _get_gradients(self, module, input_grad, output_grad):
        self.gradients = output_grad[0]

    def _get_activation(self, name):
        def hook(module, input, output):
            self.activation_maps[name] = output.detach()
        return hook

    def __call__(self, x):
        outputs = []
        for name, module in self.model.named_modules():
            x = module(x)
            if name in self.activation_maps:
                outputs.append(self.activation_maps[name])
        return outputs, x

該類中的_grads方法可以獲取梯度。在我們使用Grad-CAM方法來實現可視化之前需要的基礎就在這裡。

二、Grad-CAM的優缺點

Grad-CAM具有多個優點。其中最重要的是，它是一個通用的可視化方法，可用於任何卷積神經網絡架構。它並不需要重複訓練或特殊的網絡改造。它還不需要修改網絡體系結構或模型體系結構，這意味着它可以很好地與其他機器學習工具一起使用。

此外，Grad-CAM並不難以實現。實際上，其是一個用於反向傳播的標準技術。它只是使用了一些諸如箱形激活的技巧，以讓輸出分數和特徵映射可用於可視化。它對於更複雜的架構和框架也很有效。

Grad-CAM的一個缺點是它假定模型完全是用卷積層和全連接層構建的。如果模型具有其他類型的層（例如循環或門層），那麼該方法將不適用。此外，該方法局限於先前在模型中定義的卷積層或最終輸出層。這意味着如果您想可視化網絡中的其他層，您需要在代碼中更改構建的層。

三、Grad-CAM的應用

1. 可視化圖像分類結果

Grad-CAM最常見的應用是可視化圖像分類結果。其方法非常簡單，您只需要將Grad-CAM類與您的圖像和分類器模型一起運行。下面是執行示例代碼：

img = Image.open(image_path)

# 圖像預處理
preprocessing = transforms.Compose([
    transforms.Resize((224,224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

img_tensor = preprocessing(img).unsqueeze(0).cuda()

# 獲取Grad-CAM
grad_cam = GradCAM(model=model, candidate_layers=['layer4'])
output = model(img_tensor)

# 根據得到的概率分佈，獲得數字標籤
pred_index = output.data.cpu().numpy().argmax()

# 獲取Grad-CAM的熱力圖
cam = grad_cam(img_tensor)

# 將熱力圖(Grad-CAM)與原圖像疊加
heatmap, result = visualize_cam(img_tensor.cpu().data.numpy()[0], cam)

# 顯示結果
plt.figure(figsize=(10,10))
plt.subplot(2,1,1)
plt.imshow(heatmap)
plt.subplot(2,1,2)
plt.imshow(result)
plt.show()

2. 分析神經網絡模型

分析神經網絡的不同層：卷積層、池化層、批量標準化層（Batch Normalization）等在圖像中的作用也是很有意義的。使用Grad-CAM可以很容易地以直觀的方式分析每個層次的預測結果對輸出的影響有哪些，並檢查模型是否真正關注圖像中的重要信息。

下面的代碼演示了對特定卷積層進行可視化：

def get_cam(model, img_path, target_layer):
    """
    產生特定層的Grad-CAM
    :param model:
    :param img_path:
    :param target_layer: conv5_x, layer4, layer3, layer2, layer1
    """
    grad_cam = GradCAM(model=model, candidate_layers=[target_layer])
    img = Image.open(img_path)
    preprocessing = transforms.Compose([
        transforms.Resize((224,224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    img_tensor = preprocessing(img).unsqueeze(0).cuda()
    target_index = None
    output = model(img_tensor)
    pred_class = output.argmax(dim=1, keepdim=True)
    # 如果有多個標籤，則生成多個GradCAM
    if pred_class.size(0) > 1:
        for i in range(pred_class.size(0)):
            print(f'{i+1}-th categories with GradCAM:')
            # 注意GradCAM的標籤需要int型，且此處要將標量變為int型，不能夠用.item()方法
            cam = grad_cam(img_tensor, index=int(pred_class[i]))
            grad_img = cv2.resize(np.float32(img), (224,224))
            grad_img -= grad_img.min()
            grad_img /= grad_img.max()
            grad_map = torch.from_numpy(cam.transpose(2, 0, 1)).unsqueeze_(0)
            # 將GradCAM疊加到圖像上
            show_cam_on_image(grad_img, grad_map.numpy()[0], f'Result{i+1}.jpg')
    else:
        # 獲取Grad-CAM
        cam = grad_cam(img_tensor, index=target_index)
        grad_img = cv2.resize(np.float32(img), (224,224))
        grad_img -= grad_img.min()
        grad_img /= grad_img.max()
        grad_map = torch.from_numpy(cam.transpose(2, 0, 1)).unsqueeze_(0)
        # 將GradCAM疊加到圖像上
        show_cam_on_image(grad_img, grad_map.numpy()[0], 'Result.jpg')

model = models.resnet50(pretrained=True).cuda()
_ = model.eval()

get_cam(model, image_path, "layer4")

四、結語

Grad-CAM是解釋模型輸出的強大工具，可以幫助我們理解卷積神經網絡的特點、訓練過程、優化以及如何通過調整超參數來提高模型的精度。

當將深度學習模型應用於實際問題時，人們通常要求精度和可解釋性之間取得平衡。Grad-CAM作為一種可視化技術，為深度學習模型的可解釋性和解釋性提供了重要的信息。這種方法的優點是它易於實現，通用性強，可以應用於任何CNNs模型，缺點是存在局限性。

原創文章，作者：VWXK，如若轉載，請註明出處：https://www.506064.com/zh-hk/n/134361.html