VOC2012數據集的介紹和應用

一、VOC2012數據集的概述

VOC2012數據集是維持在英國牛津大學計算機科學系的Visual Object Classes 2012挑戰賽中使用的數據集，該數據集包含了20個物體類別。VOC2012數據集中的圖像是從互聯網上搜索到的，圖片解析度、內容、光照條件、背景、遮擋情況等都比較多樣，可以用來進行目標檢測、圖像分割、語義分割等任務。VOC2012數據集為各大計算機視覺研究者和工程師提供了一個很好的實驗平台，該數據集已經成為目標檢測、圖像分割等任務的評價標準之一。

二、VOC2012數據集的使用方式

使用VOC2012數據集進行目標檢測和圖像分割的流程如下：

1、下載VOC2012數據集：

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar

2、解壓數據集：

tar -xvf VOCtrainval_11-May-2012.tar

3、解析標註文件，例如：Annotations、ImageSets、JPEGImages：

import xml.etree.ElementTree as ET
def parse_annotation(annotation_path):
    tree = ET.parse(annotation_path)
    root = tree.getroot()

    image_path = root.find(text='filename').text
    size = root.find('size')
    width = int(size.find('width').text)
    height = int(size.find('height').text)
    depth = int(size.find('depth').text)

    objects = []
    for obj in root.findall('object'):
        obj_struct = {}
        obj_struct['name'] = obj.find('name').text
        obj_struct['xmin'] = int(obj.find('bndbox/xmin').text)
        obj_struct['ymin'] = int(obj.find('bndbox/ymin').text)
        obj_struct['xmax'] = int(obj.find('bndbox/xmax').text)
        obj_struct['ymax'] = int(obj.find('bndbox/ymax').text)
        objects.append(obj_struct)
    return image_path, objects, width, height, depth

4、讀取JPEGImages和ImageSets文件夾中的圖像，並劃分為訓練集和測試集：

from os import listdir
from os.path import isfile, join

trainval_files = ['./VOCdevkit/VOC2012/ImageSets/Main/' + f.rstrip() for f in open('./VOCdevkit/VOC2012/ImageSets/Main/trainval.txt')]
test_files = ['./VOCdevkit/VOC2012/ImageSets/Main/' + f.rstrip() for f in open('./VOCdevkit/VOC2012/ImageSets/Main/test.txt')]

trainval_image_names = []
for trainval_file in trainval_files:
    with open(trainval_file) as f:
        trainval_image_names += f.read().splitlines()

test_image_names = []
for test_file in test_files:
    with open(test_file) as f:
        test_image_names += f.read().splitlines()

trainval_image_paths = ['./VOCdevkit/VOC2012/JPEGImages/' + n + '.jpg' for n in trainval_image_names]
test_image_paths = ['./VOCdevkit/VOC2012/JPEGImages/' + n + '.jpg' for n in test_image_names]

三、VOC2012數據集中物體類別的介紹

以下是VOC2012數據集中包含的20個物體類別：

Person：人類圖像
Bird：鳥類圖像
Car：汽車圖像
Bus：公交車圖像
Cow：奶牛圖像
Sheep：綿羊圖像
Aeroplane：飛機圖像
Bicycle：自行車圖像
Horse：馬圖像
Motorbike：摩托車圖像
Potted Plant：盆栽圖像
Diningtable：餐桌圖像
Cat：貓圖像
Dog：狗圖像
Boat：船舶圖像
Train：火車圖像
Sofa：沙發圖像
Bottle：瓶子圖像
Tv/Monitor：電視/顯示器圖像

四、VOC2012數據集應用實例

以下為使用VOC2012數據集進行目標檢測和語義分割的代碼示例，目錄結構如下：

── VOCdevkit
    └── VOC2012
        ├── Annotations
        ├── ImageSets
        ├── JPEGImages

目標檢測代碼：

import torch
import torchvision
import os

def get_transform(train):
    transforms = []
    transforms.append(torchvision.transforms.ToTensor())
    return torchvision.transforms.Compose(transforms)

class VOCDataset(torch.utils.data.Dataset):
    def __init__(self, data_folder, split, transform=None):
        self.split = split.upper()
        assert self.split in {'TRAIN', 'TEST'}

        self.year = "2012"
        self.data_folder = data_folder
        self.transform = transform

        # VOC2012隻有20個類別，對應標籤編號為1-20
        self.label_map = {
            "aeroplane": 1,
            "bicycle": 2,
            "bird": 3,
            "boat": 4,
            "bottle": 5,
            "bus": 6,
            "car": 7,
            "cat": 8,
            ...
        }

        if self.split == 'TRAIN':
            data_file = os.path.join(data_folder, 'ImageSets', 'Main', 'trainval.txt')
        else:
            data_file = os.path.join(data_folder, 'ImageSets', 'Main', 'test.txt')
        self.image_names = []
        with open(data_file, 'r') as f:
            for line in f.readlines():
                self.image_names.append(line.strip() + '.jpg')

    def __getitem__(self, index):
        image_path = os.path.join(self.data_folder, 'JPEGImages', self.image_names[index])
        annotation_path = os.path.join(self.data_folder, 'Annotations', self.image_names[index].replace('.jpg', '.xml'))

        # parse the xml annotations file
        image_name, objects, image_width, image_height, image_depth = parse_annotation(annotation_path)

        # Convert everything into a torch.Tensor
        boxes = torch.zeros((len(objects), 4), dtype=torch.float32)
        labels = torch.zeros((len(objects)), dtype=torch.int64)
        difficulties = torch.zeros((len(objects)), dtype=torch.uint8)

        for i, object_dict in enumerate(objects):
            boxes[i, 0] = object_dict['xmin']
            boxes[i, 1] = object_dict['ymin']
            boxes[i, 2] = object_dict['xmax']
            boxes[i, 3] = object_dict['ymax']
            labels[i] = self.label_map[object_dict['name']]
            difficulties[i] = 0

        image = Image.open(image_path).convert("RGB")
        # apply the transformations
        if self.transform is not None:
            image = self.transform(image)

        return image, boxes, labels, difficulties

    def __len__(self):
        return len(self.image_names)

test_dataset = VOCDataset(data_folder='./VOCdevkit/VOC2012', split='TEST', transform=get_transform(train=False))
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=8, shuffle=True)

# load a pre-trained Faster R-CNN model
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()

def predict_my_image(image_path, confidence_threshold=0.5):
    # load the image
    image = Image.open(image_path).convert("RGB")
    # define the transforms
    transform = get_transform(train=False)
    # apply the transforms
    image = transform(image)
    # unsqueeze to add a batch dimension
    image = image.unsqueeze(0)
    # pass the image through the model
    with torch.no_grad():
        predictions = model(image)
    # get the predictions tensor
    predictions = [{k: v.to(torch.device('cpu')) for k, v in t.items()} for t in predictions]
    # get all the boxes above the confidence threshold
    thresholded_predictions = [pred for pred in predictions if pred['scores'][0] > confidence_threshold]

    for i, prediction in enumerate(thresholded_predictions):
        print(f"Prediction {i}: {prediction['labels'][0]}, {prediction['scores'][0]}, {prediction['boxes'][0]}")

語義分割代碼：

from PIL import Image
import numpy as np
import os
import torch
import torchvision

class VOCSemanticSegmentation(torch.utils.data.Dataset):
    """
    Dataset for Semantic Segmentation on the Pascal VOC dataset
    """

    def __init__(self, data_folder, split="TRAIN"):
        self.split = split.upper()
        assert self.split in {"TRAIN", "VAL", "TEST"}
        self.data_folder = data_folder
        self.image_name_list_file = os.path.join(data_folder, self.split + ".txt")

        with open(self.image_name_list_file, "r") as f:
            self.image_names = [x.strip() for x in f.readlines()]

    def __len__(self):
        return len(self.image_names)

    def __getitem__(self, index):
        # load image and target
        image_name = self.image_names[index]
        image_path = os.path.join(self.data_folder, "JPEGImages", image_name + ".jpg")
        target_path = os.path.join(self.data_folder, "SegmentationClass", image_name + ".png")

        image = Image.open(image_path).convert("RGB")
        target = Image.open(target_path)

        # convert target image tensor into array of labels
        target_array = np.array(target).astype(np.int32)
        labels = np.zeros_like(target_array)
        labels[np.where(target_array == 0)] = 0
        labels[np.where(target_array == 128)] = 1
        labels[np.where(target_array == 192)] = 2
        labels[np.where(target_array == 255)] = 3

        # convert everything into a torch.Tensor and return
        image = torchvision.transforms.functional.to_tensor(image)
        labels = torch.from_numpy(labels).long()

        return image, labels

總結

以上就是對VOC2012數據集的介紹和使用方法的闡述，該數據集已經成為計算機視覺領域的標杆之一，在目標檢測和圖像分割等方面都有著廣泛的應用。希望本文對大家的學習和研究有所幫助。

原創文章，作者：PUNX，如若轉載，請註明出處：https://www.506064.com/zh-tw/n/132819.html

VOC2012數據集的介紹和應用

一、VOC2012數據集的概述

二、VOC2012數據集的使用方式

三、VOC2012數據集中物體類別的介紹

四、VOC2012數據集應用實例

總結

相關推薦

發表回復