文章10
标签6
分类2

卷积神经网络和卷积自编码器

卷积神经网络

全连接网络 VS 卷积网络

全连接神经网络用于图像处理有以下几个问题:

  • 参数太多:向量两层的神经元全部连接在一起,每一个连接都有独立的参数
  • 没有利用像素之间的位置信息:辨识图片的关键像素主要集中于某一个区域,并不需要过分关注其他区域的像素、学习不必要的权重。

卷积神经网络的特性:

  • 局部连接:与全连接相对
  • 权值共享:一组连接可以共享同一个权重
  • Pooling:进一步减少样本数、参数数量

卷积神经网络

一个典型的的卷积神经网络如下图:

也就是:Conv=>Pooling=>Conv=>Pooling=>Fully Connected=>Fully Connected

卷积

假设有一张黑白图(5 X 5),每一个像素点用0、1来表示白、黑。Conv就是指用filter(也叫卷积核,假设为3 X 3)卷积得到Feature Map。filter中的参数即可看作权重,需要机器自行学习,下面的GIF图展示了一次卷积的过程。

可以发现,每次filter在Image上整体移动一个像素,即步长(stride)为1。每张图片都定义了一个高度,如果图片并不是黑白图而是彩色图,那么可以将图片高度设置为3,即5 X 5 X 3,每一层高用来表示RGB中的一种颜色。同时,filter也应当设置对应的高度。下图展示了7 X 7 X 3 的图片(由5 X 5的图片经过一次Zero Padding而来)与两个3 X 3 X 3的filter卷积得到的3 X 3 X 2的输出,其每次移动的步长为2。

至于卷积这个过程,如下图所示,可以把6 X 6的image中的36个像素点展开,类比于之前的神经网络。那么通过Filter 1输出的Feature map的第一个值便可以视作是一个只连接了Input层1、2、3、7、8、9、13、14、15像素的神经元的输出,即图中的“3”节点,下面的“-1”节点也同理。同时由于是通过同一个Filter得到的输出,所以不同神经元共享权重参数,图中相同颜色的连线即代表这两个连接使用相同的权重。这也就解释了为什么卷积神经网络拥有局部连接权值共享的特点。

QQ截图20210804234737.png

Pooling

最常用的是Max Pooling,取样本中的部分最大值来减少参数数量。除了Max Pooling以外还有Mean Pooling,即取平均值。

代码实现

依然采用FashionMNIST做测试

import numpy as np
from torch import nn
import torch
import matplotlib.pyplot as plt
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

training_data = datasets.FashionMNIST(
    root="./data",
    train=True,
    download=False,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="./data",
    train=False,
    download=False,
    transform=ToTensor()
)

labels_map = {
    0: "T-Shirt",
    1: "Trouser",
    2: "Pullover",
    3: "Dress",
    4: "Coat",
    5: "Sandal",
    6: "Shirt",
    7: "Sneaker",
    8: "Bag",
    9: "Ankle Boot",
}

train_dataloader = DataLoader(training_data, batch_size=100, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=100, shuffle=True)

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.Conv = nn.Sequential(
            # O = (I - K + 2P) / S + 1
            # 卷积后的宽度 = (卷积前的宽度 - 卷积核宽度 + 2 填充数) / 步长 + 1

            # batch x 1 x 28 x 28
            nn.Conv2d(
                in_channels=1,
                out_channels=16,
                kernel_size=7,
                stride=1,
                padding=3  # 宽高均没变
            ), nn.ReLU(), nn.MaxPool2d(kernel_size=2),

            # batch x 16 x 14 x 14
            nn.Conv2d(
                in_channels=16,
                out_channels=32,
                kernel_size=7,
                stride=1,
                padding=3
            ), nn.ReLU(), nn.MaxPool2d(kernel_size=2),
            # batch x 32 x 7 x 7
        )
        self.Linear = nn.Sequential(
            nn.Linear(32 * 7 * 7, 7 * 7), nn.ReLU(),
            nn.Linear(7 * 7, 10)
        )

    def forward(self, data: torch.FloatTensor):
        data = self.Conv(data)
        data = torch.flatten(data, 1)
        data = self.Linear(data)
        return data

net = Net()
print(net)
# 定义优化器和损失函数
optimizer = torch.optim.Adam(net.parameters(), lr=0.001)
loss_func = nn.CrossEntropyLoss()

# 转化成one-hot vector, 用来计算MSEloss, CrossEntropy不用自己实现
# def one_hot_y(batch_y):
#     # 100 => 100 x 10
#     length = len(batch_y)
#     y = torch.zeros(length, 10)
#     for row in range(length):
#         column = batch_y[row]
#         y[row][column] = 1
#     # print(y)
#     return y

for epoch in range(10):
    loss = None
    # 100 x 28 x 28 => 100 x 784
    for batch_pixel, batch_lable in train_dataloader:
        # print(batch_pixel.size())
        #batch_lable = one_hot_y(batch_lable)
        batch_prediction = net(batch_pixel)
        # print(batch_prediction.size())
        # print(batch_lable.size())

        loss = loss_func(batch_prediction, batch_lable)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    if (epoch + 1) % 1 == 0:
        print("step: {0} , loss: {1}".format(epoch + 1, loss.item()))

# 测试准确率
total_accuracy = []
for batch_pixel, batch_lable in test_dataloader:
    batch_prediction = net(batch_pixel)

    batch_num = len(batch_prediction)
    correct = 0

    for i in range(batch_num):
        single = list(batch_prediction[i])
        # 概率最大的值即是输出
        max_index = single.index(max(single))
        if max_index == batch_lable[i]:
            correct += 1

    accuracy = correct / batch_num
    total_accuracy.append(accuracy)
    print("accuracy:{:.2%}".format(accuracy))

print("total accuracy:{:.2%}".format(np.mean(total_accuracy)))

训练10次以后的输出结果为total accuracy:91.30%

自编码器

可用g(f(x)) = r来描述,其中输出r与原始输入x相近(训练目的)。

从自编码器获得有用特征的一种方法是,限制h的维度使其小于输入x,这种情况下称作有损自编码器。通过训练有损表征,使得自编码器能学习到数据中最重要的特征。

可以用来“以图搜图”,通过比较两张图片encode后的数据来判断相似程度。

反卷积是一种特殊的正向卷积,先按照一定的比例通过补0来扩大输入图像的尺寸,接着旋转卷积核,再进行正向卷积。

卷积自编码器

关于反卷积这篇讲的很清楚,但是pytorch中参数似乎有点差别:什么是deconvolution

代码:

from torch import nn
import torch
import imageio
import cv2

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

def read_single_to_tensor(image_path, device):
    image = imageio.imread(image_path)
    # cv2.imshow("img", image)
    # cv2.waitKey()
    image = torch.from_numpy(image).to(device)
    image = image.unsqueeze(0)
    image = image.unsqueeze(0)

    return image/255

image_tensor = read_single_to_tensor("pic/0.png", device=device)

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # 22 58
        self.encoder = nn.Sequential(nn.Conv2d(in_channels=1, out_channels=8, kernel_size=2, stride=2),  # 11 29

                                     nn.Conv2d(in_channels=8, out_channels=16, kernel_size=3),  # 9 27
                                     nn.BatchNorm2d(16),
                                     nn.ReLU(),

                                     nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3),  # 7 25
                                     nn.BatchNorm2d(32),
                                     nn.ReLU(),

                                     nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, stride=2),  # 3 12
                                     nn.BatchNorm2d(32),
                                     nn.ReLU(),
                                     )
        self.decoder = nn.Sequential(nn.ConvTranspose2d(in_channels=32, out_channels=32, kernel_size=3, stride=2),
                                     nn.BatchNorm2d(32),
                                     nn.ReLU(),

                                     nn.ConvTranspose2d(in_channels=32, out_channels=16, kernel_size=3),
                                     nn.BatchNorm2d(16),
                                     nn.ReLU(),

                                     nn.ConvTranspose2d(in_channels=16, out_channels=8, kernel_size=3),
                                     nn.BatchNorm2d(8),
                                     nn.ReLU(),

                                     nn.ConvTranspose2d(in_channels=8, out_channels=1, kernel_size=2, stride=2),
                                     nn.BatchNorm2d(1),
                                     nn.Sigmoid(),
                                     )

    def forward(self, data: torch.FloatTensor):
        encode = self.encoder(data)
        decode = self.decoder(encode)
        return encode, decode

net = Net()
net.to(device)

optimizer = torch.optim.Adam(net.parameters(), lr=0.01)
loss_func = nn.MSELoss()

for epoch in range(50):
    image_prediction = net(image_tensor)
    loss = loss_func(image_prediction[1], image_tensor)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if epoch % 5 == 0:
        new_img = image_prediction[1].squeeze(0).squeeze(0).cpu().detach().numpy()
        cv2.imwrite('C:\\Users\\r1ngs\\Desktop\\new' +str(epoch) + '.png', new_img*255)

输出结果

0 评论