从论文到代码完成 RoIPooling

从论文到代码完成 RoIPooling

RoI Pooling
得到特征图和候选框,就会将候选框投影在特征图,然后停止一次缩放得到大小分歧的特征图,在 Faster RCNN 中,区域候选框用来预测对象是前景还是背景,这是 class head 要做的工作,而 regression 是学习到基于 anchor 的差分,也就是中心的偏移量和宽高的缩放。

在投影过程中候选框的尺寸和位置是相关于输入图像,而不是相关于特征图,首先需求将其停止转换到候选框在特征图上详细位置,然后在对提取候选框停止尺寸的缩放。

给定一个特征图和一组提议,返回汇合的特征表示。区域提议网络被用来预测对象性和回归盒的偏向(对锚点)。这些偏移量与 anchor 分离起来生成候选框。这些倡议通常是输入图像的大小而不是特征层的大小。因而,这些倡议需求按比例减少到特征图层,之所以这样做,以便下游的CNN层可以提取特征。

我们在原图上有一个尺寸,也就是候选框中心点的坐标以及宽度,首先我们投影在原图上坐标点除以下采样的倍数,也就是 32 倍下采样,假如坐标无法整除则停止取整操作。

import numpy as np
import torch
import torch.nn as nn
floattype = torch.cuda.FloatTensor
class TorchROIPool(object):
 def __init__(self, output_size, scaling_factor):
 #输出特征图的尺寸
 self.output_size = output_size
 #缩放比率
 self.scaling_factor = scaling_factor
 def _roi_pool(self, features):
 """
 在给定的缩放提取特征图根底,返回固定大小的特征图
 Args:
 features (np.Array): 
 """
 # 特征图的通道数、高 和 宽
 num_channels, h, w = features.shape
 
 # 计算步长
 w_stride = w/self.output_size
 h_stride = h/self.output_size
 # 
 res = torch.zeros((num_channels, self.output_size, self.output_size))
 res_idx = torch.zeros((num_channels, self.output_size, self.output_size))
 
 for i in range(self.output_size):
 for j in range(self.output_size):
 
 # important to round the start and end, and then conver to int
 # 
 w_start = int(np.floor(j*w_stride))
 w_end = int(np.ceil((j+1)*w_stride))
 h_start = int(np.floor(i*h_stride))
 h_end = int(np.ceil((i+1)*h_stride))
 # limiting start and end based on feature limits
 # 
 w_start = min(max(w_start, 0), w)
 w_end = min(max(w_end, 0), w)
 h_start = min(max(h_start, 0), h)
 h_end = min(max(h_end, 0), h)
 patch = features[:, h_start: h_end, w_start: w_end]
 max_val, max_idx = torch.max(patch.reshape(num_channels, -1), dim=1)
 res[:, i, j] = max_val
 res_idx[:, i, j] = max_idx
 return res, res_idx
 def __call__(self, feature_layer, proposals):
 """Given feature layers and a list of proposals, it returns pooled
 respresentations of the proposals. Proposals are scaled by scaling factor
 before pooling.
 Args:
 feature_layer (np.Array): 特征层尺寸
 proposals (list of np.Array): 列表中每一个元素 Each element of the list represents a bounding
 box as (w,y,w,h)
 Returns:
 np.Array: proposal 数量,通道数,输出特征图高度, self.output_size
 """
 batch_size, num_channels, _, _ = feature_layer.shape
 # first scale proposals based on self.scaling factor 
 scaled_proposals = torch.zeros_like(proposals)
 # the rounding by torch.ceil is important for ROI pool
 scaled_proposals[:, 0] = torch.ceil(proposals[:, 0] * self.scaling_factor)
 scaled_proposals[:, 1] = torch.ceil(proposals[:, 1] * self.scaling_factor)
 scaled_proposals[:, 2] = torch.ceil(proposals[:, 2] * self.scaling_factor)
 scaled_proposals[:, 3] = torch.ceil(proposals[:, 3] * self.scaling_factor)
 res = torch.zeros((len(proposals), num_channels, self.output_size,
 self.output_size))
 res_idx = torch.zeros((len(proposals), num_channels, self.output_size,
 self.output_size))
 
 # 遍历候选框
 for idx in range(len(proposals)):
 #
 proposal = scaled_proposals[idx]
 # adding 1 to include the end indices from proposal
 extracted_feat = feature_layer[0, :, proposal[1].to(dtype=torch.int8):proposal[3].to(dtype=torch.int8)+1, proposal[0].to(dtype=torch.int8):proposal[2].to(dtype=torch.int8)+1]
 res[idx], res_idx[idx] = self._roi_pool(extracted_feat)
 return res
作者:才高八斗的鸵鸟原文地址:https://segmentfault.com/a/1190000041811641

%s 个评论

要回复文章请先登录注册