如何在caffe scale layer中添加新的Layer

点击联系发帖人 时间：2016-05-21 11:08

caffe eltwise layer

怎么在Caffe中配置每一个层的结构 - 综合当前位置:& &&&怎么在Caffe中配置每一个层的结构怎么在Caffe中配置每一个层的结构&&网友分享于：&&浏览：0次如何在Caffe中配置每一个层的结构最近刚在电脑上装好Caffe，由于神经网络中有不同的层结构，不同类型的层又有不同的参数，所有就根据Caffe官网的说明文档做了一个简单的总结。
1. Vision Layers
1.1 卷积层(Convolution)
类型：CONVOLUTION
name: &conv1&
type: CONVOLUTION
bottom: &data&
top: &conv1&
blobs_lr: 1
# learning rate multiplier for the filters
blobs_lr: 2
# learning rate multiplier for the biases
weight_decay: 1
# weight decay multiplier for the filters
weight_decay: 0
# weight decay multiplier for the biases
convolution_param {
num_output: 96
# learn 96 filters
kernel_size: 11
# each filter is 11x11
# step 4 pixels between each filter application
weight_filler {
type: &gaussian& # initialize the filters from a Gaussian
# distribution with stdev 0.01 (default mean: 0)
bias_filler {
type: &constant& # initialize the biases to zero (0)
blobs_lr:&学习率调整的参数，在上面的例子中设置权重学习率和运行中求解器给出的学习率一样，同时是偏置学习率为权重的两倍。&
weight_decay：
卷积层的重要参数
必须参数：
num_output (c_o)：过滤器的个数
kernel_size (or kernel_h and kernel_w)：过滤器的大小
可选参数：
weight_filler [default type: 'constant' value: 0]：参数的初始化方法
bias_filler：偏置的初始化方法
bias_term [default true]：指定是否是否开启偏置项
pad (or pad_h and pad_w) [default 0]：指定在输入的每一边加上多少个像素
stride (or stride_h and stride_w) [default 1]：指定过滤器的步长
group (g) [default 1]: If g & 1, we restrict the connectivityof each filter to a subset of the input. Specifically, the input and outputchannels are separated into g groups, and the ith output group channels will beonly connected to the
ith input group channels.
通过卷积后的大小变化：
输入：n * c_i * h_i * w_i
输出：n * c_o * h_o * w_o，其中h_o = (h_i + 2 * pad_h - kernel_h) /stride_h + 1，w_o通过同样的方法计算。
1.2 池化层（Pooling）
类型：POOLING
name: &pool1&
type: POOLING
bottom: &conv1&
top: &pool1&
pooling_param {
kernel_size: 3 # pool over a 3x3 region
# step two pixels (in the bottom blob) between pooling regions
卷积层的重要参数
必需参数：
kernel_size (or kernel_h and kernel_w)：过滤器的大小
可选参数：
pool [default MAX]：pooling的方法，目前有MAX, AVE, 和STOCHASTIC三种方法
pad (or pad_h and pad_w) [default 0]：指定在输入的每一遍加上多少个像素
stride (or stride_h and stride_w) [default1]：指定过滤器的步长
通过池化后的大小变化：
输入：n * c_i * h_i * w_i
输出：n * c_o * h_o * w_o，其中h_o = (h_i + 2 * pad_h - kernel_h) /stride_h + 1，w_o通过同样的方法计算。
1.3 Local Response Normalization (LRN)
Local ResponseNormalization是对一个局部的输入区域进行的归一化（激活a被加一个归一化权重（分母部分）生成了新的激活b），有两种不同的形式，一种的输入区域为相邻的channels（cross channel LRN），另一种是为同一个channel内的空间区域（within channel LRN）
计算公式：对每一个输入除以
可选参数：
local_size [default 5]：对于cross channel LRN为需要求和的邻近channel的数量；对于within channel LRN为需要求和的空间区域的边长
alpha [default 1]：scaling参数
beta [default 5]：指数
norm_region [default ACROSS_CHANNELS]: 选择哪种LRN的方法ACROSS_CHANNELS 或者WITHIN_CHANNEL
2. Loss Layers
深度学习是通过最小化输出和目标的Loss来驱动学习。
2.1 Softmax
类型: SOFTMAX_LOSS
2.2 Sum-of-Squares / Euclidean
类型: EUCLIDEAN_LOSS
2.3 Hinge / Margin
类型: HINGE_LOSS
name: &loss&
type: HINGE_LOSS
bottom: &pred&
bottom: &label&
name: &loss&
type: HINGE_LOSS
bottom: &pred&
bottom: &label&
top: &loss&
hinge_loss_param {
可选参数：
norm [default L1]: 选择L1或者 L2范数
n * c * h * wPredictions
n * 1 * 1 * 1Labels
1 * 1 * 1 * 1Computed Loss
2.4 Sigmoid Cross-Entropy
类型：SIGMOID_CROSS_ENTROPY_LOSS
2.5 Infogain
类型：INFOGAIN_LOSS
2.6 Accuracy and Top-k
类型：ACCURACY&
用来计算输出和目标的正确率，事实上这不是一个loss，而且没有backward这一步。
3. 激励层（Activation / Neuron Layers）
一般来说，激励层是element-wise的操作，输入和输出的大小相同，一般情况下就是一个非线性函数。
3.1 ReLU / Rectified-Linear and Leaky-ReLU
类型: RELU
name: &relu1&
type: RELU
bottom: &conv1&
top: &conv1&
可选参数：
negative_slope [default 0]:指定输入值小于零时的输出。
ReLU是目前使用做多的激励函数，主要因为其收敛更快，并且能保持同样效果。
标准的ReLU函数为max(x, 0)，而一般为当x & 0时输出x，但x &= 0时输出negative_slope。RELU层支持in-place计算，这意味着bottom的输出和输入相同以避免内存的消耗。
3.2 Sigmoid
类型: SIGMOID
name: &encode1neuron&
bottom: &encode1&
top: &encode1neuron&
type: SIGMOID
SIGMOID 层通过 sigmoid(x) 计算每一个输入x的输出，函数如下图。
3.3 TanH / Hyperbolic Tangent
类型: TANH
name: &encode1neuron&
bottom: &encode1&
top: &encode1neuron&
type: SIGMOID
TANH层通过 tanh(x) 计算每一个输入x的输出，函数如下图。
3.3 Absolute Value
类型: ABSVAL
name: &layer&
bottom: &in&
top: &out&
type: ABSVAL
}ABSVAL层通过 abs(x) 计算每一个输入x的输出。
类型: POWER
name: &layer&
bottom: &in&
top: &out&
type: POWER
power_param {
可选参数：
power [default 1]
scale [default 1]
shift [default 0]
POWER层通过 (shift + scale * x) ^ power计算每一个输入x的输出。
类型: BNLL
name: &layer&
bottom: &in&
top: &out&
type: BNLL
BNLL (binomial normal log likelihood) 层通过 log(1 + exp(x)) 计算每一个输入x的输出。
4. 数据层（Data Layers）
数据通过数据层进入Caffe，数据层在整个网络的底部。数据可以来自高效的数据库（LevelDB 或者 LMDB），直接来自内存。如果不追求高效性，可以以HDF5或者一般图像的格式从硬盘读取数据。
4.1 Database
类型：DATA
必须参数：
source:包含数据的目录名称
batch_size:一次处理的输入的数量
可选参数：
rand_skip:在开始的时候从输入中跳过这个数值，这在异步随机梯度下降（SGD）的时候非常有用
backend [default LEVELDB]: 选择使用 LEVELDB 或者 LMDB
4.2 In-Memory
类型: MEMORY_DATA
必需参数：
batch_size, channels, height, width: 指定从内存读取数据的大小
The memory data layer reads data directly from memory, without copying it. In order to use it, one must call MemoryDataLayer::Reset (from C++) or Net.set_input_arrays (from Python) in order to specify a source of contiguous data (as 4D row major array), which
is read one batch-sized chunk at a time.
4.3 HDF5 Input
类型: HDF5_DATA
必要参数：
source:需要读取的文件名
batch_size：一次处理的输入的数量
4.4 HDF5 Output
类型: HDF5_OUTPUT
必要参数：
file_name: 输出的文件名
HDF5的作用和这节中的其他的层不一样，它是把输入的blobs写到硬盘
4.5 Images
类型: IMAGE_DATA
必要参数：
source: text文件的名字，每一行给出一张图片的文件名和label
batch_size: 一个batch中图片的数量
可选参数：
rand_skip：在开始的时候从输入中跳过这个数值，这在异步随机梯度下降（SGD）的时候非常有用
shuffle [default false]
new_height, new_width: 把所有的图像resize到这个大小
4.6 Windows
类型：WINDOW_DATA
类型：DUMMY_DATA
Dummy 层用于development 和debugging。具体参数DummyDataParameter。
5. 一般层（Common Layers）
5.1 全连接层Inner Product
类型：INNER_PRODUCT
例子：layers {
name: &fc8&
type: INNER_PRODUCT
blobs_lr: 1
# learning rate multiplier for the filters
blobs_lr: 2
# learning rate multiplier for the biases
weight_decay: 1
# weight decay multiplier for the filters
weight_decay: 0
# weight decay multiplier for the biases
inner_product_param {
num_output: 1000
weight_filler {
type: &gaussian&
bias_filler {
type: &constant&
bottom: &fc7&
top: &fc8&
必要参数：
num_output&(c_o)：过滤器的个数
可选参数：
weight_filler [default type: 'constant' value: 0]：参数的初始化方法
bias_filler：偏置的初始化方法
bias_term [default true]：指定是否是否开启偏置项
通过全连接层后的大小变化：
输入：n * c_i * h_i * w_i
输出：n * c_o * 1 *1
5.2 Splitting
类型：SPLIT
Splitting层可以把一个输入blob分离成多个输出blobs。这个用在当需要把一个blob输入到多个输出层的时候。
5.3 Flattening
类型：FLATTEN
Flattening是把一个输入的大小为n * c * h * w变成一个简单的向量，其大小为 n * (c*h*w) * 1 * 1。
5.4 Concatenation
类型：CONCAT
name: &concat&
bottom: &in1&
bottom: &in2&
top: &out&
type: CONCAT
concat_param {
concat_dim: 1
可选参数：
concat_dim [default 1]：0代表链接num，1代表链接channels
通过全连接层后的大小变化：
输入：从1到K的每一个blob的大小n_i * c_i * h * w
如果concat_dim = 0: (n_1 + n_2 + ... + n_K) *c_1 * h * w，需要保证所有输入的c_i 相同。
如果concat_dim = 1: n_1 * (c_1 + c_2 + ... +c_K) * h * w，需要保证所有输入的n_i 相同。
通过Concatenation层，可以把多个的blobs链接成一个blob。
5.5 Slicing
The SLICE layer is a utility layer that slices an input layer to multiple output layers along a given dimension (currently num or channel only) with given slice indices.
5.6 Elementwise Operations
类型：ELTWISE
5.7 Argmax
类型：ARGMAX
5.8 Softmax
类型：SOFTMAX
5.9 Mean-Variance Normalization
12345678910
12345678910
12345678910 上一篇：下一篇：文章评论相关解决方案 1234567891011 Copyright & &&版权所有君，已阅读到文档的结尾了呢~~
扫扫二维码，随身浏览文档
手机或平板扫扫即可继续访问
caffe 层代码的相关学习
举报该文档为侵权文档。
举报该文档含有违规或不良信息。
反馈该文档无法正常浏览。
举报该文档为重复文档。
推荐理由：
将文档分享至：
分享完整地址
文档地址：
粘贴到BBS或博客
flash地址：
支持嵌入FLASH地址的网站使用
html代码：
&embed src='/DocinViewer--144.swf' width='100%' height='600' type=application/x-shockwave-flash ALLOWFULLSCREEN='true' ALLOWSCRIPTACCESS='always'&&/embed&
450px*300px480px*400px650px*490px
支持嵌入HTML代码的网站使用
您的内容已经提交成功
您所提交的内容需要审核后才能发布，请您等待！
3秒自动关闭窗口caffe添加HeatmapData层浅析（二）
经过加caffe添加PrecisionRecallLosslayer层（一）的学习，再继续进行学习：
本文以中所实现的data_heatma.cpp和data_heatmap.hpp为例介绍如何写自己的层。
=================================================================================================================================
1、老规矩，我们现在caffe.proto中添加参数及消息：
message LayerParameter {
optional string name = 1; // the layer name
optional string type = 2; // the layer type
repeated string bottom = 3; // the name of each bottom blob
repeated string top = 4; // the name of each top blob
// The train / test phase for computation.
optional Phase phase = 10;
// The amount of weight to assign each top blob in the objective.
// Each layer assigns a default value, usually of either 0 or 1,
// to each top blob.
repeated float loss_weight = 5;
// Specifies training parameters (multipliers on global learning constants,
// and the name and other settings used for weight sharing).
repeated ParamSpec param = 6;
// The blobs containing the numeric parameters of the layer.
repeated BlobProto blobs = 7;
// Specifies on which bottoms the backpropagation should be skipped.
// The size must be either 0 or equal to the number of bottoms.
repeated bool propagate_down = 11;
// Rules controlling whether and when a layer is included in the network,
// based on the current NetState.
You may specify a non-zero number of rules
// to include OR exclude, but not both.
If no include or exclude rules are
// specified, the layer is always included.
If the current NetState meets
// ANY (i.e., one or more) of the specified rules, the layer is
// included/excluded.
repeated NetStateRule include = 8;
repeated NetStateRule exclude = 9;
// Parameters for data pre-processing.
optional TransformationParameter transform_param = 100;
// Parameters shared by loss layers.
optional LossParameter loss_param = 101;
// Options to allow visualisation可视化层的参数，就这两货哈
optional bool visualise = 200 [ default = false ];
optional uint32 visualise_channel = 201 [ default = 0 ];
// Layer type-specific parameters.
// Note: certain layers may have more than one computational engine
// for their implementation. These layers include an Engine type and
// engine parameter for selecting the implementation.
// The default for the engine is set by the ENGINE switch at compile-time.
optional AccuracyParameter accuracy_param = 102;
optional ArgMaxParameter argmax_param = 103;
optional BatchNormParameter batch_norm_param = 139;
optional BiasParameter bias_param = 141;
optional ConcatParameter concat_param = 104;
optional ContrastiveLossParameter contrastive_loss_param = 105;
optional ConvolutionParameter convolution_param = 106;
optional CropParameter crop_param = 144;
optional DataParameter data_param = 107;
optional DropoutParameter dropout_param = 108;
optional DummyDataParameter dummy_data_param = 109;
optional EltwiseParameter eltwise_param = 110;
optional ELUParameter elu_param = 140;
optional EmbedParameter embed_param = 137;
optional ExpParameter exp_param = 111;
optional FlattenParameter flatten_param = 135;
optional HeatmapDataParameter heatmap_data_param = 145;// 加入自己层的参数
optional HDF5DataParameter hdf5_data_param = 112;
optional HDF5OutputParameter hdf5_output_param = 113;
optional HingeLossParameter hinge_loss_param = 114;
optional ImageDataParameter image_data_param = 115;
optional InfogainLossParameter infogain_loss_param = 116;
optional InnerProductParameter inner_product_param = 117;
optional InputParameter input_param = 143;
optional LogParameter log_param = 134;
optional LRNParameter lrn_param = 118;
optional MemoryDataParameter memory_data_param = 119;
optional MVNParameter mvn_param = 120;
optional PoolingParameter pooling_param = 121;
optional PowerParameter power_param = 122;
optional PReLUParameter prelu_param = 131;
optional Parameter python_param = 130;
optional ReductionParameter reduction_param = 136;
optional ReLUParameter relu_param = 123;
optional ReshapeParameter reshape_param = 133;
optional ScaleParameter scale_param = 142;
optional SigmoidParameter sigmoid_param = 124;
optional SoftmaxParameter softmax_param = 125;
optional SPPParameter spp_param = 132;
optional SliceParameter slice_param = 126;
optional TanHParameter tanh_param = 127;
optional ThresholdParameter threshold_param = 128;
optional TileParameter tile_param = 138;
optional WindowDataParameter window_data_param = 129;
顺便在这个layer参数后面添加HeatmapDataParameter消息：
// VGG heatmap params 自己层的参数
message HeatmapDataParameter {
optional bool segmentation = 1000 [default = false];
optional uint32 multfact = 1001 [default = 1];
optional uint32 num_channels = 1002 [default = 3];
optional uint32 batchsize = 1003;
optional string root_img_dir = 1004;
optional bool random_crop = 1005;
// image augmentation type
optional bool sample_per_cluster = 1006;
// image sampling type
optional string labelinds = 1007 [default = ''];
// if specified, only use these regression variables
optional string source = 1008;
optional string meanfile = 1009;
optional string crop_meanfile = 1010;
optional uint32 cropsize = 1011 [default = 0];
optional uint32 outsize = 1012 [default = 0];
optional float scale = 1013 [ default = 1 ];
optional uint32 label_width = 1014 [ default = 1 ];
optional uint32 label_height = 1015 [ default = 1 ];
optional bool dont_flip_first = 1016 [ default = true ];
optional float angle_max = 1017 [ default = 0 ];
optional bool flip_joint_labels = 1018 [ default = true ];
对各个参数进行解释：
segmentation 是否分割，默认是否，假设图像的分割模板在segs/目录
multfact 将ground truth中的关节乘以这个multfact，就是图像中的位置，图像中的位置除以这个就是关节的位置，默认是1，也就是说关节的坐标与图像的坐标是一致大小的
num_channels 图像的channel数默认是3
batchsize batch大小
root_img_dir 存放图像文件的根目录
random_crop 是否需要随机crop图像（如果true则做随机crop，否则做中心crop）
sample_per_cluster 图像采样的类型（是否均匀地在clusters上采样）
labelinds 类标索引（只使用回归变量才设置这个）
source 存放打乱文件顺序之后的文件路径的txt文件
meanfile 平均值文件路径
crop_meanfile crop之后的平均值文件路径
cropsize crop的大小
outsize 默认是0（就是crop出来之后的图像会缩放的因子，0表示不缩放）
scale 默认是1，实际上就是一系列预处理（去均值、crop、缩放之后的像素值乘以该scale得到最终的图像的）
label_width heatmap的宽
label_height heatmap的高
dont_flip_first 不要对调第一个关节的位置，默认是true
angle_max 对图像进行旋转的最大角度，用于增强数据的，默认是0度
flip_joint_labels 默认是true（即水平翻转，将左右的关节对调）
还有可视化的测试参数设置：
// Update the next available ID when you add a new LayerParameter field.
// LayerParameter next available layer-specific ID: 139 (last added: tile_param)
message LayerParameter {
optional string name = 1; // the layer name
optional string type = 2; // the layer type
repeated string bottom = 3; // the name of each bottom blob
repeated string top = 4; // the name of each top blob
// The train / test phase for computation.
optional Phase phase = 10;
// The amount of weight to assign each top blob in the objective.
// Each layer assigns a default value, usually of either 0 or 1,
// to each top blob.
repeated float loss_weight = 5;
// Specifies training parameters (multipliers on global learning constants,
// and the name and other settings used for weight sharing).
repeated ParamSpec param = 6;
// The blobs containing the numeric parameters of the layer.
repeated BlobProto blobs = 7;
// Specifies on which bottoms the backpropagation should be skipped.
// The size must be either 0 or equal to the number of bottoms.
repeated bool propagate_down = 11;
// Rules controlling whether and when a layer is included in the network,
// based on the current NetState.
You may specify a non-zero number of rules
// to include OR exclude, but not both.
If no include or exclude rules are
// specified, the layer is always included.
If the current NetState meets
// ANY (i.e., one or more) of the specified rules, the layer is
// included/excluded.
repeated NetStateRule include = 8;
repeated NetStateRule exclude = 9;
// Parameters for data pre-processing.
optional TransformationParameter transform_param = 100;
// Parameters shared by loss layers.
optional LossParameter loss_param = 101;
// Options to allow visualisation可视化层的参数，
optional bool visualise = 200 [ default = false ];
optional uint32 visualise_channel = 201 [ default = 0 ];
还有一部分前面没有提到的部分就是V1LayerParameter，在这个里面添加两个我注释内容，这部分是为caffe的扩展提供了很好的帮助，但是作者在实现更新的upgrade_proto文件中，写的风格有点不符合前面风格了，全是if。。。。。
// DEPRECATED: use LayerParameter.
message V1LayerParameter {
repeated string bottom = 2;
repeated string top = 3;
optional string name = 4;
repeated NetStateRule include = 32;
repeated NetStateRule exclude = 33;
enum LayerType {
ABSVAL = 35;
ACCURACY = 1;
ARGMAX = 30;
CONCAT = 3;
CONTRASTIVE_LOSS = 37;
CONVOLUTION = 4;
DATA_HEATMAP=40;///////////自己添加
DECONVOLUTION = 39;
DROPOUT = 6;
DUMMY_DATA = 32;
EUCLIDEAN_LOSS = 7;
ELTWISE = 25;
FLATTEN = 8;
HDF5_DATA = 9;
HDF5_OUTPUT = 10;
HINGE_LOSS = 28;
IM2COL = 11;
IMAGE_DATA = 12;
INFOGAIN_LOSS = 13;
INNER_PRODUCT = 14;
MEMORY_DATA = 29;
MULTINOMIAL_LOGISTIC_LOSS = 16;
POOLING = 17;
POWER = 26;
RELU = 18;
SIGMOID = 19;
SIGMOID_CROSS_ENTROPY_LOSS = 27;
SILENCE = 36;
SOFTMAX = 20;
SOFTMAX_LOSS = 21;
SPLIT = 22;
SLICE = 33;
TANH = 23;
WINDOW_DATA = 24;
THRESHOLD = 31;
optional LayerType type = 5;
repeated BlobProto blobs = 6;
repeated string param = 1001;
repeated DimCheckMode blob_share_mode = 1002;
enum DimCheckMode {
STRICT = 0;
PERMISSIVE = 1;
repeated float blobs_lr = 7;
repeated float weight_decay = 8;
repeated float loss_weight = 35;
optional AccuracyParameter accuracy_param = 27;
optional ArgMaxParameter argmax_param = 23;
optional ConcatParameter concat_param = 9;
optional ContrastiveLossParameter contrastive_loss_param = 40;
optional ConvolutionParameter convolution_param = 10;
optional DataParameter data_param = 11;
optional HeatmapDataParameter heatmap_data_param = 43;// 加入自己层的参数
optional DropoutParameter dropout_param = 12;
optional DummyDataParameter dummy_data_param = 26;
optional EltwiseParameter eltwise_param = 24;
optional ExpParameter exp_param = 41;
optional HDF5DataParameter hdf5_data_param = 13;
optional HDF5OutputParameter hdf5_output_param = 14;
optional HingeLossParameter hinge_loss_param = 29;
optional ImageDataParameter image_data_param = 15;
optional InfogainLossParameter infogain_loss_param = 16;
optional InnerProductParameter inner_product_param = 17;
optional LRNParameter lrn_param = 18;
optional MemoryDataParameter memory_data_param = 22;
optional MVNParameter mvn_param = 34;
optional PoolingParameter pooling_param = 19;
optional PowerParameter power_param = 21;
optional ReLUParameter relu_param = 30;
optional SigmoidParameter sigmoid_param = 38;
optional SoftmaxParameter softmax_param = 39;
optional SliceParameter slice_param = 31;
optional TanHParameter tanh_param = 37;
optional ThresholdParameter threshold_param = 25;
optional WindowDataParameter window_data_param = 20;
optional TransformationParameter transform_param = 36;
optional LossParameter loss_param = 42;
optional V0LayerParameter layer = 1;
2、参数添加好之后就是heatmapdata层声明和实现部分：
在介绍实现之前需要给出我们的训练数据的样子，看完参数，看一下训练的数据的格式理解一下：
下面给出一个样例：
train/FILE.jpg 123,144,165,123,66,22 372.296,720,1,480,0.53333 0
下面对样例做出解释
参数之间是以空格分隔
第一个参数是图像的路径：train/FILE.jpg
第二个参数是关节坐标：123,144,165,123,66,22
第三个参数是crop和scale的参数，分别为x_left,x_right,y_left,y_right,scaling_fact：372.296,720,1,480,0.53333
注意：第三个参数的crop的坐标其实上针对的是mean图像的，在mean图像中进行crop，然后放大到与原始图像一样大小，然后原始图像减去经过crop且放大之后的mean图像。这样在对原始图像进行crop的时候就不用担心了
第四个参数是是否cluster,是否均匀地在训练中采样图像： 0
crop在配置文件中的部分：
transform_param {
mirror: true
crop_size: 227
mean_file: &data/ilsvrc12/imagenet_mean.binaryproto&
上面是 caffeNet的数据层的定义,看得出用了镜像和crop_size,还定义了 mean_file利用crop_size这种方式可以剪裁中心关注点和边角特征,mirror可以产生镜像,弥补小数据集的不足.git-issues里面有人问道这个crop_size和 mean_file的问题,一开始的时候是不能定义了crop,又用mean_file的,后来改进了.并且,这个mean_file和crop_size没什么大关系.只要你这个mean_file是根据你的训练集制作出来的就可以.应该是先通过mean_file处理一遍数据集,再进行crop操作.用python接口去调用 python/caffe/ 下的ilsvrc_2012_mean.npy这个文件,显示一下它的 shape,得到 3*256*256,说明,这个mean_file是根据原数据集制作的,和crop_size 的 227 不一致,但是不影响训练.这样,就可以先根据原数据集做出mean_file,再设计想要crop的尺寸,而不用担心尺寸不一致的问题了。
声明部分data_heatmap.hpp：
// Copyright 2014 Tomas Pfister
#ifndef CAFFE_HEATMAP_HPP_
#define CAFFE_HEATMAP_HPP_
#include &caffe/layer.hpp&
#include &caffe/common.hpp&
#include &caffe/data_transformer.hpp&
#include &caffe/filler.hpp&
#include &caffe/internal_thread.hpp&
#include &caffe/proto/caffe.pb.h&
namespace caffe
// 继承自PrefetchingDataLayer
class DataHeatmapLayer: public BasePrefetchingDataLayer
explicit DataHeatmapLayer(const LayerParameter& param)
: BasePrefetchingDataLayer(param) {}
virtual ~DataHeatmapLayer();
virtual void DataLayerSetUp(const vector<blob*&& bottom,
const vector<blob*&& top);
virtual inline const char* type() const { return &DataHeatmap&; }
virtual inline int ExactNumBottomBlobs() const { return 0; }
virtual inline int ExactNumTopBlobs() const { return 2; }
protected:
// 虚函数，就是实际读取一批数据到Batch中
virtual void load_batch(Batch* batch);
// 以下都是自己定义的要使用的函数，都在load_batch中被调用了
// Filename of current image
inline void GetCurImg(string& img_name, std::vector& img_class, std::vector& crop_info, int& cur_class);
inline void AdvanceCurImg();
// Visualise point annotations
inline void VisualiseAnnotations(cv::Mat img_annotation_vis, int numChannels, std::vector& cur_label, int width);
// Random number generator
inline float Uniform(const float min, const float max);
// Rotate image for augmentation
inline cv::Mat RotateImage(cv::Mat src, float rotation_angle);
// Global vars
shared_ptr rng_data_;
shared_ptr prefetch_rng_;
vector<std::pair & lines_;
int lines_id_;
int datum_channels_;
int datum_height_;
int datum_width_;
int datum_size_;
int num_means_;
int cur_class_;
vector labelinds_;
// 图像均值的vector容器，其中存放的是每个视频的均值
vector mean_img_;
// 是否需要减去每个视频的均值
bool sub_mean_;
// true if the mean should be subtracted
// 是否对在每个类进行均匀采样
bool sample_per_cluster_; // sample separately per cluster?
string root_img_dir_;
// 如果开启sample_per_cluster_则该vector中放的就是在该类别中随机采样的图像的索引
// 举个例子，如果类别1的图像的个数是10个，那么就随机生成[0,9]之间的一个数作为采样的图像的索引
// 从类别1中将该图像取出进行处理，就是sample_per_cluster_=true的含义
// 这个数组实际上就是从类别到该类别的随机的一个图像编号的映射
vector cur_class_img_; // current class index
// 当前图像的索引，处理的时候用
int cur_img_; // current image index
// 图像索引（图像的编号从0开始）到类别的映射
vector img_idx_map_; // current image indices for each class
// array of lists: one list of image names per class
// 这么一长串这么吓人
// 分解开来看，要访问的时候
// 最外层首先要提供索引，因为第一个类型是vector
// 第二层还是vector，所以还是需要索引才能访问
第三层是pair，访问第一个可以用first，第二个用second
// 如果第三层是first，则第四层直接就是string的值了
// 如果第三层是second，则第四层就是pair，那么可以用first或者用second
// 如果第四层是first，那么第五层就可以用索引访问
// 如果第四层是second，那么第五层就直接是int值
vector& vector& pair, pair<vector, int& & & & & img_list_;
// vector of (image, label) pairs
// 外层是vector，所以用索引
// 第二层是pair，所以用first或者second
// 第三层是pair，所以继续用first或者second
// 第四层是vector或者pair，如果第三层的是first，那么第四层就可以用索引访问
// 如果第三层是second，那么第四层就直接得到值了
vector& pair, pair<vector, int& & & & img_label_list_;
#endif /* CAFFE_HEATMAP_HPP_ */</vector</vector</std::pair</blob</blob
<blob<blob<std::pair<vector<vector
在介绍详细实现之前看看整体流程：
1）首先在SetUp该函数中读取，proto中的参数，从而获得一批数据的大小、heatmap的长和宽，对图像进行切割的大小，以及切割后的图像需要缩放到多大，还有就是是否需要对每个类别的图像进行采样、放置图像的根目录等信息。
此外还读取每个图像文件的路径、关节的坐标位置、crop的位置、是否进行采样。
如果在每个类上进行采样，还会生成一个数组，该数组对应的是图像的类别索引与图像的索引之间的映射。
此外还从文件中读取每个视频的mean，然后将所读取的mean放到vector容器中，便于在读取数据的时候从图像中取出mean。最后还会设置top的形状
2）在load_batch这个函数中就是真正地读取数据，并且对数据进行预处理，预处理主要是是否对图像进行分割，对平均值图像进行切割，并将切割的图像块放大到图像的大小，然后用图像减去该段视频切割并方法的平均值图像。减去均值大牛都说可以提升3个点，具体为什么我也不是很清楚。
实现.cpp文件部分：
// Copyright 2015 Tomas Pfisterimg
// NOLINT(readability/streams)
// NOLINT(readability/streams)
#include &caffe/layers/data_layer.hpp&
#include &caffe/layer.hpp&
#include &caffe/util/io.hpp&
#include &caffe/util/math_functions.hpp&
#include &caffe/util/rng.hpp&
#include &caffe/layers/data_heatmap.hpp&
#include &caffe/util/benchmark.hpp&
namespace caffe
DataHeatmapLayer::~DataHeatmapLayer() {
this-&StopInternalThread();
// 读取参数文件中的一些数据什么的，然后初始化
void DataHeatmapLayer::DataLayerSetUp(const vector<blob*&& bottom,
const vector<blob*&& top) {
HeatmapDataParameter heatmap_data_param = this-&layer_param_.heatmap_data_param();
// Shortcuts
// 类标索引字符串（也就是关节类型？）
const std::string labelindsStr = heatmap_data_param.labelinds();
// batchsize
const int batchsize = heatmap_data_param.batchsize();
// heatmap的宽度
const int label_width = heatmap_data_param.label_width();
// heatmap的高度
const int label_height = heatmap_data_param.label_height();
// crop的大小
const int size = heatmap_data_param.cropsize();
// crop之后再次进行resize之后的大小
const int outsize = heatmap_data_param.outsize();
label的batchsize
const int label_batchsize =
// 每个cluster都要进行采样
sample_per_cluster_ = heatmap_data_param.sample_per_cluster();
// 存放图像文件的根路径
root_img_dir_ = heatmap_data_param.root_img_dir();
// initialise rng seed
const unsigned int rng_seed = caffe_rng_rand();
srand(rng_seed);
// get label inds to be used for training
// 载入类标索引
std::istringstream labelss(labelindsStr);
LOG(INFO) && &using joint inds:&;
while (labelss)
if (!std::getline(labelss, s, ','))
labelinds_.push_back(atof(s.c_str()));
LOG(INFO) && atof(s.c_str());
// load GT
// shuffle file
// 载入ground truth文件，即关节坐标文件
std::string gt_path = heatmap_data_param.source();
LOG(INFO) && &Loading annotation from & && gt_
std::ifstream infile(gt_path.c_str());
string img_name, labels, cropInfos, clusterClassS
if (!sample_per_cluster_)// 是否根据你指定的类别随机取图像
// sequential sampling
// 文件名，关节位置坐标，crop的位置，是否均匀地在clusters上采样
while (infile && img_name && labels && cropInfos && clusterClassStr)
// read comma-separated list of regression labels
// 读取关节位置坐标
std::vector
std::istringstream ss(labels);
int labelCounter = 1;
while (ss)
// 读取一个数字
if (!std::getline(ss, s, ','))
// 是否是类标索引中的值
// 如果labelinds为空或者为不为空在其中找到
if (labelinds_.empty() || std::find(labelinds_.begin(), labelinds_.end(), labelCounter) != labelinds_.end())
label.push_back(atof(s.c_str()));
labelCounter++;// 个数
// read cropping info
// 读取crop的信息
std::vector
std::istringstream ss2(cropInfos);
while (ss2)
if (!std::getline(ss2, s, ','))
cropInfo.push_back(atof(s.c_str()));
int clusterClass = atoi(clusterClassStr.c_str());
// 图像路径，关节坐标，crop信息、类别
img_label_list_.push_back(std::make_pair(img_name, std::make_pair(label, std::make_pair(cropInfo, clusterClass))));
// initialise image counter to 0
cur_img_ = 0;
// uniform sampling w.r.t. classes
// 根据类别均匀采样
// 也就是说图像有若干个类别，然后每个类别下有若干个图像
// 随机取其中一个图像
while (infile && img_name && labels && cropInfos && clusterClassStr)
// 获得你指定的类别
// 如果你制定为0
int clusterClass = atoi(clusterClassStr.c_str());
if (clusterClass + 1 & img_list_.size())
// expand the array
img_list_.resize(clusterClass + 1);
// read comma-separated list of regression labels
// 读取关节的坐标位置到label这个vector
std::vector
std::istringstream ss(labels);
int labelCounter = 1;
while (ss)
if (!std::getline(ss, s, ','))
if (labelinds_.empty() || std::find(labelinds_.begin(), labelinds_.end(), labelCounter) != labelinds_.end())
label.push_back(atof(s.c_str()));
labelCounter++;
// read cropping info
// 读取crop信息到cropinfo这个vector
std::vector
std::istringstream ss2(cropInfos);
while (ss2)
if (!std::getline(ss2, s, ','))
cropInfo.push_back(atof(s.c_str()));
// 每个clusterClass下都是一个vector，用于装各种图像
img_list_[clusterClass].push_back(std::make_pair(img_name, std::make_pair(label, std::make_pair(cropInfo, clusterClass))));
}// while结尾
// 图像的类别个数
const int num_classes = img_list_.size();
// init image sampling
cur_class_ = 0;
// cur_class_img_中存放的是某个类别中随机取到的图像的索引值
cur_class_img_.resize(num_classes);
// init image indices for each class
for (int idx_class = 0; idx_class & num_ idx_class++)
// 是否需要根据类别随机取某个类别中的一个图像
if (sample_per_cluster_)
// img_list_[idx_class].size()是该idx_class这个类中图像的个数
// 产生从0-该类中图像个数之间的一个随机数
cur_class_img_[idx_class] = rand() % img_list_[idx_class].size();
// 图像类别个数
LOG(INFO) && idx_class && & size: & && img_list_[idx_class].size();
cur_class_img_[idx_class] = 0;
if (!heatmap_data_param.has_meanfile())// 是否有meanfile
// if no mean, assume input images are RGB (3 channels)
this-&datum_channels_ = 3;
sub_mean_ =
// Implementation of per-video mean removal
// 下面整个一段代码是将每个视频mean文件读取到Mat结构
sub_mean_ =
// 从参数文件中获取mean文件的路径
string mean_path = heatmap_data_param.meanfile();
LOG(INFO) && &Loading mean file from & && mean_
BlobProto blob_proto, blob_proto2;
Blob data_
// 读取到blob,然后blob数据转换到data_mean
ReadProtoFromBinaryFile(mean_path.c_str(), &blob_proto);
data_mean.FromProto(blob_proto);
LOG(INFO) && &mean file loaded&;
// read config
this-&datum_channels_ = data_mean.channels();
// mean值的数目,有多少个视频，就有多少个mean啊
num_means_ = data_mean.num();
LOG(INFO) && &num_means: & && num_means_;
// copy the per-video mean images to an array of OpenCV structures
const Dtype* mean_buf = data_mean.cpu_data();
// extract means from beginning of proto file
// mean文件中的图像的高度
const int mean_height = data_mean.height();
// mean文件中图像的宽度
const int mean_width = data_mean.width();
// 高度数组
int mean_heights[num_means_];
// 宽度数组
int mean_widths[num_means_];
// offset in memory to mean images
在mean图像中的偏移量
const int meanOffset = 2 * (num_means_);
for (int n = 0; n & num_means_; n++)
mean_heights[n] = mean_buf[2 * n];
mean_widths[n] = mean_buf[2 * n + 1];
// save means as OpenCV-compatible files
// 将从protobin文件读取的blob存放到Mat中
// 获得mean_image容器，这其中包含了若干个视频的mean值
// 下面是分配内存
for (int n = 0; n & num_means_; n++)
cv::Mat mean_img_tmp_;
mean_img_tmp_.create(mean_heights[n], mean_widths[n], CV_32FC3);
mean_img_.push_back(mean_img_tmp_);
LOG(INFO) && &per-video mean file array created: & && n && &: & && mean_heights[n] && &x& && mean_widths[n] && & (& && size && &)&;
LOG(INFO) && &mean: & && mean_height && &x& && mean_width && & (& && size && &)&;
// 下面是实际的赋值
for (int n = 0; n & num_means_; n++)
for (int i = 0; i & mean_heights[n]; i++)
for (int j = 0; j & mean_widths[n]; j++)
for (int c = 0; c & this-&datum_channels_; c++)
mean_img_[n].at(i, j)[c] = mean_buf[meanOffset + ((n * this-&datum_channels_ + c) * mean_height + i) * mean_width + j]; //[c * mean_height * mean_width + i * mean_width + j];
LOG(INFO) && &mean file converted to OpenCV structures&;
// init data
// 改变数据形状
this-&transformed_data_.Reshape(batchsize, this-&datum_channels_, outsize, outsize);
top[0]-&Reshape(batchsize, this-&datum_channels_, outsize, outsize);
for (int i = 0; i & this-&PREFETCH_COUNT; ++i)
this-&prefetch_[i].data_.Reshape(batchsize, this-&datum_channels_, outsize, outsize);
this-&datum_size_ = this-&datum_channels_ * outsize *
// init label
int label_num_
if (!sample_per_cluster_)// 如果不按照类别进行均匀采样
label_num_channels = img_label_list_[0].second.first.size();// 获取关节坐标的数字的个数（注意是数字的个数，并不是坐标的个数，要除以2才能是坐标的个数哈）
else// 如果按照类别均匀采样
label_num_channels = img_list_[0][0].second.first.size();// 第0类的第0个图像的关节数字的个数
label_num_channels /= 2;// 获得关节个数
// 将输出设置为对应的大小
// top[0]是batchsize个图像数据
// top[1]是batchsize个heatmap（一个heatmap有关节个数个channel）
// label的batchsize，关节个数作为channel，关节的heatmap的高、关节heatmap的宽度
top[1]-&Reshape(label_batchsize, label_num_channels, label_height, label_width);
for (int i = 0; i & this-&PREFETCH_COUNT; ++i)
this-&prefetch_[i].label_.Reshape(label_batchsize, label_num_channels, label_height, label_width);
LOG(INFO) && &output data size: & && top[0]-&num() && &,& && top[0]-&channels() && &,& && top[0]-&height() && &,& && top[0]-&width();
LOG(INFO) && &output label size: & && top[1]-&num() && &,& && top[1]-&channels() && &,& && top[1]-&height() && &,& && top[1]-&width();
LOG(INFO) && &number of label channels: & && label_num_
LOG(INFO) && &datum channels: & && this-&datum_channels_;
// 根据初始化之后的信息读取实际的文件数据，以及关节的位置，并将关节位置转换为类标
void DataHeatmapLayer::load_batch(Batch* batch) {
CPUTimer batch_
batch_timer.Start();
CHECK(batch-&data_.count());
HeatmapDataParameter heatmap_data_param = this-&layer_param_.heatmap_data_param();
// Pointers to blobs' float data
// 指向数据和类标的指针
Dtype* top_data = batch-&data_.mutable_cpu_data();
Dtype* top_label = batch-&label_.mutable_cpu_data();
cv::Mat img, img_res, img_annotation_vis, img_mean_vis, img_vis, img_res_vis, mean_img_this, seg, segT
// Shortcuts to params
// 是否显示读取的图像啥的，用户调试
const bool visualise = this-&layer_param_.visualise();
// 是否对图像进行缩放
const Dtype scale = heatmap_data_param.scale();
// 每次读多少个图像
const int batchsize = heatmap_data_param.batchsize();
// heatmap的高度
const int label_height = heatmap_data_param.label_height();
// heatmap的宽度
const int label_width = heatmap_data_param.label_width();
// 需要旋转多少度
const float angle_max = heatmap_data_param.angle_max();
// 是否不要翻转第一个图
const bool dont_flip_first = heatmap_data_param.dont_flip_first();
// 是否翻转关节的坐标
const bool flip_joint_labels = heatmap_data_param.flip_joint_labels();
// 关节的坐标数值需要乘以这个multfact
const int multfact = heatmap_data_param.multfact();
// 图像是否需要分割
const bool segmentation = heatmap_data_param.segmentation();
// 切割的图像的块的带下
const int size = heatmap_data_param.cropsize();
// 切割之后的图像块需要缩放到outsize大小
const int outsize = heatmap_data_param.outsize();
const int num_aug = 1;
// 缩放因子
const float resizeFact = (float)outsize / (float)
// 是不是需要随机切图像块
const bool random_crop = heatmap_data_param.random_crop();
// Shortcuts to global vars
const bool sub_mean = this-&sub_mean_;
const int channels = this-&datum_channels_;
// What coordinates should we flip when mirroring images?
// For pose estimation with joints assumes i=0,1 are for head, and i=2,3 left wrist, i=4,5 right wrist etc
in which case dont_flip_first should be set to true.
int flip_start_
if (dont_flip_first) flip_start_ind = 2;
else flip_start_ind = 0;
if (visualise)
cv::namedWindow(&original image&, cv::WINDOW_AUTOSIZE);
cv::namedWindow(&cropped image&, cv::WINDOW_AUTOSIZE);
cv::namedWindow(&interim resize image&, cv::WINDOW_AUTOSIZE);
cv::namedWindow(&resulting image&, cv::WINDOW_AUTOSIZE);
// collect &batchsize& images
std::vector cur_label, cur_
std::string img_
// loop over non-augmented images
// 获取batchsize个图像，然后进行预处理
for (int idx_img = 0; idx_img & idx_img++)
// get image name and class
// 获取文件名、label、cropinfo、类标
this-&GetCurImg(img_name, cur_label, cur_cropinfo, cur_class);
// get number of channels for image label
// 获取关节的数值的个数（并不是关节个数哈，关节个数乘以2就是该数）
int label_num_channels = cur_label.size();
// 将根路径和文件名称拼接并读取数据到img
std::string img_path = this-&root_img_dir_ + img_
DLOG(INFO) && &img: & && img_
img = cv::imread(img_path, CV_LOAD_IMAGE_COLOR);
// show image
// 显示读取的图像
if (visualise)
img_annotation_vis = img.clone();
this-&VisualiseAnnotations(img_annotation_vis, label_num_channels, cur_label, multfact);
cv::imshow(&original image&, img_annotation_vis);
// use if seg exists
// 是否对图像分割
// 分割的模板存放在segs目录
// 读取分割模板到seg
if (segmentation)
std::string seg_path = this-&root_img_dir_ + &segs/& + img_
std::ifstream ifile(seg_path.c_str());
// Skip this file if segmentation doesn't exist
if (!ifile.good())
LOG(INFO) && &file & && seg_path && & does not exist!&;
idx_img--;
this-&AdvanceCurImg();
ifile.close();
seg = cv::imread(seg_path, CV_LOAD_IMAGE_GRAYSCALE);
int width = img.
int height = img.
// size是crop的大小
// 如果crop的大小太大x_border会变成负数，下面会进行pad
int x_border = width -
int y_border = height -
// 将读取的图像转换为RGB
// convert from BGR to RGB
cv::cvtColor(img, img, CV_BGR2RGB);
// to float
// 转换数据类型到float
img.convertTo(img, CV_32FC3);
if (segmentation)
segTmp = cv::Mat::zeros(img.rows, img.cols, CV_32FC3);
int threshold = 40;// 阈值
// 获取分割模板
seg = (seg & threshold);
// 对图像进行分割
segTmp.copyTo(img, seg);
if (visualise)
img_vis = img.clone();
// subtract per-video mean if used
// 减去每个视频的均值
int meanInd = 0;
if (sub_mean)
// 由此可以看到每个视频的命名规则，就是目录的名字嘛，而且还是数字
// 比如0,1,2,3,4
// 假设路径是images/1/xxx.jpg
// 那么获取的平均值索引就是1，然后再到mean_img_中得到对应的均值图像
std::string delimiter = &/&;
std::string img_name_subdirImg = img_name.substr(img_name.find(delimiter) + 1, img_name.length());
std::string meanIndStr = img_name_subdirImg.substr(0, img_name_subdirImg.find(delimiter));
meanInd = atoi(meanIndStr.c_str()) - 1;
// subtract the cropped mean
mean_img_this = this-&mean_img_[meanInd].clone();
DLOG(INFO) && &Image size: & && width && &x& &&
DLOG(INFO) && &Crop info: & && cur_cropinfo[0] && & & &&
cur_cropinfo[1] && & & && cur_cropinfo[2] && & & && cur_cropinfo[3] && & & && cur_cropinfo[4];
DLOG(INFO) && &Crop info after: & && cur_cropinfo[0] && & & &&
cur_cropinfo[1] && & & && cur_cropinfo[2] && & & && cur_cropinfo[3] && & & && cur_cropinfo[4];
DLOG(INFO) && &Mean image size: & && mean_img_this.cols && &x& && mean_img_this.
DLOG(INFO) && &Cropping: & && cur_cropinfo[0] - 1 && & & && cur_cropinfo[2] - 1 && & & && width && & & &&
// crop and resize mean image
// 对mean文件进行切割并且调整其大小为图像大小
// cur_cropinfo中的数据分别为x_left,x_right,y_left,y_right
// 而Rect则是x,y,w,h，所以需要转换
cv::Rect crop(cur_cropinfo[0] - 1, cur_cropinfo[2] - 1, cur_cropinfo[1] - cur_cropinfo[0], cur_cropinfo[3] - cur_cropinfo[2]);
mean_img_this = mean_img_this(crop);// 这样就crop了
cv::resize(mean_img_this, mean_img_this, img.size());
DLOG(INFO) && &Cropped mean image.&;
// 原图像减去crop之后并放大成与原图像一样大小的平均值图像
// 这是什么原理？？？？？
img -= mean_img_
DLOG(INFO) && &Subtracted mean image.&;
if (visualise)
img_vis -= mean_img_
img_mean_vis = mean_img_this.clone() / 255;
cv::cvtColor(img_mean_vis, img_mean_vis, CV_RGB2BGR);
cv::imshow(&mean image&, img_mean_vis);
// pad images that aren't wide enough
// 如果crop大小大于图像大小则padding，图像得右侧padding
if (x_border & 0)
DLOG(INFO) && &padding & && img_path && & -- not wide enough.&;
// 函数原型如下
// void copyMakeBorder( const Mat& src, Mat& dst,
// int top, int bottom, int left, int right,
// int borderType, const Scalar& value=Scalar() );
cv::copyMakeBorder(img, img, 0, 0, 0, -x_border, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));
width = img.
x_border = width -
// add border offset to joints
// 因为pad过图像的右侧了所以需要调整关节的x坐标
for (int i = 0; i & label_num_ i += 2)// 注意这里是i+=2哦！
cur_label[i] = cur_label[i] + x_
DLOG(INFO) && &new width: & && width && &
x_border: & && x_
if (visualise)// 显示经过padding的图像
img_vis = img.clone();
cv::copyMakeBorder(img_vis, img_vis, 0, 0, 0, -x_border, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));
DLOG(INFO) && &Entering jitter loop.&;
// loop over the jittered versions
// 将关节位置转换为heatmap
for (int idx_aug = 0; idx_aug & num_ idx_aug++)
// augmented image index in the resulting batch
const int idx_img_aug = idx_img * num_aug + idx_
// 关节坐标，首先将从文件读取的关节坐标赋值给它
// 接下来因为要对图像进行crop，crop之后的图像还要resize
// 所以对应的关节坐标也要进行crop和缩放，经过这个处理的
// 关节位置就存放在了 cur_label_aug
std::vector cur_label_aug = cur_
// 是否随机crop
if (random_crop)
// random sampling
DLOG(INFO) && &random crop sampling&;
// horizontal flip
// 随机旋转是否需要水平翻转
if (rand() % 2)
// flip，0表示水平
// 水平翻转
cv::flip(img, img, 1);
if (visualise)
cv::flip(img_vis, img_vis, 1);
// &flip& annotation coordinates
// 将图像的坐标也翻转了
for (int i = 0; i & label_num_ i += 2)
// width 是原始图像的宽度，原始图像的宽度除以multfact就是关节的图像宽度，关节图像的宽度减去关节的x坐标就是翻转过来的x坐标
cur_label_aug[i] = (float)width / (float)multfact - cur_label_aug[i];
// &flip& annotation joint numbers
// assumes i=0,1 are for head, and i=2,3 left wrist, i=4,5 right wrist etc
// where coordinates are (x,y)
// 将索引位置也翻转了。。。
if (flip_joint_labels)
float tmp_x, tmp_y;
for (int i = flip_start_ i & label_num_ i += 4)
CHECK_LT(i + 3, label_num_channels);
tmp_x = cur_label_aug[i];
tmp_y = cur_label_aug[i + 1];
cur_label_aug[i] = cur_label_aug[i + 2];
cur_label_aug[i + 1] = cur_label_aug[i + 3];
cur_label_aug[i + 2] = tmp_x;
cur_label_aug[i + 3] = tmp_y;
// left-top coordinates of the crop [0;x_border] x [0;y_border]
// 生成左上的坐标，用于切割图像
int x0 = 0, y0 = 0;
x0 = rand() % (x_border + 1);
y0 = rand() % (y_border + 1);
// do crop
cv::Rect crop(x0, y0, size, size);
// NOTE: no full copy performed, so the original image buffer is affected by the transformations below
// img_crop与img公用一个内存，所以在img_crop中所作的更改对img也会有
cv::Mat img_crop(img, crop);
// &crop& annotations
// 万一关节的位置在crop的大小之外怎么办？？？疑问
for (int i = 0; i & label_num_ i += 2)
cur_label_aug[i] -= (float)x0 / (float)
cur_label_aug[i + 1] -= (float)y0 / (float)
// show image
if (visualise)
DLOG(INFO) && &cropped image&;
cv::Mat img_vis_crop(img_vis, crop);
cv::Mat img_res_vis = img_vis_crop / 255;
cv::cvtColor(img_res_vis, img_res_vis, CV_RGB2BGR);
this-&VisualiseAnnotations(img_res_vis, label_num_channels, cur_label_aug, multfact);
cv::imshow(&cropped image&, img_res_vis);
// rotations
// 旋转图像到一个均匀分布的角度
float angle = Uniform(-angle_max, angle_max);
cv::Mat M = this-&RotateImage(img_crop, angle);
// also flip & rotate labels
// 遍历所有关节坐标
for (int i = 0; i & label_num_ i += 2)
// convert to image space
// 将关节坐标转换到图像中的坐标
float x = cur_label_aug[i] * (float)
float y = cur_label_aug[i + 1] * (float)
cur_label_aug[i] = M.at(0, 0) * x + M.at(0, 1) * y + M.at(0, 2);
cur_label_aug[i + 1] = M.at(1, 0) * x + M.at(1, 1) * y + M.at(1, 2);
// convert back to joint space
// 转换回关节空间
cur_label_aug[i] /= (float)
cur_label_aug[i + 1] /= (float)
img_res = img_
} else {// 中心crop(就是图像的中心crop啊)
// determinsitic sampling
DLOG(INFO) && &deterministic crop sampling (centre)&;
// centre crop
const int y0 = y_border / 2;
const int x0 = x_border / 2;
DLOG(INFO) && &cropping image from & && x0 && &x& && y0;
// do crop
cv::Rect crop(x0, y0, size, size);
cv::Mat img_crop(img, crop);
DLOG(INFO) && &cropping annotations.&;
// &crop& annotations
// 长见识了，关节的annotation也是需要crop的
for (int i = 0; i & label_num_ i += 2)
// 除以multfact转换到关节坐标，然后再减去
// 不过我有疑问，万一crop之后的图像没有关节咋办
// 这样真的好吗
cur_label_aug[i] -= (float)x0 / (float)
cur_label_aug[i + 1] -= (float)y0 / (float)
if (visualise)
cv::Mat img_vis_crop(img_vis, crop);
cv::Mat img_res_vis = img_vis_crop.clone() / 255;
cv::cvtColor(img_res_vis, img_res_vis, CV_RGB2BGR);
this-&VisualiseAnnotations(img_res_vis, label_num_channels, cur_label_aug, multfact);
cv::imshow(&cropped image&, img_res_vis);
img_res = img_
}// end of else
// show image
if (visualise)
cv::Mat img_res_vis = img_res / 255;
cv::cvtColor(img_res_vis, img_res_vis, CV_RGB2BGR);
this-&VisualiseAnnotations(img_res_vis, label_num_channels, cur_label_aug, multfact);
cv::imshow(&interim resize image&, img_res_vis);
DLOG(INFO) && &Resizing output image.&;
// resize to output image size
// 将crop之后的图像弄到给定的大小
cv::Size s(outsize, outsize);
cv::resize(img_res, img_res, s);
// &resize& annotations
// resize 标注的关节
// 将图像进行缩放了，那么关节的坐标也要缩放
for (int i = 0; i & label_num_ i++)
cur_label_aug[i] *= resizeF
// show image
if (visualise)
cv::Mat img_res_vis = img_res / 255;
cv::cvtColor(img_res_vis, img_res_vis, CV_RGB2BGR);
this-&VisualiseAnnotations(img_res_vis, label_num_channels, cur_label_aug, multfact);
cv::imshow(&resulting image&, img_res_vis);
// show image
if (visualise && sub_mean)
cv::Mat img_res_meansub_vis = img_res / 255;
cv::cvtColor(img_res_meansub_vis, img_res_meansub_vis, CV_RGB2BGR);
cv::imshow(&mean-removed image&, img_res_meansub_vis);
// multiply by scale
// 去均值、crop、缩放之后的像素值乘以该scale得到最终的图像的
if (scale != 1.0)
img_res *=
// resulting image dims
const int channel_size = outsize *
const int img_size = channel_size *
// store image data
// 将处理好的图像存放到top_data
DLOG(INFO) && &storing image&;
for (int c = 0; c & c++)
for (int i = 0; i & i++)
for (int j = 0; j & j++)
top_data[idx_img_aug * img_size + c * channel_size + i * outsize + j] = img_res.at(i, j)[c];
// store label as gaussian
// 将关节转换为高斯图像
DLOG(INFO) && &storing labels&;
const int label_channel_size = label_height * label_
const int label_img_size = label_channel_size * label_num_channels / 2;
cv::Mat dataMatrix = cv::Mat::zeros(label_height, label_width, CV_32FC1);
float label_resize_fact = (float) label_height / (float)
float sigma = 1.5;
for (int idx_ch = 0; idx_ch & label_num_channels / 2; idx_ch++)
// 将经过缩放的关节转换到图像空间的坐标(也就是乘以multfact)，再将缩小之后的图像空间坐标转换到缩小之前的图像空间坐标(也就是乘以label_resize_fact)
float x = label_resize_fact * cur_label_aug[2 * idx_ch] *
float y = label_resize_fact * cur_label_aug[2 * idx_ch + 1] *
for (int i = 0; i & label_ i++)
for (int j = 0; j & label_ j++)
// 计算索引
int label_idx = idx_img_aug * label_img_size + idx_ch * label_channel_size + i * label_height +
float gaussian = ( 1 / ( sigma * sqrt(2 * M_PI) ) ) * exp( -0.5 * ( pow(i - y, 2.0) + pow(j - x, 2.0) ) * pow(1 / sigma, 2.0) );
gaussian = 4 *
// 存入到top_label
top_label[label_idx] =
if (idx_ch == 0)
dataMatrix.at((int)j, (int)i) =
} // jittered versions loop
DLOG(INFO) && &next image&;
// move to the next image
// Advance是进行
// Cur是表示当前
// 那么就是移动到下一个图像
this-&AdvanceCurImg();
if (visualise)
cv::waitKey(0);
} // original image loop
batch_timer.Stop();
DLOG(INFO) && &Prefetch batch: & && batch_timer.MilliSeconds() && & ms.&;
// 获取当前图像的路径、类标、crop信息、类别
void DataHeatmapLayer::GetCurImg(string& img_name, std::vector& img_label, std::vector& crop_info, int& img_class)
if (!sample_per_cluster_)
img_name = img_label_list_[cur_img_].
img_label = img_label_list_[cur_img_].second.
crop_info = img_label_list_[cur_img_].second.second.
img_class = img_label_list_[cur_img_].second.second.
img_class = cur_class_;
// 看见没，这里用到了cur_class_img_，这个在SetUp中生成的随机数作为该类别的图像索引，该随机数的范围在[0,该类别图像的个数-1]之间。
img_name = img_list_[img_class][cur_class_img_[img_class]].
img_label = img_list_[img_class][cur_class_img_[img_class]].second.
crop_info = img_list_[img_class][cur_class_img_[img_class]].second.second.
// 实际上就是移动索引
void DataHeatmapLayer::AdvanceCurImg()
if (!sample_per_cluster_)
if (cur_img_ & img_label_list_.size() - 1)
cur_img_++;
cur_img_ = 0;
const int num_classes = img_list_.size();
if (cur_class_img_[cur_class_] & img_list_[cur_class_].size() - 1)
cur_class_img_[cur_class_]++;
cur_class_img_[cur_class_] = 0;
// move to the next class
if (cur_class_ & num_classes - 1)
cur_class_++;
cur_class_ = 0;
// 可视化关节点
void DataHeatmapLayer::VisualiseAnnotations(cv::Mat img_annotation_vis, int label_num_channels, std::vector& img_class, int multfact)
const static cv::Scalar colors[] = {
CV_RGB(0, 0, 255),
CV_RGB(0, 128, 255),
CV_RGB(0, 255, 255),
CV_RGB(0, 255, 0),
CV_RGB(255, 128, 0),
CV_RGB(255, 255, 0),
CV_RGB(255, 0, 0),
CV_RGB(255, 0, 255)
int numCoordinates = int(label_num_channels / 2);
// 将关节点放到centers数组中
cv::Point centers[numCoordinates];
for (int i = 0; i & label_num_ i += 2)
int coordInd = int(i / 2);
centers[coordInd] = cv::Point(img_class[i] * multfact, img_class[i + 1] * multfact);
// 给关节画圈圈
cv::circle(img_annotation_vis, centers[coordInd], 1, colors[coordInd], 3);
// connecting lines
// 1,3,5是一条膀子
// 2,4,6是一条膀子
cv::line(img_annotation_vis, centers[1], centers[3], CV_RGB(0, 255, 0), 1, CV_AA);
cv::line(img_annotation_vis, centers[2], centers[4], CV_RGB(255, 255, 0), 1, CV_AA);
cv::line(img_annotation_vis, centers[3], centers[5], CV_RGB(0, 0, 255), 1, CV_AA);
cv::line(img_annotation_vis, centers[4], centers[6], CV_RGB(0, 255, 255), 1, CV_AA);
// [min,max]的均匀分布
float DataHeatmapLayer::Uniform(const float min, const float max) {
float random = ((float) rand()) / (float) RAND_MAX;
float diff = max -
float r = random *
return min +
// 旋转图像
cv::Mat DataHeatmapLayer::RotateImage(cv::Mat src, float rotation_angle)
cv::Mat rot_mat(2, 3, CV_32FC1);
cv::Point center = cv::Point(src.cols / 2, src.rows / 2);
double scale = 1;
// Get the rotation matrix with the specifications above
rot_mat = cv::getRotationMatrix2D(center, rotation_angle, scale);
// Rotate the warped image
cv::warpAffine(src, src, rot_mat, src.size());
return rot_
INSTANTIATE_CLASS(DataHeatmapLayer);
REGISTER_LAYER_CLASS(DataHeatmap);
} // namespace caffe</blob</blob
<blob<blob3、最后看看在配置文件中怎么使用该层？
name: &data&
type: &DataHeatmap& // 层的类型是DataHeatmap
top: &data&
top: &label&
visualise: false
// 是否可视化
include: { phase: TRAIN }
heatmap_data_param {
source: &/data/tp/flic/train_shuffle.txt&
root_img_dir: &/mnt/ramdisk/tp/flic/&
batchsize: 14
cropsize: 248
outsize: 256
sample_per_cluster: false
random_crop: true
label_width: 64
label_height: 64
segmentation: false
flip_joint_labels: true
dont_flip_first: true
angle_max: 40
multfact: 1
# set to 282 if using preprocessed data from website
先浏览一下作者原始的配置文件的代码：
name: &HeatmapFusionNet&
name: &data&
type: &DataHeatmap&
top: &data&
top: &label&
visualise: false
include: { phase: TRAIN }
heatmap_data_param {
source: &/data/tp/flic/train_shuffle.txt&
root_img_dir: &/mnt/ramdisk/tp/flic/&
batchsize: 14
cropsize: 248
outsize: 256
sample_per_cluster: false
random_crop: true
label_width: 64
label_height: 64
segmentation: false
flip_joint_labels: true
dont_flip_first: true
angle_max: 40
multfact: 1 # set to 282 if using preprocessed data from website
name: &data&
type: &DataHeatmap&
top: &data&
top: &label&
visualise: false
include: { phase: TEST }
heatmap_data_param {
source: &/data/tp/flic/test_shuffle.txt&
root_img_dir: &/mnt/ramdisk/tp/flic/&
batchsize: 1
cropsize: 248
outsize: 256
sample_per_cluster: false
random_crop: false
label_width: 64
label_height: 64
segmentation: false
dont_flip_first: true
angle_max: 0
multfact: 1 # set to 282 if using preprocessed data from website
#########################################################
name: &conv1&
type: &Convolution&
bottom: &data&
top: &conv1&
lr_mult: 1
decay_mult: 1
lr_mult: 2
decay_mult: 0
convolution_param {
num_output: 128
kernel_size: 5
weight_filler {
type: &gaussian&
bias_filler {
type: &constant&
name: &relu1&
type: &ReLU&
bottom: &conv1&
top: &conv1&
name: &pool1&
type: &Pooling&
bottom: &conv1&
top: &pool1&
pooling_param {
kernel_size: 2
#########################################################
name: &conv2&
type: &Convolution&
bottom: &pool1&
top: &conv2&
lr_mult: 1
decay_mult: 1
lr_mult: 2
decay_mult: 0
convolution_param {
num_output: 128
kernel_size: 5
weight_filler {
type: &gaussian&
bias_filler {
type: &constant&
name: &relu2&
type: &ReLU&
bottom: &conv2&
top: &conv2&
name: &pool2&
type: &Pooling&
bottom: &conv2&
top: &pool2&
pooling_param {
kernel_size: 2
#########################################################
name: &conv3&
type: &Convolution&
bottom: &pool2&
top: &conv3&
lr_mult: 1
decay_mult: 1
lr_mult: 2
decay_mult: 0
convolution_param {
num_output: 128
kernel_size: 5
weight_filler {
type: &gaussian&
bias_filler {
type: &constant&
name: &relu3&
type: &ReLU&
bottom: &conv3&
top: &conv3&
#########################################################
name: &conv4&
type: &Convolution&
bottom: &conv3&
top: &conv4&
lr_mult: 1
decay_mult: 1
lr_mult: 2
decay_mult: 0
convolution_param {
num_output: 256
kernel_size: 9
weight_filler {
type: &gaussian&
bias_filler {
type: &constant&
name: &relu4&
type: &ReLU&
bottom: &conv4&
top: &conv4&
#########################################################
name: &conv5&
type: &Convolution&
bottom: &conv4&
top: &conv5&
lr_mult: 1
decay_mult: 1
lr_mult: 2
decay_mult: 0
convolution_param {
num_output: 512
kernel_size: 9
weight_filler {
type: &gaussian&
bias_filler {
type: &constant&
name: &relu5&
type: &ReLU&
bottom: &conv5&
top: &conv5&
#########################################################
name: &conv6&
type: &Convolution&
bottom: &conv5&
top: &conv6&
lr_mult: 1
decay_mult: 1
lr_mult: 2
decay_mult: 0
convolution_param {
num_output: 256
kernel_size: 1
weight_filler {
type: &gaussian&
bias_filler {
type: &constant&
name: &relu6&
type: &ReLU&
bottom: &conv6&
top: &conv6&
#########################################################
name: &conv7&
type: &Convolution&
bottom: &conv6&
top: &conv7&
lr_mult: 1
decay_mult: 1
lr_mult: 2
decay_mult: 0
convolution_param {
num_output: 256
kernel_size: 1
weight_filler {
type: &gaussian&
bias_filler {
type: &constant&
name: &relu7&
type: &ReLU&
bottom: &conv7&
top: &conv7&
#########################################################
name: &conv8&
type: &Convolution&
bottom: &conv7&
top: &conv8&
lr_mult: 1
decay_mult: 1
lr_mult: 2
decay_mult: 0
convolution_param {
num_output: 7
kernel_size: 1
weight_filler {
type: &gaussian&
bias_filler {
type: &constant&
name: &relu8&
type: &ReLU&
bottom: &conv8&
top: &conv8&
#########################################################
name: &loss_heatmap&
type: &EuclideanLossHeatmap&
bottom: &conv8&
bottom: &label&
bottom: &data&
top: &loss_heatmap&
visualise: false
loss_weight: 1
#########################################################
name: &concat_fusion&
type: &Concat&
bottom: &conv3&
bottom: &conv7&
top: &concat_fusion&
concat_param {
concat_dim: 1
#########################################################
name: &conv1_fusion&
type: &Convolution&
bottom: &concat_fusion&
top: &conv1_fusion&
lr_mult: 1
decay_mult: 1
lr_mult: 2
decay_mult: 0
convolution_param {
num_output: 64
kernel_size: 7
weight_filler {
type: &gaussian&
bias_filler {
type: &constant&
name: &relu1_fusion&
type: &ReLU&
bottom: &conv1_fusion&
top: &conv1_fusion&
#########################################################
name: &conv2_fusion&
type: &Convolution&
bottom: &conv1_fusion&
top: &conv2_fusion&
lr_mult: 1
decay_mult: 1
lr_mult: 2
decay_mult: 0
convolution_param {
num_output: 64
kernel_size: 13
weight_filler {
type: &gaussian&
bias_filler {
type: &constant&
name: &relu2_fusion&
type: &ReLU&
bottom: &conv2_fusion&
top: &conv2_fusion&
#########################################################
name: &conv3_fusion&
type: &Convolution&
bottom: &conv2_fusion&
top: &conv3_fusion&
lr_mult: 1
decay_mult: 1
lr_mult: 2
decay_mult: 0
convolution_param {
num_output: 128
kernel_size: 13
weight_filler {
type: &gaussian&
bias_filler {
type: &constant&
name: &relu3_fusion&
type: &ReLU&
bottom: &conv3_fusion&
top: &conv3_fusion&
#########################################################
name: &conv4_fusion&
type: &Convolution&
bottom: &conv3_fusion&
top: &conv4_fusion&
lr_mult: 1
decay_mult: 1
lr_mult: 2
decay_mult: 0
convolution_param {
num_output: 256
kernel_size: 1
weight_filler {
type: &gaussian&
bias_filler {
type: &constant&
name: &relu4_fusion&
type: &ReLU&
bottom: &conv4_fusion&
top: &conv4_fusion&
#########################################################
name: &conv5_fusion&
type: &Convolution&
bottom: &conv4_fusion&
top: &conv5_fusion&
lr_mult: 1
decay_mult: 1
lr_mult: 2
decay_mult: 0
convolution_param {
num_output: 7
kernel_size: 1
weight_filler {
type: &gaussian&
bias_filler {
type: &constant&
#########################################################
name: &loss_fusion&
type: &EuclideanLossHeatmap&
bottom: &conv5_fusion&
bottom: &label&
bottom: &data&
top: &loss_fusion&
visualise: false
loss_weight: 3
下一步按照作者需要的层数逐步添加完成。
论文中还有其他的东西，我这一层就没法调用，只是编译完成了，最后说一下实现完之后要注册该层，还有caffe版本是最新的，如果中途出现一个关于opencv错误，修改makefile文件，解决如下图：
</blob</blob</vector</vector</std::pair</blob</blob
(window.slotbydup=window.slotbydup || []).push({
id: '2467140',
container: s,
size: '1000,90',
display: 'inlay-fix'
(window.slotbydup=window.slotbydup || []).push({
id: '2467141',
container: s,
size: '1000,90',
display: 'inlay-fix'
(window.slotbydup=window.slotbydup || []).push({
id: '2467142',
container: s,
size: '1000,90',
display: 'inlay-fix'
(window.slotbydup=window.slotbydup || []).push({
id: '2467143',
container: s,
size: '1000,90',
display: 'inlay-fix'
(window.slotbydup=window.slotbydup || []).push({
id: '2467148',
container: s,
size: '1000,90',
display: 'inlay-fix'}

久游无息网

如何在caffe scale layer中添加新的Layer

我要回帖

更多关于 caffe eltwise layer 的文章

更多推荐