Proposed method
In this paper, we propose the ‘exclusive regularization’ to enlarge the distance between samples of different classes, to improve feature discriminability.
Suppose $W \in \mathbb{R}^{D\times C}$ is the weights of classification layer that maps $D$ dimensional features to $C$ dimensional class confidence scores. $W^i$ and $W^j$ are the $i$-th and $j$-th column of $W$. The exclusive loss can be denoted as: $$ \begin{equation} \mathcal{L}_{r}(W) = \frac{1}{C}\sum_i \max_{j\neq i} \frac{W_i \cdot W_j}{||W_i||_2^2 \cdot ||W_j||_2^2}. \label{eq:l-exc} \end{equation} $$
Interestingly, the same idea has been adopted in two other concurrent works in CVPR2019:
- UniformFace: Learning Deep Equidistributed Representation for Face Recognition
- Unequal-Training for Deep Face Recognition With Long-Tailed Noisy Data.
Illustration:
As depicted in above figure, our method "pushes" representations of different identities away from others, improving the "inter-class separability".A demonstrative implementation in PyTorch:
import torch
class ExclusiveLinear(nn.Module):
def __init__(self, feat_dim=512, num_class=10572, norm_data=True, radius=20):
super(ExclusiveLinear, self).__init__()
self.num_class = num_class
self.feat_dim = feat_dim
self.norm_data = norm_data
self.radius = float(radius)
self.weight = nn.Parameter(torch.randn(self.num_class, self.feat_dim))
self.reset_parameters()
def reset_parameters(self):
stdv = 1. / math.sqrt(self.weight.size(1))
self.weight.data.uniform_(-stdv, stdv)
def forward(self, x):
weight_norm = torch.nn.functional.normalize(self.weight, p=2, dim=1)
cos = torch.mm(weight_norm, weight_norm.t())
cos.clamp(-1, 1)
cos1 = cos.detach()
cos1.scatter_(1, torch.arange(self.num_class).view(-1, 1).long().cuda(), -100)
_, indices = torch.max(cos1, dim=0)
mask = torch.zeros((self.num_class, self.num_class)).cuda()
mask.scatter_(1, indices.view(-1, 1).long(), 1)
exclusive_loss = torch.dot(cos.view(cos.numel()), mask.view(mask.numel())) / self.num_class
if self.norm_data:
x = torch.nn.functional.normalize(x, p=2, dim=1)
x = x * self.radius
return torch.nn.functional.linear(x, weight_norm), exclusive_loss
Merit of our method:
- Easily improve inter-class separability and feature discriminability without hyper-parameter tuning.
- Computationally lite (with small identities). On CASIA-WebFace, the extra overhead our method brings about is negligible.
- Performance improvements on Sphereface[2] and centerloss[1].
- Easy to implement and has straight-forward interpretability.
Weakness of our method:
- Inefficient and memory-consumptive on large datasets with large numbers of identities. The exclusive loss is calculated from a $C\times C$ cosine similarity matrix ("cos" in above code). For a dataset with large number of identities ($C$), the computation is memory memory-consumptive and inefficient.
- Brings insignificant improvement based on ArcFace[3]. ArcFace introduces additive margins that controls the between-class margins in a very fine-grain level. Well tuned cross-class decision margins lead to good between-class variance, especially when the number of classes (identities) is large enough (See the figure above).
- 为了计算公式$\ref{eq:l-exc}$中的 exclusive loss,我们要维护一个$C\times C$ 的余弦相似度矩阵(代码中的cos)。 其中 $cos_{i,j}$ 表示 $W_i$ 和 $W_j$ 的余弦相似度。 当数据集中的 identity 个数很多的时候,这个矩阵会很大,因此计算 exclusive loss 效率会比较低,而且消耗内存。
- 在 ArcFace[3] 上性能不理想。 一个可能的原因是:ArcFace 中加性的边界(margin)控制粒度更细。 当决策边界控制得比较好的时候,类别间的离散度也会随之变大,特别是当数据集的 identity 数目很多的时候。 与之相比,Sphereface 使用乘性的系数 $m$ 来决定类别间的决策边界的 margin,$m$ 只能是整数,因此对边界的调整粒度比较粗。
Citation:
If our method is helpful to your research, please kindly consider to cite:@InProceedings{zhao2019regularface,
author = {Zhao, Kai and Xu, Jingyi and Cheng, Ming-Ming},
title = {RegularFace: Deep Face Recognition via Exclusive Regularization},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}
Reference:
[1] Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. A discriminative feature learning approach for deep face recognition. In European Conference on Computer Vision., pages 499–515. Springer, 2016.
[2] Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. Sphereface: Deep hypersphere embedding for face recognition. In IEEE conf Comput Vis Pattern Recog., volume 1, 2017.
[3] Deng, Jiankang and Guo, Jia and Niannan, Xue and Zafeiriou, Stefanos. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In IEEE conf Comput Vis Pattern Recog., volume 1, 2019.