深度学习之hierarchicalsoftmax

一、什么是hierarchicalsoftmax

hierarchicalsoftmax是一种用于优化神经网络中softmax函数计算速度的方法。在传统的softmax函数中，需要对每个候选类别计算概率，这导致计算量呈指数级增长。Huffman树是一种二叉树结构，旨在通过分配更短的编码来最小化字符编码的平均长度。基于Huffman树，hierarchicalsoftmax可以将softmax计算复杂度减少为O(log(n))，其中n是类别总数。

在hierarchicalsoftmax中，所有可能的输出类别都被视为二叉树的节点。每个节点都有一段唯一的编码。在推断时，softmax操作沿着树从根节点开始移动，直到找到输出节点并计算对应的概率。

通俗地理解，hierarchicalsoftmax可以看作是将原本softmax中的每个类别映射为一个节点，然后用二叉树的形式展示。每个节点都可以得到一个唯一的binary code。在实际中，用hierarchicalsoftmax代替传统softmax可以大幅度地减少参数大小和模型复杂度，从而加速模型训练和推理。

二、hierarchicalsoftmax的优点

1.减少模型参数：hierarchicalsoftmax通过二叉树结构来组织类别标签，有效降低了softmax的计算复杂度。相应的，也能减少模型的参数数量和计算时间。

2.更快的训练和推理速度：传统softmax方法需要计算每个输出类别的概率值，而hierarchicalsoftmax只需要向下遍历Huffman树即可。因此，hierarchicalsoftmax可以显著减少计算量，提高训练和推理效率。

3.适合处理大规模分类问题：由于传统的softmax方法需要计算所有可能的类别的概率值，因此对于大规模分类问题计算量过大，而hierarchicalsoftmax可以在常规硬件设备上处理上百万个类别的分类问题。

三、如何使用hierarchicalsoftmax

在tensorflow中，可以通过设置softmax_weights和softmax_biases的参数实现hierarchicalsoftmax。先用一个batch对模型进行一次forward，通过实例化HuffmanTree类，将训练数据传入。创建完成Huffman树后，即可计算对应节点的编码和概率值。


import tensorflow as tf
from tensorflow.contrib.framework import nest
from tensorflow.contrib.rnn import LSTMStateTuple
from tensorflow.python.ops.rnn import dynamic_rnn

logit = tf.contrib.layers.fully_connected(
    inputs=last_outputs,
    num_outputs=output_dimension,
    activation_fn=None,
    weights_initializer=tf.truncated_normal_initializer(stddev=1e-4),
    biases_initializer=tf.zeros_initializer(),
    scope='hierarchical_softmax_logit'
)

# create a softmax weight matrix for each branch
hierarchical_softmax_weights = [tf.Variable(
    tf.truncated_normal([branch_size, output_dimension], stddev=1e-4),
    name="hierarchical_softmax_weights_%d" % i)
for i, branch_size in enumerate(huffman_tree.branch_sizes)]

# split the variables into a list for each branch
hierarchical_softmax_weights_branches = nest.pack_sequence_as(
    structure=huffman_tree.branch_sizes,
    flat_sequence=hierarchical_softmax_weights)

# compute the logits for each branch
logits = nest.map_structure(
    lambda w: tf.matmul(last_outputs, w, transpose_b=True),
    hierarchical_softmax_weights_branches)

# induce a softmax on them
softmaxes = nest.map_structure(
    lambda l: tf.nn.softmax(l, dim=1),
    logits)

# assign unique paths from the root node to all of the leafs
hierarchical_paths = huffman_tree.paths()

# get the full word embeddings for each unique word in the tree
full_embeddings = tf.gather(
    params=full_embeddings,
    indices=huffman_tree.word_ids())

weights_t = tf.transpose(hierarchical_softmax_weights_branches, [1, 0, 2])
weights_flat = tf.reshape(weights_t, [-1, output_dimension])

biases_flat = tf.Variable(
    tf.zeros([tf.reduce_sum(huffman_tree.branch_sizes)]),
    name="hierarchical_softmax_biases")

hierarchical_softmax_biases_branches = tf.split(
    biases_flat, huffman_tree.branch_sizes)

biases = nest.pack_sequence_as(
    structure=hierarchical_softmax_weights_branches,
    flat_sequence=hierarchical_softmax_biases_branches)

l_prods = nest.map_structure(
    lambda s, l: tf.matmul(l, s, transpose_b=True), hierarchical_paths, softmaxes)

prods = tf.reduce_prod(l_prods, axis=0)

dot = tf.matmul(full_embeddings, weights_flat, transpose_b=True)

z = tf.add(dot, biases_flat)

pred = tf.multiply(z, prods)

prediction = tf.nn.softmax(pred, 1)

loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=pred)

四、hierarchicalsoftmax的局限性和应用

1.局限性：由于hierarchicalsoftmax是依靠Huffman树构建的，因此其对类别分布的偏置和采样方式较为敏感。在类别分布不均衡的情况下，Huffman树的构建往往会是非常非常慢，甚至不可用。

2.应用：hierarchicalsoftmax在大规模分类问题中表现出了优异的性能。例如，可以通过构建超大型的分类词典以实现高级的文本语言建模。hierarchicalsoftmax也可以用于其他类型的分类问题，例如多标签分类。

五、小结

hierarchicalsoftmax是一种用于提高softmax计算速度的算法。相比传统softmax，改进方案通过构建Huffman树，将分类问题以一种更加简洁的方式来展示。在大规模分类问题中，hierarchicalsoftmax是一种值得尝试的算法。

原创文章，作者：小蓝，如若转载，请注明出处：https://www.506064.com/n/237035.html