深度學習之hierarchicalsoftmax

一、什麼是hierarchicalsoftmax

hierarchicalsoftmax是一種用於優化神經網路中softmax函數計算速度的方法。在傳統的softmax函數中，需要對每個候選類別計算概率，這導致計算量呈指數級增長。Huffman樹是一種二叉樹結構，旨在通過分配更短的編碼來最小化字元編碼的平均長度。基於Huffman樹，hierarchicalsoftmax可以將softmax計算複雜度減少為O(log(n))，其中n是類別總數。

在hierarchicalsoftmax中，所有可能的輸出類別都被視為二叉樹的節點。每個節點都有一段唯一的編碼。在推斷時，softmax操作沿著樹從根節點開始移動，直到找到輸出節點並計算對應的概率。

通俗地理解，hierarchicalsoftmax可以看作是將原本softmax中的每個類別映射為一個節點，然後用二叉樹的形式展示。每個節點都可以得到一個唯一的binary code。在實際中，用hierarchicalsoftmax代替傳統softmax可以大幅度地減少參數大小和模型複雜度，從而加速模型訓練和推理。

二、hierarchicalsoftmax的優點

1.減少模型參數：hierarchicalsoftmax通過二叉樹結構來組織類別標籤，有效降低了softmax的計算複雜度。相應的，也能減少模型的參數數量和計算時間。

2.更快的訓練和推理速度：傳統softmax方法需要計算每個輸出類別的概率值，而hierarchicalsoftmax只需要向下遍歷Huffman樹即可。因此，hierarchicalsoftmax可以顯著減少計算量，提高訓練和推理效率。

3.適合處理大規模分類問題：由於傳統的softmax方法需要計算所有可能的類別的概率值，因此對於大規模分類問題計算量過大，而hierarchicalsoftmax可以在常規硬體設備上處理上百萬個類別的分類問題。

三、如何使用hierarchicalsoftmax

在tensorflow中，可以通過設置softmax_weights和softmax_biases的參數實現hierarchicalsoftmax。先用一個batch對模型進行一次forward，通過實例化HuffmanTree類，將訓練數據傳入。創建完成Huffman樹後，即可計算對應節點的編碼和概率值。


import tensorflow as tf
from tensorflow.contrib.framework import nest
from tensorflow.contrib.rnn import LSTMStateTuple
from tensorflow.python.ops.rnn import dynamic_rnn

logit = tf.contrib.layers.fully_connected(
    inputs=last_outputs,
    num_outputs=output_dimension,
    activation_fn=None,
    weights_initializer=tf.truncated_normal_initializer(stddev=1e-4),
    biases_initializer=tf.zeros_initializer(),
    scope='hierarchical_softmax_logit'
)

# create a softmax weight matrix for each branch
hierarchical_softmax_weights = [tf.Variable(
    tf.truncated_normal([branch_size, output_dimension], stddev=1e-4),
    name="hierarchical_softmax_weights_%d" % i)
for i, branch_size in enumerate(huffman_tree.branch_sizes)]

# split the variables into a list for each branch
hierarchical_softmax_weights_branches = nest.pack_sequence_as(
    structure=huffman_tree.branch_sizes,
    flat_sequence=hierarchical_softmax_weights)

# compute the logits for each branch
logits = nest.map_structure(
    lambda w: tf.matmul(last_outputs, w, transpose_b=True),
    hierarchical_softmax_weights_branches)

# induce a softmax on them
softmaxes = nest.map_structure(
    lambda l: tf.nn.softmax(l, dim=1),
    logits)

# assign unique paths from the root node to all of the leafs
hierarchical_paths = huffman_tree.paths()

# get the full word embeddings for each unique word in the tree
full_embeddings = tf.gather(
    params=full_embeddings,
    indices=huffman_tree.word_ids())

weights_t = tf.transpose(hierarchical_softmax_weights_branches, [1, 0, 2])
weights_flat = tf.reshape(weights_t, [-1, output_dimension])

biases_flat = tf.Variable(
    tf.zeros([tf.reduce_sum(huffman_tree.branch_sizes)]),
    name="hierarchical_softmax_biases")

hierarchical_softmax_biases_branches = tf.split(
    biases_flat, huffman_tree.branch_sizes)

biases = nest.pack_sequence_as(
    structure=hierarchical_softmax_weights_branches,
    flat_sequence=hierarchical_softmax_biases_branches)

l_prods = nest.map_structure(
    lambda s, l: tf.matmul(l, s, transpose_b=True), hierarchical_paths, softmaxes)

prods = tf.reduce_prod(l_prods, axis=0)

dot = tf.matmul(full_embeddings, weights_flat, transpose_b=True)

z = tf.add(dot, biases_flat)

pred = tf.multiply(z, prods)

prediction = tf.nn.softmax(pred, 1)

loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=pred)

四、hierarchicalsoftmax的局限性和應用

1.局限性：由於hierarchicalsoftmax是依靠Huffman樹構建的，因此其對類別分布的偏置和採樣方式較為敏感。在類別分布不均衡的情況下，Huffman樹的構建往往會是非常非常慢，甚至不可用。

2.應用：hierarchicalsoftmax在大規模分類問題中表現出了優異的性能。例如，可以通過構建超大型的分類詞典以實現高級的文本語言建模。hierarchicalsoftmax也可以用於其他類型的分類問題，例如多標籤分類。

五、小結

hierarchicalsoftmax是一種用於提高softmax計算速度的演算法。相比傳統softmax，改進方案通過構建Huffman樹，將分類問題以一種更加簡潔的方式來展示。在大規模分類問題中，hierarchicalsoftmax是一種值得嘗試的演算法。

原創文章，作者：小藍，如若轉載，請註明出處：https://www.506064.com/zh-tw/n/237035.html