Murmur哈希

维基百科,自由的百科全书
跳转至: 导航搜索

MurmurHash 是一种非加密哈希函数,适用于一般的哈希检索操作。[1][2][3] 由Austin Appleby在2008年发明,[4][5] 并出现了多个变种,[6] 都已经发布到了公有领域(public domain)。与其它流行的哈希函数相比,对于规律性较强的key,MurmurHash的随机分布特征表现更良好。[7]

变种[编辑]

当前的版本是MurmurHash3,[8][9] 能够产生出32-bit或128-bit哈希值。

较早的MurmurHash2[10]能产生32-bit或64-bit哈希值。对于大端存储和强制对齐的硬件环境有一个较慢的MurmurHash2可以用。MurmurHash2A 变种增加了Merkle–Damgård 构造,所以能够以增量方式调用。 有两个变种产生64-bit哈希值:MurmurHash64A,为64位处理器做了优化;MurmurHash64B,为32位处理器做了优化。MurmurHash2-160用于产生160-bit 哈希值,而MurmurHash1已经不再使用。

实现[编辑]

最初的实现是C++的,但是被移植到了其他的流行语言上,包括 Python,[11] C,[12] C#,[9][13] Perl,[14] Ruby,[15] PHP,[16] Haskell,[17]Scala[18]Java[19][20]JavaScript[21][22]等。

这个算法已经被若干开源计划所采纳,最重要的有libstdc++ (4.6版)、Perl[23]nginx (不早于1.0.1版)[24]Rubinius[25]、 libmemcached (MemcachedC语言客户端驱动)[26]、maatkit[27]Hadoop[1]、Kyoto Cabinet[28]以及RaptorDB[29]

算法[编辑]

Murmur3_32(key, len, seed)
    c1 \gets 0xcc9e2d51
    c2 \gets 0x1b873593
    r1 \gets 15
    r2 \gets 13
    m \gets 5
    n \gets 0xe6546b64
 
    hash \gets seed

    for each fourByteChunk of key
        k \gets fourByteChunk

        k \gets k * c1
        k \gets (k << r1) OR (k >> (32-r1))
        k \gets k * c2

        hash \gets hash XOR k
        hash \gets (hash << r2) OR (hash >> (32-r2))
        hash \gets hash * m + n

    with any remainingBytesInKey
        remainingBytes \gets SwapEndianOrderOf(remainingBytesInKey)
        remainingBytes \gets remainingBytes * c1
        remainingBytes \gets (remainingBytes << r1) OR (remainingBytes >> (32 - r1))
        remainingBytes \gets remainingBytes * c2

        hash \gets hash XOR remainingBytes
 
    hash \gets hash XOR len

    hash \gets hash XOR (hash >> 16)
    hash \gets hash * 0x85ebca6b
    hash \gets hash XOR (hash >> 13)
    hash \gets hash * 0xc2b2ae35
    hash \gets hash XOR (hash >> 16)

参考[编辑]

  1. ^ 1.0 1.1 Hadoop in Java. Hbase.apache.org. 24 July 2011 [13 January 2012]. 
  2. ^ Chouza et al.
  3. ^ Couceiro et al. (PDF). [13 January 2012] <span style="font-family: sans-serif; cursor: default; color:#555; font-size: 0.8em; bottom: 0.1em; font-weight: bold;" title="连接到(葡萄牙文)网页">((葡萄牙文). 
  4. ^ MurmurHash on GooglePages. Murmurhash.googlepages.com. [13 January 2012]. 
  5. ^ Tanjent (tanjent) wrote,3 March 2008 13:31:00. MurmurHash first announcement. Tanjent.livejournal.com. [13 January 2012]. 
  6. ^ MurmurHash2-160. Simonhf.wordpress.com. 25 September 2010 [13 January 2012]. 
  7. ^ Which hashing algorithm is best for uniqueness and speed. stackexchange.com. 
  8. ^ MurmurHash3 on smhasher. 
  9. ^ 9.0 9.1 Horvath, Adam. MurMurHash3, an ultra fast hash algorithm for C# / .NET. Aug 10, 2012. 
  10. ^ MurmurHash2 on smhasher. 
  11. ^ pyfasthash in Python. Google. [13 January 2012]. 
  12. ^ C implementation in qLibc by Seungyoung Kim. 
  13. ^ Landman, Davy. Davy Landman in C#. Landman-code.blogspot.com. [13 January 2012]. 
  14. ^ Toru Maesaka in Perl. Search.cpan.org. [13 January 2012]. 
  15. ^ Bruce Williams <http://codefluency.com>, for Ruby Central <http://rubycentral.org>. Ruby. Rubyforge.org. 3 May 2009 [13 January 2012]. 
  16. ^ Murmurhash3 PHP extension. Murmur.vaizard.org. [13 January 2012]. 
  17. ^ Haskell. Hackage.haskell.org. [13 January 2012]. 
  18. ^ Scala standard library implementation. 14 December 2012. 
  19. ^ MurmurHash3 in Java, part of Guava
  20. ^ Derek Young in Java, public domain
  21. ^ raycmorgan (owner). Javascript implementation by Ray Morgan. Gist.github.com. [13 January 2012]. 
  22. ^ garycourt. MurmurHash.js by Gary Court. Github.com. [13 January 2012]. 
  23. ^ perl5176delta. [31 December 2012]. 
  24. ^ nginx. [13 January 2012]. 
  25. ^ Rubinius. [29 February 2012]. 
  26. ^ libmemcached
  27. ^ maatkit. Google. 24 March 2009 [13 January 2012]. 
  28. ^ Kyoto Cabinet specification. Fallabs.com. 4 March 2011 [13 January 2012]. 
  29. ^ Gholam, Mehdi. RaptorDB CodeProject page. Codeproject.com. 13 November 2011 [13 January 2012].