Courses Ask a Question. Get your technical queries answered by top developers! Why does Java's hashCode in String use 31 as a multiplier? Why is 31 used as a multiplier?
Please log in or register to add a comment. Other "undefined-undefined" queries related to "Why does Java's hashCode in String use 31 as a multiplier? Image doesn't work in next js. Hidden input does not pass the value to the server Map two arrays and compare in Javascript React Taking disparate datasets and merging together in javascript problem Child z-index does not work while using perspective hover [duplicate] What does a reactjs app look like in production?
Change color for the last three digit in input value in javascript [duplicate]. Related Questions. In Java, how do I access the outer class when I'm not in the inner class? Log4j2 configuration not found when running standalone application built by shade plugin In Java, what are the advantages of streams over loops?
Using 73 or 37 instead of 31 might be better, because it leads to denser code : The two LEA instructions only take 6 bytes vs. One possible caveat is that the 3-argument LEA instructions used here became slower on Intel's Sandy bridge architecture, with an increased latency of 3 cycles. Moreover, 73 is Sheldon Cooper's favorite number.
Neil Coffey explains why 31 is used under Ironing out the bias. The table below summarizes the performance of the various hash functions described above, for three data sets:. The performance metric shown in the table is the "average chain size" over all elements in the hash table i. Looking at this table, it's clear that all of the functions except for the current Java function and the two broken versions of Weinberger's function offer excellent, nearly indistinguishable performance.
I strongly conjecture that this performance is essentially the "theoretical ideal", which is what you'd get if you used a true random number generator in place of a hash function. I'd rule out the WAIS function as its specification contains pages of random numbers, and its performance is no better than any of the far simpler functions.
Any of the remaining six functions seem like excellent choices, but we have to pick one. I suppose I'd rule out Vo's variant and Weinberger's function because of their added complexity, albeit minor. Hashes boil down to multiplication and modulus operations, which means that you never want to use numbers with common factors if you can help it.
In other words, relatively prime numbers provide an even distribution of answers. In latest version of JDK, 31 is still used. I'm not sure, but I would guess they tested some sample of prime numbers and found that 31 gave the best distribution over some sample of possible Strings. This is because 31 has a nice property — it's multiplication can be replaced by a bitwise shift which is faster than the standard multiplication:.
Using prime number multipliers when computing the hash decreases the probability that your multiplier and the N share divisors, which would make the result of the operation less uniformly random.
Others have pointed out the nice property that multiplication by 31 can be done by a multiplication and a subtraction. I just want to point out that there is a mathematical term for such primes: Mersenne Prime. See instruction tables from Agner Fog. That's why GCC seems to optimize multiplications by mersenne primes by replacing them with shifts and subs, see here.
However, in my opinion, such a small prime is a bad choice for a hash function. With a relatively good hash function, you would expect to have randomness at the higher bits of the hash.
However, with the Java hash function, there is almost no randomness at the higher bits with shorter strings and still highly questionable randomness at the lower bits. This makes it more difficult to build efficient hash tables. See this nice trick you couldn't do with the Java hash function. Some answers mention that they believe it is good that 31 fits into a byte.
This is actually useless since:. See here , you multiply entire 64bit registers. Does a smaller value increase randomness in the middle-lower bits? Maybe, but it also seems to greatly increase the possible collisions :. One could list many different issues but they generally boil down to two core principles not being fulfilled well: Confusion and Diffusion. But is it fast? Probably, since it doesn't do much. It's not a great hash algorithm, but it's good enough and better than the 1.
By multiplying, bits are shifted to the left. This uses more of the available space of hash codes, reducing collisions. By not using a power of two, the lower-order, rightmost bits are populated as well, to be mixed with the next piece of data going into the hash. He investigated the performance of different hash functions in regards to the resulting "average chain size" in a hash table. In the end he basically had to choose one and so he took P 31 since it seemed to perform well enough.
Even though P 33 was not really worse and multiplication by 33 is equally fast to calculate just a shift by 5 and an addition , he opted for 31 since 33 is not a prime:. Of the remaining four, I'd probably select P 31 , as it's the cheapest to calculate on a RISC machine because 31 is the difference of two powers of two. P 33 is similarly cheap to calculate, but it's performance is marginally worse, and 33 is composite, which makes me a bit nervous.
So the reasoning was not as rational as many of the answers here seem to imply. But we're all good in coming up with rational reasons after gut decisions and even Bloch might be prone to that. Actually, 37 would work pretty well! Both steps correspond to one LEA x86 instructions, so this is extremely fast.
0コメント