This is Benjamin Huygir's Typepad Profile.
Join Typepad and start following Benjamin Huygir's activity
Benjamin Huygir
Recent Activity
(better late than never...) There was a comment above something to the effect that if the distribution of the hash was not uniform (unbiased) then the math gets a whole lot messier and depends on the hash. But as I was diagraming the calculation for probability of at least one collision it occurred to me that the math must already be messy (not so clean as the simple factorial calculation of the series)... Think about it... if you have 10 slots total and 5 are occupied, you would calculate a probability of a collision on the next insert at 50%. No problem there - we stated clearly that there are 5 occupied slots. BUT if we wanted to calculate the probability of at least one collision so far, the straight math gives you 70%... but only if you make the paradoxical assumption that there are 5 occupied slots - in which case you beat the odds because you had no collisions. If you in fact have a collision along the way, then the number of open slots is greater than you are basing the calculations on. See? If along the way of inserting 5 items into a 10 slot hashtable I in fact have a collision (and odds are good I do), then after 5 insertions I have 6 open slots, not 5. That make the calculation for the next insertion and the additive probability so far change (quite a bit for such a small hashtable). Someone please tell me how to reconcile this so that some "real world" numbers come out of this. The worst case example of this is that you never have a collision until you hit m = n + 1 at which point the calculation is moot because every insert from then on must collide... but up until then you could have been beating the odds even at probabilities approaching 1... but that isn't realistic. So the pure math does not reflect reality. bh
Benjamin Huygir is now following The Typepad Team
Sep 29, 2011