CS253: Software Development with C++

Spring 2021

Hashing

Show Lecture.Hashing as a slide show.

CS253 Hashing

Leonardo da Vinci’s Mona Lisa and John the Baptist

Hashing in General

To hash an object:

Typical Hash Table

A hash table starts like this, an array of five (for instance) pointers, all initially null.

         0     1     2     3     4
      ┌─────┬─────┬─────┬─────┬─────┐
      |  ●  |  ●  |  ●  |  ●  |  ●  |
      └─────┴─────┴─────┴─────┴─────┘

Typical Hash Table

After adding "animal" and "vegetable":

         0     1     2	   3	 4
      ┌─────┬─────┬─────┬─────┬─────┐
      │	 ●  │	  │  ●	│     │	 ●  │
      └─────┴──┼──┴─────┴──┼──┴─────┘
               │	   │
               ∨	   ∨
         ┌────────┐	┌───────────┐
         │ animal │	│ vegetable │
         └────────┘	└───────────┘

Typical Hash Table

After adding "mineral":

         0     1     2	   3	 4
      ┌─────┬─────┬─────┬─────┬─────┐
      │	 ●  │	  │  ●	│     │	 ●  │
      └─────┴──┼──┴─────┴──┼──┴─────┘
               │	   │
               ∨	   ∨
         ┌────────┐	┌─────────┐   ┌───────────┐
         │ animal │	│ mineral │──>│ vegetable │
         └────────┘	└─────────┘   └───────────┘

Typical Hash Table

         0     1     2	   3	 4
      ┌─────┬─────┬─────┬─────┬─────┐
      │	 ●  │	  │  ●	│     │	 ●  │
      └─────┴──┼──┴─────┴──┼──┴─────┘
               │	   │
               ∨	   ∨
         ┌────────┐	┌─────────┐   ┌───────────┐
         │ animal │	│ mineral │──>│ vegetable │
         └────────┘	└─────────┘   └───────────┘

Expanding the Table

So What?

Hashing in C++

unordered_set<int> p = {2, 3, 5, 7, 11, 13, 17, 19};
for (auto n : p)
    cout << n << ' ';
19 17 11 7 5 3 13 2 

I Care

OK, let’s say that we care. We can find out:

unordered_set<int> p = {2, 3, 5, 7, 11, 13, 17, 19};
cout << "Buckets: " << p.bucket_count() << '\n'
     << "Size: " << p.size() << '\n'
     << "Load: " << p.load_factor() << " of "
     << p.max_load_factor() << '\n';
for (size_t b = 0; b<p.bucket_count(); b++)
    if (p.bucket_size(b))
        cout << "Bucket " << b << ": "
             << p.bucket_size(b) << " items\n";
for (auto n : p)
    cout << n << ' ';
Buckets: 11
Size: 8
Load: 0.727273 of 1
Bucket 0: 1 items
Bucket 2: 2 items
Bucket 3: 1 items
Bucket 5: 1 items
Bucket 6: 1 items
Bucket 7: 1 items
Bucket 8: 1 items
19 17 11 7 5 3 13 2 

Variable Number of Buckets

The number of buckets (usually prime) increases, based on how much data the hash contains:

unordered_set<int> us;
for (int r = 1; r <= 1e6; r*=10) {
    us.reserve(r);
    cout << r << ' ' << us.bucket_count() << '\n';
}
1 2
10 11
100 103
1000 1031
10000 10273
100000 107897
1000000 1056323

Load Factor

unordered_set::load_factor()
Returns the current load factor for this hash table, defined as unordered_set::size()/unordered_set::bucket_count().
unordered_set::max_load_factor()
Returns maximum load factor tolerated before rehashing.
Optional argument: change the maximum load factor.

Load Factor Demo

unordered_multiset<double> us;
for (int i=0; i<1e6; i++)
    us.insert(drand48());
cout << us.size()            << '\n'
     << us.bucket_count()    << '\n'
     << us.load_factor()     << '\n'
     << us.max_load_factor() << '\n';
1000000
1447153
0.691012
1

Once we study random numbers, we’ll see better ways of generating such things.

What are the Hash Values?

The process of hashing is converting any value (integer, floating-point, vector, set, stuct MyData, etc.) to an unsigned number.

We can find out the hash values, if we care:

cout << hash<int>()(253)       << '\n'
     << hash<int>()(-253)      << '\n'
     << hash<double>()(253.0)  << '\n'
     << hash<float>()(253.0F)  << '\n'
     << hash<long>()(253L)     << '\n'
     << hash<unsigned>()(253U) << '\n'
     << hash<char>()('a')      << '\n'
     << hash<bool>()(true)     << '\n'
     << hash<string>()("253")  << '\n'
     << hash<string>()("")     << '\n'
     << hash<int *>()(new int) << '\n';
253
18446744073709551363
12026514335406308073
3703063408979182286
253
253
97
1
1899958766268164750
6142509188972423790
37954240

Not everything

Not all built-in types are hashable:

cout << hash<ostream>()(cout);
c.cc:1: error: use of deleted function 'std::hash<std::basic_ostream<char> 
   >::hash()'
cout << hash<nullptr_t>()(nullptr);
c.cc:1: error: use of deleted function 'std::hash<std::nullptr_t>::hash()'
int a[] = {11,22};
cout << hash<int[]>()(a);
c.cc:2: error: use of deleted function 'std::hash<int []>::hash()'

User-defined Types

It doesn’t know how to hash your types:

struct Point { float x, y; } p = {1.2, 3.4};

int main() {
    cout << hash<Point>()(p);
}
c.cc:4: error: use of deleted function 'std::hash<Point>::hash()'

However, it can be taught.

User-defined Types

User-defined Types

We can create a template specialization for std::hash<Point>:

struct Point { float x, y; } p = {1.2, 3.4};

template <>
struct std::hash<Point> {
    size_t operator()(const Point &p) const {
       return hash<float>()(p.x) ^ hash<float>()(p.y);
    }
};

int main() {
    cout << hash<Point>()(p);
}
11708950365973905104

User-defined Types

Still fails; needs ==:

struct Point { float x, y; } p = {1.2, 3.4};

template <>
struct std::hash<Point> {
    size_t operator()(const Point &p) const {
       return hash<float>()(p.x) ^ hash<float>()(p.y);
    }
};

int main() {
    unordered_set<Point> us;
    us.insert(p);
}
In file included from /usr/include/c++/8/string:48,
                 from c.cc:1:
/usr/include/c++/8/bits/stl_function.h: In instantiation of 'constexpr bool std::equal_to<_Tp>::operator()(const _Tp&, const _Tp&) const [with _Tp = Point]':
/usr/include/c++/8/bits/hashtable_policy.h:1460:   required from 'static bool std::__detail::_Equal_helper<_Key, _Value, _ExtractKey, _Equal, _HashCodeType, true>::_S_equals(const _Equal&, const _ExtractKey&, const _Key&, _HashCodeType, std::__detail::_Hash_node<_Value, true>*) [with _Key = Point; _Value = Point; _ExtractKey = std::__detail::_Identity; _Equal = std::equal_to<Point>; _HashCodeType = long unsigned int]'
/usr/include/c++/8/bits/hashtable_policy.h:1844:   required from 'bool std::__detail::_Hashtable_base<_Key, _Value, _ExtractKey, _Equal, _H1, _H2, _Hash, _Traits>::_M_equals(const _Key&, std::__detail::_Hashtable_base<_Key, _Value, _ExtractKey, _Equal, _H1, _H2, _Hash, _Traits>::__hash_code, std::__detail::_Hashtable_base<_Key, _Value, _ExtractKey, _Equal, _H1, _H2, _Hash, _Traits>::__node_type*) const [with _Key = Point; _Value = Point; _ExtractKey = std::__detail::_Identity; _Equal = std::equal_to<Point>; _H1 = std::hash<Point>; _H2 = std::__detail::_Mod_range_hashing; _Hash = std::__detail::_Default_ranged_hash; _Traits = std::__detail::_Hashtable_traits<true, true, true>; std::__detail::_Hashtable_base<_Key, _Value, _ExtractKey, _Equal, _H1, _H2, _Hash, _Traits>::__hash_code = long unsigned int; std::__detail::_Hashtable_base<_Key, _Value, _ExtractKey, _Equal, _H1, _H2, _Hash, _Traits>::__node_type = std::__detail::_Hash_node<Point, true>]'
/usr/include/c++/8/bits/hashtable.h:1562:   required from 'std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::__node_base* std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::_M_find_before_node(std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::size_type, const key_type&, std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::__hash_code) const [with _Key = Point; _Value = Point; _Alloc = std::allocator<Point>; _ExtractKey = std::__detail::_Identity; _Equal = std::equal_to<Point>; _H1 = std::hash<Point>; _H2 = std::__detail::_Mod_range_hashing; _Hash = std::__detail::_Default_ranged_hash; _RehashPolicy = std::__detail::_Prime_rehash_policy; _Traits = std::__detail::_Hashtable_traits<true, true, true>; std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::__node_base = std::__detail::_Hash_node_base; std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::size_type = long unsigned int; std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::key_type = Point; std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::__hash_code = long unsigned int]'
/usr/include/c++/8/bits/hashtable.h:649:   required from 'std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::__node_type* std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::_M_find_node(std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::size_type, const key_type&, std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::__hash_code) const [with _Key = Point; _Value = Point; _Alloc = std::allocator<Point>; _ExtractKey = std::__detail::_Identity; _Equal = std::equal_to<Point>; _H1 = std::hash<Point>; _H2 = std::__detail::_Mod_range_hashing; _Hash = std::__detail::_Default_ranged_hash; _RehashPolicy = std::__detail::_Prime_rehash_policy; _Traits = std::__detail::_Hashtable_traits<true, true, true>; std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::__node_type = std::__detail::_Hash_node<Point, true>; typename _Traits::__hash_cached = std::integral_constant<bool, true>; std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::size_type = long unsigned int; std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::key_type = Point; std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::__hash_code = long unsigned int]'
/usr/include/c++/8/bits/hashtable.h:1830:   required from 'std::pair<typename std::__detail::_Hashtable_base<_Key, _Value, _ExtractKey, _Equal, _H1, _H2, _Hash, _Traits>::iterator, bool> std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::_M_insert(_Arg&&, const _NodeGenerator&, std::true_type, std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::size_type) [with _Arg = const Point&; _NodeGenerator = std::__detail::_AllocNode<std::allocator<std::__detail::_Hash_node<Point, true> > >; _Key = Point; _Value = Point; _Alloc = std::allocator<Point>; _ExtractKey = std::__detail::_Identity; _Equal = std::equal_to<Point>; _H1 = std::hash<Point>; _H2 = std::__detail::_Mod_range_hashing; _Hash = std::__detail::_Default_ranged_hash; _RehashPolicy = std::__detail::_Prime_rehash_policy; _Traits = std::__detail::_Hashtable_traits<true, true, true>; typename std::__detail::_Hashtable_base<_Key, _Value, _ExtractKey, _Equal, _H1, _H2, _Hash, _Traits>::iterator = std::__detail::_Node_iterator<Point, true, true>; std::true_type = std::integral_constant<bool, true>; std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::size_type = long unsigned int]'
/usr/include/c++/8/bits/hashtable_policy.h:834:   required from 'std::__detail::_Insert_base<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::__ireturn_type std::__detail::_Insert_base<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::insert(const value_type&) [with _Key = Point; _Value = Point; _Alloc = std::allocator<Point>; _ExtractKey = std::__detail::_Identity; _Equal = std::equal_to<Point>; _H1 = std::hash<Point>; _H2 = std::__detail::_Mod_range_hashing; _Hash = std::__detail::_Default_ranged_hash; _RehashPolicy = std::__detail::_Prime_rehash_policy; _Traits = std::__detail::_Hashtable_traits<true, true, true>; std::__detail::_Insert_base<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::__ireturn_type = std::pair<std::__detail::_Node_iterator<Point, true, true>, bool>; std::__detail::_Insert_base<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::value_type = Point]'
/usr/include/c++/8/bits/unordered_set.h:421:   required from 'std::pair<typename std::_Hashtable<_Value, _Value, _Alloc, std::__detail::_Identity, _Pred, _Hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<std::__not_<std::__and_<std::__is_fast_hash<_Hash>, std::__is_nothrow_invocable<const _Hash&, const _Tp&> > >::value, true, true> >::iterator, bool> std::unordered_set<_Value, _Hash, _Pred, _Alloc>::insert(const value_type&) [with _Value = Point; _Hash = std::hash<Point>; _Pred = std::equal_to<Point>; _Alloc = std::allocator<Point>; typename std::_Hashtable<_Value, _Value, _Alloc, std::__detail::_Identity, _Pred, _Hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<std::__not_<std::__and_<std::__is_fast_hash<_Hash>, std::__is_nothrow_invocable<const _Hash&, const _Tp&> > >::value, true, true> >::iterator = std::__detail::_Node_iterator<Point, true, true>; std::unordered_set<_Value, _Hash, _Pred, _Alloc>::value_type = Point]'
c.cc:12:   required from here
/usr/include/c++/8/bits/stl_function.h:356: error: no match for 'operator==' in 
   '__x == __y' (operand types are 'const Point' and 'const Point')

User-defined Types

Now, unordered_set works with a Point:

struct Point { float x, y; } p = {1.2, 3.4};

template <>
struct std::hash<Point> {
    size_t operator()(const Point &p) const {
       return hash<float>()(p.x) ^ hash<float>()(p.y);
    }
};

bool operator==(const Point &a, const Point &b) {
    return a.x==b.x && a.y==b.y;
}

int main() {
    unordered_set<Point> us;
    us.insert(p);
}

The Rules