CS253: Software Development with C++

Fall 2022

Iterators

Show Lecture.Iterators as a slide show.

CS253 Iterators

Inclusion

Use of advance() or next() requires:

    
#include <iterator>

The free functions begin(), end(), and size() are defined in all header files that define a container, such as:

    
#include <array>
#include <deque>
#include <forward_list>
#include <iterator>
#include <list>
#include <map>
#include <set>
#include <string>
#include <string_view>
#include <unordered_map>
#include <unordered_set>
#include <vector>

Foundation

class Foo {
  public:
    auto bar() const { return "Yow!\n"; }
    int x = 42;
    using cash = float;
};

Foo::cash salary = 2.43;
Foo f;
cout << salary << ' ' << f.x << ' ' << f.bar();
2.43 42 Yow!

Array Traversal

You have an array, and you want to traverse (walk through) it.

int a[] = {2, 3, 5, 7, 11, 13, 17, 19};
for (int i=0; i!=8; ++i)
    cout << a[i] << ' ';
2 3 5 7 11 13 17 19 

Problems with indexing

Array Traversal via Pointers

int a[] = {2, 3, 5, 7, 11, 13, 17, 19};
for (int *p = &a[0]; p != &a[8]; ++p)
    cout << *p << ' ';
2 3 5 7 11 13 17 19 

vector traversal

You can traverse a vector in the same way, since the vector elements are contiguous:

vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (int *p = &a[0]; p != &a[8]; ++p)
    cout << *p << ' ';
2 3 5 7 11 13 17 19 

or, avoiding the magic number 8:

vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (int *p = &a[0]; p != &a[a.size()]; ++p)
    cout << *p << ' ';
2 3 5 7 11 13 17 19 

Types and Methods

vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (vector<int>::iterator it = a.begin(); it != a.end(); ++it)
    cout << *it << ' ';
2 3 5 7 11 13 17 19 

auto is your friend

This is prettier:

vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (auto it = a.begin(); it != a.end(); ++it)
    cout << *it << ' ';
2 3 5 7 11 13 17 19 

All that we did is we used auto to tell the compiler to make it the same type as a.begin() returns, namely, vector<int>::iterator.

for loop

This is (nearly) exactly the same, and gorgeous :

vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (auto v : a)
    cout << v << ' ';
2 3 5 7 11 13 17 19 

The for loop is defined to use .begin() and .end(), just as the previous code.

It’s simple

Really, the for-loop like that (sometimes called a for…each loop) is just a mechanical source-level transformation. It rewrites this:

set<string> s = {"my", "dog", "has", "fleas"};
for (auto v : s)
    cout << v << '\n';
dog
fleas
has
my

as this:

set<string> s = {"my", "dog", "has", "fleas"};
for (auto it = s.begin(); it != s.end(); ++it) {
    auto v = *it;
    cout << v << '\n';
}
dog
fleas
has
my

Really

See this error message:

double zulu;
for (auto v : zulu)  // 🦡
    cout << v;
c.cc:2: error: ‘begin’ was not declared in this scope

“begin”? Why is it complaining about “begin”? Because it rewrote that for loop to use .begin() and .end().

Variations

Which is best? Why?

vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (int v : a)
    cout << v << ' ';
2 3 5 7 11 13 17 19 

vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (const auto v : a)
    cout << v << ' ';
2 3 5 7 11 13 17 19 

vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (auto &v : a)
    cout << v << ' ';
2 3 5 7 11 13 17 19 

vector<int> a = {2, 3, 5, 7, 11, 13, 17, 19};
for (const auto &v : a)
    cout << v << ' ';
2 3 5 7 11 13 17 19 

Other containers

The same iterator code works for all STL containers.

forward_list<char> l = {'a', 'c', 'k', 'J'};
for (auto it = l.begin(); it != l.end(); ++it)
    cout << *it << ' ';
a c k J 

unordered_set<string> u = {"a", "c", "k", "J"};
for (auto it = u.begin(); it != u.end(); ++it)
    cout << *it << ' ';
J k c a 

set<char> s = {'a', 'c', 'k', 'J'};
for (auto it = s.begin(); it != s.end(); ++it)
    cout << *it << ' ';
J a c k 

“Jack” is ASCII-sorted.

Of the top 100 ♀ + top 100 ♂ U.S. names, 1918–2017, the ones in ASCII order: Amy, Ann, Betty, Billy, Gary, Harry, Henry, Jack, Jerry, Kelly, Larry, Mary, Roy, Scott, and Terry.

iterator type

iterator vs. pointer

Iterator categories

Standard iterator categories, least to most powerful:

Not C++ types: descriptions, used to categorize iterators.

Iterator categories

ForwardIterator:
forward_list, unordered_set

BidirectionalIterator:
list, set, map

RandomAccessIterator:
std::array, vector, string, deque

advance()

++ works on all iterators, but + and += only work on RandomAccessIterators:

list<int> li{11,22,33,44,55,66,77,88,99};
auto b = li.begin();
b += 5;  // 🦡
cout << *b;
c.cc:3: error: no match for ‘operator+=’ in ‘b += 5’ (operand types are 
   ‘std::_List_iterator<int>’ and ‘int’)

But, sometimes, you want to do that, even if it’s inefficient:

list<int> li{00,11,22,33,44,55,66};
auto b = li.begin();
advance(b, 5);
cout << *b;
55

advance() just works, no matter what the type of the iterator. It does simple addition for a RandomAccessIterator, and loops otherwise.

next()

next() is like advance(), but it returns the modified iterator:

list<int> li{00,11,22,33,44,55,66};
auto b = li.begin();
cout << *(next(b, 5));
55

next() does not modify its iterator argument. Think of advance() as +=, whereas next() is more like +.

begin() & end()

string s = "bonehead";
cout << "First char: " << *s.begin() << '\n';
cout << "Badness: " << *s.end() << '\n';
First char: b
Badness: ␀

string s = "genius";
cout << "First char: " << *s.begin() << '\n';
cout << "Last char:  " << *(s.end()-1) << '\n';
First char: g
Last char:  s

Intervals

Let’s get the mathematical important concept of a half-open interval clear:

Intervals

It’s all about [square brackets] vs. (round parens).

Half-open intervals

Most C++ containers have ctors that take half-open intervals.

string alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
string::iterator c = alpha.begin()+2, f = c+3;
string foo(c, f);
cout << *c << ' ' << *f << ' ' << foo << '\n';
C F CDE

-or-

string alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
auto c = &alpha[2], f = c+3;
cout << *c << ' ' << *f << ' ' << string(c,f) << '\n';
C F CDE

foo is not CDEF, because it’s a half-open interval.

Are both f variables the same type?

The first is string::iterator, the second is char *.

Are those the same?

Not necessarily. Maybe so, maybe no.

Half-open Intervals

int a[] = {00, 11, 22, 33, 44, 55, 66, 77};
vector<int> v(a+2, a+5);
for (auto n : v)
    cout << n << ' ';
22 33 44 
0011223344556677
      
  start  end  

The end location is not included.

.front() & .back()

Some containers have .front() and .back(), which return references to the first and last elements.

list<double> c = {2.3, 5.7, 11.13, 17.19, 23.29};
cout << "First: " << c.front() << '\n'
     << "Last:  " << c.back()  << '\n';
First: 2.3
Last:  23.29
string action = "Gripping";
action.front() = 'T';
cout << action;
Tripping

Comparisons, part one

This won’t work:

list<string> l = {"kappa", "alpha", "gamma"};
for (auto it = l.begin(); it < l.end(); ++it)  // 🦡
    cout << *it << ' ';
c.cc:2: error: no match for ‘operator<’ in ‘it < 
   l.std::__cxx11::list<std::__cxx11::basic_string<char> >::end()’ (operand 
   types are ‘std::_List_iterator<std::__cxx11::basic_string<char> >’ and 
   ‘std::__cxx11::list<std::__cxx11::basic_string<char> >::iterator’)

Read the message. It says that < isn’t defined for those iterators.

Comparisons, part two

This will work:

list<string> l = {"kappa", "alpha", "gamma"};
for (auto it = l.begin(); it != l.end(); ++it)
    cout << *it << ' ';
kappa alpha gamma 

list<>::iterator is only a BidirectionalIterator, not a RandomAccessIterator, and so < isn’t defined. What would it compare? The addresses of the linked list nodes? That’s not useful.

Constructors

Nearly all containers accept a pair of iterators as ctor arguments. These do not have to be iterators for the same type of container.

char date[] = __DATE__;    // E.g., May  3 2024
string now(date, date+6);
cout << "month & day: " << now << '\n';

string day_of_month(now.begin()+4, now.begin()+6);
cout << "day: " << day_of_month << '\n';

multiset<int> ms(now.begin(), now.end());
for (auto n : ms)
    cout << n << ' ';
cout << '\n';
month & day: May  3
day:  3
32 32 51 77 97 121 

begin() and end() functions

Let’s copy from a C array to a C++ string:

int fido[] = {'d','o','g'};
string current(fido, fido+2);
cout << current << '\n';
do

Oh dear, I counted wrong. Why am I counting!? Am I a computer?

int fido[] = {'d','o','g'};
string current(fido.begin(), fido.end());  // 🦡
cout << current << '\n';
c.cc:2: error: request for member ‘begin’ in ‘fido’, which is of 
   non-class type ‘int [3]’

Alas, fido has no methods.

begin() and end() functions

There are also free functions begin() and end(), which work on arrays (not pointers) and all standard containers:

int fido[] = {'d','o','g'};
string current(begin(fido), end(fido));
cout << current << '\n';
dog

Try that again

It was crazy to separate char values. Let’s just use a C string:

char fido[] = "DOG";
string current(begin(fido), end(fido));
cout << current << '\n';
DOG␀
␀? What fresh hell is this?

␀ is how these web pages display '\0', the null character.

What is sizeof(fido)? Did you count the null character ('\0') at the end? We’re not asking for the length of the C-string, which is clearly 3. Instead, we’re asking how many bytes the variable fido occupies: 4.

Pointer invalidation

Consider this poor code:

int *p = new int(42);
cout << "Before: " << *p << '\n';
delete p;
cout << "After:  " << *p << '\n';
Before: 42
After:  8882

Iterator invalidation

vector<long> v = {253};
vector<long>::iterator it = v.begin();

cout << "Before: " << *it << '\n';

for (long i=1; i<1000; i++)
    v.push_back(i);

cout << "After:  " << *it << '\n';
Before: 253
After:  7983

Reservation

Using .reserve() pre-allocates memory:

vector<long> v = {253};
v.reserve(1005);
auto it = v.begin();

cout << "Before: " << *it << '\n';

for (long i=1; i<1000; i++)
    v.push_back(i);

cout << "After:  " << *it << '\n';
Before: 253
After:  253

How often?

How often does re-allocation happen? We can find out, for any particular implemention :

vector<int> v;

for (int i=1; i<=1000; i++) {
    auto before = v.capacity();
    v.push_back(i);
    auto after = v.capacity();
    if (before != after)
        cout << i << ' ' << after << '\n';
}
1 1
2 2
3 4
5 8
9 16
17 32
33 64
65 128
129 256
257 512
513 1024

.capacity(): how much memory is allocated
.size(): how much memory is used

How often?

Similarly, because a std::string is not much more than a vector<char>:

string s;

cout << s.size() << ' ' << s.capacity() << '\n';
for (int i=1; i<10'000; i++) {
    auto before = s.capacity();
    s += 'x';
    auto after = s.capacity();
    if (before != after)
        cout << s.size() << ' ' << after << '\n';
}
0 15
16 30
31 60
61 120
121 240
241 480
481 960
961 1920
1921 3840
3841 7680
7681 15360

What happened to our nice powers of two?

Curious String Behavior

for (string s; s.size()<60; s+="abcde")
    cout << s.size() << ' ' << (void *) s.data() << '\n';
0 0x7fffb31a0320
5 0x7fffb31a0320
10 0x7fffb31a0320
15 0x7fffb31a0320
20 0x225e2c0
25 0x225e2c0
30 0x225e2c0
35 0x225e2f0
40 0x225e2f0
45 0x225e2f0
50 0x225e2f0
55 0x225e2f0

Casting? Why can’t we use << s.data() or << &s[0] to get the address of the string data?

Small String Optimization

Small String Optimization

Possible small string implementation:

class string {
  private:
    size_t size;            // actual string length
    union {
      struct {
        size_t alloc_size;  // amount of heap data
        char *heap;         // ptr to heap data
      } s;
      char local[16];       // or, put it here
    };
  public:
    char& operator[](size_t i) {
        return (i < sizeof(local)) ? local[i] : s.heap[i];
    }
    // and many more methods
};

Order Calculation

Big-O

Does it scale?

vector<char> v;
int copies = 0, iterations = 1'000'000;

for (int i=0; i<iterations; i++) {
    auto before = v.capacity();
    v.push_back(i);
    if (before != v.capacity())
        copies += before;
}

cout << double(copies)/iterations;
1.04858

Pre-allocation helps

Again, if we know how many items we’re going to add, we can .reserve() the space:

vector<int> v;
v.reserve(900);

cout << v.size() << ' ' << v.capacity() << '\n';
for (int i=1; i<1000; i++) {
    auto before = v.capacity();
    v.push_back(i);
    auto after = v.capacity();
    if (before != after)
        cout << v.size() << ' ' << after << '\n';
}
0 900
901 1800

const

Why does this fail?

class Foo {
    int sum() const {
        int total = 0;
        for (vector<int>::iterator it = data.begin(); it != data.end(); ++it)  // 🦡
            total += *it;
        return total;
    }
    vector<int> data;
};
c.cc:4: error: conversion from ‘__normal_iterator<const int*,[...]>’ to 
   non-scalar type ‘__normal_iterator<int*,[...]>’ requested

const

Restate the problem

This is the same problem:

const vector<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (vector<int>::iterator it = v.begin(); it != v.end(); ++it)  // 🦡
    cout << *it << ' ';
c.cc:2: error: conversion from ‘__normal_iterator<const int*,[...]>’ to 
   non-scalar type ‘__normal_iterator<int*,[...]>’ requested

Solution #1

One solution—use the correct type:

const vector<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (vector<int>::const_iterator it = v.begin(); it != v.end(); ++it)
    cout << *it << ' ';
1 1 2 3 5 8 13 21 34 

Solution #2

A better solution—let auto figure it out:

const vector<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = v.begin(); it != v.end(); ++it)
    cout << *it << ' ';
1 1 2 3 5 8 13 21 34 

Solution #3

The best solution—avoid all of this:

const vector<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto val : v)
    cout << val << ' ';
1 1 2 3 5 8 13 21 34 

.cbegin() and .cend()

vector<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = v.cbegin(); it != v.cend(); ++it)
    cout << *it << ' ';
1 1 2 3 5 8 13 21 34 

.rbegin() and .rend()

array<int, 9> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = v.rbegin(); it != v.rend(); ++it)
    cout << *it << ' ';
34 21 13 8 5 3 2 1 1 
list<int> v = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = v.crbegin(); it != v.crend(); ++it)
    cout << *it << ' ';
34 21 13 8 5 3 2 1 1 

Don’t get too excited

Plain old data types:

How about old C-style data?

int a[] = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = a.begin(); it != a.end(); ++it)  // 🦡
    cout << *it << ' ';
c.cc:2: error: request for member ‘begin’ in ‘a’, which is of non-class 
   type ‘int [9]’

That failed miserably. Perhaps it’s because arrays are not objects.

Free functions

Fortunately, the begin(), end(), and size() free functions work for containers and C arrays.

begin(thing) is defined as, conceptually (through the magic of templates):

    if thing is a C-style array
    then
        address of the start of the array
    else
        thing.begin()

Similarly, end(thing) is:

    if thing is a C-style array
    then
        address just after the end of the array
    else
        thing.end()

Free functions

int a[] = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = begin(a); it != end(a); ++it)
    cout << *it << ' ';
1 1 2 3 5 8 13 21 34 

These work for objects, as well:

deque<int> s = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto it = begin(s); it != end(s); ++it)
    cout << *it << ' ';
1 1 2 3 5 8 13 21 34 

What use is that? Generality for the sake of generality? No, it’s so that for loops will work for arrays as well as objects (even user-defined classes) with .begin() and .end() methods.

for loop

Consider any for loop:

forward_list<int> fl = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (double v : fl)
    cout << v << ' ';
1 1 2 3 5 8 13 21 34 

The compiler turns that into something like this, with free functions, not methods:

forward_list<int> fl = {1, 1, 2, 3, 5, 8, 13, 21, 34};
for (auto iter = begin(fl), e = end(fl); iter != e; ++iter) {
    double v = *iter;
    cout << v << ' ';
}
1 1 2 3 5 8 13 21 34 

which works for any container type, even C-style arrays. end() is efficiently called only once.

Efficiency

class Foo {
  public:
    Foo& operator++() {
        // whatever needs to be done
        return *this;
    }
    Foo operator++(int) {
        const auto save = *this;
        ++*this;
        return save;
    }
};