Show Lecture.Algorithms as a slide show.

CS253 Algorithms

First published computer algorithm, by Ada Lovelace

Inclusion

Algorithms are defined in:

    #include <algorithm>

A few numeric algorithms are defined in:

    #include <numeric>

Definition

In Computer Science, algorithm means “how to do something”.
In C++, algorithm refers to templatized functions from the <algorithm> and <numeric> header files.
There are many algorithms available. We will focus on a few:
count() copy_if()

count_if() replace()

find() transform()

find_if() sort()

copy() unique()

Arguments

Algorithms generally take their input from half-open iterator ranges, which always (😟) come first.
For output, algorithms take a single iterator, which says where the output starts.
A second iterator indicating the end of the output is not required, since the length of the output is determined by the size of the input, possibly filtered in some way, as in copy_if().
Additional arguments may specify a value to look for, a predicate to select items, etc.

count()

vector v = {1, 1,2, 1,2,3, 1,2,3,4};
cout << count(v.begin(), v.end(),                 1) << '\n'
     << count(v.begin(), v.begin()+v.size()/2,  2.0) << '\n'
     << count(&v[0],     1+&v.back(),          true) << '\n';

4
2
4

count() counts how many times a thing is found. 🤯
The first two arguments form a half-open interval, which is exactly what .begin() and .end() give, since .end() “points” one past the last element.
Each element in the interval is compared to the third argument, which does not have to be the same type as the items in the range.
The interval can be two iterators into any sort of container. As long as the first iterator can be incremented, and compared to the second iterator, and assuming that the first iterator will eventually become equal to the second, it’s ok.
Pointers are iterators, so pointers into C arrays, C strings, or the heap are ok.

count_if()

bool small(int n) {
    return n < 5;
}

int main() {
    multiset ms = {3,1,4,1,5,9,2,6,5,3,5,8,9,7,9};
    cout << count_if(ms.begin(), ms.end(), small) << '\n'
         << count_if(ms.begin(), ms.end(),
                     [](int n){return n>7;}) << '\n';
}

6
4

count_if() is like count(), except it takes a predicate (a function that returns a bool) instead of a target value.

find()

The find() algorithm searches a half-open range for a value.
If it finds the value, it returns:
- not an index to the value found ✘
- not a pointer to the value found ✘
- an iterator that “points” to the value found. ✔️
- What type of iterator? The same type that you gave it to indicate the range.
If it can’t find the value, it returns:
- not a 0 or −1 ✘
- not a null pointer ✘
- not a pointer ✘
- the second iterator given; the end of the half-open interval. ✔️
OK, technically, if you give find() raw pointers, then it does return the same type, namely, a pointer.

find()

vector primes{ 2,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
              43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 };

auto it = find(primes.begin(), primes.end(), 13);
if (it == primes.end())
    cout << "Not found\n";
else
    cout << "Found "<< *it << " at " << it-primes.begin() << '\n';

Found 13 at 5

An algorithm; not a method! Some containers have a .find() method, which is preferred, if it exists. All that the poor find() algorithm can do is to search linearly, from front to back, but set::find() can take advantage of a set’s binary tree structure to perform the search in O(log n) time, and unordered_set::find() simply uses magic.

find()

vector primes{ 2,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
              43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 };

int *start = &primes[3], *finish = &primes[20];

cout << "Search the interval [" << *start << ',' << *finish << ")\n"
     << "It includes " << *start << ", but not " << *finish << '\n';

auto it = find(start, finish, 13);
if (it == finish)
    cout << "Not found, *it=" << *it << '\n';
else
    cout << "Found "<< *it << " at " << it-start << '\n';

Search the interval [7,73)
It includes 7, but not 73
Found 13 at 2

find()

vector primes{ 2,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
              43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 };

int *start = &primes[3], *finish = &primes[20];

cout << "Search the interval [" << *start << ',' << *finish << ")\n"
     << "It includes " << *start << ", but not " << *finish << '\n';

auto it = find(start, finish, 14);
if (it == finish)
    cout << "Not found, *it=" << *it << '\n';
else
    cout << "Found "<< *it << " at " << it-start << '\n';

Search the interval [7,73)
It includes 7, but not 73
Not found, *it=73

find_if()

bool pred(int n) {
    return n > 50;                  // Should find 53
}

int main() {
    set primes{ 2,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
               43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 };
    if (auto it = find_if(primes.begin(), primes.end(), pred); it == primes.end())
        cout << "Failure\n";
    else
        cout << "Found " << *it << '\n';
}

Found 53

Note the C++17 if statement with if (init; condition)
find_if() is like find(), but it takes a predicate, not a target value.
find() and find_if() stop at the first success. Can’t return all the matches!

copy()

string str = "bonehead";
set alpha = {'P', 'D', 'Q'};
copy(alpha.begin(), alpha.end(), str.begin());
cout << str << '\n';

DPQehead

string alpha = "abcdefghijklmnopqrstuvwxyz";
string initials = "JRA";
copy(initials.begin(), initials.begin()+2, alpha.begin()+20);
cout << alpha << '\n';

abcdefghijklmnopqrstJRwxyz

The iterator arguments don’t just have to be .begin() and .end().

copy_if()

copy_if() is like copy(), except that it doesn’t copy everything.
Instead it takes a predicate that determines whether or not to copy a given element.
- A predicate is a function that returns bool.
copy_if() takes three iterators and a predicate:
copy_if(source-begin, source-end, dest-begin, predicate)

Why isn’t there a dest-end ?

It isn’t needed. source-begin and source-end say how much to copy. Anyway, we might not even copy all of that!

First attempt

Let’s ensure that we know how to use copy() before moving on to copy_if():

string foo = "I have to ration my Diet Mountain Dew!";
cout << foo << "\n";
string bar;
copy(foo.begin(), foo.end(), bar.begin());  // 🦡
cout << bar << "\n";

SIGSEGV: Segmentation fault

Of course, plain bar = foo; would have worked nicely.

Why did the example fail?

There is no space allocated in bar. You can’t allocate space by pretending that it exists.

Second attempt

string foo = "I have to ration my Diet Mountain Dew!";
cout << foo << "\n";
string bar;
bar.resize(foo.size());
copy(foo.begin(), foo.end(), bar.begin());
cout << bar << "\n";

I have to ration my Diet Mountain Dew!
I have to ration my Diet Mountain Dew!

There—we allocated enough space in bar, via string::resize().
Fine, we’ve mastered copy(), so we’ll move on to copy_if().

Third attempt

string foo = "I have to ration my Diet Mountain Dew!";
cout << foo << "\n";
string bar;
bar.resize(foo.size());
// Don’t copy vowels:
copy_if(foo.begin(), foo.end(), bar.begin(),
        [](char c){return "aeiouAEIOU"s.find(c)==string::npos;} );

cout << bar << "\n";

I have to ration my Diet Mountain Dew!
 hv t rtn my Dt Mntn Dw!␀␀␀␀␀␀␀␀␀␀␀␀␀␀

Hooray, copy_if() worked!

Hey, what’s with those ␀ characters?

.resize() filled the string with '\0', which display as ␀ here. Your terminal may simply ignore them and so not display them. bar.size() is unchanged.

Fourth attempt

string foo = "I have to ration my Diet Mountain Dew!";
cout << foo << "\n";

string bar(foo.size(), 'X');
// Don’t copy vowels:
auto it = copy_if(foo.begin(), foo.end(), bar.begin(),
                  [](char c){return "aeiouAEIOU"s.find(c)==string::npos;} );

// Make bar the correct size:
bar.resize(it-bar.begin());
cout << bar << "\n";

I have to ration my Diet Mountain Dew!
 hv t rtn my Dt Mntn Dw!

We resized bar to the correct size.

In-place

string foo = "I have to ration my Diet Mountain Dew!";
cout << foo << "\n";

auto it = copy_if(foo.begin(), foo.end(), foo.begin(),
                  [](char c){return "aeiouyAEIOUY"s.find(c)==string::npos;} );
// Make foo the correct size:
foo.resize(it-foo.begin());
cout << foo << "\n";

I have to ration my Diet Mountain Dew!
 hv t rtn m Dt Mntn Dw!

We can copy from & to the same location.

replace()

string fact = "Warren Harding’s middle name was Gamaliel.";
replace(fact.begin(), fact.end(), ' ', '_');
cout << fact << '\n';

Warren_Harding’s_middle_name_was_Gamaliel.

string fact = "Warren Harding’s middle name was Gamaliel.";
replace_if(fact.begin(), fact.end(),
           [](char c) { return c=='o' || c=='a';}, '*');
cout << fact << '\n';

W*rren H*rding’s middle n*me w*s G*m*liel.

transform()

string name = "Joseph Robinette Biden Jr.";
string out;
transform(name.begin(), name.end(), out.begin(),
          [](char c) { return c ^ 040; });  // 🦡
cout << out << '\n';

SIGSEGV: Segmentation fault

Oops! Didn’t allocate any memory in out!

string name = "Joseph Robinette Biden Jr.";
string out(name.size(), 'X');  // fill it
transform(name.begin(), name.end(), out.begin(),
          [](char c) { return c ^ 040; });
cout << out << '\n';

jOSEPH␀rOBINETTE␀bIDEN␀jR␎

sort()

The sort() algorithm (from the header file <algorithm>) has two forms:

sort(begin, end );
sort(begin, end, comparison-object-or-function);

Or, you can think of the third argument as optional, defaulting to less<whatever>(), where whatever is the type of the things that the iterators point to.

Only a single half-open interval is given.
- It’s an in-place sort.

How do I sort() from container1 to container2?

copy() to container2, sort() the data there. It’s still O(n ).

Containers

Of course, some containers are intrinsically sorted.
You might specify a comparison functor for those containers.
You wouldn’t use the sort() algorithm on those containers.
However, you might want to apply the sort() algorithm to an unsorted container, such as a std::array, vector, string, or even a C array.
list has a sort() method.

Default comparison

string s = "Kokopelli";
sort(s.begin(), s.end());
cout << s << '\n';

Keiklloop

Duplicates and both upper-case K and lower-case k.

Why aren’t K and k together?

In ASCII, and hence Unicode, A…Z all come before a…z. The order is ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz, not AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz

Explicit comparison

string s = "Kokopelli";
sort(s.begin(), s.end(), less<char>);  // 🦡
cout << s << '\n';

c.cc:2: error: expected primary-expression before ‘)’ token

Why doesn’t that work?

What does sort() want for the third argument?

It wants something that the function call operator () works on: a function or functor object.

What kind of thing is less<char>?

It’s a type. Not an object—a type.
However, less<char>() is an object of that type.

Can you pass a type as a function argument?

No. An object of that type, sure, but not a type.

Explicit comparison

string s = "Kokopelli";
sort(s.begin(), s.end(), less<char>());
cout << s << '\n';

Keiklloop

less<char> is a type.
less<char>() is a temporary object of that type.
- The () invoke the ctor—not operator().

Reverse sort

string s = "Kokopelli";
sort(s.begin(), s.end(), greater<char>());
cout << s << '\n';

poollkieK

Comparison function

bool lt(char a, char b) {
    return a < b;
}

int main() {
    string s = "Kokopelli";
    sort(s.begin(), s.end(), lt);
    cout << s << '\n';
}

Keiklloop

λ-function

string s = "Kokopelli";
sort(s.begin(), s.end(),
     [](char a, char b){return a<b;});
cout << s << '\n';
sort(s.begin(), s.end(),
     [](char a, char b){return a>b;});
cout << s << '\n';

Keiklloop
poollkieK

Case folding

bool lt(char a, char b) {
    return toupper(a) < toupper(b);
}

int main() {
    string s = "Kokopelli";
    sort(s.begin(), s.end(), lt);
    cout << s << '\n';
}

eiKklloop

Unique

bool lt(char a, char b) {
    return toupper(a) < toupper(b);
}

int main() {
    string s = "Kokopelli";
    sort(s.begin(), s.end(), lt);
    auto it = unique(s.begin(), s.end());
    s.resize(it-s.begin());
    cout << s << '\n';
}

eiKklop

If you want to avoid duplicates, then use unique(), which requires that its input is in order already. That way, it can run in O(n ) time, as opposed to O(n ²) time.

Unique

bool lt(char a, char b) {
    return toupper(a) < toupper(b);
}

bool eq(char a, char b) {
    return toupper(a) == toupper(b);
}

int main() {
    string s = "Kokopelli";
    sort(s.begin(), s.end(), lt);
    auto it = unique(s.begin(), s.end(), eq);
    s.resize(it-s.begin());
    cout << s << '\n';
}

eiKlop

Alas, case-independent uniqueness doesn’t come free: we’ve duplicated the calls to toupper().

Unique and DRY

bool lt(char a, char b) {
    return toupper(a) < toupper(b);
}

bool eq(char a, char b) {
    return !lt(a,b) && !lt(b,a);  // a=b ⇔ a≮b ∧ b≮a
}

int main() {
    string s = "Kokopelli";
    sort(s.begin(), s.end(), lt);
    auto it = unique(s.begin(), s.end(), eq);
    s.resize(it-s.begin());
    cout << s << '\n';
}

eiKlop

Duplication of code is a bad thing, but avoiding it might cost: two lt() calls, hence four toupper() calls. However, there’s no cost. Our smart compiler generated exactly the same code for this eq() as the previous version.

Generality

It’s not just about strings:

int a[] = {333, 22, 4444, 1};
sort(begin(a), end(a));
for (auto val : a)
    cout << val << '\n';

vector<double> v = {1.2, 0.1, 6.7, 4.555};
sort(v.begin(), v.end(), greater<double>());
for (auto val : v)
    cout << val << '\n';

6.7
4.555
1.2
0.1

Why didn’t I say a.begin()?

Because a is a C array. It’s not an object—no methods! However, the free functions begin() and end() work on C arrays.
Plain a works as well as begin(a), but begin(a),end(a) is nicely symmetric.

Attitude

These algorithms may strike you as simplistic. “I could write that!”.
You could write a for loop as a while loop, but that would just confuse everybody. A for loop has semantic value.
Sure, you could write your own code. But would it be correct? Even the corner cases, like searching an empty range?
Using standard algorithms conveys meaning. Educated C++ programmers recognize the standard algorithms, just as we all know that “brother” means “male sibling”.
Compilers might recognize algorithms and replace them with special machine code. copy for chars might get replaced with ultra-fast looping instructions to copy memory 64 bytes at a time.

CS253: Software Development with C++

Fall 2022

Algorithms

CS253 Algorithms

Inclusion

Definition

Arguments

count()

count_if()

find()

find()

find()

find()

find_if()

copy()

copy_if()

First attempt

Second attempt

Third attempt

Fourth attempt

In-place

replace()

transform()

sort()

Containers

Default comparison

Explicit comparison

Explicit comparison

Reverse sort

Comparison function

λ-function

Case folding

Unique

Unique

Unique and DRY

Generality

Attitude