CS253: Software Development with C++

Fall 2022

Algorithms

Show Lecture.Algorithms as a slide show.

CS253 Algorithms


First published computer algorithm, by Ada Lovelace

Inclusion

Algorithms are defined in:

    
#include <algorithm>

A few numeric algorithms are defined in:

    
#include <numeric>

Definition

Arguments

count()

vector v = {1, 1,2, 1,2,3, 1,2,3,4};
cout << count(v.begin(), v.end(),                 1) << '\n'
     << count(v.begin(), v.begin()+v.size()/2,  2.0) << '\n'
     << count(&v[0],     1+&v.back(),          true) << '\n';
4
2
4

count_if()

bool small(int n) {
    return n < 5;
}

int main() {
    multiset ms = {3,1,4,1,5,9,2,6,5,3,5,8,9,7,9};
    cout << count_if(ms.begin(), ms.end(), small) << '\n'
         << count_if(ms.begin(), ms.end(),
                     [](int n){return n>7;}) << '\n';
}
6
4

count_if() is like count(), except it takes a predicate (a function that returns a bool) instead of a target value.

find()

find()

vector primes{ 2,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
              43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 };

auto it = find(primes.begin(), primes.end(), 13);
if (it == primes.end())
    cout << "Not found\n";
else
    cout << "Found "<< *it << " at " << it-primes.begin() << '\n';
Found 13 at 5

An algorithm; not a method! Some containers have a .find() method, which is preferred, if it exists. All that the poor find() algorithm can do is to search linearly, from front to back, but set::find() can take advantage of a set’s binary tree structure to perform the search in O(log n) time, and unordered_set::find() simply uses magic.

find()

vector primes{ 2,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
              43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 };

int *start = &primes[3], *finish = &primes[20];

cout << "Search the interval [" << *start << ',' << *finish << ")\n"
     << "It includes " << *start << ", but not " << *finish << '\n';

auto it = find(start, finish, 13);
if (it == finish)
    cout << "Not found, *it=" << *it << '\n';
else
    cout << "Found "<< *it << " at " << it-start << '\n';
Search the interval [7,73)
It includes 7, but not 73
Found 13 at 2

find()

vector primes{ 2,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
              43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 };

int *start = &primes[3], *finish = &primes[20];

cout << "Search the interval [" << *start << ',' << *finish << ")\n"
     << "It includes " << *start << ", but not " << *finish << '\n';

auto it = find(start, finish, 14);
if (it == finish)
    cout << "Not found, *it=" << *it << '\n';
else
    cout << "Found "<< *it << " at " << it-start << '\n';
Search the interval [7,73)
It includes 7, but not 73
Not found, *it=73

find_if()

bool pred(int n) {
    return n > 50;                  // Should find 53
}

int main() {
    set primes{ 2,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
               43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 };
    if (auto it = find_if(primes.begin(), primes.end(), pred); it == primes.end())
        cout << "Failure\n";
    else
        cout << "Found " << *it << '\n';
}
Found 53

copy()

string str = "bonehead";
set alpha = {'P', 'D', 'Q'};
copy(alpha.begin(), alpha.end(), str.begin());
cout << str << '\n';
DPQehead
string alpha = "abcdefghijklmnopqrstuvwxyz";
string initials = "JRA";
copy(initials.begin(), initials.begin()+2, alpha.begin()+20);
cout << alpha << '\n';
abcdefghijklmnopqrstJRwxyz

The iterator arguments don’t just have to be .begin() and .end().

copy_if()

Why isn’t there a dest-end ?

It isn’t needed. source-begin and source-end say how much to copy. Anyway, we might not even copy all of that!

First attempt

Let’s ensure that we know how to use copy() before moving on to copy_if():

string foo = "I have to ration my Diet Mountain Dew!";
cout << foo << "\n";
string bar;
copy(foo.begin(), foo.end(), bar.begin());  // 🦡
cout << bar << "\n";
SIGSEGV: Segmentation fault

Of course, plain bar = foo; would have worked nicely.

Why did the example fail?

There is no space allocated in bar. You can’t allocate space by pretending that it exists.

Second attempt

string foo = "I have to ration my Diet Mountain Dew!";
cout << foo << "\n";
string bar;
bar.resize(foo.size());
copy(foo.begin(), foo.end(), bar.begin());
cout << bar << "\n";
I have to ration my Diet Mountain Dew!
I have to ration my Diet Mountain Dew!

Third attempt

string foo = "I have to ration my Diet Mountain Dew!";
cout << foo << "\n";
string bar;
bar.resize(foo.size());
// Don’t copy vowels:
copy_if(foo.begin(), foo.end(), bar.begin(),
        [](char c){return "aeiouAEIOU"s.find(c)==string::npos;} );

cout << bar << "\n";
I have to ration my Diet Mountain Dew!
 hv t rtn my Dt Mntn Dw!␀␀␀␀␀␀␀␀␀␀␀␀␀␀

Hooray, copy_if() worked!

Hey, what’s with those ␀ characters?

.resize() filled the string with '\0', which display as ␀ here. Your terminal may simply ignore them and so not display them. bar.size() is unchanged.

Fourth attempt

string foo = "I have to ration my Diet Mountain Dew!";
cout << foo << "\n";

string bar(foo.size(), 'X');
// Don’t copy vowels:
auto it = copy_if(foo.begin(), foo.end(), bar.begin(),
                  [](char c){return "aeiouAEIOU"s.find(c)==string::npos;} );

// Make bar the correct size:
bar.resize(it-bar.begin());
cout << bar << "\n";
I have to ration my Diet Mountain Dew!
 hv t rtn my Dt Mntn Dw!

We resized bar to the correct size.

In-place

string foo = "I have to ration my Diet Mountain Dew!";
cout << foo << "\n";

auto it = copy_if(foo.begin(), foo.end(), foo.begin(),
                  [](char c){return "aeiouyAEIOUY"s.find(c)==string::npos;} );
// Make foo the correct size:
foo.resize(it-foo.begin());
cout << foo << "\n";
I have to ration my Diet Mountain Dew!
 hv t rtn m Dt Mntn Dw!

We can copy from & to the same location.

replace()

string fact = "Warren Harding’s middle name was Gamaliel.";
replace(fact.begin(), fact.end(), ' ', '_');
cout << fact << '\n';
Warren_Harding’s_middle_name_was_Gamaliel.
string fact = "Warren Harding’s middle name was Gamaliel.";
replace_if(fact.begin(), fact.end(),
           [](char c) { return c=='o' || c=='a';}, '*');
cout << fact << '\n';
W*rren H*rding’s middle n*me w*s G*m*liel.

transform()

string name = "Joseph Robinette Biden Jr.";
string out;
transform(name.begin(), name.end(), out.begin(),
          [](char c) { return c ^ 040; });  // 🦡
cout << out << '\n';
SIGSEGV: Segmentation fault

Oops! Didn’t allocate any memory in out!

string name = "Joseph Robinette Biden Jr.";
string out(name.size(), 'X');  // fill it
transform(name.begin(), name.end(), out.begin(),
          [](char c) { return c ^ 040; });
cout << out << '\n';
jOSEPH␀rOBINETTE␀bIDEN␀jR␎

sort()

The sort() algorithm (from the header file <algorithm>) has two forms:

Or, you can think of the third argument as optional, defaulting to less<whatever>(), where whatever is the type of the things that the iterators point to.

How do I sort() from container1 to container2?

copy() to container2, sort() the data there. It’s still O(n ).

Containers

Default comparison

string s = "Kokopelli";
sort(s.begin(), s.end());
cout << s << '\n';
Keiklloop

Duplicates and both upper-case K and lower-case k.

Why aren’t K and k together?

In ASCII, and hence Unicode, AZ all come before az. The order is ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz, not AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz

Explicit comparison

string s = "Kokopelli";
sort(s.begin(), s.end(), less<char>);  // 🦡
cout << s << '\n';
c.cc:2: error: expected primary-expression before ‘)’ token

Why doesn’t that work?

What does sort() want for the third argument?

It wants something that the function call operator () works on: a function or functor object.

What kind of thing is less<char>?

It’s a type. Not an object—a type.
However, less<char>() is an object of that type.

Can you pass a type as a function argument?

No. An object of that type, sure, but not a type.

Explicit comparison

string s = "Kokopelli";
sort(s.begin(), s.end(), less<char>());
cout << s << '\n';
Keiklloop

Reverse sort

string s = "Kokopelli";
sort(s.begin(), s.end(), greater<char>());
cout << s << '\n';
poollkieK

Comparison function

bool lt(char a, char b) {
    return a < b;
}

int main() {
    string s = "Kokopelli";
    sort(s.begin(), s.end(), lt);
    cout << s << '\n';
}
Keiklloop

λ-function

string s = "Kokopelli";
sort(s.begin(), s.end(),
     [](char a, char b){return a<b;});
cout << s << '\n';
sort(s.begin(), s.end(),
     [](char a, char b){return a>b;});
cout << s << '\n';
Keiklloop
poollkieK

Case folding

bool lt(char a, char b) {
    return toupper(a) < toupper(b);
}

int main() {
    string s = "Kokopelli";
    sort(s.begin(), s.end(), lt);
    cout << s << '\n';
}
eiKklloop

Unique

bool lt(char a, char b) {
    return toupper(a) < toupper(b);
}

int main() {
    string s = "Kokopelli";
    sort(s.begin(), s.end(), lt);
    auto it = unique(s.begin(), s.end());
    s.resize(it-s.begin());
    cout << s << '\n';
}
eiKklop

If you want to avoid duplicates, then use unique(), which requires that its input is in order already. That way, it can run in O(n ) time, as opposed to O(n ²) time.

Unique

bool lt(char a, char b) {
    return toupper(a) < toupper(b);
}

bool eq(char a, char b) {
    return toupper(a) == toupper(b);
}

int main() {
    string s = "Kokopelli";
    sort(s.begin(), s.end(), lt);
    auto it = unique(s.begin(), s.end(), eq);
    s.resize(it-s.begin());
    cout << s << '\n';
}
eiKlop

Alas, case-independent uniqueness doesn’t come free: we’ve duplicated the calls to toupper().

Unique and DRY

bool lt(char a, char b) {
    return toupper(a) < toupper(b);
}

bool eq(char a, char b) {
    return !lt(a,b) && !lt(b,a);  // a=b ⇔ a≮b ∧ b≮a
}

int main() {
    string s = "Kokopelli";
    sort(s.begin(), s.end(), lt);
    auto it = unique(s.begin(), s.end(), eq);
    s.resize(it-s.begin());
    cout << s << '\n';
}
eiKlop

Duplication of code is a bad thing, but avoiding it might cost: two lt() calls, hence four toupper() calls. However, there’s no cost. Our smart compiler generated exactly the same code for this eq() as the previous version.

Generality

It’s not just about strings:

int a[] = {333, 22, 4444, 1};
sort(begin(a), end(a));
for (auto val : a)
    cout << val << '\n';
1
22
333
4444

vector<double> v = {1.2, 0.1, 6.7, 4.555};
sort(v.begin(), v.end(), greater<double>());
for (auto val : v)
    cout << val << '\n';
6.7
4.555
1.2
0.1
Why didn’t I say a.begin()?
  • Because a is a C array. It’s not an object—no methods! However, the free functions begin() and end() work on C arrays.
  • Plain a works as well as begin(a), but begin(a),end(a) is nicely symmetric.

Attitude