CS253: Software Development with C++

Fall 2022

Strings

Show Lecture.Strings as a slide show.

CS253 Strings

Inclusion

To use strings, you need to:

    
#include <string>

Use of string_view objects requires:

    
#include <string_view>

The rare uses of strcmp() or strlen() require:

    
#include <cstring>

Operators

Java programmers aren’t used to mutable strings with operators:

string a = "alpha", b("beta"), g{"gamma"}, parens = "()";
b += "<"+g+'>';
auto result = parens[0]+a+b+parens[1];
result[3] = '*';
cout << result << '\n';
if (b < g) cout << "Good!\n";
(al*habeta<gamma>)
Good!

Old vs. New

// C strings
char q[] = "gamma delta";
char *p = "alpha beta";
printf("%zd chars; first char is %c\n", strlen(q), q[0]);
printf("%zd chars; ninth char is %c\n", strlen(p), p[8]);
11 chars; first char is g
10 chars; ninth char is t
// C++ strings
string s = "ceti alpha six";
cout << s.size()   << " chars; third char is " << s[2]     << '\n'
     << s.length() << " chars; last char is "  << s.back() << '\n';
14 chars; third char is t
14 chars; last char is x

Why C strings?

C strings

char foo[10] = "xyz", bar[] = "pdq";
cout << sizeof(foo) << ' ' << strlen(foo) << '\n';
cout << sizeof(bar) << ' ' << strlen(bar) << '\n';
10 3
4 3

C strings

C++ strings

How NOT to define a C++ string

Welcome to C++, which is not Java:

string riley = new string;  // 🦡
cout << riley;
c.cc:1: error: conversion from ‘std::string*’ {aka 
   ‘std::__cxx11::basic_string<char>*’} to non-scalar type 
   ‘std::string’ {aka ‘std::__cxx11::basic_string<char>’} requested

How NOT to define a C++ string

Don’t do this, either, though it does work:

string joy = string("sadness");  // 🦡
cout << joy;
sadness

That creates an anonymous temporary string on the right-hand side, copies/moves it to joy, then destroys the temporary string.

Sure, it works, but … no!

How to define a C++ string

Do it like this:

string fear = "disgust";
cout << fear;
disgust

or, if you don’t have a value for the string at first:

string anger;
anger = "Bing Bong";
cout << anger;
Bing Bong

Java programmers are trained to treat objects differently than other types. Shake that off!

Subscripting

Subscripting on a C++ string produces a char:

string course="CS253";
cout << course << '\n';
cout << course[1] << '\n';
CS253
S

which can be modified:

string pet = "cat";
pet[0] = 'r';
cout << pet << '\n';
rat

Note the 'r', not "r". 'x' is a char, "y" is a C-string, a const char *.
Single quotes for single characters.

Indexing

A great horned lizard
Yosemite Sam

Declare your index variable properly:

string s = "Great horny toads!\n";
for (int i=0; i<s.size(); i++)  // 🦡
    if (s[i] == 'o') s[i] = '*';
cout << s;
c.cc:2: warning: comparison of integer expressions of different signedness: 
   ‘int’ and ‘std::__cxx11::basic_string<char>::size_type’ {aka ‘long 
   unsigned int’}
Great h*rny t*ads!

i is int, which is signed, whereas string::size() returns a size_t, which is unsigned. The compiler dislikes comparing signed to unsigned variables.

Indexing

What’s so bad about comparing signed to unsigned variables? The rules of the language say that both sides of the comparison get promoted to unsigned, which produces interesting results:

if (-2 > 3U)  // 🦡
    cout << "Isn’t that surprising!\n";
c.cc:1: warning: comparison of integer expressions of different signedness: 
   ‘int’ and ‘unsigned int’
Isn’t that surprising!

Indexing

The solution: make i the right type:

string s = "Great horny toads!\n";
for (size_t i=0; i<s.size(); i++)
    if (s[i] == 'o') s[i] = '*';
cout << s;
Great h*rny t*ads!
Why wouldn’t auto work?

auto would just make i the type of 0, which is int. Sure, you could say auto i=0ULL, but you may as well just use size_t. Besides, size_t is not necessarily the same as unsigned long long.

Mutable

Unlike Java, C++ strings are mutable—they can be modified.

string soup = "Tomato dispue is bisgusting.";
cout << soup << '\n';
soup[7]  = 'b';
soup[10] = 'q';
soup[17] = 'd';
cout << soup << '\n';
Tomato dispue is bisgusting.
Tomato bisque is disgusting.

String methods

string::c_str()Extract C string
string::size()get length
string::insert()add chars anywhere
string::erase()remove chars anywhere
string::replace()replace chars anywhere
string::substr()return substring
string::find()look for a string or character
string::find_first_of()find next char in a set of chars
string::find_first_not_of()find next char not in a set of chars
string::find_last_of()find prev char in a set of chars
string::find_last_not_of()find prev char not in a set of chars

The string class has many methods. These are only some of them.

Learn those methods

Truth

I freely use C string literals, like this:

void emit(string s) {
    cout << "*** " << s << '\n';
}

int main() {
    emit("Today is a lovely day.");
    return 0;
}
*** Today is a lovely day.

Some Code

char q[80] = "This is a C string.\n";
cout << q;
char r[] = "foobar";
r[3] = '\0';
cout << "r is now \"" << r << "\"\n";
const char *p = "This is also a C string";
cout << p << ", length is " << strlen(p) << '\n';
This is a C string.
r is now "foo"
This is also a C string, length is 23
string s("useless initial value");
s = "This am a C++ string";     // mixed
s[5] = 'i';                     // mutable
s[6] += 6;              // char is integer-like
cout << s << ", length is " << s.size() << '\n';
This is a C++ string, length is 20

Conversions

Converting from a C-style string to a C++ string is easy, because the C++ string object has a constructor that takes a C-style string:

char chip[] = "chocolate";
string dale(chip);
cout << dale << '\n';
chocolate

Conversions

Converting from a C++ string to a C-style string requires a method:

string wall(30, '#');
const char *p = wall;  // 🦡
cout << p << '\n';
c.cc:2: error: cannot convert ‘std::string’ {aka 
   ‘std::__cxx11::basic_string<char>’} to ‘const char*’ in 
   initialization
string wall(30, '#');
const char *p = wall.c_str();
cout << p << '\n';
##############################

string::c_str()

string::c_str() is useful for calling an old-fashioned library function that wants a C-style string.

string command = "date";
system(command);  // 🦡
c.cc:2: error: cannot convert ‘std::string’ {aka 
   ‘std::__cxx11::basic_string<char>’} to ‘const char*’
string command = "date";
system(command.c_str());
Sun May  5 08:53:22 MDT 2024

string::data()

String Literals

made at imgflip.com

Literals

String Literals

A "string literal" is an anonymous array of constant characters. These are equivalent:

cout << "FN-2187";
FN-2187
const char whatever[] = "FN-2187";
cout << whatever;
FN-2187
const char whatever[] = "FN-2187";
const char *p = &whatever[0];
cout << p;
FN-2187

Escape Sequences:

SequenceMeaningSequenceMeaning
\abell\''
\bbackspace\""
\fform feed\\\
\nnewline\0ddd0–3 octal digits
\rcarriage return\xdd1–∞ hex digits
\thorizontal tab\uddddUnicode U+dddd
\vvertical tab\UddddddddUnicode U+dddddddd

String Pasting

Two adjacent string literals are merged into one at compile-time:

cout << "alpha beta "  "gamma delta "
        "epsilon\n";
alpha beta gamma delta epsilon
cout << "Business plan:\n\n"
        "1. Collect underpants\n"
        "2. ?\n"
        "3. Profit\n";
Business plan:

1. Collect underpants
2. ?
3. Profit

Raw Strings

Raw Strings

A raw string starts with R"( and ends with )". The parens are not part of the string.

cout << R"(Don’t be "afraid" of letters:
\a\b\c\d\e\f\g)";
Don’t be "afraid" of letters:
\a\b\c\d\e\f\g

Cool! Quotes inside of quotes!

However …

What if the string contains a right paren? I want to emit:

    A goatee!  \:-)"  Cool!
cout << R"(A goatee!  \:-))"  Cool!";  // 🦡
c.cc:1: warning: missing terminating " character
c.cc:1: error: missing terminating " character

That didn’t work. The )" at the bottom of the face was taken to be the end of the raw string.

Solution

A raw string starts with:

R"whatever-you-like-up-to-sixteen-chars(

and ends with:

)the-same-up-to-sixteen-chars"
cout << R"X(A goatee!  \:-)"  Cool!)X";
A goatee!  \:-)"  Cool!
cout << R"<COVID-19>(What the #"%'&*)?)<COVID-19>";
What the #"%'&*)?
cout << R"(The degenerate case)";
The degenerate case

Comparing C-Style Strings

if ("foo" < "bar")  // 🦡
    cout << "😢";
😢

Comparing C-style strings properly.

Comparing C++ std::strings

    <  >  <=  >=  ==  !=

Comparing C++ std::strings

string name = "Conan O’Brien";
if (name == "Conan O’Brien")
    cout << "good 1\n";
if (name < "Zulu")
    cout << "good 2\n";
if (name > "Andy Richter")
    cout << "good 3\n";
if (name == name)
    cout << "good 4\n";
good 1
good 2
good 3
good 4

God help us, another string!

C++17’s string_view is a non-owning read-only view into a C-string or std::string. It’s generally implemented as a char * and a length.

const char *a = "alpha";
string b = "beta";
string_view c = a;
cout << c << '\n';
c = b;
cout << b << '\n';
alpha
beta

string_view purpose

void hero(string_view sv) {
    cout << "Nice work, " << sv << "!"
         << " (len=" << sv.size() << ")\n";
}

int main() {
    hero("Batman"); // C-string
    hero("Robin"s); // C++ string
}
Nice work, Batman! (len=6)
Nice work, Robin! (len=5)

Methods

Timing: converting to const string reference

bool first(const string &csr) { return csr[0]; }

int main() {
    const char s[] = "abcdefghijklmnopqrstuvwxyz";
    for (int i=0; i<10'000'000; i++)
        first(s);
}

Real time: 156 ms

bool first(const string &csr) { return csr[0]; }

int main() {
    string s = "abcdefghijklmnopqrstuvwxyz";
    for (int i=0; i<10'000'000; i++)
        first(s);
}

Real time: 8.83 ms

Timing: converting to string_view

bool first(string_view sv) { return sv[0]; }

int main() {
    const char s[] = "abcdefghijklmnopqrstuvwxyz";
    for (int i=0; i<10'000'000; i++)
        first(s);
}

Real time: 7.54 ms

bool first(string_view sv) { return sv[0]; }

int main() {
    string s = "abcdefghijklmnopqrstuvwxyz";
    for (int i=0; i<10'000'000; i++)
        first(s);
}

Real time: 7.2 ms