CS253: Software Development with C++

Fall 2022

Not Fully Specified

Show Lecture.NotFullySpecified as a slide show.

CS253 Not Fully Specified

What the language definition does not say.

Unlike many languages, the C++ standard leaves some choices up to the compiler. It defines several varieties of not-fully specified things:

Implementation-defined (§1.9.2)
A choice made by the compiler, must be documented
Unspecified behavior (§1.9.3)
A choice made by the compiler, need not be documented
Undefined behavior (§1.9.4)
All bets are off!

Implementation-defined behavior

A choice made by the compiler, which must be documented.

The number of bytes or bits allocated to various types:

cout << sizeof(int) << '\n';
4

Floating-point precision:

float f = 123456789;
f += 3;
f -= 123456789;
cout << f << '\n';
0

Such choices are often heavily influenced by the hardware.

Implementation-defined behavior examples

What happens when a value gets too big for its signed variable:

short s = 32767;  // 🦡
cout << ++s;
-32768

Character set (ASCII, EBCDIC, HP Roman-8, Windows-1252, Big5, Shift JIS, various flavors of Unicode, etc.):

switch ('$') {
    case 0x24: cout << "ASCII or UTF-8"; break;
    case 0x5b: cout << "EBCDIC";         break;
    default:   cout << "WTF!?";          break;
}
ASCII or UTF-8

Implementation-defined behavior examples

When >> is used to shift a signed value, what comes in to replace the leftmost (sign) bit? It might be a copy of the sign bit, or it might be just a zero.

cout << (-1 >> 4) << '\n';  // 🦡
-1

system() invokes the command interpreter. Its result really depends on the host operating system.

system("date");
system("hostname");
Sun May  5 14:28:49 MDT 2024
beethoven

Unspecified behavior

A choice made by the compiler, need not be documented or consistent, generally a “this-or-that” sort of choice.

Do not assume that expressions are evaluated left-to-right:

int foo() { cout << "foo"; return 0; }
int bar() { cout << "bar"; return 0; }

int main() {
    return foo()*bar();  // 🦡
}
foobar

Many students assume that expressions must be evaluated left-to-right. This is not true in C++.

Unspecified behavior examples

Do not assume that arguments are evaluated left-to-right:

int foo() { cout << "foo"; return 0; }
int bar() { cout << "bar"; return 0; }

void ignore_arguments(int, int) { }

int main() {
    ignore_arguments(foo(), bar());  // 🦡
}
barfoo

Unspecified behavior examples

Declaration order does not determine memory order:

int a,b;
if (&a < &b)  // 🦡
    cout << "a has a lower address\n";
else
    cout << "b has a lower address\n";
b has a lower address

Unspecified behavior examples

I suspect that byte order (little-endian, big-endian) is unspecified, since a program can’t detect byte order without an unspecified operation:

int word = 0x12345678;
short *sp = reinterpret_cast<short *>(&word);
cout << hex << *sp << '\n';
5678

Undefined behavior

Undefined behavior

Undefined variables

If you want a value in a variable, put one there.

// Compiled without warnings or optimization.
int a,b,c,d,e,f,g,h,i,j;
cout << a << ' ' << b << ' ' << c << ' ' << d << ' ' << e << ' '
     << f << ' ' << g << ' ' << h << ' ' << i << ' ' << j << '\n';
0 0 32766 712983792 0 4195904 0 4196528 0 0

Out-of-bounds array access

long a=11, b[] = {22,33}, c=44;
cout << a    << '\n'
     << b[2] << '\n'  // 🦡
     << c    << '\n';
11
44
44

Undefined behavior

Similar code can produce quite different results.

long d[2];
cout << d[1] << endl;        // 🦡
cout << d[100] << endl;      // 🦡
cout << d[1000] << endl;     // 🦡
cout << d[1000000] << endl;  // 🦡
c.cc:2: warning: ‘d’ is used uninitialized
c.cc:1: note: ‘d’ declared here
0
140726466608376
0
SIGSEGV: Segmentation fault

Undefined behavior examples

cout << "Hello, world!\n";
int *p = nullptr;
cout << *p << '\n';  // 🦡 Kaboom!
SIGSEGV: Segmentation fault
Why don’t we see the “Hello, world!”?

Buffering! Output does not go out immediately—that’s inefficient. Instead, the output accumulates, piles up in a buffer, until endl, flush, or program end. Program dies; output is lost. ☹

Interactive output is line-buffered, but these slides send the output to a file, so it’s fully buffered.

Undefined behavior examples

// Shifting too far:
int amount=35;
cout << (1<<amount);  // 🦡
8

The standard states that you can’t shift more than the word size, and can’t shift a negative amount.

Why not?

Since shifting is such a common operation, most CPUs have a shift instruction. For 32‑bit values, the shift amount is typically held in a five‑bit field in the instruction (25 = 32). Alas, 35 cannot be represented in five bits, and we’re not going to slow down my correct program to check for errors in your faulty code.

Undefined behavior examples

// Multiple writes to the same location
// in a single expression:
int a = 0;
cout << ++a + ++a << '\n';  // 🦡
c.cc:4: warning: operation on ‘a’ may be undefined
c.cc:4: warning: operation on ‘a’ may be undefined
4
int b;
cout << b << '\n';  // 🦡
c.cc:2: warning: ‘b’ is used uninitialized
0

g++ notices some undefined behavior, not all. This is a QOI (Quality Of Implementation) aspect, but not a standards-conformance issue.

But, why??

C++ is quite concerned about efficiency.

C++’s attitude is “You break the rules, you pay the price.” It doesn’t hold your hand.

Things Be Changin’

This is undefined behavior in C++14:

// C++ 2014
int i=5;
i = i++;  // 🦡
cout << i;
c.cc:3: warning: operation on ‘i’ may be undefined
c.cc:3: warning: operation on ‘i’ may be undefined
5

C++17, regarding assignment, says “The right operand is sequenced before the left operand”, so ++ finishes before =, and the output of this awful code is guaranteed to be 6:

// C++ 2017
int i=5;
i = i++;
cout << i;
c.cc:3: warning: operation on ‘i’ may be undefined
c.cc:3: warning: operation on ‘i’ may be undefined
5

Looks like the compiler (on the web server) hasn’t caught up to the standard.

Not just theoretical

Information from table 4–6 (page 4–11) of the Unisys C Compiler Programming Reference Manual:

TypeBitssizeofSigned RangeUnsigned Max
char91−255 to 255511
short182−217+1 to 217−1218−1
int364−235+1 to 235−1236−2
long364−235+1 to 235−1236−2
long long728−271+1 to 271−1