CS253: Software Development with C++

Spring 2020

Not Fully Specified

Show Lecture.NotFullySpecified as a slide show.

CS253 Not Fully Specified

What the language definition does not say.

The C++ standard defines several kinds of not-fully specified things:

Implementation-defined (§1.9.2):

A choice made by the compiler, must be documented

Unspecified behavior (§1.9.3):

A choice made by the compiler, need not be documented

Undefined behavior (§1.9.4):

All bets are off!

Implementation-defined behavior

A choice made by the compiler, which must be documented.

// Size of variables:
cout << sizeof(int) << '\n';
4
// Maximum value of a double:
double d = 6e307;
cout << d << '\n' << d*2 << '\n' << d*3;
6e+307
1.2e+308
inf

Such choices are often heavily influenced by the hardware.

Implementation-defined behavior examples

// Signed overflow:
short s = 32767;
cout << ++s;
-32768
// Character set:
switch ('$') {
    case 0x24: cout << "ASCII or UTF-8\n"; break;
    case 0x5b: cout << "EBCDIC\n";         break;
    default:   cout << "WTF!?\n";          break;
}
ASCII or UTF-8

Implementation-defined behavior examples

// The result of shifting a negative signed value right:
cout << (-1 >> 4) << '\n';
-1
// The result of system():
system("date");
Sat Apr 20 06:31:33 MDT 2024

Unspecified behavior

A choice made by the compiler, need not be documented or consistent, generally a “this-or-that” sort of choice.

// Order of evaluation of an expression (mostly):

int foo() { cout << "foo"; return 0; }
int bar() { cout << "bar"; return 0; }

int main() {
    return foo()*bar();
}
foobar

Unspecified behavior examples

// Comparing addresses of different objects:
int a,b;
cout << boolalpha << (&a < &b);
false
// Order of evaluation of function arguments:
int foo() { cout << "foo"; return 0; }
int bar() { cout << "bar"; return 0; }

void ignore_arguments(int, int) { }

int main() {
    ignore_arguments(foo(), bar());
}
barfoo

Unspecified behavior examples

I suspect that byte order (little-endian, big-endian) is unspecified, since a program can’t detect byte order without an unspecified operation:

int word = 0x12345678;
short *usp = reinterpret_cast<short *>(&word);
cout << hex << *usp << '\n';
5678

Undefined behavior

With undefined behavior, all bets are off! Anything can happen. Consistency is not required. Warnings are not required.

// Uninitialized & out-of-range values:
long long a[25][6];
a[0][0] = 0;
for (int j=0; j<8; j++) {
    for (int i=0; i<30; i++)
        cout << a[i][j] << ' ';
    cout << '\n';
}
0 1 1 140730054572224 4196640 140629562534064 0 1 140730054572536 140629565240176 140629565240080 0 0 0 0 0 0 0 0 0 0 0 4196016 1 2 140730054572512 8037842368 0 0 0 
140629567557680 6295584 0 6294984 6295921 22 0 140730054572520 140730054572520 140629565240216 140629565240120 0 0 0 0 0 0 0 0 0 0 0 140629551142396 4196464 4196573 4294967321 4196262 0 140629565317194 4196078 
4294967295 1 140629550846232 1 4196016 140629562534038 140629560863696 140730054572536 0 140629565240176 140629565240216 0 0 0 0 0 0 0 0 0 0 0 140730054572224 7813586406938797358 140629565315840 4196496 0 3266557683118885826 0 140730054572504 
140629565337489 140730054572520 140629551142544 140629565287566 140629554744608 281470681751424 140629561628256 140629565287566 1000 140629565240216 0 0 0 0 0 0 0 0 0 0 0 0 1 4295032831 0 140629551050117 -3266889061645694014 3270758933826751426 0 140629567574272 
140629550855544 140730054572536 7813586406938797358 2 0 73728 140730054572536 0 256 140629565240080 140629565240176 0 0 0 0 2 0 0 0 0 0 0 140730054572224 140730054572240 4196496 140629562355040 4196032 140728898420736 0 1 
140629567574544 140629565363912 140730054572176 384 140629565287566 140629554744608 281470681751424 32 1000 140629565240120 140629565240216 0 0 0 0 0 0 0 0 0 0 140629554758952 6294984 4196486 4196032 140730054572520 140730054572512 0 0 140730054578886 
1 1 140730054572224 4196640 140629562534064 0 1 140730054572536 140629565240176 140629565240080 0 0 0 0 0 0 0 0 0 0 0 4196016 1 2 140730054572512 8037842368 0 0 0 0 
6295584 0 6294984 6295921 22 0 140730054572520 140730054572520 140629565240216 140629565240120 0 0 0 0 0 0 0 0 0 0 0 140629551142396 4196464 4196573 30064771096 4196262 0 140629565317194 4196078 140730054578894 

Undefined behavior examples

cout << "Hello, world!\n";
int *p = nullptr;
cout << *p << '\n'; // Kaboom!
SIGSEGV: Segmentation fault

Why don’t we see the desired output?

Buffering! Output does not go out immediately—that’s inefficient. Instead, the output accumulates, until endl, flush, or program end. Program dies; output is lost. ☹

Interactive output is line-buffered, but these slides send the output to a file, so it’s fully buffered.

Undefined behavior examples

// Shifting too far:
int amount=35;
cout << (1<<amount);
8

The standard states that you can’t shift more than the word size, and can’t shift a negative amount. Why?

Since shifting is such a common operations, most CPUs have a shift instruction. For 32-bit values, the shift amount is typically held in a five-bit field in the instruction (25 = 32). Alas, 35 cannot be represented in five bits.

Undefined behavior examples

// Multiple writes to the same location
// in a single expression:
int a = 0;
cout << ++a + ++a << '\n';
c.cc:4: warning: operation on 'a' may be undefined
4
int b;
cout << b << '\n';
c.cc:2: warning: 'b' is used uninitialized in this function
0

g++ notices some undefined behavior, not all. This is a QOI (Quality Of Implementation) aspect, but not a standards-conformance issue.

But, why??

C++ is quite concerned about efficiency.

C++’s attitude is “You break the rules, you pay the price.” It doesn’t hold your hand.

Things Be Changin’

This is undefined behavior in C++14:

int i=5;
i = i++;
cout << i;
c.cc:2: warning: operation on 'i' may be undefined
5

C++17, regarding assignment, says “The right operand is sequenced before the left operand”, so ++ finishes before =, and this output is guaranteed to be 6:

// c++17
int i=5;
i = i++;
cout << i;
c.cc:3: warning: operation on 'i' may be undefined
5

Looks like the compiler hasn’t caught up to the standard.

Not just theoretical

Information from the Unisys C Compiler Programming Reference Manual:

TypeBitssizeofSigned RangeUnsigned Max
char91−255 to 255511
short182−217+1 to 217−1218−1
int364−235+1 to 235−1236−2
long364−235+1 to 235−1236−2
long long728−271+1 to 271−1