CS253

CS253: Software Development with C++                

Spring 2017                

HW 6                

CS253 HW6: Split!                

Description                

Upon reflection, you have decided that the class U, from HW5, is trying to do too many things. It reads data files, decodes UTF-8, parses properties files, counts properties, etc. That’s too much!                 

For this assignment, you will split the functionality of the HW5 U class into two classes, U and P. The U class will handle UTF-8 translations, and the P class will handle properties.                 

More precisely, the U class will accumulate a series of UTF-8 characters (via ctor, .readfile(), and .append(), and return them as either UTF-8 strings (via .get()) or as integer Unicode code points (via .codepoint()).                 

On the other hand, the P class will read a properties file (via ctor or .readfile()), accumulate property counts (via .count()) and return the count for a given property (via .count()).                 

Most methods of the HW5 U class will have counterparts in the new classes, but some are different.                 

Methods of U                

Your U class must have the following public methods:                 

Default constructor
Accumulated string is initially empty.
Copy constructor
Takes another U, and copies the information.
Filename constructor
Takes a single std::string arguments: a data filename. The file is read as if readfile were called. Throws a std::string error message upon error.
Assignment operator
Takes another U, and copies the information.
Destructor
Destroys.
readfile
Takes a std::string filename, and reads the UTF-8 characters from it. Throws a std::string error message upon error. If called again, the data accumulates.
append
Takes a std::string (not a filename), and treats the UTF-8 characters in it as if they were the result of calling readfile. Throws a std::string error message upon error.
get
Takes no arguments, and returns all the accumulated UTF-8 characters as one big std::string.
get
Takes an int index, and returns a std::string containing the UTF-8 character at that point. The first UTF-8 character is at index zero. Throws a std::string error for an invalid index.
get
Like get, but takes two int values: starting and ending positions. They represent a half-open interval: get(7,9) returns a std::string containing two UTF-8 characters, the one at 7 and the one at 8. Throws a std::string error for an invalid argument.
codepoint
Takes an int index, and returns a int value representing the Unicode codepoint character at that point. The first UTF-8 character is at index zero. Throws a std::string error for an invalid index.
It’s like the one-argument .get(), except that it returns an int.
size
Returns total number of UTF-8 characters read, as an int.
empty
Returns true iff the object is empty (no characters).
clear
Removes all data from the object. It should become identical to an object created with a default ctor.

You may define other methods or data, public or private, as you see fit.                 

Methods of P                

Your P class must have the following public methods:                 

Default constructor
No properties so far.
Copy constructor
Takes another P, and copies the information.
Filename constructor
Takes a single std::string arguments: a property filename. The file is read as if readfile were called.
Assignment operator
Takes another P, and copies the information.
Destructor
Destroys.
readfile
Takes a std::string property filename, and reads property information. Throws a std::string error message upon error.
props
Takes no arguments; returns a std::set<std::string> of the names of all the possible properties, as read from the properties file.
count
Takes a int codepoint, and counts it. It is not an error to count a codepoint which has no corresponding property.
count
Takes a std::string which is a property name (e.g., Lu) and returns the number of times that characters with that property have been encountered via the other .count().
size
Returns number of unique property names encountered from readfile.
empty
Returns true iff the object is empty (no property names).
clear
Removes all data from the object. It should become identical to an object created with a default ctor.

You may define other methods or data, public or private, as you see fit.                 

Constiness                

Const-correctness, for both classes, both arguments & methods, is your job. For example, it must be possible to call .size() on a const object, or to pass a const string to .readfile().                 

Other Classes                

Sample Run                

Here is an example of how we may test your code. “%” is my shell prompt.                 

    % tar -xf hw6.tar
    % cp -f some-other-place/main.cc .
    % cat main.cc
    #include "U.h"
    #include "P.h"
    #include <iostream>
    using namespace std;
    const string pub="/s/bach/a/class/cs253/pub/";   // ~ only works in shells
    int main() {
        try {
            U u;
            u.append("a³+b³≠c³");
            P p(pub+"UnicodeData.txt");
            for (int i=0; i<u.size(); i++)
        	p.count(u.codepoint(i));
            cout << "Should be 8: " << u.size() << '\n'
        	 << "Should be 2: " << p.count("Sm") << '\n'
        	 << "Should be b³: " << u.get(3,5) << '\n';
            try {
        	u.readfile("/bogus");
            }
            catch (const string &msg) {
        	cout << "I expected this: " << msg << '\n';
            }
            return 0;
        }
        catch (const string &msg) {
            cout << "Unexpected error: " << msg << '\n';
        }
    }
    % g++ -Wall *.cc
    % ./a.out
    Should be 8: 8
    Should be 2: 2
    Should be b³: b³
    I expected this: U::readfile: can’t open /bogus
    %

Hints                

Don’t just modify your current U.h and U.cc files. You might need them, someday. Instead, create HW5 and HW6 directories and put the different versions there. mkdir is your friend. Disk space is cheap.                 

Requirements                

  1. Do not turn in a main() function. If you do, you will lose one full point.
  2. No requirements are inherited from previous assignments, except as explicitly mentioned.
  3. The very first three lines of all four source files must be comments, containing, in this order, one per line:
    • your name
    • the date
    • the purpose of this program
  4. Your code must not exit(), or emit any output.
  5. If it can be opened, you may assume that the properties file is properly formatted, as described in HW4. It may be out of order, or have holes.
  6. It is not defined what happens if P::readfile is called twice on the same object.
  7. The behavior of P::count(int) is not defined before P::readfile is called.
  8. The behavior of P::count(int) is not defined if P::readfile has been called twice. That’s because calling P::readfile twice invokes undefined behavior. At that point, all bets are off!
  9. For count, return zero if that property is not defined, or no characters with that property have been encountered. For example, .count("<*>") will return zero, since that’s a very poor property.
  10. Do not turn in a main() function. If you do, you will lose one full point.
  11. You may not assume that the input files contain valid UTF-8 encoded characters.
  12. You may assume that the Unicode characters described by the properties file are in the range U+0000–U+10FFFF.
  13. You may not use C-style <stdio.h> or <cstdio> facilities, such as scanf, fopen, and getchar.
    • Instead, use use C++ facilities such as ifstream.
  14. You may not use C-style dynamic memory via malloc, calloc, realloc, free, and the like.
    • Use new/delete/new[]/delete[], if you must do your own memory allocation.
    • Containers are easier!
  15. No global variables.
  16. You may not leak memory.
  17. Do not turn in a main() function. If you do, you will lose one full point.
  18. Do it yourself:
    • You may not use u16_string(cpp), u32_string(cpp), wstring, wcout, wchar_t, etc.
    • You may not use any of the C++ wide-character facilities.
    • You may not use an external program via popen, system, or the like.
    • You have to do any UTF-8 character decoding/encoding yourself.
  19. For readability, don’t use ASCII int constants (65) instead of char constants ('A') for printable characters.
  20. Several instances of these classes must be able to co-exist, without interfering with each other.
  21. Your header files must not pollute the namespace.
  22. Strings thrown as errors must contain reasonable error messages, describing the error. "Something went wrong" is not good enough.
  23. We will compile your code as shown above.
    • If that generates warnings, you will lose a point.
    • If that generates errors, you will lose all points.
  24. Do not turn in a main() function. If you do, you will lose one full point.

Ask                

If you have any questions about the requirements, ask. In the real world, your programming tasks will almost always be vague and incompletely specified. Same here.                 

How to submit your homework:                

You will turn in at least four files:

That’s a lot of files, so construct a tar file, like this:

    tar -cvf hw6.tar U.cc U.h P.cc P.h other-files

and turn it in.                 

Use web checkin, or Linux checkin:                 

    ~cs253/bin/checkin HW6 hw6.tar

How to receive negative points:                

Turn in someone else’s work.                 

Modified: 2017-04-05T16:08                 

User: Guest                 

Check: HTML CSS
Edit History Source
Apply to CSU | Contact CSU | Disclaimer | Equal Opportunity
Colorado State University, Fort Collins, CO 80523 USA
© 2015 Colorado State University
CS Building