CS457 - Fall 2008
Project 3: TeleMed - Sharing Hospital Records Using a P2P Network
Assigned: Nov 3, 2008
Due: Mon, Nov 24 2008, Dec 1, midnight
Version 1.12
11/30/08: fixed pong behavior. However, we will also accept the early version as correct. Do NOT change what you have implemented.
11/19/08: fixed wording in Part II that duplicated ping messages
11/14/08: added some clarifications and slides


Grading policy


Introduction

Healthcare providers and payers have long realized the benefits computer-based patient records could provide. Virtual patient records provide a user with virtual access to data possibly scattered around the world. On August 21, 1996, the U.S. enacted into law a process that will govern the adoption of national standards for health-related electronic commerce.

One of the most basic standardization issues for international health care transactions involves how to handle the many different person identifiers existing in disparate medical record systems. Even within a single hospital, different departments may have separate electronic records for the same person, each with a different identification number for that person. Historically, health care providers dealt with this issue by creating a Master Patient Index (MPI) that used a limited set of demographic data (for example, name, gender, date of birth, and so on) to help retrieve the disparate elements of a patient's records. Originally, MPIs were manual and were primarily used to link patient episodes for continuous care. Maintaining the continuity of patient care is still the most important role of automated MPIs. However, existing MPI software applications typically rely on proprietary solutions, involving a centralized approach that uses a new indexing number to cross-reference multiple disparate records, rather than taking a distributed nonproprietary approach to solving the problem.

Working in collaboration with physicians at the National Jewish Medical and Research Center (NJC) in Denver, CO, TeleMed was developed, that is, the early prototype for a virtual patient record. TeleMed uses a media-rich patient record to allow multiple physicians, possibly located remotely across a wide-area network, to consult on a patient record. Consolations can take place interactively in realtime, or offline using textual or audio annotations combined with graphical markers in the record.

The original prototype demonstrated the feasibility of virtual patient records, but it did not reach the stage of full functionality with actual patient records on a day-to-day basis. Los Alamos National Laboratory plans to deploy a more functional version of TeleMed in some of the small clinics and hospitals of northern New Mexico. The Northern New Mexico Rural Telemedicine Project (NNMRTP), led by the Northern New Mexico Community College, aims to develop the information infrastructure required to support telemedicine services in nine countries. This project will provide an important testbed for a collaboratory based on the TeleMed prototype.

TeleMed dynamically assembles a chronologically oriented graphical patient record using data gathered from several different remote locations. It assigns icons and other graphical features as place holders in the chronological record to mark the onset and duration of events it finds recorded in the data. The assigned graphic icons then become user interface components for selections about which information to retrieve for more detailed viewing. Analogous to hyperlinks on a Web page, these icons provide access to information on demand. Object Request Brokers (ORBs) then mediate the interactive access linking the icons in the graphical patient record to the distributed databases that provide the persistent object storage for the multimedia data. The physicians using TeleMed interact with a seamless longitudinal record that gives no indication that multiple databases were used. The latest version of TeleMed supports real-time interactive collaborations between multiple users. Multiple physicians at remote locations can simultaneously view, edit, and annotate the patient data. Furthermore, each physician can see the data another physician has entered as well as monitor some of the other physician's interactions with various user interface windows. This allows physicians to engage in collaboratory electronic discussions so that referrals and consultations can occur in a natural manner. This is the concept which underlies the design of TeleMed.

The above information was taken from http://www.corba.org/.

For your final project this semester, you will be building a simulation of the TeleMed system. More specifically, you will be implementing this simulation in a Gnutella (pdf)-like fashion. You will need to read the information on the Gnutella protocol before beginning this project so you can understand the packet structure that will be used. You WILL NOT implement the entire functionality of Gnutella. More specificically, you will implement the following messages only: Ping, Pong, Query, and QueryHit.

Part I - Building the Network

For Part I, you will be building up the network structure for this simulation. You will have one manager process that will act as your introduction service. The manager will start up when the program starts and will read the manager.init file, from which it will spawn some number of hospital nodes. Each hospital node will then read its initialization information over a TCP connection with the manager. Finally, each hospital will request its neighbor list and patient records from the manager.

An example manager.init file follows. Each line that starts with a # is a comment and needs to be ignored and skipped. There can be a variable number of comments anywhere in the file. Note that any amount of whitespace (spaces or tabs) can separate words or numbers. You should NOT assume a particular format for the city, state, country information other than it will be an ascii string of at most 32 characters.


# begin manager.init
# total number of nodes
5
# timeout value in seconds
4
# node and drop rate for each node
# drop rate will be an integer, 0..99%
# this section ends with 0 0
1 5
# For example, node 1 above has a given drop rate of 5%

#Here are the remaining nodes 2..5 and their drop probablilities
2 22
3 1
4 3
5 70
0 0

# neighbors - for use by the manager, ends with 0 0
1 2 3 4
2 1 3 5
3 1 2 4 5
4 1 3
5 2 3
0 0

# initialization information for each hospital, not in any particular order.
# hospital number
1
# initialization information
#Location of hospital (City Code (3 chars), State Code, Country Code):
NYC NY US
# Patient ID number (8 digits) that this hospital has possession of, followed by
# their last name, first name, gender, and DOB (represented as MMDDYYYY)
12345678 Jones John M 02101950
23414577 Smith Susan F 12301987
19202900 Larson Jane F 06071966
12843567 Lewis Richard M 10311934
# A line of all zeros indicates the end of the patient records for that hospital
00000000

# Another hospital
2
DAL TX US
34184532 Gibson Mel M 07211965
83425729 Roberts Julia F 03181966
00000000
3
WAS DC US
12983008 Bush Laura F 01011958
43978241 Bush George M 05031955
11335578 Cheney Dick M 09091950
94455387 Powell Colin M 02021945
32785900 Rice Condeleeza F 04061960
30046812 Clinton Hillary F 12181949
00000000
4
LON WM UK
22901113 Blair Tony M 12311940
60040302 Winters Emma F 05201990
77331230 Thomas William M 04301955
00000000
5
ABQ NM US
42981034 Sanchez Juan M 02021930
98745345 Fitzgerald Rose F 02291997
11554389 Terrell Mark M 03142000
00000000
# end manager.init


Each hospital will need to know which patient records it possesses.

To build up the network, you will start with the manager process. The manager process is responsible for forking the hospital processes. Additionally, the manager should bind and listen to a TCP port that it will use for communication with the hospitals (a separate TCP connection for each hospital). You can assume that there will be no errors in the init file. We may post test files to help you, but your project will be graded on test files that are unknown to you.

After the manager program creates a TCP port to communicate with the hospitals it should read the initialization file called manager.init. As we saw above, the first line will indicate the number of hospitals to fork, and the lines after that will be the neighbor lists. It should fork one process for each of the hospitals. The hospital should then create a UDP port for data transfer between hospitals and then open up TCP connection with the manager, convey the UDP port information, read its neighbor list from the manager, and then it should receive all of its patient information. The manager may assign hospital numbers in the sequence connection requests come in from the newly created hospitals. Each hospital should read this configuration information over TCP from the master in a format that you can specify (but document this in your README file). Finally, each hospital should create a UDP port to use for the data exchanges with other hospitals.

(New) Neighbor Discovery Phase: As hospitals come alive they connect back to the manager at the TCP listening port and receive their ID numbers. In addition, hospitals 2..N receive the UDP port number of the previous hospital. In other words, hospital 1 receives no UDP ports; hospital 2 receives hospital 1's UDP port; hospital 3 receives hospital 2's port; and so on. Once the manager speaks to all hospitals, the manager signals "begin discovery phase" to all hospitals. At this point all hospitals send (and receive) Ping messages to (from) their neighbor. Received Pings are forwarded to the neighbor. If a Ping is received from a neighbor designated by the manager (recall that neighbor information is conveyed by the manager to each hospital), then a Pong is sent to that neighbor to establish a link. For example, if hospital 1 was told by the manager that hospital 3 is a neighbor, when hospital 1 receives a Ping from hospital 1 (which may arrive after being forwarded from hospital 2), then hospital 3 sends a Pong to hospital 1. When hospital 1 receives the Pong, it verifies that hospital 3 is supposed to be its neighbor (according to information hospital 1 received from the manager) and establishes a link to hospital 3. Note that establishing a link means storing hospital 3's UDP port number in the neighbor list. Here are some slides that try to capture the link establishment.

NOTE: the early version of the slides was incosistent with Gnutella, because it had pong messages going back directly to the ping originator. This wrong, pong messages should follow the reverse path to the ping originator. However, we will accept both solutions as correct for this project.

In summary, the manager will open a TCP port to listen to (make that the same port or a global variable that all forked processes have access to), fork the hospitals, which in turn will create a UDP port, connect back to the manager via its listening port, relay the UDP port information, and retrieve their information from the manager.

Output for this stage: At the end of Part I, the manager should create a file called man-P1.out that has one line per hospital that it spawned. Also, each hospital should create a file called hospX-P1.out that has the initialization information that it received from the manager. Each line of the file will correspond to a hospital and will have the hospital location (City State Country), the hospital number (node number), and the UDP port for the hospital. Additionally, each hospital will print out its list of patients that it received from the manager in this file.

Part II: Locating and Transferring the Data

Each hospital has patient records for a list of patients. For this part of the project, we will assume that each patient's record is only at one hospital.

Once the network has inititialized in Part I, each hospital will read its record requirements out of the file hospX.req (where X is the node number of the hospital). These are the patient records that it needs to find on the network using the Gnutella protocol. Following is a sample hospX.req file:


12983008
77331230
12111111
00000000


The file will end with '00000000'. The eight digit numbers before that file will be the patients that you need to find on the network. The hospital that has the information will send you back their ID number, last name, first name, gender, and DOB (in a format that you can specify). If the information is not found on the network, enter a not found record as shown below. You will write this information to a hostX-P2.out file as follows:


WAS DC US 12983008 Bush Laura F 01011958
LON WM UK 77331230 Thomas William M 04301955
**NOT FOUND** 12111111


To find a record on the network, the hospital will follow the Gnutella protocol by first asking its neighbors, and then having each neighbor ask its neighbors, etc. until the record is found and then returned to the requester or until the TTL expires on the request. Your implementation will differ from that of standard Gnutella in that none of your transmissions will be done over HTTP. All of your transmission should take place over UDP.

Use the aforementioned Gnutella specification to build up your requests to send to your neighbors. Send your Query messages with a TTL = 7 to ensure they reach the entire network. If a neighbor has a Hit, it will respond to you with a Hit packet that has the port number of the node with the requested information. You can then download the information through a direct connection to the node that has the information. If your neighbor doesn't have the information, then it will forward the packet to all of its neighbors (except for the one from which it came) after it decrements the TTL.

Logging Network Traffic: Each node will need to keep a log of the traffic that it sees. For each hospital, keep a file called hospX-P2.log that records every event that it sees. This record should include every event that happens at the node. The log needs to be specific, but concise. For each packet that the node sees, make an entry in the hospX-P2.log file in the following format:

<MessageID:FunctionID:TTL-Remaining:Hops-Already-Taken:DataLength>

At the end of the log file will be a count of how many packets of each type that the node saw in the form:

Ping packets received:     XXX
Ping packets sent:            XXX
Pong packets received:    XXX
Pong packets sent:           XXX
Push packets received:    XXX
Push packets sent:           XXX
Query packets received:  XXX
Query packets sent:         XXX
Hit packets received:       XXX
Hit packets sent:              XXX
Total packets processed: XXX

Where XXX represents the total number of packets for that count.

Output for this stage: You will need to create an hospX-P2.out for each hospital after the data has been located that includes all of the patient information. Also, you will need to have a log file for each node.

Part III - Adding Reliability (Extra credit see grading policy)

In a scenario such as health care where patients lives are on the line, it is imperative that medical records get transferred quickly and accurately from the remote site to the requesting site. Therefore, in Part III we will be adding reliability to the system. You will need to ensure that the medical records get to the requester by using reliable flooding that adds ACKS and timeouts. Also, you will be simulating real networks by having nodes drop a percentage of their traffic.

Since all transfers in Gnutella are done over HTTP and we are using UDP transfers in this project, it is up to you to determine how the packets will be modified to have an ACK and sequence number. Do you have to add a sequence number, or is there a field of information in the packet already that can act as the sequence number? (Be sure to document this in your README). The receiver must ACK (via UDP) the message and sequence number when it gets it. The sender must set a timer and resend the packet if it was not ACKed. You will get the length of the timeout from the manager.init file. For the README file, think about how differences in this number (higher or lower) will affect the amount of traffic that we see on your network.

You should figure out a way to determine when the exchanges have finished. This can be as simple as waiting for several seconds, but you must document your method in the README file.

The drop rate is given to you in the manager.init file for each node. Discards should happen randomly at the node on a per-packet basis, not deterministically. For example, with a 5% discard rate, each packet must suffer a 5% chance of being dropped. You cannot implement this with a counter that drops every 20th packet. Dropping should be implemented on the receive side (i.e. on receipt, randomly drop or receive the packet). The packet drops happen only for the UDP traffic (data requests and transfer), not for the configuration manager TCP traffic.

You will know you are in Part III if ANY of the nodes in the manager.init file has a non-zero drop rate. If all of the drop rates are zero then you need to produce the output files for Part II. Else, produce the output files for Part III. These output files are exclusive.

Logging Network Traffic: Each node will need to keep a log of the traffic that it sees. For each hospital, keep a file called hospX-P3.log that records every event that it sees. This record should include every event that happens at the node. The log needs to be specific, but concise. For each packet that the node sees, make an entry in the hospX.log file in the following format:

MessageID:FunctionID:TTL-Remaining:Hops-Already-Taken:DataLength

At the end of the log file will be a count of how many packets of each type that the node saw in the form:

Ping packets received: XXX
Ping packets sent: XXX
Ping packets dropped: XXX
Pong packets received: XXX
Pong packets sent: XXX
Pong packets dropped: XXX
Push packets received: XXX
Push packets sent: XXX
Push packets dropped: XXX
Query packets received: XXX
Query packets sent: XXX
Query packets dropped: XXX
Hit packets received: XXX
Hit packets sent: XXX
Hit packets dropped: XXX
Total packets dropped: XXX
Total packets processed: XXX
Given drop rate of packets: XXX
Actual drop rate of packets: XXX

Where XXX represents the total number of packets for that count. (Given drop rate is that given to you by the manager, whereas the Actual drop rate is the percentage of packets that were actually dropped on the network)

Output for this stage: You will need to create an hospX-P3.out for each hospital after the data has been located that includes all of the patient information. Also, you will need to have a log file for each node named hospX-P3.log.


Information for your README file:

  1. Be sure to include your name, ID number, and email address.
  2. Discuss your implementation. What assumptions (if any) did you have to make? How does your implementation differ from the requirements?
  3. List each function in your code and give a 2 sentence description of what it does.
  4. If you implement Part 3, how did the message counts compare with and without the loss rates? Was that what you expected?
  5. Are there any idiosyncrasies of your project? You should list under what conditions your project fails, if any. What input limitations does it have?
  6. How did you determine that the computation had terminated? What was good or bad about this mechanism?
  7. Discuss your programming experience on this project. Which parts of the project did you find most difficult? Why? Did you find any surprises?
  8. Any additional information from your implementation that you need to include.


Important Notes for this Project:

  1. Do not attempt to copy or use portions of anyone else's code. We have a very sophisticated software package that detects plagiarism from anyone in the class or any other Gnutella software packages that are available. Cheating will result in an automatic dismissal from the class with a failing grade. Don't risk It!!!
  2. You may not use any "borrowed" code from libraries. You can use STL, glib and libc. Any other code usage must first be cleared with the TA or prof.
  3. You must include a make file with your implementation. The grader will type 'make' and will look at the code and output generated. If 'make' doesn't work, you will receive a failing grade.
  4. We will only grade code that compiles cleanly. We will not do any debugging of your code to get it to work once you submit it.
  5. Errors such as obviously inefficient implementations or memory leaks will be penalized even if the program runs to completion.
  6. No extensions will be given to the deadline, so start programming NOW!