THE DANGEROUS DOSSIER: RISKS OF INDISCRIMINATE DATA ACCUMULATION

by JAMES LEE PETERSON

copyright 1991, all rights reserved

It is not news that for many years information has been collected about each citizen by various government and business entities. Keeping records on the citizenry, of course, dates back thousands of years. Records pertaining to births, deaths and taxes are routinely uncovered in ancient archaeological sites. But something remarkable has changed quite recently about the nature of record keeping since those early times, something deserving sober attention: the rise of computer-based data gathering and correlation.

Until recently, most records were kept in paper files in large cabinets consuming great amounts of space, and proportionally great amounts of effort were required to maintain, store, reproduce and disseminate them. A number of practical limitations related to storage space and the time required to search for information then existed. Because of this, retaining any particular piece of information about anyone demanded a very good reason, and an especially useful purpose. It also took an almost Herculean effort to track information concerning any individual across different record keeping systems. It might take a team of investigators weeks to collect and correlate law enforcement, employment, telephone, and medical information concerning a given person. Since it was such an involved and labor-intensive process to bring all this information together, doing so, again, required very special motivation. In those days it was not difficult for an individual to keep his own affairs private from large organizations because Big Brother was not yet watching. He did not yet have the technical means to do so. But those days are gone.

Over the last 25 years the situation has changed dramatically. Today's computer technology has created the potential for seriously damaging, limiting, and redefining individual freedom. Practical difficulties previously associated with data collection have all but disappeared. Records that were once unwieldy, expensive, labor intensive, slow, and incompatible have become compact, cheap, easy, fast, and coordinated. The very practical and technical problems that once protected us from unwarranted intrusions into our private lives have recently, and increasingly, disappeared.

Information of all kinds is now being assembled about every citizen, and, as in the past, some is collected for specific purposes, but now a large part of it is collected simply because it is technically feasible to do it. If information is easy and cheap to collect, why not collect as much as possible? After all, who knows what "good" use it might later be put to? And since the growth of this technology is unplanned and largely unregulated, no one is in any position to ask whether or not such indiscriminate data collection is desirable or harmful to individuals or to society as a whole.

Look at the kind of information about you sitting right now in computers somewhere: FBI and police files, credit reports, medical records, bank files including all your credit card transactions, video rental records, marriage records, property records, telephone records, internal revenue records, insurance company records, personnel records, motor vehicle records, mail-order and publication subscription records, and in some cases even individual grocery and department store purchase records. Somewhere, someone has collected information about nearly every facet of each of our lives. Some of this is the kind of information that used to be kept in paper folders in large metal filing cabinets, but much of it would have been previously considered trivial or for other reasons not worth keeping. And there are other crucial differences from the days of physical record-keeping: today's records are especially easy to access; and they are also easily combined with other similar kinds of computer records. Since much of this information is indexed by a supposedly unique number, assigned to every citizen (the social security number), it is technically possible to put all this information into a single gargantuan database inside which the lives of every citizen would be an open book. Because it is possible, steps will be, and already are, taken to make it so. This unified data-base will not burst on to the scene, however, but will come into being slowly, insidiously, and in step-wise fashion. All it takes are reasons for combining together existing databases. Such reasons already exist, and their motivation is almost always economic. It is already the case that purveyors of credit reports endeavor to include as much information as possible about their "subjects;" and insurance companies have completely clear-cut economic incentives to monitor the "risky" behavior of their clients. Thus cross-referencing different data collections will be done in stages, with data accumulating in just such "clumps" as a perceived need arises. It will be done because it is possible to do it, but it will be done also because it is easy and cheap to do.

So what are the dangers of this inevitable large-scale data collection? As I see it, there are many dangers, but principally there are three: mis-collection, misinterpretation, and faulty distribution.

First, mis-collection. Since data comes from many different and not necessarily related sources, and for many reasons, there is no inherent discrimination among or order imposed on what is collected, and thus there is no quality control exercised over it. Data is gathered from sites and sources far and wide, gathered with attention to and concern for accuracy ranging from high precision to negligent carelessness, and from sources ranging from reliable to completely and wholly untrustworthy. In the end, the trustworthiness of any particular piece of information finds a level commensurate with the information it gets thrown in with. Mixing good information with bad creates a database which is in truth only as reliable as its least reliable data. Ironically, there is at the same time a tendency on the part of the people who collect it to present it as being as reliable as the most reliable piece of information in the collection. And because the information's points of origin are so wide-ranging and varied, and because so many steps and stages are involved in its compilation, no one in the collection process need take any special responsibility for the accuracy of any particular bit of data. When a lot of hands are involved in the collection process it is quite an easy matter for them to point fingers of responsibility elsewhere.

Note too that people tend to view any information, irrespective of its source, as somehow more credible by virtue of the simple fact that it has been recorded in the first place. The assumption is made (a carry over from the days when data collecting was hard) that if something was determined to be valuable enough to record, it must therefore be correct. Thus we tend to place greater trust in things written than in things said, and compare data stored inside computers with things written in books.

But this is, of course, exactly the wrong analogy. Putting data into computers is not an expensive and difficult procedure like book publishing, where care and thought goes into what is written in the first place, and printed in the second. Once data is entered, good or bad, it is trivial to keep it indefinitely, add to it, move it around, and make it widely available. In most cases it is easier to retain faulty data than to edit or process it. Many items show up in computer data-bases with even greater ease than the thoughtless off-hand spoken comment. But such is the nature of the information collection and laundering business. In the field of computer information gathering, a silk purse can easily be made of any sow's ear. In this arena accuracy takes a back seat to the act of collection itself.

Once collected, any data can be made more dubious by misinterpretation. This is the second danger. In a recent book, Ian Will summed up the situation by pointing out that in the information collection business

...each snippet of information is an unspecified accusation intended at some future time to support a specified allegation against the person it relates to. (p.131)

To be sure, law enforcement agencies have great enthusiasm for such undigested data. This is a major reason why such accumulations of indiscriminate "information" are so dangerous. It has the force of reversing the very notion of the presumption of innocence. And it does this by making people "look bad" on the basis of information that is taken out of context or is incomplete. Such incriminating "evidence" is just what every legal investigation thrives on.

Data records are dangerous often not because of the information contained in them, but because of what is not contained in them. That is, they are liable to misinterpretation. Perfectly innocent information can paint a totally misleading picture merely by juxtaposition. This is how a series of minor truths can be converted into great lies. The jurist Learned Hand warned of the dangers of a time when "denunciation without specification or backing takes the place of evidence" (cf. Kafka's The Trial). That time is upon us if it has not already arrived. It is only natural for people to occasionally fill in the blanks when confronted with incomplete information, and even to place "some sinister significance on the most innocuous human behavior." (Will, 1990, p.137).

Unprocessed official material is a dagger pointed at the privacy and freedom of every individual. If I were photographed in Madison, Wisconsin in 1970 standing in the back of a crowd of anti-war protesters attending a rally at which a suspect in the 1969 bombing of Sterling Hall also attended, what would be false in a statement included in my FBI file that I was "seen with Armstrong," the suspected campus bomber, on such and such a date in the Library Courtyard? Nothing would be technically false about such a data entry, but it would be inaccurate since not only did I not know Armstrong was at the same rally, I did not know him at all, much less was I at that location for the purpose of meeting him. Yet an accumulation of such misleading entries can leave an impression vastly different from the truth. And the reason such inaccurate impressions are so easily conveyed is that we always assume care was taken in the compilation of data in the first place.

Now that we've sloppily managed to collect false information, and true but incomplete and misleading information in our database, let us now make this data even more dangerous, by spreading it about haphazardly. This leads us to the third danger: faulty or indiscriminate dissemination.

Today, databases have almost no controls over who may access them. Information about your credit history, your bank records, your DMV records and police records are accessible today by nearly anyone with the right "contacts." Leaks of data to the unauthorized are routine. Let's consider the credit reporting agencies, who, by law (Fair Credit Reporting Act of 1971), are to restrict access to credit reports to only those with a "legitimate business purpose." Since the law fails to define "legitimate" in this context, it could mean anything. Subsequently, anyone could claim such a purpose, and they often do.

All this makes possible a transformation of American society, such that half truths lay in wait to be used against anyone the state wishes to pressure or silence. But most citizens, aware of the dangers of government data collection and its misuse through the examples of actual totalitarian, communist, and fascist states, would be hesitant to give the government the authority to create, much less use, a central database on the citizenry. Aside from a number of relatively unknown law enforcement and national security databases already in existence, the government is legally prevented from most forms of overt mass collection of personal data. And what forms do exist, such as census data, and tax records, have legal safeguards on their distribution (though, even these safeguards are currently under attack by well- meaning legislators).

Not that such "safeguards" ought to lull us into complacency. Why should we give a power to private organizations (through inaction) we would never dream of giving to our government? Laws, such as the Privacy Act of 1974, curtail government intrusions on personal privacy, but place no restrictions upon what information private enterprise may require from you in order to do business with them or upon what information may be collected about you, or to whom such information may be sold or distributed. In many respects the potential harm that might be done with impunity by private enterprise under present laws can be far worse than anything the government can legally do to you.

The forces of capitalism makes it possible that the interests of some large and powerful business entity, such as an insurance company, might be served by keeping track of people's activities, in order to avoid taking financial risks. Of course, in the process, any particular individual's privacy might be harmed by such activities, and by inaccurate information kept about them, but from the perspective of the goals and interests of the business, violations of individual rights from time to time are acceptable risks of doing business as long as corporate profits can be maintained. Once a large private database is established by one industry, there surely would be profit in selling access to one's data to other industries, such as employers, who would love to be able to weed out prospective employees who are risks either of health or character. Landlords and mortgage companies might also benefit by knowing what sort of people they are doing business with.

The danger is in part, of course, that false, inaccurate or incomplete information remains false, inaccurate or incomplete. It does not improve with age. On the contrary, it becomes rather more dangerous as its questionable source becomes harder and harder to determine. After a while, vintage misinformation may take on the appearance of divine truth -- its origins obscured by the mists of time.

But collection of any data, even completely accurate data, is capable of causing harm to individuals. The fact that people lose control over data about them once it is collected means that control over a portion of their personal autonomy has been lost. Once collected, there is no telling what use a piece of data might be put to. And that means that personal knowledge surreptitiously transferred to unknown parties may be interpreted and used in ways which restrict the choices of the individual about whom it applies or against whom it is used. Suppose, for instance, that a database is established that combines information about credit card purchases with marketing information collected at grocery stores about items purchased, purportedly to "better serve the customer" by offering coupons tailored to ones purchase patterns. Suppose further that one charges large quantities of liquor on behalf of an injured friend, or for a company Christmas party, and soon after buys large quantities of cheese and prime rib at the grocers as gifts. If a health insurance provider had access to this information, what would stop them from refusing you insurance, or raising your rates based upon such evidence of an unhealthy lifestyle? The very fact of their access to such information about a person diminishes that person's freedom and autonomy.

This is precisely the reason why the argument that one needs no privacy if one has "nothing to hide" is so pernicious. Once information is collected, one loses control over its uses and over any conclusions that might be drawn on its basis. It is therefore perfectly true that even saints have something to hide.

Many say that it is too late, that our privacy has already been lost, but that is simply not correct, the potential harm to privacy and individualism posed by this technology is not yet even partially realized. So what can we do to protect ourselves from invasions of privacy and loss of freedom due to data collection? We can be careful about what information we volunteer about ourselves. We can ask to be removed from mailing lists. We can pay cash whenever possible. The less information we give out about ourselves, the less chance that information can be used against us. Never give your social security number to anyone who has no demonstrable legal right to it (basically, the Social Security Administration and the IRS, or people who must report to the IRS are the only ones who actually need this number). Question strongly the need of anyone to know your address, phone number, bank account numbers, credit card numbers, or anything personal about you. Write your government representatives to pass laws restricting the collection and distribution of information by not only government but private institutions as well. Information about you is part of you -- it is your property, and you should ethically and legally be allowed some measure of control over it.

Computer technology is capable of being an instrument of control and authoritarianism compelling people to obey and conform, or an instrument in service to individual power and creativity. As the Firesign Theater troupe once opined, it is "a power so great it can be used for good or evil." In the end, it will most likely be used to serve both.