Database Discredited
Author(s):
Waldo Roth
Introduction
As an industrial engineering consultant hired by a credit
bureau ("RWT"), you have been asked to analyze problems that
have occurred with their 20 million record credit file. RWT
management became concerned when the following situation came
to their attention. 1, 2
A couple moving to a retirement community has an eye on
their "dream home." Because they have a good credit history,
they assume that they will have no trouble obtaining a mortgage
to purchase this home through a local bank in their new
community. A routine credit check through RWT reveals that in
fact they are a bad credit risk. When a representative from the
local bank pursues the case, she discovers that the couple has
been mis-identified in the RWT database, which has confused
them with another party having a very bad credit history. In
making amends, the local bank approves the loan, but by now the
home has already been sold to someone else. The couple is
heartbroken and worse yet, they continue to experience credit
problems for some time.
Back to Top
Numerical and/or Design Problem(s)
- RWT has called you in as a consultant to make
recommendations. Where do you begin?
- What design flaws in the database have allowed this
problem to occur?
- Management at RWT claims they have only 1 error per
100,000 records in their databank. How would you develop an
experimental design test this claim?
- After determining whether RWT is correct in their claim
that they have only one error per 100,000 records, determine
how many bad records they have in their major file at the
present time. What implications does this have, if any?
- At what cost per record do you decide they need to rework
the database? What other data or assumptions do you need to
make before recommending a solution?
Assume you need
- a cost per record to update,
- the number of errors per year (which can be estimated), and
- the cost per error found by users.
- You also need to know the rate of updates or new
records per month or year. Assume there are 300,000
accesses or updates per month to the database.
- Also assume it costs $10/record to clean up the
database plus $50,000 in fixed costs. Finally, assume the
cost for insurance, lawyers. etc. for each bad record found
is $100,000.
- Compare the two costs, draw conclusions, and recommend a
course of action in a one page memo to RWT management. Or
write a dialogue for a discussion of the matter with
RWT.
- Estimate the time required to clean up the database. Can
you design a solution that would not take the database
off-line for that amount of time?
Back to Top
Questions on Ethics and Professionalism
- What are the relevant facts?
- List the "stockholders" -- those with something to lose
(or win) in this case.
- Who benefits here? Who is harmed here? (There may be
multiple answers.)
- What ethical principles from your textbook apply to this
case?
- How could you prevent this situation from happening
again?
Back to Top
Solutions to the Numerical Problems
- You would first want to find out how often such errors
occur and what the source of the typical error is in the
system (e.g. data entry, updates, software, indices,
etc.)
- This assumes there is a design flaw. There may not be
such a flaw at all.
- First, verify how many records they actually have. Then
you would take approximately 50 random records from the
20,000,000 records they claim to have. Then, every 20 mea/X =
50 records should be chosen. Hence, selecting "every
400,000th" record (X = 400K) in some random fashion would
yield 50 records. Later, if a pattern emerges as to what type
of records are in error, a subset of those could be randomly
tested.
- This would mean they have 200 bad records. That is really
rather impressive if it turns out to be correct. Also, the
type of error would be significant. A small address error may
be trivial as compared to a pointer to the wrong person's
record. If all 200 errors are bad pointers, RWT needs a major
software overhaul.
- If they have 300,000 transactions per month and that is
where the errors occur, then there will be 3 x 12 or 36
errors per year entered into the system. However, the far
tougher problem will be in finding these errors!
- Using the assumptions made earlier, one may conclude that
it will take about 60 years to check and verify the entire
static file and to justify the cost of clean-up.
If one assumes $1 million in cost per loss, it will still
take 6 years to break-even. One can ignore the $50,000 fixed
cost as irrelevant in the calculation.
- A software solution might be implemented by taking the
system down over a weekend. Otherwise, the snail's pace of
the cleanup and the use of any other pseudo-manual system may
immobilize employees.
Back to Top
Possible answers to ethical questions:
- Responsibility for this data rests with the credit bureau (RWT).
- Everyone listed in the problem and the public as well.
- Probably not. These mistakes will occur in any file system.
- No one benefits.
- GIGO means errors must be dealt with and resolved in an ethical
manner. Good software engineering design from the outset SHOULD
solve the pointer problems for errors in the data. Ongoing
spot checking is probably the only way to GIGO in the file, plus
consistent cleanup of errors as they are found.
- One probably can't!
Appendices, Attachments, and Bibliography
Back to Top
These problems were originally developed as part of an
NSF-funded project to create numerical problems that raise
ethical issues for use in engineering and other course
assignments. The problems presented here have been edited
slightly for clarity.
Cite this page:
Waldo Roth
"Database Discredited"
Online Ethics Center for Engineering
2/16/2006 8:43:35 AM
National Academy of Engineering
Accessed: Tuesday, January 06, 2009
<www.onlineethics.org/CMS/profpractice/ppcases/numericalprob/Database.aspx>