Heiko Holtkamp, Peter B. Ladkin
RVS-J-98-05
"Houston, we have a problem."
We describe the Year 2000 problem (the millennium bug, Y2K),
what can be done about it,
and what systems it is likely to affect. These pages are intended
for novices as well as experts, with short descriptions of the
computer-technical background linked where necessary.
These pages accompany the Bielefeld seminar
"Das
Y2K-Problem". We are reviewing and developing strategies for
handling the Year 2000 problem.
We shall be continuing to build these pages.
Contents
Back to Contents
The Year 2000 problem arises from representing years in dates with only
two digits: 27.04.98 instead of 27.04.1998. This leads to date ambiguity:
is 01.01.00 the 1st January 1900, or the 1st January 2000? People using
such date representations usually have no problem with this ambiguity,
since one usually
has enough background information to figure out which is meant.
However, it is a problem for Computer programs, chips and other
digital hardware that have not been specifically designed to handle
this ambiguity, because computer systems work rigidly - they can't
use `background information' or identify `ambiguity', unless
this is specifically built in to their design.
A more precise description of the problem is that it is one example
of a date representation discontinuity (DRD). The features of
a DRD are that
- Calculations are performed in which part of the date is implicit;
- Such calculations involve dates with two distinct `implicit' parts;
- This situation is incorrectly handled by the program or hardware,
or not handled at all.
A DRD is an example of a
data discontinuity problem,
itself in turn an example of
data ambiguity problem.
Such problems are a consequence of how data are represented by
computers. Those who don't know may consult our gentle introduction to
Data Representation in Computers.
Back to Contents
The Year 2000 problem is not the only DRD problem we need to worry about now.
Many computer systems (software and hardware) will have problems earlier.
Ambiguous End-of-File Markers
For example many financial and adminstrative software, especially databases,
use a date of 9.9.99 (or similar constructs such as 1.1.11, 99.99.99, 00.00.00)
as end-of-file markers. This is a particularly clear example
of data ambiguity being deliberately designed in,
when it could have been avoided.
Such systems could have problems handling dates of 9th
September 1999.
32-bit Unix and C
Many Unix systems (the 32-bit ones in particular) have a DRD on
19th January 2038 when the internal time representation reaches its
limit and `rolls over'.
UNIX calculates dates using the number of seconds since
1.1.1970 0:00:00 Coordinated Universal Time (Greenwich Mean Time).
This is stored as a 32-Bit signed integer, that will reach its
maximum possible value (31 1's) on 19.01.2038 and `roll over' (to
31 0's). This issus also concerns many C and C++ programs, which
use the same date representation.
GPS, the global satellite navigation system
Navigation is increasingly dependent on a satellite data system run by the
United States armed forces, known as the Global Positioning System (GPS).
A GPS system uses data from a number of satellites it can `see' in the
sky to determine precisely where it is. It is extensively used by
trucking companies to tell where their trucks are, by automobiles for
`moving map' displays showing the car's position, and of course by
aircraft to determine position. It is also increasingly used by
commercial aircraft as a primary means of navigation when approaching
to land at an airport in `instrument conditions' (when the airport cannot
be seen, or only sporadically, because of clouds, darkness or both).
Because ensuring clearance between the aircraft and earth-based obstacles
or mountains is essential `on approach', a GPS date discontinuity could
have safety-critical consequences.
GPS counts dates by counting weeks since the 5 January 1980, and stores
this count in a 10-bit unsigned integer. That entails a date discontinuity
between week 1024 and week 1025 since 5 Jan 1980, which will happen on
21 August 1999 (apparently at 23:59:47).
GPS equipment that does not specifically handle this
discontinuity may have problems at that time.
Back to Contents
Another problem with the year 2000 that is not a DRD problem occurs
because many programs won't recognise that the year 2000 is a leap year.
As the U.K. Health and Safety Executive says: Please pay no attention
to anyone who tells you otherwise. There is a February 29th, 2000.
Some programs have implemented mistaken rules for leap years.
A year is a leap year when
- the year is divisible by 4,
- the year is not divisible by 100,
- but when the year is divisible by 400, it's a leap year
(see, for example,
Claus
Tondering: Frequently asked questions about Calendars.)
This method of calculating leap years is standardised in the
European Standard EN 28601 (which is German standard DIN 5008).
Programs that implement shortsighted bugfixes (for example, shifting the
`time window', say by subtracting 30 from every date), without paying attention
to the boundaries of the window, will sooner or later have the same problem
over again.
Briefly, specific problems with the year 2000 are:
- two-place representation of years (a DAP)
- inappropriate calculations on date fields (a DRD problem)
- `overflow' of the date fields
- incorrect calculation of leap years
- lack of use of a standard date representation
Back to Contents
The last point in the problem list,
the lack of a standard for date representations,
is not necessarily a Year 2000 problem. However, adherence to
a standard would (have) solve(d) most of the Year 2000 problems.
There is such a standard. The International Standard ISO 8601
requires the representation "YYYY-MM-DD".
This is prescribed by European Norm EN 28601 and German Industry Norm
DIN 5008.
This standard has two DRDs: At year 0 AD and at year 10,000AD.
The standard does not prescribe what internal representation
this shall have ( signed integer, unsigned integer,
etc., but the date format itself is explicitly unsigned - AD is assumed).
The DRD at 0 could be a problem for archaeologists or
ancient-historians, whose databases may need to represent dates
BC (that is, as a signed integer).
We will leave it to the reader to determine if the DRD at 10,000AD
is problematic.
Back to Contents
There are more fine-grained representations of time than by days.
If one measures time in seconds, then it can be very hard to
determine in advance when DRD will occur. The reason is that
Universal Coordinated Time (UTC) adds leap seconds according
to need. Leap seconds are added by convention, and are dependent
upon physical processes (changes in the rotation of the earth, and
of the earth about the sun), that are imperfectly predictable
(indeed, imperfectly understood).
Thus it is not possible to determine today when leap
seconds will be needed in the future. For more information,
see the RISKS article
A definitive clarification of time measurement
in volume 19(14), 14 May 1997, by John Laverty and Peter Ladkin.
Back to Contents
Many companies and computer users have realised the problem too late
to perform proper checks - or haven't realised it at all yet!
Any system which uses dates or a clock can be vulnerable.
One of the problems is that the clock may be hidden.
Here is a quick run through some issues.
High-Level-Language Software
Software written in a high-level language can be inspected
to see if it explicitly uses two-digit dates. However, much software
is written without a precise specification of all data structures it
uses. It may assume such date representations, without any overt indication
that it is designed around such a representation. In this case, one
hopes that testing the software when the system clock has been set
forward to near midnight on 31.12.1999 will produce observable effects.
However, there can be effects which are not immediately observable, but
which only appear later, say when certain transactions occur.
For example, a financial loan system calculates at the end of the year
2000 that you owe it, not one year interest, but a backlog of 101 years
interest, and sends you the bill early in 2001.
Real-Time Systems
Real-time systems are dependent on a system clock. After
precautions have been taken to limit the possibility of adverse
consequences, the system can be taken off-line, the system clock can
be set forward to mear midnight on 31.12.1999, and the system restarted.
However, as above, there can be certain effects which only become apparent
after a considerable time. For example, there could be maintenance
software which `wakes up' every 24 hours, looks at the record of system
behavior, and calculates what and where maintenance needs to be performed.
When this software wakes up at the end of the first day of the new
millennium, it may calculate some values mistakenly, based on mistaken
date representations in sensors and other parts of the system, calculate
that something untoward has gone on, and shuts the whole system down
in emergency mode. Such behavior may not be seen unless a `roll-over'
test is allowed to run for more than 24 hours.
Hidden Clocks
Even more difficult to detect are those cases in which the processor
or underlying hardware has a clock, whose presence the system
never makes use of, and has no access to, but which is nevertheless there,
and on which the underlying hardware depends for its correct operation.
One can determine the presence of such clocks -
and thus such Year 2000 vulnerabilities -
most appropriately
by asking the original manufacturer whether one is present.
This is not always easy, especially for those systems whose
manufacturers no longer support them (so-called `legacy systems').
In these cases, one must do the best
one can to find out for oneself whether there are such dependencies,
whether based on old records of any sort, or on tests.
Safety-Critical Systems
Safety-critical systems require particular care when being taken
off-line for tests, or when being tested. A good guide to the
analysis and testing of such systems for Year 2000 compliance
is the series of two books,
Safety and the Year 2000 and
Testing Safety-Related Control Systems for Year-2000 Compliance,
available from the electronic bookstore
www.hsebooks.co.uk
of the publisher, the UK Health and Safety Executive (HSE).
Legacy Systems: An Example
An example of legacy systems which definitely have clocks
is the FAA's En-Route Air Traffic Display
Systems, based on IBM 9020E machines (more than 30 years old) or
Raytheon 760 machines (more than 25 years old).
These
systems have been deemed Year 2000 noncompliant by the original
manufacturers. They may be replaced by the new DSR system before the
millennium end.
PC-based Systems
Perhaps the most common example of legacy systems, albeit ones whose
software is understood to a much larger extent than most, are older PCs
running versions of Windows 3.1 or earlier. In contrast to later
versions of Windows, Windows 3.1 is applications software, which
runs under the operating system MSDOS (later versions of Windows run
as the operating system, mostly replacing the function of MSDOS).
If you have one of these
systems, be sure to take Year 2000 precautions. There may be Year 2000
dependencies in the BIOS, but happily these are changeable by the user.
However, old PC software may no longer be supported by the company that
programmed it, if indeed that company is still in business. You may
simply be advised that that software is or could be Y2K-vulnerable, and
to buy the updated version (if there is one). However, your hardware
may not be supported by the new software version.
Relying on System Vendors
If one can rely on hardware and software vendors to give accurate
information about the Y2K-compliance of their products, then the
job of determining Y2K compliance of one's system is made much easier.
However, how does one tell whether to trust the vendors?
Much may depend on the vendor's reputation, and record for addressing
Y2K problems, and also on the risk level of one's system application.
A hospital life-support system should probably be thoroughly analysed
for Y2K compliance, no matter how reputable the hardware and software
manufacturers. But one can probably just wait and see if a video game
machine is Y2K compliant.
A Curious Response
One of the most curious replies to queries about Y2K problems is reputed
to have been given by a Thai businessman: they don't have a problem
because they don't use the Julian calendar
(International Herald Tribune,
"Distracted Asia Ignores The Millennium Bug" by Thomas Crampton,
17-18.10.1998, p1. This anecdote is attributed to Iain Anderson, a
British government advisor on millennium issues, on p16, article
`No hiding place', of
The Economist's
survey of the millennium bug,
Time Runs Out, 19 September 1998).
We leave it to the reader to determine
the fallacy in this reasoning.
Back to Contents
No one has determined how many resources will be used up dealing with the
Millennium Bug. Estimates are many billions of dollars worldwide.
Neither is anyone able to determine the extent of the problem. Estimates
from computer professionals have ranged from
- "it's not really a problem worth worrying about" (letter
writer to
The Economist journal
in response to their survey of the millennium bug,
Time Runs Out, 19 September 1998);
to
- "everyone I have talked to who has looked hard into it has found
a bigger problem than they thought they had" (Martyn Thomas
in conversation with P. B. Ladkin, July 1998).
Economic Consequences
One thing is clear - the millennium bug is in either case a resource hog.
If the bug has been overestimated as a technical problem,
then billions of
dollars worth of human and other resources will have been spent chasing
a will-o'-the-wisp. And if has been underestimated, then those who have not
analysed their systems adequately in preparation for it will suffer
greatly - and this suffering may be passed on to the general public.
Either way, it is a human and economic problem of significant size,
said by The Economist to be the most expensive single technological failure,
in constant dollars, of all time.
Given the published experience of those who have diligently searched for Y2K
problems in their systems, we judge it more likely that the millennium
bug has been technically underestimated than that it has been overestimated
(cf. the quote from Thomas, above).
Although one will simply have to deal with whatever the consequences of the
millennium bug will be, avoiding data discontinuity problems in computer
systems is straightforward.
Data types, including ranges, must be explicitly specified either in the
requirements or when designing a computer system; and it should be
determined that overflows would be explicitly announced and correctly
handled.
(Partial lack of such measures accounted recently for the Ariane Flight
501 failure: see
The Ariane 5 Accident: A
Programming Problem? by Peter Ladkin, Research Report RVS-J-98-02,
and An Analysis of the Ariane 5 Flight 501 Failure - A System
Engineering Perspective by Gérard Le Lann, IEEE Symposium
and Workshop in Engineering of Computer-Based systems, 1997.)
Such measures are a routine part of
a normal rigorous specification process. Such a specification process is
adequate if it is possible to prove, formally or informally, that the
design of the system fulfils its requirements, and if the coding of
the system fulfils its design specification, and if the hardware
correctly executes the programs according to
the semantics of the language in which the code is written.
Software engineers experienced in the use of formal methods have
been advocating and using such procedures, especially for safety-critical
and other mission-critical systems, for decades. Such methods have
by and large been used in the design of security-critical systems,
and are beginning to be applied more widely for safety-critical
system development, although some domains (military avionics systems) are
more advanced in this respect than others (hospital health care apparatus).
The emergence of the millennium bug may serve to give momentum to
this effort. It provides a significant argument for rigorous specification,
requirements and design analysis either on economic grounds (more formal
development would have saved most of the resources used later for
post hoc analysis of Y2K vulnerabilities),
or on grounds of the dire consequences of the millennium bug. It seems
prudent at this point to hope for the former but plan for the latter.
Back to Contents
Martyn Thomas
has written some short but splendid articles
attempting to categorise the extent of Year 2000 problems. We
include them below by permission and with gratitude.
Peter de Jaeger has collected and linked a vast amount
of information concerning Y2K, and wrote already in September 1993
that time was running out to `solve' the Year 2000 problem
(for example, in a
quote from his article
Doomsday 2000).
Back to Contents