NOTE: This is an archived copy of the Dragonfire
Internet Services website. Dragonfire is no longer in operation; please
do not attempt to contact any of the addresses listed on this site.
Dragonfire System Event Log
Note: All times on this page are given in Universal
Coordinated Time (UTC), also known as Greenwich Mean Time (GMT).
Current system status
No known problems.
Please note that ftp://ftp.dragonfire.net/users/... URLs will
no longer function; we found it necessary to disable this method of FTP
file access to improve server performance. You should instead use your
hostname in FTP URLs; for example: ftp://accountname.dragonfire.net/.
Often, problems accessing Dragonfire are the result of Internet backbone
troubles. The following network service providers have pages listing the
status of their networks. We would appreciate links to any additional
network status pages.
Historical system events (most recent first)
Note that while we try to cover all system and service outages, some
problems do not leave any traces of their existence if they resolve
themselves before we detect or are notified of them. Also note that, as a
rule, outages which pertain to a single service only and which are less
than ten minutes in duration are not listed here.
- 30 November 1998
- Server Bahamut rebooted to fix a kernel bug preventing
access to some sites. Server downtime: 1.5 minutes.
- 30 November 1998
- Counters are restored.
- 27 November 1998, 13:40 UTC
- Scheduled maintenance; in order to complete the maintenance,
services are disabled for some time. Total downtime: 4 hours
(approximately).
- 23 November 1998, 02:24 UTC
- Server palantir rebooted. Server downtime:
2 minutes.
- 22 November 1998, 04:05 UTC
- Mail service is temporarily disabled. Due to a system
misconfiguration, a certain message was being repeatedly sent from
the system to itself, replicating in the process and taking up
large amounts of disk space and CPU time. The problem turned out
to be easily solvable, but in order to restore the system to normal
it was necessary to empty the mail queue. This may have resulted
in a small number of normal messages being lost as well; we
apologize for this. If you have not received a message you were
expecting, you may want to ask the sender to resend it. Service
downtime: 15 minutes.
- 18 November 1998, 23:34 UTC
- Server palantir rebooted. Server downtime:
1 minute.
- 13 November 1998, 15:11 UTC
- Server palantir rebooted. Server downtime:
10 minutes.
- 11 November 1998, 06:54 UTC
- Both servers are taken out by a power failure, and for unknown
reasons, do not return when the power is restored. Total downtime:
2 hours, 14 minutes.
- 8 November 1998, 20:47 UTC
- Both servers rebooted. Total downtime: 5 minutes.
- 5 November 1998, 18:35 UTC
- Server palantir noticed to be down and rebooted. Server
downtime: unknown.
- 1 November 1998, 17:04 UTC
- Server palantir rebooted. Server downtime:
15 minutes.
- 30 October 1998, 21:25 UTC
- DNS and HTTP servers restarted. Service downtime:
10 minutes.
- 28 October 1998, 00:26 UTC
- Server palantir stops responding. Rebooting the server
does not solve the problem, but after some investigation, it is
determined that server Bahamut was "eating" packets
intended for palantir. A reboot of Bahamut
solves the problem. Server Bahamut downtime:
11 minutes; server palantir downtime: 1 hour,
33 minutes.
- 24 October 1998, 14:44 UTC
- Server palantir rebooted to fix weirdness with the FTP
daemon. Server downtime: 1 minute.
- 23 October 1998, 13:26 UTC
- A routing problem apparently with sprintlink.net prevents
some access to Dragonfire for about 15 minutes.
- 21 October 1998, 15:00 UTC
- Some virtual servers were down for approximately 3 hours as
the result of a bug in the monitoring program, which has since been
fixed.
- 17 October 1998
- Mail delivery service suspended temporarily for minor software
configuration changes. No mail was lost as a result of this
maintenance.
- 16-17 October 1998
- Due to server modifications for moving all users to basic domain
service, HTTP service for various sites was down intermittently.
- 7 October 1998, 20:00 UTC
- Both servers reboot. Total downtime: 30 minutes. (Some
virtual servers remain down for an additional 20-25 minutes.)
- 3 October 1998, 22:47 UTC
- Server palantir rebooted. Server downtime:
2 minutes.
- 26 September 1998, 01:06 UTC
- Server palantir rebooted. Server downtime:
1 minute.
- 24 September 1998, 01:24 UTC
- HTTP daemon upgraded to a new version. HTTP service downtime:
5 minutes.
- 23 September 1998, 22:05 UTC
- Server Bahamut reboots. Server downtime:
11 minutes.
- 22 September 1998, 10:13 UTC
- Filesystem corruption is discovered on server palantir,
and all services on that machine are taken offline to repair the
damage. No user files are believed to have been affected, although
this damage did affect the ability for certain users to create
directories or use CGI scripts. Server downtime: 10 minutes.
- 18 September 1998, 08:21 UTC
- Server palantir rebooted. Server downtime: 1 minute.
- 17 September 1998, 02:18 UTC
- Perl binary replaced on server palantir due to reports of
failing CGIs which led to the discovery the binary was apparently
corrupted.
- 15 September 1998, 03:11 UTC
- Server palantir rebooted. Server downtime: 1 minute.
- 10 September 1998, 04:21 UTC
- Some virtual servers were down for approximately 25 minutes.
The cause of this is unknown.
- 30 August 1998, 05:05 UTC
- Server palantir is rebooted when FTP service begins
acting up. Server downtime: 1 minute.
- 25 August 1998, 13:43 UTC
- Server Bahamut (which had been experiencing no
difficulties) was mistakenly rebooted by a CAIS technician. Server
downtime: 11 minutes.
- 25 August 1998, 07:23 UTC
- Server palantir goes down and needs to be rebooted.
Server downtime: 6 hours, 27 minutes.
- 21 August 1998, 22:54 UTC
- Server palantir rebooted when FTP slows to a crawl.
Server downtime: 3 minutes.
- 18-19 August 1998
- Logs indicate that some virtual domains on server Bahamut
were inaccessible for a total of 8 hours, from 11:00 UTC to
15:00 UTC each day. The cause of this is unknown.
- 18 August 1998, 08:53 UTC
- Server palantir rebooted when it is discovered that the
HTTP servers on that machine are refusing to start, possibly due to
memory corruption. Server downtime: 2 minutes, with some
prior (unknown) downtime for HTTP servers on that machine.
- 13 August 1998, 17:18 UTC
- Server Bahamut rebooted as the simplest solution after a
runaway server process causes many other system processes to die.
Server downtime: 3 minutes.
- 10 August 1998, 17:14 UTC
- Server Bahamut is rebooted due to system overload.
Server downtime: 13 minutes.
- 9 August 1998, 10:00-17:00 UTC
- Logs indicate that some virtual domains on server Bahamut
were inaccessible for 7 hours. The cause of this is
unknown.
- 5 August 1998, 22:00 UTC
- Server palantir is restarted after going down for an unknown
reason.
Estimated server downtime: 16 hours.
- 5 August 1998, 17:10 UTC
- Server Bahamut rebooted due to an unknown system error. The
system may have been inaccessible for a short time before, extending
the actual downtime.
Server downtime: 14 minutes.
- 4 August 1998, 17:37 UTC
- Both servers are rebooted. System activity indicates that HTTP
servers may have overloaded the system.
Server downtime: 6 minutes.
- 25 July 1998, 17:49 UTC
- Server palantir rebooted due to NFS-related problems.
Server downtime: 3 minutes.
- 15 July 1998, 17:48 UTC
- Server palantir crashes, and is rebooted. System
activity suggests that the problem may be NFS-related, and we are
therefore working on reducing NFS traffic between the servers.
Server downtime: 11 minutes.
- 11 July 1998, 08:20 UTC
- Server palantir crashes again, and is rebooted again.
Server downtime: 4 hours, 45 minutes.
- 9 July 1998, 13:44 UTC
- Server palantir crashes, is rebooted, and minor filesystem
damage is fixed. Server downtime: 1 hour, 26 minutes.
- 6 July 1998, 16:47 UTC
- Server palantir is rebooted as part of a test of new
system monitoring software. Server downtime: 1 minute.
- 4 - 5 July 1998
- Server palantir fails, unfortunately during a holiday
break when the servers are being watched as closely. The server
is discovered disabled (not actually down, but nearly so) at 22:48
UTC on 5 July 1998 and is rebooted and repaired. Server
downtime: 33 hours (estimated from reports).
- 30 June - 1 July 1998
- A misconfigured mail delivery program is not taken into account
during system cleanup, causing mail delivery to some Dragonfire
administrative addresses to fail. User accounts were not
affected by this error.
- 26 June 1998, 12:23 UTC
- An HTTP daemon serving a number of virtual servers is seen to be
down and is restarted. Prior downtime is unknown, since the
system monitoring tool had not reported any problems. The tool is
reconfigured to look harder for and yell more loudly about
problems.
- 24 June 1998, 02:50 UTC
- Server palantir fails. Inexplicably, a hard reset does
not start it up again, but it recovers after being power-cycled.
Server downtime: 29 minutes.
- 23 June 1998, 18:51 UTC
- Server Bahamut begins experiencing the same problems
experienced earlier in the day by server palantir, and is
immediately reset to stave off any additional service problems.
Server downtime: 13 minutes.
- 23 June 1998, 09:10 UTC
- An unknown HTTP server problem causes the main server,
www.dragonfire.net, as well as several virtual servers, to
stop responding. Eventually this is traced to what appears to be a
rare system bug occurring on server palantir, causing the
init process (the "overseer" of the entire system) to
freeze and leave many server processes unable to function properly.
The system is given a hard reset, followed by correction of the
filesystem corruption thereby caused. No user files are believed
to have been damaged by this incident. Service downtime: 6
hours (approximately).
Note that we are looking into establishing virtual servers for
all users to limit the damage that can be caused if the
www.dragonfire.net server processes fail again.
- 18 June 1998, 19:31 UTC
- Server Bahamut rebooted due to odd error conditions
appearing. Server downtime: 1.5 minutes.
- 14 June 1998
- A bizarre NFS problem, cause still unknown, takes out a few virtual
servers for most of the day before it is corrected.
- 8 June 1998, 15:21 UTC
- Servers taken down for replacement of the UPS which had apparently
not been doing its job. Total downtime: 47 minutes.
- 8 June 1998, 11:23 UTC
- Servers reboot. (Another power flicker?) Total downtime:
10 minutes.
- 5 June 1998, 19:44 UTC
- Servers hit by a power flicker. Total downtime: 10 minutes.
- 2 June 1998, 14:30 UTC
- Servers hit by another backup power test at CAIS. Total downtime:
10 minutes.
- 1 June 1998, 18:33 UTC
- Server palantir reboots. Server downtime:
2 minutes.
- 31 May 1998, 22:37 UTC
- Server palantir rebooted because it was acting strangely,
though FTP and web service was still functioning. Server downtime:
2 minutes.
- 30 May 1998, 06:46 UTC
- Server Bahamut goes down. Attempts to restart it appear
to be successful, but only for a short time; eventually it is
determined that the power supply is faulty and needs to be
replaced. We theorize that this failure is due to voltage spikes
resulting from the multiple occurrences of power loss in CAIS's
weekly power tests. However, whatever caused this failure, we were
unfortunately not prepared to deal with it, and hence it takes a
good deal of time to restore the server to normal operation.
Server downtime: 16 hours, 17 minutes.
- 29 May 1998, 08:41 UTC
- Server Bahamut goes down and needs to be manually
rebooted. Server downtime: 2 hours, 20 minutes.
- 27 May 1998, 00:02 UTC
- 11 virtual servers are downed for approximately one hour to move
them from one server to the other.
- 26 May 1998, 14:30 UTC
- Both servers are downed again by a CAIS power test. A call to
CAIS's NOC reveals that an electrician is scheduled to move VMA's
power to CAIS's backup power system before next week's test. Total
downtime: 10 minutes.
- 25 May 1998, 03:51 UTC
- Both servers reboot. Total downtime: 10 minutes.
- 19 May 1998, 14:30 UTC
- Servers are downed due to another backup generator test at CAIS.
Total downtime: 10 minutes. A configuration error keeps
one web account inaccessible for the next eight hours; this error
has since been corrected.
- 12 May 1998, 14:27 UTC
- A backup generator test at CAIS brings down both servers. A CAIS
technician says that CAIS will be requiring all colocation
customers to use CAIS's backup generator system, which should bring
an end to these episodes of downtime. Total downtime:
15 minutes.
- 5 May 1998, 13:44 UTC
- Another power failure strikes, bringing both servers down.
Discussion with a technician at CAIS, the provider of the physical
network link and colocation space, reveals a startling fact: Verio
Mid-Atlantic (the company formerly known as ClarkNet, which
provides the logical network connection) is not purchasing backup
UPS power from them. According to VMA, they have their own backup
power supply at CAIS, but it failed. Total downtime: 2 hours,
41 minutes.
- 27 April 1998, 08:21 UTC
- Server palantir also needs to be rebooted because of a
disk error. Server downtime: 13 minutes.
- 27 April 1998, 07:32 UTC
- Server Bahamut needs to be rebooted because of a disk
error (possibly a result of the power outage). Server downtime:
22 minutes.
- 27 April 1998, 05:27 UTC
- Both servers are brought down by a power failure. Total downtime:
1 hour, 13 minutes.
- 23 April 1998, 08:35 UTC
- Server Bahamut crashes again. The timing and
frequency of these crashes, especially in light of the prior
stability of the server (it had been up for over 40 days before the
April 7 self-reboot), suggest to us that the server is under some
sort of attack. ICMP packets (i.e. support for pings and
traceroutes) are filtered out to see if that helps. Server
downtime: 4 hours, 55 minutes.
- 23 April 1998, 01:21 UTC
- Server Bahamut rebooted after kernel warnings pop up in the
system log. Server downtime: 2.5 minutes.
- 20 April 1998, 13:22 UTC
- Server Bahamut reboots; the root drive is not clean after
the reboot and has to be checked manually. Server downtime:
59 minutes.
- 18 April 1998, 05:54 UTC
- Server Bahamut bites the bucket and gets rebooted again. A
new kernel (the latest in the "development" series of Linux kernels)
is compiled to test on that server. The new kernel fails
spectacularly due to conflicts with old software; the software
problems are resolved, but the kernel itself then fails. The
original kernel is restored.
Total downtime: 14 hours (approximately).
- 17 April 1998, ~22:00 UTC
- Both servers stop responding at about the same time for unknown
reasons, and have to be reset manually. Total downtime:
7 hours (approximately).
- 14 April 1998, 14:14 UTC
- Server palantir rebooted to fix an odd problem preventing
a web server process which had failed earlier in the day from
restarting. Server downtime: 1 minute. (The two
sites served by that particular process were also down for several
hours prior to the reboot.)
- 7 April 1998
- Server Bahamut reboots. Server downtime: 1 minute.
- 28 March 1998
- Server palantir taken down for installation of a
replacement hard drive. Server downtime: 70 minutes.
- 26 March 1998
- Several people report that they cannot reach the web server. We
can find no local cause for this; all processes are running
normally and there are no unusual entries in the system log.
This suggests that the reported problems were caused by either
Internet connectivity failures or a multiple service failure on
Dragonfire (e.g. logging facility, HTTP daemon monitor, and HTTP
daemon). System monitor scripts are fine-tuned to prevent (or at
least greatly reduce the possibility of) such failures occurring
again.
- 24 March 1998, 03:29 UTC
- Server palantir rebooted. Server downtime:
6 minutes.
- 21 March 1998, 15:17 UTC
- Server palantir crashes, and has to be rebooted. Server
downtime: 32 minutes.
- 8 March 1998, 04:03 UTC
- Server palantir rebooted. Server downtime:
2 minutes. Indications are that one or two HTTP servers may
have been down for some time previously.
- 6 March 1998, 08:43 UTC
- Server palantir rebooted. Server downtime:
2 minutes.
- 1 March 1998, 18:29 UTC
- Server palantir rebooted due to apparent memory
corruption. Server downtime: 1 minute.
- 25 February 1998, 16:11 UTC
- Text/graphics counter (/cgi-bin/mcounter) recompiled and
reinstalled, as the text counter seems to be working well.
- 25 February 1998, 05:17 UTC
- Text counter (/cgi-bin/tcounter) code reviewed,
recompiled, and experimentally reinstalled.
- 22 February 1998, 20:34 UTC
- Server palantir reboots. Server downtime: 1.5
minutes.
- 22 February 1998
- Intermittent (and possibly extended) service outages occur through
about half of the day due to a disk-full condition on server
Bahamut. Estimated total web/POP service downtime:
6 hours.
- 21 February 1998, 20:01 UTC
- One HTTP daemon on server palantir killed and restarted
because it had apparently stopped responding to requests.
- 17 February 1998, 07:45 UTC
- Server palantir rebooted when FTP sessions suddenly start
dying for no apparent reason. Server downtime: 1.5 minutes
(FTP outage extends approximately 10 minutes farther back).
- 12 February 1998, 02:09 UTC
- Server palantir reboots. Server downtime:
1.5 minutes.
- 5 February 1998, 00:25 UTC
- Server palantir stops responding, and needs to be
rebooted. The cause of the freeze is unknown. An unexpected
result caused by file-sharing between the servers causes HTTP
daemons on the other server (Bahamut) to stop responding.
Total downtime: 80 minutes.
- 3 February 1998, 16:00 UTC
- Corrected configuration problem denying access to anonymous FTP
files through ftp.dragonfire.net, and disallowed anonymous
FTP to main server files through dragonfire.net (virtual
servers on the same machine are still accessible).
- 3 February 1998, 04:43 UTC
- images.dragonfire.net virtual server (which serves
Dragonfire logos) misconfiguration fixed after it was reported
down.
- 1 February 1998, 23:08 UTC
- Server palantir rebooted. Server downtime: 13
minutes.
- 31 January 1998, 00:59 UTC
- A new server, palantir.dragonfire.net, is brought online.
- 30 January 1998, 17:01 UTC
- Main server taken down for a short time during installation of a
new server. Total downtime: 15 minutes (approximately).
- 22 January 1998, 21:30 UTC
- Re-enabling the FTP server caused the load to shoot back up again.
At this point, the problem appears more likely to be excesesive
disk thrashing from many FTP processes trying to access different
files at once alongside the HTTP servers. Anonymous FTP has been
disabled until we find a way to reduce the thrashing.
- 22 January 1998, 20:01 UTC
- All FTP processes terminated again as the load problem continues.
After process termination, the load drops quickly back to normal,
suggesting either a problem in the FTP service or an FTP-based
denial of service attack. FTP server downtime: 43 minutes.
- 22 January 1998, 17:25 UTC
- All FTP processes are terminated due to extreme amounts of swapping
on the server; the anonymous FTP user limit is then reduced from
200 to 100 users to prevent a repeat of the problem.
- 21-22 January 1998
- Delays in mail delivery introduced by system load are found and
corrected. Note that no mail was lost; however, some messages may
have been delayed one or more days.
- 20 January 1998
- ClarkNet provides a new set of IP addresses which bypass the ACSI
backbone. Servers are reconfigured to listen to the new IP
addresses as of 21 January, 01:10 UTC.
- 15 January 1998, 05:00 UTC
- Counters disabled again after the HTTP servers grind to a halt.
- 15 January 1998, 04:41 UTC
- Counters re-enabled to stop the flood of complaints from people who
hadn't bothered to read this page.
- 10 January 1998, 05:17 UTC
- Web counter temporarily disabled upon discovery of many "junk"
counter entries and unusually large numbers of calls to the
counter.
- 9 January 1998, 00:15 UTC
- FTP/POP server mysteriously dies; since it has never failed before,
the outage is not noticed until much later. POP service downtime:
17 hours, 25 minutes (approximately).
- 2 January 1998, 15:28 UTC
- Server rebooted with a newer kernel to see if that stops whatever
is slowing down the server. Server downtime: 20 minutes.
- 1 January 1998, ~16:30 UTC and continuing
- Denial of service attack? All server processes
on Dragonfire slow down; the delays cannot be traced to any program
running on the server itself, suggesting some sort of external
cause.
- 1 January 1998, 02:15 UTC
- A bug is fixed in the HTTP server software, apparently eliminating
the earlier problems with freezes.
- 29 December 1997, 19:24 UTC
- A configuration file for the mail server is mysteriously corrupted,
temporarily suspending receipt of mail. No mail is lost as a result
of this error.
- 24 December 1997, 18:50 UTC
- Web server upgraded to a new release. Web server downtime: 15
minutes, plus occasional restarts to fix configuration or
installation problems over the next several hours.
- 24 December 1997, 16:28 UTC
- After some intermittent service, the virtual domain server problem
seems to have been resolved, and virtual domains are once again
available as they should be.
- 24 December 1997, 14:21 UTC
- The virtual domain server crashed inexplicably, and has not been
able to be restarted. We are actively working to resolve this
problem.
- 23 December 1997
- Nameservice was reported to be down for a large part of the
morning. We have not been able to identify any system faults
during that period; we can only guess that this stems from a
problem with our network access provider, which also provides
primary nameservice for the dragonfire.net domain.
- 10 December 1997, 00:47 UTC
- Server taken down to replace a defective SCSI cable. Total
downtime: 14 minutes.
- 9 December 1997, 21:34 UTC
- Most services temporarily suspended to fix a filesystem error.
Total downtime: 10 minutes.
- 9 December 1997, 20:45 UTC
- Server reboots. Total downtime: 4 minutes.
- 29 November 1997, 13:11 UTC
- Server reboots. Total downtime: 12 minutes.
- 26 November 1997, 14:54 UTC
- images.dragonfire.net virtual server brought back online
manually (again), this time fixing the problem for good.
- 22 November 1997, 02:56 UTC
- Server crashes and has to be brought back online manually. We are
investigating this and the previous crash as a possible
denial-of-service attack against Dragonfire, though we are not
hopeful of gaining much information after the fact. Total
downtime: 53 minutes.
- 20 November 1997, 17:28 UTC
- images.dragonfire.net virtual server (which serves
Dragonfire logos) brought back online manually after it failed to
come up normally during restart.
- 20 November 1997, 12:26 UTC
- Server crashes, and does not restart as normal. Total downtime:
1 hour, 37 minutes.
- 19 November 1997, 15:53 UTC
- Various configuration parameters on the HTTP server are modified
to increase server performance. Web server downtime: 2
minutes.
- 18 November 1997, 20:07 UTC
- DNS change reversed as it seemed to have adverse effects.
- 18 November 1997, 11:55 UTC
- Traffic to www.dragonfire.net split among four IP
addresses to balance HTTP server load.
- 7 November 1997, 01:48 UTC
- Server reboots. Total downtime: 12 minutes.
- 25 October 1997
- An incorrect HTTP daemon is started up after system boot, and a
few domain sites are unavailable for most of the day.
- 24 October 1997, 10:45 UTC
- ClarkNet, during maintenance on their equipment, knocks a cable
loose from Dragonfire which effectively brings the system down.
Total downtime: 15 hours, 54 minutes.
- 23 October 1997, 04:07 UTC
- System crashes due to disk drive errors (on the same disk which we
tried to replace on multiple previous occasions but could not due
to bad replacement drives). The drive is replaced and data
transferred. Total downtime: 13 hours, 47 minutes.
- 18 October 1997
- HTTP daemon restored to standard operation. With the exception of
the fact that the server must still be restarted on occasion due
to an apparent memory leak, the problems of the past month appear
to be resolved.
- 9 October 1997, 16:00 UTC
- HTTP daemon is replaced with a newer version which seems to
function considerably better.
- 27 September 1997, 13:51 UTC
- Server is taken down for scheduled maintenance to install a new
9GB disk drive. Total downtime: 3 hours, 45 minutes.
- 14 September 1997, 19:10 UTC
- The primary HTTP daemon is taken down momentarily to test the
effectiveness of an experimental updated version. Web server
downtime: 5 minutes.
- 14-15 September 1997
- Dragonfire's HTTP daemon begins to show signs of stress, restarting
itself several times an hour.
- 11 September 1997, 03:47 UTC
- Server reboots. Total downtime: 1 minute. A bug
in the HTTP server causes wrong pages to be served for an
additional 20-25 minutes before it is found and corrected.
- 4 September 1997, 03:36 UTC
- Server reboots. Total downtime: 1.5 minutes; the
HTTP daemon takes a bit longer to recover.
- 17 August 1997, 22:20 UTC
- A strong thunderstorm takes out power to Dragonfire momentarily.
Total downtime: 10 minutes.
- 15 August 1997, 04:14 UTC
- Following unspecified maintenance work by ClarkNet, one of
Dragonfire's hard disks was found to be damaged. Analysis and
temporary repairs took most of the day. Total downtime:
21 hours, 22 minutes.
- 31 July 1997, 14:25 UTC
- Dragonfire is shut down (intentionally, this time) to allow
ClarkNet to replace the defective UPS. Total downtime:
2.5 minutes.
- 31 July 1997, 12:55 UTC
- The UPS (Uninterruptible Power Supply) provided by ClarkNet for
Dragonfire and several other machines serviced by ClarkNet
overloads, taking Dragonfire down with it. Total downtime:
27 minutes.
- 5 July 1997, 13:18 UTC
- Server is shut down for an upgrade to 128MB RAM and installation of
a new tape drive. Total downtime: 43 minutes.
- 3 July 1997, 21:40 UTC
- Dragonfire's HTTP daemon crashes and does not restart as usual.
Web server downtime: 4 hours, 44 minutes.
- 1 July 1997, 17:15 UTC
- Server is rebooted to cure a small but persistent resource leak.
Total downtime: 1.5 minutes.
- 1 June 1997, 10:27 UTC
- Server reboots. Total downtime: 9 minutes.
- 27 May 1997, ~03:30 UTC
- Due to problems caused by BBN (bbnplanet.net), one of the
Internet backbone providers, Dragonfire is cut off from at least a
large part of North America. Normal routing is restored around
17:30 UTC.
- 16 May 1997, 05:53 UTC
- Due to an error in configuration while switching IP addresses, the
server becomes unreachable and must be power-cycled. Total
downtime: 1 hour, 23 minutes. The problem is
exacerbated by unannounced ClarkNet network maintenance, causing
continued occasional network dropouts for a while.
- 3 May 1997, 23:56 UTC
- Server reboots. Total downtime: 46 minutes.
- 24 April 1997, 02:48 UTC
- During a network upgrade, ClarkNet changes the IP address of the
router to which Dragonfire is connected, thus cutting Dragonfire
off from the Internet, and fails to inform us of the change until
considerably later. Total downtime: 11 hours, 47
minutes.
- 23 April 1997, 20:40 UTC
- Server goes down due to a loose cable. Total downtime: 37
minutes.
- 20 April 1997
- We receive a report that AGIS's network, through which most of
Dragonfire's Internet traffic travels, is under attack. Due to
this, many people may well encounter delays accessing Dragonfire.
- 20 April 1997, 16:23 UTC
- Server rebooted to get rid of a runaway process. Total downtime:
1 minute.
- 19 April 1997, 20:52 UTC
- Server reboots. Total downtime: 9 minutes.
- 31 March 1997, 19:04 UTC
- Server crashes, and reboots automatically. Total downtime:
43 minutes.
- 28 March 1997, 18:07 UTC
- Server is moved to another network center. Total downtime:
1 hour, 39 minutes (plus DNS propogation time).
- 27 March 1997, 05:21 UTC
- Server crashes. Total downtime: 11 hours, 16 minutes.
- 21 March 1997, 17:27 UTC
- Server is moved to a new network center. Total downtime:
6 hours, 46 minutes (plus DNS propogation time).
- 15 March 1997, 13:04 UTC
- Server is taken down shortly after schedule to move drives over to
a new case. Total downtime: 1 hour, 40 minutes.
- 11 March 1997, 00:05 UTC
- Server is taken down again due to (yet more) problems with the
drive. All data on the drive is moved to other filesystems, and
the drive is removed. Total downtime: 1 hour, 38
minutes.
- 10 March 1997, 08:00 UTC
- Server is taken down for scheduled emergency maintenance to repair
the drive. The drive is fixed with little loss. Total downtime:
2 hours.
- 9 March 1997, 18:04 UTC
- The troubled drive fails again, and is temporarily taken offline to
prevent it from bringing the rest of the system down. Total
downtime: 1 hour, 5 minutes.
- 8 March 1997, 02:10 UTC
- One of Dragonfire's drives experiences a transient failure (likely
from heat caused by the malfunctioning fan). Total downtime:
30 minutes.
- 8 March 1997, 00:55 UTC
- New fan removed due to bad performance. Total downtime: 18
minutes.
- 7 March 1997, 14:33 UTC
- Server taken down momentarily to adjust new fan. Total downtime:
1.5 minutes.
- 7 March 1997, 12:11 UTC
- Server taken down to install a new cooling fan. Total downtime:
18 minutes.
- 26 February 1997, 10:05 UTC
- Server rebooted to test a new and (hopefully) improved kernel.
Total downtime: 4 minutes.
- 22 February 1997, 10:58 UTC
- HTTP daemon died unexpectedly and did not return. We were not
made aware of the problem before the following day and thus could
not fix it at the time. The server itself crashed about a day
later. Total downtime: 11 hours, 40 minutes; web
server downtime: 36 hours, 39 minutes.
- 21 February 1997, 00:39 UTC
- Server crashed for unknown reasons and rebooted automatically.
Total downtime: 10 minutes.
- 19 February 1997, 17:25 UTC
- Kernel was modified and server rebooted to attempt to fix the
recent crashes. Total downtime: 1.5 minutes.
- 19 February 1997, 15:01 UTC
- Server crashed for unknown reasons. Total downtime: 1 hour,
1 minute.
- 19 February 1997, 07:36 UTC
- HTTP daemon crashed for unknown reasons, and was restarted manually
as soon as the problem was noticed. Total downtime: 3 hours,
30 minutes.
- 19 February 1997, 03:27 UTC
- Server crashed unexpectedly; stable kernel was reinstalled. Total
downtime: 28 minutes.
- 16 February 1997, 10:24 UTC
- Server was rebooted multiple times to test added system
functionality. Total downtime: 20 minutes.
- 15 February 1997, 16:22 UTC
- Server was rebooted to install a new kernel version. Total
downtime: 4 minutes.
- 9 February 1997, 14:48 UTC
- Server was shut down for minor rearrangement of drives to improve
air flow. Total downtime: 5 minutes.