[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Constant reboots on FreeBSD




> I have talked about this problem in the past but I am still unable to 
> figure out what could be causing the reboots.
> I have configured several Proxy servers using FreeBSD 4.0 with 
> squid 2.3 and apache 1.3.12. They are all configured to do NAT 
> and firewall.
...
> Does anyone have any ideas? I can't find anything in the logs or 
> any core dumps.

Brian's excellent suggestions should help you to get going again.
I'd like to make a few more comments on the 'no logs, no coredumps' part.

First, personally I strongly recommend running servers using a serial
console, not the standard PC keyboard/screen. This has several advantages:
- You can use another PC as console server, and add logging on the tip
  session that connects to the console of the squid server.
  This means that every console message can be logged, even if the 
  original box reboots; you _will_ see that last message it spits out.
- You can even detach/attach remotely (e.g. from the network).
  Check the attach/detach functionality of screen(1).
- You'll be able to squeeze many more machines in a rack as these
  monitors are *big*.
- A computer monitor is a fire hazard (more than a computer itself).
  Not having a room full of those is a Good Thing. 
  Saves a lot of power, too.
Perhaps, the serial console will help you get the panic message.
Please report what it is if you cought it.

Second, making core dumps is a two-stage process. At the time of the crash,
the box will write it's memory to the swap partition. For this reason,
the swap partition must be at least as large as the physical memory
in the box.
During reboot time, the box checks whether it should copy the swap space
to a code dump file. I'm more familiar with BSD/OS obviously, but
I just checked a FreeBSD box (hi, randy!) and /etc/rc checks for the
existance of the directory /var/crash, which usually isn't there by default 
(not sure about FreeBSD's default). Be aware that these dump files, too, 
are as big as the physical memory, so you may want to create 
a crash directory somewhere else and symlink /var/crash to it.

You should be able to test whether making dumps in general works;
see the man pages of reboot(8) (check 'reboot -d') and dumpon(8),
and test to make sure making dumps actually works.

Also, make sure you have the kernel tree you used to compile the kernel
at hand as it is neccessary to analyze these dumps.

It is possible, but very rare, that code dumps do work, but that the
box doesn't make a dump when the bug triggers. These are very rare
situations (double faults, stack overflows and the like).

It might also be useful to see if you can trigger the problem.
This makes analysis a lot easier.

Again, this is the 'long approach', which I use because I hack kernels
for a living, but if you'd like to try this path I might be able to help out 
a bit.

Geert Jan


-----
This is the afnog mailing list, managed by Majordomo 1.94.4

To send a message to this list, e-mail afnog at afnog.org
To send a request to majordomo, e-mail majordomo at afnog.org and put
your request in the body of the message (i.e use "help" for help)

This list is maintained by owner-afnog at afnog.org