help-gnu-radius
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Help-gnu-radius] Radius dies frequently


From: Dan Beldiman
Subject: [Help-gnu-radius] Radius dies frequently
Date: Tue, 25 Jan 2005 03:21:54 +0100

Hi,

Since last week I have the problem that my radius server is frequently
stopping to work.
I am using GNU Radius version 1.2 on Linux with MySQL, and it has been
working fine for a long time.

On Tuesday last week, gnu radius died for the first time.
The radius server was not completely unresponsive, it did reply all request
with DENY messages (which prevented a failover)

The log files showed the following:


Jan 18 06:49:28 Main.info: Reloading configuration now
Jan 18 06:49:28 Main.info: Terminating the subprocesses
Jan 18 06:49:28 Main.notice: child 23102 exited with status 0
Jan 18 06:49:28 Main.notice: child 23098 exited with status 0
Jan 18 06:49:28 Main.notice: child 23094 exited with status 0
(this line for about 12 child processes)



The log file showed some strange messages for the time before the outage:

Jan 18 06:29:47 Auth.notice: Killing unresponsive ACCT child 20528
Jan 18 06:29:47 Auth.notice: Killing unresponsive ACCT child 21260
Jan 18 06:29:47 Main.notice: child 21260 terminated on signal 9
Jan 18 06:29:47 Main.notice: child 20528 terminated on signal 9
Jan 18 06:29:47 Auth.notice: (Access-Request local 138 "NE-Radius-Test"):
Login OK [NE-Radius-Test]
Jan 18 06:29:49 Auth.notice: (Access-Request local 70 "notExist"): No such
user [notExist]
Jan 18 06:29:59 Auth.notice: Killing unresponsive ACCT child 22794
Jan 18 06:29:59 Auth.notice: Killing unresponsive ACCT child 22800
Jan 18 06:29:59 Auth.notice: Killing unresponsive ACCT child 22802
Jan 18 06:29:59 Main.notice: child 22802 terminated on signal 9
Jan 18 06:29:59 Main.notice: child 22800 terminated on signal 9
Jan 18 06:29:59 Main.notice: child 22794 terminated on signal 9
Jan 18 06:30:00 Auth.notice: (Access-Request VPN1 59 "Test" CLID=xxx
CSID=x): Login OK [TEST]
Jan 18 06:30:01 Acct.notice: Killing unresponsive ACCT child 22804
Jan 18 06:30:01 Main.notice: child 22804 terminated on signal 9
Jan 18 06:30:04 Acct.info: login: entry for NAS 172.xxx port 26508 duplicate
Jan 18 06:30:07 Acct.info: login: entry for NAS 172.xxx port 26508 duplicate
Jan 18 06:30:14 Acct.info: login: entry for NAS 172.xxx port 26508 duplicate
Jan 18 06:30:17 Acct.notice: Killing unresponsive ACCT child 22832
Jan 18 06:30:17 Main.notice: child 22832 terminated on signal 9
Jan 18 06:30:17 Acct.info: login: entry for NAS 172.xxx port 26508 duplicate
Jan 18 06:30:28 Acct.notice: Killing unresponsive ACCT child 22835
Jan 18 06:30:28 Acct.notice: Killing unresponsive ACCT child 22837
Jan 18 06:30:28 Acct.notice: Killing unresponsive ACCT child 22839
Jan 18 06:30:28 Main.notice: child 22839 terminated on signal 9
Jan 18 06:30:28 Main.notice: child 22837 terminated on signal 9
Jan 18 06:30:28 Main.notice: child 22835 terminated on signal 9
Jan 18 06:30:50 Auth.notice: Killing unresponsive ACCT child 22841
Jan 18 06:30:50 Auth.notice: Killing unresponsive ACCT child 22843
Jan 18 06:30:50 Main.notice: child 22841 terminated on signal 9
Jan 18 06:30:50 Main.notice: child 22843 terminated on signal 9
Jan 18 06:30:51 Auth.notice: (Access-Request VPN1 62 "Test" CLID=xxx
CSID=x): Login OK [TEST]

about 15 minutes later it dies.
Checking the past log files, I noted that every morning around 6:30, and
only at this time,  the radius server was killing unresponsive ACCT child
processes.
I do not know if this is related to radius not responding anymore, but I had
the radius server fail two more times at around 6:45 AM, and once at a
different time.

The last messages in the log file are always something like this:
Jan 25 00:04:48 Acct.info: Child exiting on timeout.
Jan 25 00:04:49 Acct.info: Child exiting on timeout.
Jan 25 00:04:49 Acct.info: Child exiting on timeout.
Jan 25 00:08:53 Acct.error: Child received malformed header (len = 0, error
= Success)
Jan 25 00:08:53 Acct.error: Child received malformed header (len = 0, error
= Success)


I found in the mailing lists similar problems, but for older versions of the
radius server.
Does anyone have a clue what the problem might be?

Thanks

Dan






reply via email to

[Prev in Thread] Current Thread [Next in Thread]