您的当前位置:首页正文

错误日志分析

2024-02-04 来源:步旅网
该问题目前的分析:

1、9312-A主板(1/13)忽然出现硬件故障,导致该单板不停复位。

Jan 19 2012 14:29:07 Quidway %%01CSSM/4/STACKBACKUP(l)[33]:This cluster CSS compete result is backup.

Jan 19 2012 14:29:15 Quidway %%01ALML/4/CLOCKFAULT(l)[50]:The \"CLK_33M_CHK\" sensor15 of MPU board[1/13] detect clock signal fault

Jan 19 2012 14:29:15 Quidway %%01ALML/4/CLOCKFAULT(l)[51]:The \"CLK_125M_CHK\" sensor16 of MPU board[1/13] detect clock signal fault

Jan 19 2012 14:29:15 Quidway %%01ALML/4/CLOCKFAULT_RESUME(l)[55]:The \"CLK_125M_CHK\" sensor16 of MPU board[1/13] detect clock signal fault resume

Jan 19 2012 14:29:15 Quidway %%01ALML/4/CLOCKFAULT(l)[56]:The \"CLK_125M_CHK\" sensor16 of MPU board[1/13] detect clock signal fault

Jan 19 2012 14:29:15 Quidway %%01ALML/3/CPU_RESET(l)[57]:The canbus node of MPU board[1/13] detects that CPU was reset.

2、由于该单板的复位导致9312-A备板(1/14)也出现异常复位,应该是由于1/13单板复位

导致,怀疑是1/13板一直复位,自动回退到了老的版本,此时出现主备板版本不一致

引发。

V1R6后续版本已经解决该问题。

Jan 19 2012 14:29:41 Quidway %%01ALML/4/ENTRESET(l):MPU frame[1] board[14] is reset, The reason is: VRP reset selfboard because of find exception.

3、此时1框的两块主控都复位了,导致堆叠分裂。分裂之后,1/14单板启动,启动完成之后又会堆叠合并。

合并的过程会出现2号框的整框复位,这个是堆叠机制要求的。

Jan 19 2012 14:39:05 Quidway %%01ALML/4/ENTRESET(l):LPU frame[2] board[5] is reset, The reason is: Reset for CSS management.

Jan 19 2012 14:39:05 Quidway %%01ALML/4/ENTRESET(l):LPU frame[2] board[8] is reset, The reason is: Reset for CSS management.

Jan 19 2012 14:39:05 Quidway %%01ALML/4/ENTRESET(l):MPU frame[2] board[14] is reset, The reason is: Reset for CSS management.

Jan 19 2012 14:39:06 Quidway %%01ALML/4/ENTRESET(l):MPU frame[2] board[13] is reset, The reason is: Reset for CSS management.

4、1/13故障之后引发了1/14单板的复位,同时1/14的复位引发了2框的复位。

5、升级到V1R6之后,应该可以解决上诉问题。但是日志里分别在01:23:21才使能了两框的堆叠,但是

01:49:17、02:21:42、02:12:29和02:32:05都有电源的告警,怀疑是人为手动整框下电,在02:29:13

的时候去使能了堆叠,之后就一直没有再使能堆叠,一直处于单框工作状态。

详细分析如下:

B单框直到 20 1:23才开始有堆叠

Jan 19 2012 21:13:18 Quidway %%01SHELL/5/CMDRECORD(l):Record command information. (Task=co0 , Ip=**, User=**, Command=\"startup system-software cfcard:/s9300v100r006c00spc800.cc slave-board\")

Jan 19 2012 21:13:22

Quidway %%01SHELL/6/DISPLAY_CMDRECORD(l):Record command information. (Task=co0 , Ip=**, User=**, Command=\"display startup\")

Jan 19 2012 21:13:35

Quidway %%01SHELL/6/DISPLAY_CMDRECORD(l):Record command information. (Task=co0 , Ip=**, User=**, Command=\"display current-configuration\")

Jan 19 2012 21:13:40 Quidway %%01SHELL/5/CMDRECORD(l):Record command information. (Task=co0 , Ip=**, User=**, Command=\"system-view\")

A框20号 01:08:53才开始使能堆叠

an 19 2012 22:01:01 Quidway BASETRAP/4/CPUUSAGERESUME:OID 1.3.6.1.4.1.2011.5.25.129.2.4.2 CPU utilization resumed from exceeding the pre-alarm threshold.(Index=70516745, BaseUsagePhyIndex=0, UsageType=1, UsageIndex=0, PhysicalName=\"MPU

Severity=6, Board

13\

ProbableCause=154, RelativeResource=\"\

EventType=4, UsageValue=73,

UsageUnit=1, UsageThreshold=80)

Jan 19 2012 22:01:07 Quidway %%01SHELL/5/CMDRECORD(l):Record command

information.

(Task=co0

,

Ip=**,

User=**,

Command=\"get

S9300V100R006C00SPC800.CC\")

Jan 19 2012 22:02:31 Quidway %%01SHELL/5/CMDRECORD(l):Record command information. (Task=co0 , Ip=**, User=**, Command=\"quit\")

Jan 19 2012 22:02:32 Quidway %%01SHELL/5/CMDRECORD(l):Record command information. (Task=co0 , Ip=**, User=**, Command=\"dir\")

A日志

Jan 20 2012 01:04:46 Quidway BASETRAP/4/CPUUSAGERESUME:OID 1.3.6.1.4.1.2011.5.25.129.2.4.2 CPU utilization recovered to the normal range.(Index=68419593, BaseUsagePhyIndex=0, UsageType=1, UsageIndex=0, Severity=6, ProbableCause=154, EventType=4,PhysicalName=LPU Board 5,

RelativeResource=\"\

Jan 20 2012 01:08:19

Quidway %%01SHELL/6/DISPLAY_CMDRECORD(l):Record command information. (Task=co0 , Ip=**, User=**, Command=\"display device\")

Jan 20 2012 01:08:47 Quidway %%01SHELL/5/CMDRECORD(l):Record command information. (Task=co0 , Ip=**, User=**, Command=\"system-view\")

Jan 20 2012 01:08:53 Quidway %%01SHELL/5/CMDRECORD(l):Record command information. (Task=co0 , Ip=**, User=**, Command=\"css enable\")

B日志

Jan 20 2012 01:23:21 SwitchB %%01SHELL/5/CMDRECORD(l):Record command information. (Task=co0 , Ip=**, User=**, Command=\"save\")

Jan 20 2012 01:23:22 SwitchB %%01HWCM/5/TRAPLOG(l):OID

1.3.6.1.4.1.2011.6.10.2.1 configure changed. (EventIndex=9, CommandSource=1, ConfigSource=2, ConfigDestination=4)

Jan 20 2012 01:23:26 SwitchB %%01SHELL/5/CMDRECORD(l):Record command information. (Task=co0 , Ip=**, User=**, Command=\"system-view\")

Jan 20 2012 01:23:28 SwitchB %%01SHELL/5/CMDRECORD(l):Record command information. (Task=co0 , Ip=**, User=**, Command=\"css enable\")

Jan 20 2012 01:23:29 SwitchB %%01VFS/5/DEV_UNREG(l):Device slave#flash: unregistration finished.

Jan 20 2012 01:23:29 SwitchB %%01VFS/5/DEV_UNREG(l):Device slave#cfcard: unregistration finished.

B日志

Jan 20 2012 01:26:08 SwitchA %%01CSSM/4/STACKBACKUP(l)[326]:This cluster CSS compete result is backup. 选为备框

Jan 20 2012 01:56:59 SwitchB %%01CSSM/4/STACKMASTER(l):This cluster CSS compete result is master.

A日志

Jan 20 2012 01:25:15 SwitchA %%01CSSM/4/STACKMASTER(l):This cluster CSS compete result is master.选为主框

Self slot:25, CSS status: master

Matser:[1,25], backup:[2,27]

1:49分 25掉电了。主备切换。B为主框

Jan 20 2012 01:49:17 SwitchA %%01ALML/4/IOFAULT(l):The \"AC MODE

PROTEC\" sensor3 of [FRAME1/PWR1] detects a fault.

Jan 20 2012 01:49:17 SwitchA %%01ALML/4/IOFAULT(l):The \"AC MODE PROTEC\" sensor3 of [FRAME1/PWR2] detects a fault.

Jan 20 2012 01:49:18 SwitchA %%01ALML/4/IOFAULT(l):The \"AC MODE PROTEC\" sensor3 of [FRAME1/PWR3] detects a fault.

%2012-Jan-20 01:56:29.790.2 SwitchA

01SOURCE/6/TASKREGSUC(D)[64]:Succeed to create framework task LSPMLsp management.

===== current int switch info (slot: 25) =====

Reset reason is power off(after reset), StartKind is Cold Reset.

%2012-Jan-20 01:56:29.790.3 SwitchA

01SOURCE/6/TASKREGSUC(D)[65]:Succeed to create framework task RSVP task.

Jan 20 2012 01:57:52 SwitchB %%01CSSM/4/STACKBACKUP(l)[333]:This cluster CSS compete result is backup.

电源问题导致重新选为备框

Jan 20 2012 02:13:02 SwitchB %%01ALML/4/ENTRESET(l):MPU frame[1] board[13] is reset. The reason is: Reset for no heart.

Jan 20 2012 02:12:29 SwitchB %%01ALML/4/IOFAULT(l):The \"AC MODE PROTEC\" sensor3 of [FRAME1/PWR1] detects a fault.

Jan 20 2012 02:12:29 SwitchB %%01ALML/4/IOFAULT(l):The \"AC MODE PROTEC\" sensor3 of [FRAME1/PWR2] detects a fault.

Jan 20 2012 02:12:30 SwitchB %%01ALML/4/IOFAULT(l):The \"AC MODE PROTEC\" sensor3 of [FRAME1/PWR3] detects a fault.

Jan 20 2012 02:19:02 SwitchB %%01CSSM/4/STACKBACKUP(l):This cluster CSS compete result is backup.

B:掉电了

Jan 20 2012 02:21:42 SwitchB %%01ALML/4/IOFAULT(l):The \"AC MODE PROTEC\" sensor3 of [FRAME2/PWR1] detects a fault.

Jan 20 2012 02:21:43 SwitchB %%01ALML/4/IOFAULT(l):The \"AC MODE PROTEC\" sensor3 of [FRAME2/PWR2] detects a fault.

Jan 20 2012 02:21:44 SwitchB %%01ALML/4/IOFAULT(l):The \"AC MODE PROTEC\" sensor3 of [FRAME2/PWR3] detects a fault.

A:堆叠端口linkdown了

Jan 20 2012 02:21:46 SwitchB CSSM/4/STACKLINKDOWN:OID

1.3.6.1.4.1.2011.5.25.183.3.3.2.1 1/13 CSS port 1 down.

Jan 20 2012 02:21:46 SwitchB CSSM/4/STACKLINKDOWN:OID

1.3.6.1.4.1.2011.5.25.183.3.3.2.1 1/13 CSS port 3 down.

去使能堆叠

B:

Jan 20 2012 02:29:13 SwitchB %%01RSVP/7/SND_HA_BATCHBK_OVER(l):Sent batch backup end event to HA.

Jan 20 2012 02:29:15 SwitchB %%01SHELL/5/CMDRECORD(l):Record command information. (Task=co0 , Ip=**, User=**, Command=\"system-view\")

Jan 20 2012 02:29:18 SwitchB %%01SHELL/5/CMDRECORD(l):Record command information. (Task=co0 , Ip=**, User=**, Command=\"undo css enable\")

A:

Jan 20 2012 02:23:52 Quidway %%01HWCM/5/TRAPLOG(l):OID

1.3.6.1.4.1.2011.6.10.2.1 configure changed. (EventIndex=1, CommandSource=3, ConfigSource=4, ConfigDestination=2)

Jan 20 2012 02:23:54 Quidway %%01SHELL/5/CMDRECORD(l):Record command information. (Task=co0 , Ip=**, User=**, Command=\"system-view\")

Jan 20 2012 02:24:03 Quidway %%01SHELL/5/CMDRECORD(l):Record command information. (Task=co0 , Ip=**, User=**, Command=\"undo css enable\")

后面就都变为了单框了

因篇幅问题不能全部显示,请点此查看更多更全内容