I have a iQstor 2880 SAN device. Recently I was logged into one of the controllers and noticed some strange error messages showing up on the console:
10:49:11, Wednesday, 02/10/2010 : EXCEPTION: Dram Error detected: count=19 cause=4000 esr_c_0004=20000 esr_c_000C=0 esr_c_Lcause=1 esr_c_Lerr=7cc2a2b6 Dram Error being handled: count=19 cause=4000 esr_c_0004=20000 esr_c_000C=0 esr_c_Lcause=1 esr_c_Lerr=7cc2a2b6 Dram Error recovered: count=19 cause=4000 esr_c_0004=20000 esr_c_000C=0 esr_c_Lcause=1 esr_c_Lerr=7cc2a2b6
If you see errors like this, it indicates that you have bad RAM on the controller that you are logged into. Now, the important thing to note here is that these memory errors will only show up on the console of the controller that has the bad RAM. The messages will not be placed into syslog OR copied to the console of all controllers. The best way to verify which controller has the bad RAM is to open a telnet session to both controllers and leave them up for a while. Wait for the console posted error to show up and then you have confirmation on which controller to switch out the memory on.
Now, if only iQstor would get the errors to trigger an alert, copy to both consoles (with detail of which console has bad RAM, and syslog.