Hardware locks: Comments and Possible Solutions

Do you know where your system board is tonight?

Updated February 14, 1998 -- Have a Heart!


February 11, 1998


Overheating

Memory

BIOS

Resident Applications

Video Cards - Millennium

Knowledge Acquisition


Back to Benchmarking Index 
Optimized for 1280X1024 DPS
Email: lae2@psu.edu
http://www.personal.psu.edu/lae2 

Overheating:  About a year ago I informed SuperMicro that  a loose heatsink on the voltage regulator (VR) was causing locking in a P5STD system.  The system randomly locked after about one hour of use.  The loose heatsink disrupted thermal conductance from the VR to the heatsink.   Since tightening of the heatsink hardware, this system has run 24/7 for a year without a lock.  SuperMicro changed their manufacturing protocol for mounting theVR heatsink.  Nevertheless, I inspect and tweak the VR hardware on all newly acquired boards. 

At about this same time, I built a CMS system that, although stable, elaborated a burning odor.  It turned out that the VR was out of alignment with the heatsink.  The hardware was tight, but the mounting surfaces were not perfectly congruent.  Reshaping the VR mounting leads oriented the VR perfectly orthogonal to the system board.  This corrected the divergence of mating surfaces between the VR and heatsink.  The burning odor disappeared. 

The main computer in my office was built using the first HX83 released to the public.  Today, this computer is used heavily for image analysis and typical office suite applications.  Plus, it is a 24/7 server tor the FX83/HX83 Forums and other interactive WEB pages. Currently, the CPU is an Intel 233MMX clocked at 3.5X75 (266).  The HX83 runs stable and fast when paired with compatible parts. More out of concern than need, I recently purchased a used HX83.  The original owner had trouble with  lockups.  I made the purchase in the wake of a hinted trip to the desert with a 44 Magnum and the this poor HX83.  I narrowly intervened in time to avert yet another news group copy cat killing.  I think it is high time that special interest groups petition lawmakers to put a stop to the carnage directed toward SuperMicro system boards.  We, as ordinary citizens, can do our part by rescuing the misunderstood.  But first, we must become sensitive to the danger signals indicating a board in trouble.  Perusal of usenet is perhaps the most efficient way to spot a board at risk.  After 20 posts betraying extreme frustration an owner might refer to the desert, target practice, or even liquid nitrogen.  There is little time to lose. The concerned citizen must spring into action.  Upon receiving the "misbehaving" board, I immediately gave it a warm bath and a cup of hot soup.  Next, I inspected the VR hardware.  The mounting hardware was snug, but not nearly as tight as I prefer.  More importantly, the mating surface of the VR was not perfectly aligned to the heatsink.  There was a visible gap between the top edge of the VR and the surface of the heatsink.  I realigned the mating surfaces and  reinstalled the heatsink using thermal conductive paste. The beginnings of a new life are underway for this board (2/13/98).  Hopefully, there will be a happy ending as this wayward orphan integrates into a new home.  I only hope that the emotional scars do not run too deep (and that I am eligible for a tax deduction). 

Recently, I built two "budget" systems using the STE.   Classic pentiums are available for less than 100.00.  Used STEs are in the 60.00 range.  Even less, if you can rescue one from a potential STE assassin. You can build a very capable system with minimal expenditure.  There are about a dozen SM Socket 7 boards at my workplace.  Several of these systems use the STE systems board.  These systems were built (or rebuilt) by me.  As a matter of routine, I inspect the VR/heatsink hardware.  I verify that the nut is tight and that the mating surfaces are in alignment.  For good measure, the heatsink is reinstalled using thermal paste.  It is important to note that one of the VRs is electricaly isolated from the heatsink.  A gasket separates this VR from the heatsink. There should be about 500 ohms resistance between the VR and heatsink.  The other VR is in direct continuity with the heatsink. 

I use a time honored technique for assissing CPU temperature -- touching the heatsink.   The entire heatsink should feel evenly warm, but not burning hot. Be alarmed if either the CPU or the VR heatsink is cool.  This might indicate poor thermal conduction between the active device and the heatsink. Use a thermal conductive paste, tape, or epoxy between the CPU and heatsink.  The Pentium fan/heatsink manufactured by PC-Power and Cooling is excellent.  To ensure the best possible ventilation,  the front case fan should be hardwired.  A quality case is a must.  The best AT case, for the money, is the SuperMicro midtower 300W.  This case has three large fans -- one at the front, another at the rear, and the power supply fan.  Unfortunately, the extra cooling required by highly configured systems causes audible turbulence.  With adequate cooling, it is ordinary for the 200MMX to run at 3X75 and for the 233MMX to run at 3.5X75 (266).  However, the 233MMX has more stringent memory requirements than the 200MMX.  The memory has to be compatible and of high quality.  Plus, the BIOS must be setup appropriately. 



Memory:  For whatever reason, SuperMicro system boards (perhaps all boards) vary in the degree to which they tolerate different brands of memory.  I highly recommend ECC memory,   Unfortunately, you can not always rely on price to predict performance.  Memory that will not POST in SM PII boards might run great in SM Socket 7 boards.  Memory that fails to post on Socket 1 might work fine in Socket 7.  Memory that fails in Socket 1 and Socket 7 might work in Socket 8.  I used to hassle returns. Now I shelve "problem" memory and use it in a different configuration. Once you find memory compatible with running wide open, consider purchasing more of the same batch. However, this same memory might fail in a different configuration. I am referring to highly optimized and often overclocked systems.  With compatible memory, favorable BIOS settings, solid software, and effective cooling, the STE/HX83 forms the heart of a very stable system. In my estimation, the majority of hardware instabilities are memory related.  Hopefully, with the newer SDRAM, memory compatibility issues are minimal relative to 72 pin DRAM.  A recent add campaign for Kingston suggests that compatibility issues with SDRAM might be greater than ever.  The faster speed and synchronization opens the door for the smallest of timing errors to cause instability. 

Stable operation in OSR2 running at 83MHZ is difficult to achieve.  The system I routinely configure for 83MHz operation is the HX83/P166.  This configuration is solid.  Stability at higher clock ratios is difficult to attain. Scott Evans is known for overclocking the 233MMX chip to 3.5X83 (292).  Scott uses Crucial Micron Memory (50nS). 

The FX83/PII300 is a bit finicky.  It clearly likes either Toshiba or Micron ECC EDO (Megatrends AMM).  This puzzles me.  The ECC EDO should require more overhead.  I would expect it to be more prone to timing errors.  Yet, I have to detune EDO (several different brands tested including AMM) whereas the AMM ECC EDO runs wide open (50nS) at 4.5X75.  Any other memory has to setup at 70nS.  Some types of EDO will not even POST.  It would be fun to run benchmarks at 4.0X83 or 4.5X83, but my FX83 is not quite stable running OSR2 at 83MHz.  Perhaps the Crucial 50nS ECC EDO would provide just the edge.  But, $250.00 for this experiment is a bit steep. 

Note: Recently, I configured a stable and overclocked FX83/PII300 using standard EDO memory (50nS).  In a moment of budgetary insanity, I bid on six 32M EDO simms from an auction.  My bid won at $49.00.  I received two pairs of identically marked simms (128M) and another one pair (64M) of identically marked simms.  This memory was intended for upgrading existing systems.  For grins, I tried this memory in my FX83.  The 64M pair would not POST, but ran fine in an overclocked P6SNE/Pro180 at 3.5X66.  The 128M pairs failed the memory test.  One of the simms was bad regardless of the system in which it was installed.  It was simply a bad sim.  This left me with one pair (64M) from the 128M matched set.  To my surprise, the FX83 accepted this pair.  Even more surprising, this memory runs at 50nS in an overclocked (4.5X75) configuration.  Go figure.  Should you buy memory from an auction?  Two of six simms worked in an FX83. One of six simms was blatantly bad. 



BIOS:  Ariving at the most stable BIOS settings without compromising performance can be a long and tedious process.  A major help is adhering to a standard and valid measure of stability.  I rely on the Ziff Davis Benchmarking suite for this.  The ultimate test is whether a particular configuration will run, in demo mode, Winbench, Winstone, and 3D Wenbench for 24 hours or more.  This can be difficult to achieve for any computer whether overclocked or running at specification. Adding memory to an already stable and highly optimized system is apt to involve additional BIOS adjustments.  Recently, I overclocked a Pentium 233MMX to 3.5X75 (266MHz).  I previously achieved this with minimal fanfare in another HX83/233MMX.  This second system was more of a challenge.  Scandisk would not run correctly from a floppy (I never let a new configuration "see" the hard drive until scandisk is shown to work properly when run from a bootable floppy disk).  There was not a BIOS setup available that resulted in stable operation.  Finally, I removed the 128M of installed EDO and replaced it with Toshiba ECC PEDO.  I use ECC PEDO memory in overclocked FX83/PII300 systems.  The 266MHz configuration worked great after installing the ECC PEDO.  WinBench97, WB98, 3DWB98, and WS98 (and the HX/FX83 Forum) all run without a glitch.  I can't over stress how important favorable memory is for successful overclocking.  Was the original EDO memory bad?  No, chances are that it will work fine a different configuration.  In fact, this same EDO worked great for months in an HX83/200MMX running at 3.0X75. 

An important, but sometimes overlooked, bios setting is "turn around insertion."  This is true even when running at specification.  The optimal setting varies according to BIOS version and memory type. 



Resident Applications:  Disable resident software or "initialization" applications.   Except for a possible path statement, the autoexec.bat and config.sys files should be empty.  The "run=" and "load=" entries in SYSTEM.INI should be blank.  Check for resident applications by pressing ALT-CTRL-DEL in OSR2.  Use the "End Task" button to unload all resident software except Explorer and Systray.  Remove all applications from the "startup" folder.  Backup the system registry.  Use REGEDIT.EXE to delete applications set to run during the OSR2 boot.  The Run key is located at Local Machine/Software/Microsoft/Windows/Current/Run. 

Video Cards - Millennium:  I recently completed two STE/166 systems.  Each of these systems is configured with a Matrox Millennium 4M and the Matrox 3.80 drivers.  Both of these systems were stable prior to installing DirectX 5.0.  Both systems started locking on graphic events immediately after installing DX50.  This is despite that I did not replace the display driver.  Furthermore, disabling DirectX 50 did not solve the locking problem.  DirectX 50 is written so that you cannot uninstall it using Control Panel.  You can only disable it.  Funny how MS sometimes violates standards set by MS.  Restoring stability required that I uninstall and then reinstall the Matrox drivers. 



Here is yet another locking story involving the STE.  There is a relatively happy ending.    I built the following system about a month ago. 

P5STE  Classic Pentium 166,  64M OKI EDO,  2940AU,  ST32230N,  Mil4M 

Everything appeared to working fine.  Winbench 98 and Winstone 98 completed all WinMarks without incident.  I tested and used the system for a month before releasing it into laboratory use.  Finishing the configuration involved networking software.  I loaded Communicator 404.  Right away the system locked in Communicator.  It still was stable in WB98.  Knowing that turn around assertion is often a tricky BIOS setting I enabled it.  Presto, Communicator worked, but now WB98 was failing.  Reversing turn around insertion reversed the results. 

Right away, I suspected memory.  Despite that Communicator appeared to be the problem I knew there would be no fix for that.  I planned to benchmark the Nine R3D in this system.   I  installed DirectX 50 just to see if I could make things worse (as in the past).  As  expected, this really mucked things up.  I have seen this before. 

The Nine R3D is an incredibly stable video card.  I have configured these cards in Pentium MMX systems, Pentium Pro systems, and Pentium PII systems.  This was to be the first Classic Pentium.  Even though I suspected a memory problem, I tried the 9R3D without swapping memory.   Not so much to my surprise, the system ran flawlessly with the 9R3D.  This is with DirectX50 and the beta 417p drivers.  Communicator is happy and WB98 runs and runs. 

Is the Millennium bad?  The 3.80 drivers were installed.  Downgrading to 3.63 and 3.70 did not affect the locks.  I have successfully configured several STEs with Millennium 1 cards.   Most likely, different memory would cure the locks that happen with the Millennium.  Others might argue that these kinds of interactions don't occur.  I don't know the engineering behind it, but I have seen this before. 

Is the OKI memory bad?  Not with the Nine Revolution 3D as the video card.  I paid a premium for this memory.  Chances are it will be fine in a Socket 8 system.  In my experience, Socket 8 is like "Mikey", it will eat anything. 



Knowledge Acquisition:  Reseach compatibility before purchasing parts.  DejaNews is great for researching system boards before you purchase. Email those who experienced locks and those who don't. Try to determine the differences (then don't build an IDE based system :)).  Avoid certain memory types, configurations, cpus, video cards, drivers, bios versions, etc). Include the targeted applications in your search. Many users upgrade the system board using existing peripherals. Verify compatibility before choosing the system board. 
 
Top of Page
This Site has been accessed  times since February 11th, 1998