This is Orcmid's Typepad Profile.
Join Typepad and start following Orcmid's activity
Join Now!
Already a member? Sign In
Orcmid
Recent Activity
In the mid 70s, Dick Morse (Mrs. Morse's sone Helmut a he was dubbed by Hugh Rundell) and I talked through the idea of having software fire drills built into systems. The idea, at the time, was that there were points in protocols where errors could be injected to ensure that the recovery procedures worked and also that operators saw them enough (but always recoverable) to know what it was like (and avoid the Maytag repairman syndrome). I actually designed a real-time subsystem for operating multiple terminals off of a Xerox 530 minicomputer in which there were fire-drill points. It was a valuable design exercise but I never needed to pull a fire drill. It happened that there were some heuristics for estimating the size of data blocks needed to satisfy a terminal request or response that would guess wrong often enough that the recovery code for that was exercised regularly enough and it was visible (to those who knew what was happening) and it recovered properly. Meanwhile, cases of dropped responses from the controller, a situation that could have been injected, happened often enough that we never had to do that. We didexpose a problem in the hardware architecture, however. The terminal controller was on the other side of a cheapo-adapter that provided no way for the minicomputer to force a reset of the controller. So if the controller (or the adapter) went autistic, all we know was all of our requests were timing out and all we could do was slowly shut down all of the sessions as if the terminal operators had simply all walked away without logging off. My interest in this kind of fire drill was inspired by an earlier experience in the late 60s when Sperry Univac was building a System/360 semi-clone. (It was not plug compatible, and it could some of use the same devices but not the operating system). In the test center when early production machines were being used to develop the operating system, including all of the device drivers, IBM disk drives were being used until we had delivery of our own. Everything was going along great until newly-manufactured competitive drives were installed. These drives were not so reliable and the OS started crashing, because the error recovery paths in the drivers had never been exercised and they failed.
Toggle Commented Apr 25, 2011 on Working with the Chaos Monkey at Coding Horror
Orcmid is now following The Typepad Team
Apr 25, 2011