
System reliability, availability and robustness are often not well understoodby system architects, engineers and developers. They often don't understand what drives customer's availability expectations, how to frame verifiable availability/robustness requirements, how to manage and budget availability/robustness, how to methodically architect and design systems that meet robustness requirements, and so on. The book takes a very pragmatic approach of framing reliability and robustness as a functional aspect of a system so that architects, designers, developers and testers can address it as a concrete, functional attribute of a system, rather than an abstract, non-functional notion INDICE: Preface. Acknowledgements. Section 1 Reliability Basics. 1 Reliability and Availability Concepts. 1.1 Reliability and Availability. 1.2 Faults, Errors and Failures. 1.3 Error Severity. 1.4 Failure Recovery. 1.5 Highly Available Systems. 1.6 Quantifying Availability. 1.7 Outage Attributability. 1.8 Hardware Reliability. 1.9 Software Reliability. 1.10 Problems. 1.11 For FurtherStudy. 2 System Basics. 2.1 Hardware and Software. 2.2 External Entities. 2.3System Management. 2.4 System Outages. 2.5 Service Quality. 2.6 Total Cost ofOwnership. 2.7 Problems. 3 What Can Go Wrong. 3.1 Failures in the Real World.3.2 8 Ingredient Framework. 3.3 Mapping Ingredients to Error Categories. 3.4 Applying Error Categories. 3.5 Error Category: Field Replaceable Unit (FRU) Hardware. 3.6 Error Category: Programming Errors. 3.7 Error Category: Data Error. 3.8 Error Category: Redundancy. 3.9 Error Category: System Power. 3.10 ErrorCategory: Network. 3.11 Error Category: Application Protocol. 3.12 Error Category: Procedures. 3.13 Summary. 3.14 Problems. 3.15 For Further Study. Section2 Reliability Concepts. 4 Failure Containment and Redundancy. 4.1 Units of Design. 4.2 Failure Recovery Groups. 4.3 Redundancy. 4.4 Summary. 4.5 Problems. 4.6 For Further Study. 5 Robust Design Principles. 5.1 Robust Design Principles. 5.2 Robust Protocols. 5.3 Robust Concurrency Controls. 5.4 Overload Control. 5.5 Process, Resource and Throughput Monitoring. 5.6 Data Auditing. 5.7 Fault Correlation. 5.8 Failed Error Detection, Isolation or Recovery. 5.9 Geographic Redundancy. 5.10 Security, Availability and System Robustness. 5.11 Procedural Considerations. 5.12 Problems. 5.13 For Further Study. 6 Error Detection. 6.1 Detecting Field Replaceable Unit (FRU) Hardware Faults. 6.2 Detecting Programming and Data Faults. 6.3 Detecting Redundancy Failures. 6.4 Detecting Power Failures. 6.5 Detecting Networking Failures. 6.6 Detecting Application Protocol Failures. 6.7 Detecting Procedural Failures. 6.8 Problems. For Further Study. 7 Analyzing and Modeling Reliability and Robustness. 7.1 Reliability BlockDiagrams. 7.2 Qualitative Model of Redundancy. 7.3 Failure Mode and Effects Analysis. 7.4 Availability Modeling. 7.5 Planned Downtime. 7.6 Problems. 7.7 For Further Study. Section 3 Design for Reliability. 8 Reliability Requirements.8.1 Background. 8.2 Defining Service Outages. 8.3 Service Availability Requirements. 8.4 Detailed Service Availability Requirements. 8.5 Service Reliability Requirements. 8.6 Triangulating Reliability Requirements. 8.7 Problems. 9 Reliability Analysis. 9.1 Step 1: Enumerate Recoverable Modules. 9.2 Step 2: Construct Reliability Block Diagrams. 9.3 Step 3: Characterize Impact of Recovery. 9.4 Step 4: Characterize Impact of Procedures. 9.5 Step 5: Audit Adequacy ofAutomatic Failure Detection and Recovery. 9.6 Step 6: Consider Failures of Robustness Mechanisms. 9.7 Step 7: Prioritizing Gaps. 9.8 Reliability of SourcedModules and Components. 9.9 Problems. 10 Reliability Budgeting and Modeling. 10.1 Downtime Categories. 10.2 Service Downtime Budget. 10.3 Availability Modeling. 10.4 Update Downtime Budget. 10.5 Robustness Latency Budgets. 10.6 Problems. 11 Robustness & Stability Testing. 11.1 Robustness Testing. 11.2 Context of Robustness Testing. 11.3 Factoring Robustness Testing. 11.4 Robustness Testing in the Development Process. 11.5 Robustness Testing Techniques. 11.6 Selecting Robustness Test Cases. 11.7 Analyzing Robustness Test Results. 11.8 Stability Testing. 11.9 Release Criteria. 11.10 Problems. 12 Closing the Loop. 12.1Analyzing Field Outage Events. 12.2 Reliability Roadmapping. 12.3 Problems. 13 Design for Reliability Case Study. 13.1 System Context. 13.2 System Reliability Requirements. 13.3 Reliability Analysis. 13.4 Downtime Budgeting. 13.5 Availability Modeling. 13.6 Reliability Roadmap. 13.7 Robustness Testing. 13.8 Stability Testing. 13.9 Reliability Review. 13.10 Reliability Report. 13.11 Release Criteria. 13.12 Field Data Analysis. 14 Conclusion. 14.1 Overview of Design for Reliability. 14.2 Concluding Remarks. 14.3 Problems. 15 Appendix Assessing Design for Reliability Diligence. 15.1 Assessment Methodology. 15.2 Reliability Requirements. 15.3 Reliability Analysis. 15.4 Reliability Modeling and Budgeting. 15.5 Robustness Testing. 15.6 Stability Testing. 15.7 Release Criteria. 15.8 Field Availability. 15.9 Reliability Roadmap. 15.10 Hardware Reliability. Abbreviations. References. Photo Credits. Index. About the Author.
- ISBN: 978-0-470-60465-6
- Editorial: John Wiley & Sons
- Encuadernacion: Cartoné
- Páginas: 348
- Fecha Publicación: 12/11/2010
- Nº Volúmenes: 1
- Idioma: Inglés