A BRIEF OVERVIEW OF SYSTEMS RELIABILITY


By Haya Altaleb, Óbuda University, Doctoral School on Safety and Security Sciences, H-1081 Budapest, Népszínház street 8, haya.altaleb@phd.uni-obuda.hu

and Rajnai Zoltán, Óbuda University, Doctoral School on Safety and Security Sciences, H-1081 Budapest, Népszínház street 8, rajnai.zoltan@bgk.uni-obuda.hu

Abstract
The reliability and safety concept sprang succeeding later than other engineering branches. This research paper prepared to brief review Systems Reliability; its origin, signification, and impact on industries. Reliability giving a quantitative measure of performance so it’s the failure rate/frequency or the time between failures. Systems elements failures reasons are important to study to enhance reliability. Besides, the authors discussed Reliability present and future challenges. Finally, we recommended some tips to increase systems reliability.

INTRODUCTION

Reliability is defined according to IEEE standards as the ability of the systems or components to perform their designed functions under specified conditions for a specific period of time. Reliability can also be expressed as the number of failures over a period. The key factors of the definition are ability, conditions, required function, and specified period of time. Ability is expressed quantitatively with probability. The required function relates to expected performance. Stated conditions typically refer to environmental conditions of operation. A defined period of time is also referred to as the mission time, which provides an expected duration of the operation.[5]

Classic measures of reliability are failure rate/ frequency, mean time to failure, or the time between failures. Even though reliability provides a quantitative measure of performance, we should not look at the absolute values but rather on a relative basis. It is necessary to realize the difference between quality and reliability. Quality is conformance to specifications before we start operation, which is at time t = 0. Reliability can often be termed as a projection of quality over time, meeting customer’s expectations over its lifetime.[6]

Systems reliability can be increased by using a redundancy technique without changing the individual unit that forms a system. One of the most used forms of redundancy is a cold standby system, which often finds applications in various industries.[11] Liebowitz (1966) and Minc et al. (1968) while studying the redundant system have assumed that a unit, immediately after the failure, enters repair. A two-unit priority standby redundant system with a repairable non-priority unit was addressed by Nakagawa and Osaki (1975). Stochastic conduct, the distribution of time to the failure of the system, the predicted number of system failure visits over a finite interval.[11]

Figure 1: Systems reliability concept.

RELIABILITY THEORY HISTORY

The reliability and safety concept sprang succeeding later than other engineering branches. At Bell Labs in the 1920s, Dr W.A. Shewart inspired the rise of statistical quality control, while W. Weibull conceived the Weibull distribution in order to represent fatigue of materials. The concept ‘the axiom that a chain is no stronger than its weakest link is one with essential mathematical implications’ introduced by Pierce in 1926.[13] 

The first predictive reliability models were discovered in Germany while (Wernher von Braun), one of the most well-known rocket scientists was working on the V1 missile.[3] Afterwards, Wern Von Braun introduced the concept of redundancy to  improve the reliability of systems. During World War II, over 50% of the defence equipment was found to be out of order; (vacuum tube failures) it was due to electronic system failure. The unreliability of vacuum tube performed as a catalyst to the rise of reliability engineering. In the 1950s reliability was born as a branch of engineering in the USA. two years later the Advisory Group on Reliability of Electronic Equipment (AGREE) was created by the Department of Defense (DOD) and the American electronics industry.[1]

AGREE reported modularity in design, reliability growth, and demonstration examinations to enhance reliability, as well as, a classical definition of reliability. This study reached the aerospace industry and triggered several applications in the electronic industry. the first conference on ‘quality control and reliability’, held in this period as well as the first journal in the area ‘IEEE Transaction on Reliability’ by the Institute of Electrical and Electronics Engineers.[1]

Later on ‘Fault Tree Analysis (FTA)’ concept has been introduced in 1961 by H.A. Watson to evaluate the control system of the Minuteman I Intercontinental Ballistic Missile (ICBM) launching system at Bell telephone laboratories. The FTA considers as one of the pillars for safety and risk assessment even today, which is widely used in nuclear and aerospace industries. The aerospace industry in the early 1960s introduced the failure mode effect analysis (FMEA) method and it became popular in the automotive industry. The aerospace industry started to use a systematic approach to evaluate risk called ‘Probabilistic Risk Assessment (PRA)’, Following the Apollo 1 disaster in 1967. The nuclear industry had adapted PRA concepts in the 1970 s, and one more branch of reliability engineering emerged, software reliability which was concerned about software development, testing, and improvement.[8]

BASIC RELIABILITY TERMS

Reliability Concepts
Weibull Distribution is named after the Swedish Engineer Waloddi Weibull who is Famous for pioneering work on reliability and life analysis. Weibull Distribution is a popular tool for modelling lifetimes. The equation for the 3-parameter Weibull cumulative density function, CDF, is given by:

F(t)=1–e–(t–γŋ)β

This is also referred to as unreliability and designated as Q(t) by some authors. Recalling that the reliability function of distribution is simply one minus the CDF, the reliability function for the 3-parameter Weibull distribution is then given by:

R(t)=e–(t–γŋ)β

Depending on the values of the parameters, the Weibull distribution can be used to model a variety of life behaviours. We will now examine how the values of the shape parameter, β, and the scale parameter, ŋ, affect such distribution characteristics as the shape of the curve, the reliability, and the failure rate.[10]

Initially, failures are due to problems in Workmanship or poor quality control. Then, most systems reach a constant rate; failures are caused by the environment, chance events – reduced by design, redundancy. Finally, systems wear out, failures are caused by fatigue, corrosion, ageing – reduced by derating, PM, parts replacement, design technology.

Typical “Quality Over” Time follows a Bathtub Curve Failures as illustrated in Figure 2.

Figure 2: Bathtub Curve. Where β<1 means infant mortality, β=1 means useful life, β>1 means wear-out.

 

Basic reliability terms
Basic Reliability terms include:

 

THE IMPORTANCE OF RELIABILITY AND SAFETY

Failures have ranges of consequences from negligible inconvenience and costs to personal injury, environmental disasters, major economic loss, and deaths. Fukushima-Daiichi nuclear disaster, Chernobyl accident, Deepwater Horizon oil spill, Bhopal gas tragedy, and space shuttle Columbia disaster, all considers major accidents.[2] The main causes of failure include bad engineering design, human errors, faulty manufacturing, improper use, inadequate testing, poor maintenance, and lack of protection against excessive stress. Manufacturers, designers, and endusers strive to reduce the occurrence and recurrence of failures. To decrease failures in engineering systems, it is significant to know ‘why’ and ‘how’ failures occur, as well as how often such failures may occur. Safety deals with the impact after the failure, whereas reliability deals with the failure concept.[7]

WHY DO SYSTEMS ELEMENTS FAIL?

There are numerous reasons cause systems fail. Knowing, as considerably as is possible, the potential causes of failures is fundamental to preventing them. It is rarely possible to anticipate all of the causes. Therefore, the ambiguity involved is also important to take into consideration. During the entire phase, beginning design, manufacturing, and service enhancement, the reliability engineering effort should address all of the unanticipated and anticipated causes of failure to ensure that their occurrence is prevented or reduced.

Systems design is one of the most important reasons for systems element failures, it might be fragile, and inherently incapable, consumes excessive power and suffers resonance at the wrong frequency, etc. Reasons are endless, and every design problem presents the potential for errors and omissions. The more complex the design, the higher is this potential.[14]

As well as overstressed leads to systems failures; when the stress applied exceeds the strength, then failure will occur. For example, the system will fail if the applied electrical stress (voltage, current) exceeds the ability of an electronic component, and if the compression stress applied exceeds the buckling strength a mechanical strut will buckle. Besides, time-dependent mechanisms lead, as in turbine disks and fine solder joints, to failures such as creep caused by simultaneous high-temperature, battery run-down, and ensile stress, and progressive drift of electronic part parameter values are examples of such mechanisms. Software coding, errors such as incorrect requirements, faulty assembly or testing, insufficient or incorrect maintenance, or incorrect use, all of the previous explanations have caused failures.[6]

PRESENT CHALLENGES AND FUTURE NEEDS OF RELIABILITY AND SAFETY

Most of the current studies on reliability and risk focus on assessing and comparing the level of safety with explicit or implicit standards. Moreover, reliability studies are more and more being used in the maintenance and operation of engineering systems activities.

Reliability and Safety Assessments are valuable to manage risk and support decision making for safe, economical, and effective design and operation of complex engineering systems like chemical plants, nuclear power plants, aeronautical systems, and defence equipment. Certain applications like determine critical parts for safety and reliability management, design evaluations for comparison with standards, evaluation of inspection and residual life estimation, and maintenance intervals. There are a few limitations, regardless of several potential applications of reliability and safety studies. Models greatly influence the accuracy of these studies, uncertainties in data and models, unjustified assumptions, and incompleteness in the analysis. In representing the complex behaviour of systems with the mathematical models, there could be simplifying assumptions and idealizations of rather complex processes and phenomena. These simplifications and idealizations lead to inconvenient reliability/risk estimates, the impact of which must be appropriately addressed if the assessment is to serve as a tool in the decision-making process.[9][4] The need to convince plant/system managers of a cultural breakthrough about the benefits of resource-intensive reliability and risk studies. By standardizing approaches and offering tools that direct the practitioners, this can be realized. Finally, designing and applying technically feasible solutions to industrial-scale problems will reduce the distance between theory and reality.[12]

CONCLUSION

The authors give a summarized overview of the system’s reliability its origin, signification, and impact on industries. Besides answering some critical questions related to the main concepts of the topic like, why do systems elements fail? Regarding the research topic, the authors come out with some recommendations and ideas to increase the system’s reliability as the following:

  1. Train well your employee to avoid human errors.
  2. Use fewer components during the Design Phase; for example, by simplifying the system; or using more complex (possibly custom-designed) integrated circuits.
  3. Use better components, with better quality.
  4. Use enough questions to assess competence.
  5. Have a stable environment.
  6. Measure reliability and Use preventive and corrective maintenance.
  7. Improve software reliability.

REFERENCES

[1] “Chapter 2 Literature Review 2.1 Origin of ReliabilityTheory.”

[2] Chernov, Dmitry, Didier Sornette, Dmitry Chernov, and Didier Sornette. 2016. “Examples of Risk Information Concealment Practice.” In Man-Made Catastrophes and Risk Information Concealment, 9–245. Springer International Publishing. https://doi.org/10.1007/978-3-319-24301-6_2

[3] “Germany Conducts First Successful V-2 Rocket Test – HISTORY.” n.d. Accessed May 19, 2020. https://www.history.com/this-day-in-history/germany-conductsfirst-successful-v-2-rocket-test

[4] “IAEA (1992) Procedure for Conducting Probabilistic Safety Assessment of Nuclear Power Plants (Level 1). International Atomic Energy Agency, Vienna, Safety Series No. 50-P-4.” 1992.

[5] IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries, 1990. New York: Institute of Electrical and Electronics Engineers. [6] “Introduction to Reliability Engineering – Reliabilityweb: A Culture of Reliability.” n.d. Accessed May 10, 2020. https://reliabilityweb.com/articles/entry/introduction_to_reliability_engineering

[7] Morag, Ido, Peter Chemweno, Liliane Pintelon, and Mohammad Sheikhalishahi. 2018. “Identifying the Causes of Human Error in Maintenance Work in Developing Countries.” International Journal of Industrial Ergonomics 68 (November): 222–30. https://doi.org/10.1016/j.ergon.2018.08.014

[8] Moranda, PB. 1975. “Prediction of Software Reliability during Debugging.” In The Annual Reliability Maintenance Symposium, 327–32.

[9] “NASA (2002) Probabilistic Risk Assessment Procedures Guide for NASA Managers and Practitioners. Version 1.1, NASA Report.” n.d.

[10] Patrick Breheny. 2018. “The Weibull Distribution – ReliaWiki.” 2018. http://reliawiki.org/index.php/The_Weibull_Distribution

[11] Ram, Mangey, and Monika Manglik. 2016. “Reliability Measures Analysis of an Industrial System under Standby Modes and Catastrophic Failure.” International Journal of Operations Research and Information Systems 7(3): 36–56. https://doi.org/10.4018/ijoris.2016070103

[12] Ushakov, Igor. 2000. “Reliability: Past, Present, Future.” In Recent Advances in Reliability Theory, 3–21. Birkhäuser Boston. https://doi.org/10.1007/978-1-4612-1384-0_1

[13] Villemeur A. 1992. “Methods and Techniques.” Reliability, Maintainability, and Safety Assessment 1 (Wiley, New York).

[14] “Why Systems Fail – BenMeadowcroft.Com.” n.d. Accessed May 13, 2020. http://www.benmeadowcroft.com/reports/systemfailure/

 

ABOUT THE AUTHORS

Ms. Haya Altaleb is a Doctoral Researcher, at the Doctoral school of Safety and Security Sciences at (Óbuda University) and lecturer in Bánki Donát Faculty of Mechanical and Safety Engineering at Óbuda University (Budapest), where she completed her masters in engineering with first rank honors (summa cum laude) and is now pursuing her doctoral degree in safety and security sciences. She has worked in the field of renewable energy as a photovoltaic design engineer to promote clean energy in Jordan, and her research interests include risk assessment, safety and security science, autonomous vehicle systems, disaster geomatics, and exploring the cultural and innovative boundaries between data and society. Ms. Altaleb is a member of the Association of Energy Engineers and Member of CEDS Advisory Board, she models excellence as an early career woman in science by expanding her professional skill set and area of expertise above and beyond her graduate school program.

Prof. Dr. Zoltan Rajnai is a Dean of Bánki Donát Faculty of Mechanical and Safety Engineering. He is. holding a PhD degree in Military science from Miklós Zrínyi University of National Defense. Adding to his distinguished education records, The Senate of Politehnica University Timișoara has decided to confer the title of Honorary Professor in 12 Feb 2020. Prof. Dr. Zoltan Rajnai is a lecturer in the Doctoral School of Safety and Security Sciences at Óbuda University (Budapest). He has research interest in the field of protection of critical infrastructure, information security, security of communication networks of qualified periods. He has been a member of Association (HEA) and the AFCE (Armed Forces Communications and Electronics Assoc) International Military Communications and Electronics Association. He is also a member Tivadar Puskás Fraternal Association News; The HTEZrinski Group member, 2004-2006 Head; Since 2001 the Hungarian he is a member of the public body of the Academy of Sciences; Member of the Board of Trustees of the János Bolyai Honvéd Foundation; Founder of the Tivadar Puskás Technical College; Since 1999 he has been continuously participating in international research projects related to his field.


Download PDF: A BRIEF OVERVIEW OF SYSTEMS RELIABILITY – by Haya Altaleb and Rajnai Zoltán – Óbuda University, Doctoral School on Safety and Security Sciences


Publication date: May 2021