×
Article

Operational Resilience: What It Is And Why It's Important?

One of the biggest challenges faced by organizations is intensifying digital interconnectedness between people, processes, organizations, and risks. The very nature of risks being interconnected—with other firms and financial market infrastructure--can have far-reaching consequences. The ensuing reputational and financial disaster could create a domino effect and impact the entire industry. Organizations need to accept that cyber breaches, systems, processes, and third-party service failures will happen.

Unexpected outages, natural hazards, catastrophic events, and operational mismanagement will cause disruption that impacts customers, stakeholders, and the broader economy.

Effectively managing risk depends on how we integrate people, processes, and technology together within an organization. It’s vital to ensure the identification and control of risks that emerge between these interlinks, are met with a strong multi-layer risk strategy combining the latest tools and technologies.

With an effective operational resilience program, organizations can not only keep the “known unknowns” in check, but also strengthen the continuity of business.

Key Takeaways 

* Operational resilience refers to the comprehensive approach by which organizations can effectively retrospect, respond, course correct, and move forward when faced with adverse risk events and business disruptions. 

* There is a significant uptick in regulatory activity and focus on operational resilience around the world. Authorities are issuing guidance, laws, and regulations to ensure that organizations can not prevent adverse events but also have the ability to recover quickly from them,

* Organizations need not build their operational resilience programs from scratch. They can integrate the principles of operational resilience into their existing risk management program.

What is Operational Resilience?

Operational resilience refers to the organization and its people's capability to withstand, adapt to, and recover from unexpected disruptions such as cyberattacks, natural disasters, or technical failures.

Why is Operational Resilience Important?

Operational resilience goes beyond business continuity and operational risk management. It aims to minimize the impact on consumers and the wider economy. Major disruptive events such as the COVID-19 pandemic have clearly highlighted the need for organizations to ensure continuity of operations by embedding operational resilience as part of their organizational DNA.

The following reasons have highlighted how important it is for organizations to build operational resilience.
 

operational Resilience management
 

1. Risk and Control Identification and Documentation

Operational incidents not only have a significant financial impact but can also disrupt entire markets and systems. According to IBM, the global average cost of data breach in 2023 was $4.45 million. The systemic nature of such incidents is showcased by a New York Fed study which highlighted that if the system of five of the most active US banks is disrupted, it would result in a significant spillover to other banks, affecting 38% of the network on average. These incidents can also have a long tail and result in a long-term impact on the shareholder value as well as the operational risk capital requirements.

By implementing a robust operational resilience program, an organization can ensure a systematic and well-defined process for identifying potential risks and establishing effective controls to proactively mitigate those risks.
 

2. Increased Regulatory Focus

In the last few years, operational resilience has been a key priority for the regulatory authorities around the globecommunity. Some of the recent regulatory developments on operational resilience include the EU’s Digital Operational Resilience Act (DORA), FCA policy statement (PS21/3) and PRA policy statement (PS6/21) in the UK, an interagency paper published by financial regulatory authorities in the US, and many more. 

Today the regulatory focus is shifting as well. Regulators don’t just want to see how effectively organizations can attempt to prevent events from occurring but also how quickly they can recover from them. 

Regulatory authorities expect organizations to understand the firm’s vulnerabilities, invest in protecting those and themselves, their consumers, and the market, to preserve the interest of the public, and retain continuity of supply of products—even in events of operational disruptions.


 

3. Core Programs Constructs

Any operational resilience program needs to be aligned with the overall strategy of the organization so that it can drive and support investment decisions and day-to-day operations. To be successful with this approach, businesses require direct efficient engagement from the board, the front line, and the extended enterprise. The goal is to manage the volatility of the impact generated by problems associated with “business-threatening events.” This means a comprehensive risk program that accommodates operational risk management, business continuity management, and third-party risk management.
 

5 Steps to Build an Operational Resilience Framework


Building a comprehensive operational resilience framework requires organizations to first identify critical business services and operations that need to be safeguarded during disruptive events. Next, they need to set impact tolerances and define key metrics. They then need to understand the interrelationships and dependencies of processes and business functions, both internally and externally. Scenario planning and analysis can help further fine-tune the framework by identifying potential points of failure. Lastly, the entire framework, along with key roles, responsibilities, and accountabilities, should be communicated across the enterprise.


1. Define Key Business Services / Critical Economic Functions (CEFs)

As organizations prepare to streamline and improve their operational resilience program, the first step is to identify relevant key business services - which, if disrupted, could cause substantial harm to the organization, consumers, and the business environment. The concept of causing potential harm is core to operational resilience and forms the crux of the program as all subsequent processes are dependent on the right identification of these CEFs.

To effectively do this, organizations will need to:
 

  • Map the organizational hierarchy, business objectives, market expectations, and supervisory objectives and align them with your organization’s risk appetite. This will enable your organization to clearly understand what your organizational resilience is and provide a critical understanding of the business service alignment to the overall business strategy
  • Identify the users of each service and effectively engage the front line as their participation is critical to the process.
  • Bring the critical insights together in one view. This enables your organization with the information needed to further refine strategic and critical initiatives aligned to the organization’s risk exposure level. It further provides visibility over dependent processes, systems, people, and related third parties that have the potential to impact business objectives.


2. Set Impact Tolerance and Risk Metrics

There are multiple known and unknown factors that contribute to critical disruptions, which may put the organization at risk. Trying to forecast, pre-empt, manage, or mitigate these factors are of high importance if organizations are to accurately report on the stability of the organization.

Organizations need to keep track of the following while setting impact tolerances and risk metrics:
 

  • Set tolerances with full visibility and prioritize operations and investments. This is important as many organizations are now being asked to optimize their investment dollars and need to make comprehensive decisions that can be validated.
  • Gain visibility over business services and processes. These need to be ranked and approved by the board including value-based impacts that threaten the firm’s viability, volume-based impacts that cause harm to consumers and market participants, and time-based impacts that cause instability in the financial system.
  • Utilize a logical and rational approach to setting tolerances that include all interconnecting areas and processes. This will enable realistic scenarios to better understand and analyze the impact tolerance and provide a quick analysis of the type of risks that may impact the industry where the organization is operating and predict impacts on the overall stability of the economy. Simultaneously, it is important to allow an accurate understanding of the scope and organizational impact, with correlation to customer impacts, and partnerships.


3. Understand Dependencies - Upstream and Downstream

Companies operate in a dynamic environment today. Building a relational data framework to map people, process, systems, and third parties required for delivering the business service is an important step in understanding the dependencies. Crucial to building business resilience is to understand the internal and external interconnections and points of view while ensuring that the full picture exists, is current, and that all changes are relevant.

Since organizations are increasingly dependent on third-party providers and outsourcing of some functions, such an approach can help navigate the risks presented by third and fourth parties.

The following best practices can help gain a better understanding of upstream and downstream dependencies:
 

  • Leverage technology to ensure that you gain a single unified view across all vital processes that an organization has highlighted depending on which major factors they need or want to be resilient to. To understand the roadblocks, it’s important to make sure everything is connected and understood by examining the horizontal and vertical view of the critical capabilities.
  • Ensure the organization takes a risk-based and proportionate approach to third and fourth-party vendors. This will require them to consider the nature, scale, and complexity of their operations to continue to meet their obligations. Firms who use these providers must take reasonable care to organize and control their affairs responsibly and effectively, with adequate risk management systems.


4. Leverage Scenarios for Potential Points of Failure

While looking for points of failure it is important to ensure the real impact on the organization and to create a better understanding of the organization’s risk appetite and capabilities.

Consider the following when building scenarios for potential points of failure:
 

  • Include past failures caused both within and outside of the organization’s control to help build operational resilience and provide better visibility across processes.
  • Bring together distinct parts of your organization by examining business continuity management, data management, digital risk management, and third-party risk management. This can give clarity while understanding the real possibilities to better track inter-disciplinary risk scenarios.
  • Identify scenarios for impact tolerance related to people, processes, systems, and third parties, using the relational data framework. This can help assess the impact of inter-relationships. Overlaying the scenarios on your business framework can increase your understanding of where stakeholders come into play.
  • Understand how the risk appetite range can create action plans to mitigate risks. Plot the information obtained from risk scenarios, based on the vitality of service, measurements of dependence, and microeconomic intelligence. Then define the action plan using data points that cover internal capital adequacy assessment, prioritization of the recovery, governance framework, culture, corporate structure, controls, and regulatory framework to build a strong business contingency plan.
  • Keep the focus on validating the risk scenarios against the business objectives of the organization, ensuring that the scenarios address business impacts. Scenario-based testing using questionnaires, simulations, expert tabletop exercises, and thematic views are useful ways of testing response and recovery capabilities.
  • Leverage the potential of the Business Continuity Management (BCM) and Disaster Recovery (DM) teams of the organization. The teams should undergo several scenario analyses, exercises, and testing. It’s important to conduct the same amount of testing for building a stronger operationally resilient team. The idea is to bring about the same level of thinking and analysis within the BCM and DR practices.
  • Understand the weak links in a resilience plan by taking employees out of their comfort zone and making them work in an unfamiliar situation. This can lead to a better understanding of the complexity, business criticality, usage frequency, visible areas, defect-prone areas, and other measurable success criteria of your operational resilience plan.


5. Communicate the Plan and Stakeholder Map

A communication plan forms an integral element in any risk management strategy.

Formulate your communication plan and stakeholder map by:

  • Identifying key internal and external stakeholders and building communication plans during a crisis for both stakeholders. External customers should also have clarity on the alternatives available to them in such an event.
  • Providing evidence for the identification of important business services and demonstrating to the regulators that scenario tests on plausible events have been conducted for all critical business services.
  • Ensuring testing is integral to the operational resilience process and documenting and demonstrating their ability to remain within impact tolerances.
     

Why A Connected GRC Approach is The Answer

Effectively executing the above steps by integrating GRC to support business objectives can prove to be a powerful differentiator. Technology provides a scalable platform and the necessary data model to build a relational data framework and align organizational hierarchy, business services, market expectations, strategic and regulatory objectives. Leveraging the right GRC platform further helps simplify this process with a single, panoramic view that shows the hierarchy of business processes and the functionality--enabling organizations to comprehensively evaluate their impact on strategic and supervisory objectives. Organizations can easily gain tangible insights for arriving at the core/critical functions. Additionally, they are empowered with insights on risk rating or relevance rating of important services which can help identify critical economic functions. A GRC platform can also simplify the capturing, reporting, and tracking of business anomalies—empowering and equipping the front line. 

Make Resilience a Top Priority

Enterprise-wide risk management frameworks in many organizations are capable enough to effectively manage operational resilience. Sustaining these plans will require integration of enhanced preventative, responsive, recovery, and learning capabilities. Here are some key considerations for organizations:

  • To attain a holistic view of risks, consolidate risk identification through service mapping and stress testing. Risk data from service mapping and service risk assessment, with internal and external sources such as threat intelligence, incident data, and loss events, is an asset in operational resilience.

  • Leverage quality and readily available risk and control data from the cloud applications and infrastructure. This ensures the ability to streamline processes using advanced technologies and analytics, including AI and ML techniques.

  • Enable easy understanding of large data sets to provide continuous monitoring of threats and vulnerabilities and ensure there is a more data-driven and fact-based risk assessment.

  • Organizations can Bbuild and implement a pervasive approach to operational resiliencerisk identification, assessment, monitoring, and mitigation with MetricStream’s Operational Resilience Solution. MetricStream brings all aspects of the operational resilience framework onto a single unified platform by seamlessly embedding risk management practices into compliance, cybersecurity, vendor risk management, and business continuity planning to prepare for and prevent potential disruptions.

FAQ

1. Operational resilience vs business continuity

Operational resilience is a proactive approach to ensuring an organization has the essential measures in place to quickly identify, analyze, prevent, respond to, and recover from business disruptions. Business continuity is reactive and involves implementing pre-determined response measures in the aftermath of an event. 

2. What are the key components of operational resilience?

The key components of operational resilience are:

  • Identify important business services
  • Map dependencies
  • Set impact tolerances
  • Identify and test scenarios
  • Analyze findings and build resilience
  • That said, as a best practice organizations should integrate operational resilience principles into their overarching risk management framework, strengthening resilience across cyber, operational, and other areas.

3. Who is responsible for operational resilience?

The ultimate responsibility for an organization’s operational resilience lies with the board and senior management. It is essential for the board and the top management to have insights into critical business operations and services, impact tolerances, and key metrics. Also, they need to set the tone from the top to ensure the operational resilience program is implemented effectively across the firm.

lets-talk-img

Ready to get started?

Speak to our GRC experts Let’s talk