Cyber risk has undoubtedly moved up the priority list and taken the center stage in boardroom discussions with the rapid pace of digital transformation of organizations and amplified data-dependency and interconnectedness. The COVID-19 pandemic and the resulting remote working environment have only aggravated the challenges for security teams as the entire workforce moved home—beyond the reach of the office firewall. In these unprecedented times, ensuring robust cyber defense infrastructure to protect critical assets is of paramount importance.
We recently conducted a survey to take a pulse of the current state of IT and cyber risk management programs at organizations. Here are the key takeaways from the survey:
It is encouraging to see that switching to digitized and centralized GRC solutions is among the top priorities of organizations this year. These solutions can help improve risk visibility and foresight, facilitate continuous monitoring of IT and cyber controls, and streamline overall cyber risk and compliance management. Innovative features, such as support for mobility, real-time reporting, advanced risk analytics, regulatory notifications, and more, further assist executive management and board in quick and efficient decision-making.
“The ultimate goal isn’t to avoid cyber risk but rather transforming it into strategic advantage—because things can and will inevitably go wrong at some point. But if organizations build their cyber resilience—the ability to not just prevent cyberattacks but also minimizing the impact of security incidents and ensuring continued business operations in the aftermath of attacks—that’s when they can truly thrive and create business value,” an excerpt from the report reads.
Our flagship event, GRC Summit, was held recently and brought together the best in the industry to share risk management strategies and best practices, and how to build better governed, more risk-aware, compliant, and resilient enterprises that thrive on risk.
Unsurprisingly, cyber risk has emerged as one of the top risks faced by organizations today, and risk leaders believe that it will continue to dominate the risk strategies going forward. To that end, security experts discussed some of the key considerations for ensuring a robust cybersecurity program:
The best-prepared organizations in the world today are those that use risk as their competitive advantage. Quantifying cyber risks in a manner that makes sense to the executive board and helps them make sound cybersecurity investment decisions is critical for organizations to thrive in today’s digital world. The Cyber Risk Quantification capability of MetricStream IT and Cyber Risk Management can make it considerably easier for organizations to quantify cyber risks in monetary terms, which can then be easily communicated to the top management and board.
To download the report, click here. To watch the summit, click here.
Kafka is an open-source real-time streaming messaging system built around the publish-subscribe system. In a service-oriented architecture, instead of subsystems establishing direct connections with each other, the producer subsystem communicates information via a distributed server, which brokers information and helps move enormous number of messages with low-latency and fault tolerance and allows one or more consumers to concurrently consume these messages.
Kafka does an excellent job with respect to fault-tolerance and ensuring that the messages that are delivered are not lost by partitioning, replication, and distributing the data across multiple brokers.
In distributed systems failures are inevitable, whether it be DB connection failure, or network call failure, or outages in downstream dependencies, especially in a microservices ecosystem.
There are multiple issues that could occur on the consumer side that need special handling. When implementing the Kafka Consumer, there are some scenarios that need to be considered that need special handling:
Downstream Service or Data Store Failure
Consumer is not able to process the message because a downstream microservice API is unavailable or returns an error, or a DB it's trying to connect to is down or unresponsive.
This blog post discusses some of the error handling mechanisms that we implemented as a part of the MetricStream Platform to improve the robustness and resiliency of the Platform.
Data Format Changes or Event Version Incompatibility
The consumer is expecting the message payload to be in a certain format, whereas the producer has changed the format of the message e.g., a required field is removed, i.e., for example the consumer is unable to deserialize the message which is sent by the producer in a certain format.
Unable to reach Kafka cluster
The producer may fail to push message to a topic due to a network partition or unavailability of the Kafka cluster, in such cases there are high chances of messages being lost, hence we need a retry mechanism to avoid loss of data. So, the approach we take here is to store the message in a temporary secondary store DB/Cache and retry the messages from the secondary store and try to write the message to the main topic.
Clogged processing
When we are required to process many messages in real time, repeatedly failed messages can clog processing. The worst offenders consistently exceed the retry limit, which also means that they take the longest and use the most resources. Without a success response, the Kafka consumer will not commit a new offset and the batches with these bad messages would be blocked, as they are re-consumed again and again.
Difficulty retrieving retry metadata
It can be cumbersome to obtain metadata on the retries, such as timestamps and nth retry. If requests continue to fail retry after retrying, we want to collect these failures in a DLQ for visibility and diagnosis. A DLQ should allow listing for viewing the contents of the queue, purging for clearing those contents, and merging for reprocessing the dead-lettered messages, allowing comprehensive resolution for all failures affected by a shared issue.
To address the problem of blocked batches, we set up a distinct retry queue using a separately defined Kafka topic. Under this paradigm, when a consumer handler returns a failed response for a given message after a certain number of retries, the consumer publishes that message to its corresponding retry topic. The handler then returns true to the original consumer, which commits its offset.
Here are some of the possible scenarios why the Producer API is unable to send the message.
The approach to recover from the above errors involves building a retry mechanism within the producer to ensure that there is an auto-retry process to try and re-deliver messages and a dead-letter store to save messages that were undeliverable even after the auto-retry process.
The steps involved are (see diagram above):
1. Client invokes the Kafka client's producer API to push a message to the main topic (configured in the producer API).
2. If there is an exception thrown by Kafka while pushing the message to the topic, then we need a way of handling the error and managing the message in way that we don't lose the message (prevent data loss).
3. When there is an exception returned by Kafka, then the message will be written to a secondary store.
4. Retry policy defines three key things,
5. Based on the retry policy the message will be pushed to a secondary store (DB) till the max retry limit is not reached, once the max retry count is reached, the message will be pushed to the dead letter store.
6. The retry consumer implemented internally as part of the framework will read the messages from the retry store and invoke the producer API to push the message to the main topic.
7. The retry will be done by a separate set of threads from a dedicated retry thread pool, which will not interfere with the main threads pushing the data to Kafka topic or consuming data from Kafka topics.
8. The error handling is controlled through a flag, which the producer can set at the API level, as certain messages may not be as important as the others, such that we can allow the messages from being lost, e.g., log messages.
9. There will be a flag "enableRetry" which will be enabled by default; this can be set at the producer API level to enable/disable error handling.
Some of the scenarios where the consumer process could run into errors are:
To handle these errors, the following mechanisms can be followed to improve resiliency:
Following are a series of steps to be followed for retrying in case of a failure on the consumer end as shown in the above figure:
1. Kafka consumer listener in Service A tries to consume an event/message from the main topic.
2. The consumer in Service A has a dependency on another service e.g., Service B or a data store to complete the processing, e.g., it may try to invoke another API on a microservice to fetch or update some data.
3. There is an exception thrown while making a call to the microservice i.e., Service B due to some network failure or the service throws an exception due to some internal service failure (i.e., Internal Server Error or Service not available)
Valid Characters for Kafka topics:
Max Allowed Topic Name Length:
Main topics convention:
Resulting Topic Name: org.orders.created
Example of retry policy configuration in KafkaListener annotation:
@KafkaListener(name = "workflow", topics = "forms", group = "workflow", retryPolicy = @SimpleRetryPolicy(retryBackoffMs = 5000, retryCount = 2, exceptions = {KafkaConsumerException.class, IOException.class}))
Following mechanisms can be optionally added to the producer/consumer retry policy.
As the pandemic continues to batter right through into 2021 and businesses return to the next normal with vaccines making their way into our lives, staying on course with compliance becomes even more critical. Why so?
Regulatory and Corporate compliance, closely tied to brand image and reputation, tops any organization’s priority today to steer clear of penalties, work stoppages or lawsuits in an environment where regulatory complexities are growing. Chief Compliance Officers (CCO) recognize that the cost of non-compliance is too high to bear in a world that is still facing the scourge of COVID-19 crisis. CCOs, tasked with guaranteeing adherence while pre-empting risks, understand the value of putting together a risk-based, integrated compliance strategy.
So, let’s look at what makes for a comprehensive compliance strategy. Starting with a risk-based and federated approach, it entails tracking regulatory engagements, keeping policies in sync with new regulations, while not taking the eye off integrity and culture needs.
A federated approach to compliance makes room for a holistic view, where departments across the board collaborate and share compliance information and technology, but also ensure that the unique compliance needs of each department are kept in place. This is the sign of a true mature organization because it weeds out duplication of effort, breaks data silos and offers an opportunity to create a common compliance data architecture.
To put together a tightly-knit compliance strategy, organizations must adopt a risk-based approach. The need of the hour, especially post the pandemic, is a risk-based approach that is customized to suit the needs of each industry type. With the COVID-19 crisis, organizations have woken up to the reality that not only are there record-high regulatory fines to deal with in case of non-compliance, but also that not all risks need the same level of protection.
Informed decision making in an evolving landscape requires creating best practices for managing compliance risk. The three key steps organizations can take to carve out a robust compliance risk management program are:
The pandemic especially requires organizations to reassess and rearchitect their compliance risk profiles, both from a quantitative and a qualitative perspective. What is a good way to acquire a contextual view of risk? It is by putting in place an integrated compliance data model that ensures a link with other risks as well as regulations, policies, processes, controls, objectives, etc. Risks must be linked to their appropriate owners. And, risk computations make it easier for organizations to rank and prioritize compliance risks.
The next steps are choosing the appropriate controls so as to prevent or detect risks better. Well executed controls, stem risks. Compliance management software tools, especially Robotic Process Automation (RPA) tools, have a key role to play here as they help accelerate control assessments by automating and streamlining processes. Compliance management softwarecan help document potential risks and make room for systematic issue investigation and remediation.
For organizations that operate across geographies have their own share of risk reporting complexities to deal with. Real and on-time reporting is feasible with use of advanced reporting tools such as graphical dashboards that help view historical as well as real-time data. Organizations are also exploring the use of advanced analytics and machine learning in detecting and predicting compliance risks so that compliance managers stay clued in to ground realities.
Risk mitigation may be the primary responsibility of compliance experts, but all the three lines of defense must work in tandem on this. The stronger the business ownership of risk, the better positioned an organization is. An integrated and holistic compliance strategy and program puts workflows around policies, cases, compliance assessments and other processes on the fast track. And while this happens, organizations must not lose sight of integrity and culture. Compliance and integrity are two sides of the same coin. Be it the management, board or the frontline, each has a role to play to help the organization imbibe the culture of compliance.While the top management, leads from the front by articulating the organization’s core values in an unambiguous and consistent manner, the middle and lower management are the eyes and ears of the organization. The top managers can lean on tools such as employee reviews and customer surveys, while they help employees gauge the importance of accountability, transparency and desired behaviors. The Board of Directors, on the other hand, can institute formal processes and structures to monitor progress and gaps in compliance to integrity and take corrective actions where necessary.
The COVID-19 crisis has brought with it a changing compliance landscape. As of May 2020, more than 100 countries issued over 350 regulatory notifications to deal with the COVID-19 crisis. The key challenge for organizations is to ensure compliance without disrupting operational efficiencies. To keep policies in sync with recently-updated regulations both at the global and the federal level, organizations can take to take a few steps, that are outlined in the graphic below :
Build credibility with regulators with effective regulatory engagement
Organizations need an agile and well-coordinated strategy to effectively track regulatory engagements. To strengthen their regulatory relationships, organizations can:
Organizations that Perform with Integrity™ enjoy brand loyalty of customers, partners as well as employees. MetricStream helps customers build more risk-aware and compliant cultures through a range of governance, risk and compliance (GRC) products and solutions built on an integrated risk platform. Our M7 Regulatory Compliance and Corporate Compliance solutions help organizations strengthen compliance by adopting an integrated approach.
As the pressure on compliance and regulatory engagement management teams grows, our solutions will help you:
Subscribe for Latest Updates
Subscribe Now