Build cyber resilience: NIST CSF & cloud architectures.


Cyber Security Framework Usage

As cyber-attacks continue to rise every year, organizations are increasingly adopting NIST’s Cybersecurity Framework (CSF) to put an organization level plan to address business disruptions due to cyber disasters. According to Forrester, “Ransomware attacks are up 500% in the last year with damage costs expected to soar up to $11 billion in the coming year”. On the one hand, cybersecurity vendors focus on a few segments of the CSF, on the other legacy backup tools fail to properly address the Protection and Recovery mechanisms outlined by the framework. Given the increasing attack profile and sophistication of NotPetya, WannaCry, and others, existing application recovery mechanisms largely fail to help organizations, particularly they completely fail to address the recovery of entire application environments. New application-level resilience solutions built using cloud-native services need to be deployed by organizations to effectively protect and recover from various cyber disasters such as Ransomware. Cloud platforms provide the necessary building blocks required by the CSF to build proper Cyber Resilience solutions. This post explores how organizations can leverage cloud provider best practices along with new resilience solutions, like Appranix, built for the “always-on” enterprises that can be leveraged to properly implement NIST CSF.

Cybersecurity vs Cyber Resilience

As organizations continue to invest in Cyber Defence, it is important to differentiate between Cybersecurity vs Cyber Resilience. Cybersecurity consists of technologies, processes, and measures that are designed to protect systems, networks, and data from cybercrimes. Most of the Cybersecurity aspects focus on Identification, Detection, and Response so you can continue with existing production systems. Cyber Resilience helps businesses to recognize that hackers have the advantage of innovative tools, the element of surprise, target and can be successful in their attempt. Cyber Resilience focuses more holistically and in particular, Protection and Recovery get more importance. In the event of a successful attack which could be disastrous, including systems being inaccessible, how can an organization continue to offer business services?

Cyber resilience is defined as an organization’s ability to Protect and Recover from cyber triggered business disasters. In recent years, cyber-attacks have shifted from experimentation, fraud, extortion, blackmail and data exfiltration to more damaging impacts, such as system destruction, data eradication, and data manipulation. The Petya and NotPetya malware showed the world how ransomware attacks can spread and how damaging they can be to a company’s business operations and, ultimately, its core mission. The damages caused by NotPetya reached an estimated $10b, exceeding the $4b–$8b estimated losses caused by the WannaCry outbreak one month earlier. Organizations must recognize that these events are really an existential attack on the business itself that can have disastrous effects, fundamentally threatening the going concern of a company.

CSF Five Stages and Subcategories

NIST Cybersecurity framework categorizes the framework in five stages. A good implementation of the CSF involves covering all the categories and subcategories properly.

Cyber Security Framework Core

This post mainly focuses on Protect and Recover components of the CSF and its subcategories. There are several tools and services available for the Identify, Detect and Respond stages to prepare organizations. In fact, the entire Cybersecurity industry more or less focuses on these three stages that address the production environments. However, they fail to address what happens when the entire production environments are taken over by the attacker.

Main function and Categories

Key Components of the Protect Subcategory

To effectively protect against attacks, CSF gives elaborate details on Identity and Access Management under PR.AC (Protect.Access Control) 1 through 5 subcategories. PR.AC-5 gives details on network integrity and isolation of the production network from the rest of the corporate networks. There are seven (7) subcategories within PR.DS (Protect.Data Security) category. The key ones are how organizations shall protect against data leaks (PR.DS-5) and how Dev & Test environments should be separate from production environments (PR.DS-7).

Under Information Protection Processes and Procedures (PR.IP), there are twelve (12) sub-sections that detail clearly what kind of protection systems should be implemented.

  • PR.IP-1: A baseline configuration of the IT systems should be created and maintained
  • PR.IP-2: A system development life cycle to manage systems should be implemented.
  • PR.IP-3: Configuration and change control processes should be in-place
  • PR.IP-4: Backups of information is conducted, maintained and tested
  • PR.IP-7: Protection processes are continuously improved
  • PR.IP-9: Response plans (incident, business continuity) and recovery plans (incident and disaster recovery) should be in-place.
  • PR.IP-10: Response and recovery plans are tested
  • PR.IP-12: A vulnerability management plan is developed and implemented

Today, most of the backup tool vendors market their legacy tools as protection against cyber attacks. Their recovery is more or less focused on a minimal component of the infrastructure such as the individual recovery of virtual machines or a file system or databases. Most of these backup systems were created for the data center era, not for the cloud hyperscale era where every component of the infrastructure, be it, compute, storage, load balancer, or network, is software-defined and need to be properly protected and recovered in the case of a cyber disaster. In order to properly implement the recommendations, a holistic approach is required to capture the baseline configurations of not only VMs and databases but an entire application environment as well. Moreover, environment configurations and change control processes should be kept, and continuously improved to keep up with changes in the production environments. This is particularly important as organizations adopt DevOps practices to increase the rollout of changes to satisfy customer requests. PR.IP-2 suggests that a system development life cycle should be in-place. The best approach for implementing such a life cycle is to use cloud platform infrastructure-as-code so proper versioning for configurations and deployment can be implemented using version control systems.

Legacy backup systems are completely inadequate to handle most of the requirements of CSF Protect recommendations as they completely ignore the environment configurations, cloud services life cycle and orchestration requirements. A new system should be in a place that offers complete application environment protection not only for systems but also for configurations of those systems as well. Recovery should be performed using proper life cycle management techniques using modern cloud-native tools as opposed to data center era tools.

Cyber Security Framework Version 1.1

Key Components of CSF the Recover Subcategory

CSF Recover stage has one important subcategory called “Recovery Planning (RC.RP)”. In order to successfully implement RC.RP, organizations need to have the Protect category properly implemented. In the event of a successful cyberattack, Recovery processes and procedures should be executed to ensure timely restoration of the systems. This is only possible if organizations conduct frequent recovery tests. Recovery tests are possible only when adequate capacity is available or allocated to test frequently. This is where the hyper-scale, software-defined nature of the public cloud platforms play a major role. They provide on-demand capacity in the same region as the production environment or in a completely isolated region of the country to test various cyber disaster recovery cases.


Effective Cyber Resilience Implementation Needs Cloud Platforms


Cloud workload protection needs a completely new backup and recovery system

Cyber Resilience requires a set of technologies that work together at the infrastructure, software platform and application level. Given the confusion with existing cybersecurity and backup tools, an effective Cyber Resilience implementation can only be done with the help of a hyper-converged, software-defined, hyper-scale infrastructure like public cloud platforms. However, organizations do need more than just the infrastructure level cloud services provided by vendors such as AWS, Google or Azure. They need the following capabilities on top of the cloud platforms to effectively implement a comprehensive Cyber Resilience (CR).

  • Complete environment protection: An effective CR solution should protect an entire cloud application environment not just part of the cloud infrastructure such as VMs, or file systems. Because on the cloud, workloads can freely move around within an environment to accommodate performance and scaling requirements. Entire application environment protection involves backing up of VMs, disks, network, security, load balancer, identity, and other crucial configurations and actual data so an organization can bring back an entire system after a cyber attack.

  • Immutable “Gold” images and storage: Unalterable or write-once-read-many (WORM) cloud storage technologies for application data and platform configurations to prevent corruption should be in place. Clouds offer alternate regions to production regions that are physically far apart as far as in another part of the country or even across another continent’s cloud region. Air-gapped protection: There has to be proper network isolation to separate production environments from the storage that contains protected, backed-up data so even after a successful attack, organizations can safely recover production data up to a known time within the Service Level Objective (SLO) goals.

  • Recovering in another cloud account: Cloud platforms offer a new layer of isolation that was not possible in the data centers. Organizations can effectively create an isolated, physically separated account with global replication of data to a completely isolated cloud account that will ensure recovery of applications even if the primary production account was taken over by a cyber attacker.

  • Cloud configuration data: Along with system images and disks, application environments cloud service configurations should be protected so they can be recovered as well. Moreover, automated testing and validation to help detect unauthorized changes and ensure the data being protected are clean and recoverable should be in place.

  • Cloud-native orchestration: Automation of the end-to-end recovery of the entire cloud application environment should be done with cloud-native orchestration systems such as Cloud Formation or Deployment Manager or Azure Templates, etc. depending on the cloud platform. Monitoring and reporting: Automated dashboard to monitor data, cloud configuration changes, SLO deviations, and snapshot validation status in real-time, along with built-in modules to generate reports for audit and compliance will enhance Cyber Resilience implementations for the organizations.

Appranix Cloud Application Resilience

Appranix pioneered an industry-first cloud-native platform that offers a true cloud application resilience that protects, recovers entire cloud application environments. Appranix SaaS automatically discovers and creates an application environment time machine for organizations directly from the cloud provider marketplaces. Organizations can recover an entire application environment, application state, and cloud service configurations data with a few clicks in the event of a cyber attack like Ransomware.

Appranix Cloud Application Environment Time Machine

Appranix not only gives organizations a modern, cloud-native agentless backup and recovery system but also offers what legacy backup systems can not do which is to protect and recover an entire application environment.

Appranix offers a comprehensive solution for Cyber Resilience with

  • Complete cloud application environment protection: Workloads can freely move around the application environment with auto-scaling that carry network and security configurations

  • Appranix captures a baseline configuration of the application environment and system images and keeps an immutable copy across a different region to protect against main region attacks

  • Air-gapped Production: Organizations can air-gap production vs backup environment with gold images, configurations and keep them across a different cloud region separated by VPCs

  • Appranix can recover your cloud environment in a completely isolated cloud account. This is the best way to achieve air-gap security and protect backup environments against cyberattacks

  • Appranix uses built-in, ever-evolving cloud-native platform services to protect systems and cloud configurations. Effective recovery of systems is only possible when the cloud service configurations can be recreated along with recovered systems for a full application environment recovery using cloud-native services

  • In order to ensure successful recoveries, Appranix uses auto-generated cloud-native infrastructure-as-code to recreate the cloud infrastructure at a particular point in time using an application environment time machine

  • Appranix continuously monitors and reports using alerts and dashboards. SLO notifications can be configured with notification details so system administrators can quickly respond to attacks.

Summary

Legacy backup and recovery systems are totally inadequate for Cyber Resilience. Organizations need to completely rethink their capabilities to recover entire application environments from cyber disastrous scenarios like ransomware attacks. As they move more and more workloads to the cloud platforms, they are in a unique position to shed their technical debt and move to a more modern cloud-native application-centric resilience system.