How Zurich Insurance coverage Group constructed a log administration answer on AWS


This put up is written in collaboration with Clarisa Tavolieri, Austin Rappeport and Samantha Gignac from Zurich Insurance coverage Group.

The expansion in quantity and variety of logging sources has been growing exponentially over the previous few years, and can proceed to extend within the coming years. Because of this, clients throughout all industries are going through a number of challenges corresponding to:

  • Balancing storage prices towards assembly long-term log retention necessities
  • Bandwidth points when transferring logs between the cloud and on premises
  • Useful resource scaling and efficiency points when making an attempt to investigate huge quantities of log knowledge
  • Conserving tempo with the rising storage necessities, whereas additionally with the ability to present insights from the information
  • Aligning license prices for Safety Info and Occasion Administration (SIEM) distributors with log processing, storage, and efficiency necessities. SIEM options allow you to implement real-time reporting by monitoring your setting for safety threats and alerting on threats as soon as detected.

Zurich Insurance coverage Group (Zurich) is a number one multi-line insurer offering property, casualty, and life insurance coverage options globally. In 2022, Zurich started a multi-year program to speed up their digital transformation and innovation by the migration of 1,000 functions to AWS, together with core insurance coverage and SAP workloads.

The Zurich Cyber Fusion Middle administration staff confronted comparable challenges, corresponding to balancing licensing prices to ingest and long-term retention necessities for each enterprise software log and safety log knowledge inside the present SIEM structure. Zurich wished to establish a log administration answer to work along side their present SIEM answer. The brand new method would want to supply the flexibleness to combine new applied sciences corresponding to machine studying (ML), scalability to deal with long-term retention at forecasted development ranges, and supply choices for value optimization. On this put up, we talk about how Zurich constructed a hybrid structure on AWS incorporating AWS providers to fulfill their necessities.

Answer overview

Zurich and AWS Skilled Companies collaborated to construct an structure that addressed decoupling long-term storage of logs, distributing analytics and alerting capabilities, and optimizing storage prices for log knowledge. The answer was based mostly on categorizing and prioritizing log knowledge into precedence ranges between 1–3, and routing logs to totally different locations based mostly on precedence. The next diagram illustrates the answer structure.

Flow of logs from source to destination. All logs are sent to Cribl which routes portions of logs to the SIEM, portions to Amazon OpenSearch, and copies of logs to Amazon S3.

The workflow steps are as follows:

  1. All the logs (P1, P2, and P3) are collected and ingested into an extract, remodel, and cargo (ETL) service, AWS Accomplice Cribl’s Stream product, in actual time. Capturing and streaming of logs is configured per use case based mostly on the capabilities of the supply, corresponding to utilizing built-in forwarders, putting in brokers, utilizing Cribl Streams, and utilizing AWS providers like Amazon Knowledge Firehose. This ETL service performs two features earlier than knowledge reaches the analytics layer:
    1. Knowledge normalization and aggregation – The uncooked log knowledge is normalized and aggregated within the required format to carry out analytics. The method consists of normalizing log discipline names, standardizing on JSON, eradicating unused or duplicate fields, and compressing to cut back storage necessities.
    2. Routing mechanism – Upon finishing knowledge normalization, the ETL service will apply essential routing mechanisms to ingest log knowledge to respective downstream techniques based mostly on class and precedence.
  2. Precedence 1 logs, corresponding to community detection & response (NDR), endpoint detection and response (EDR), and cloud risk detection providers (for instance, Amazon GuardDuty), are ingested on to the present on-premises SIEM answer for real-time analytics and alerting.
  3. Precedence 2 logs, corresponding to working system safety logs, firewall, identification supplier (IdP), electronic mail metadata, and AWS CloudTrail, are ingested into Amazon OpenSearch Service to allow the next capabilities. Beforehand, P2 logs had been ingested into the SIEM.
    1. Systematically detect potential threats and react to a system’s state by alerting, and integrating these alerts again into Zurich’s SIEM for bigger correlation, decreasing by roughly 85% the quantity of knowledge ingestion into Zurich’s SIEM. Ultimately, Zurich plans to make use of ML plugins corresponding to anomaly detection to reinforce evaluation.
    2. Develop log and hint analytics options with interactive queries and visualize outcomes with excessive adaptability and velocity.
    3. Cut back the typical time to ingest and common time to look that accommodates the growing scale of log knowledge.
    4. Sooner or later, Zurich plans to make use of OpenSearch’s safety analytics plugin, which might help safety groups rapidly detect potential safety threats through the use of over 2,200 pre-built, publicly out there Sigma safety guidelines or create customized guidelines.
  4. Precedence 3 logs, corresponding to logs from enterprise functions and vulnerability scanning instruments, aren’t ingested into the SIEM or OpenSearch Service, however are forwarded to Amazon Easy Storage Service (Amazon S3) for storage. These could be queried as wanted utilizing one-time queries.
  5. Copies of all log knowledge (P1, P2, P3) are despatched in actual time to Amazon S3 for extremely sturdy, long-term storage to fulfill the next:
    1. Lengthy-term knowledge retentionS3 Object Lock is used to implement knowledge retention per Zurich’s compliance and regulatory necessities.
    2. Value-optimized storageLifecycle insurance policies routinely transition knowledge with much less frequent entry patterns to lower-cost Amazon S3 storage lessons. Zurich additionally makes use of lifecycle insurance policies to routinely expire objects after a predefined interval. Lifecycle insurance policies present a mechanism to steadiness the price of storing knowledge and assembly retention necessities.
    3. Historic knowledge evaluation – Knowledge saved in Amazon S3 could be queried to fulfill one-time audit or evaluation duties. Ultimately, this knowledge may very well be used to coach ML fashions to help higher anomaly detection. Zurich has executed testing with Amazon SageMaker and has plans so as to add this functionality within the close to future.
  6. One-time question evaluation – Easy audit use instances require historic knowledge to be queried based mostly on totally different time intervals, which could be carried out utilizing Amazon Athena and AWS Glue analytic providers. Through the use of Athena and AWS Glue, each serverless providers, Zurich can carry out easy queries with out the heavy lifting of working and sustaining servers. Athena helps quite a lot of compression codecs for studying and writing knowledge. Due to this fact, Zurich is ready to retailer compressed logs in Amazon S3 to attain cost-optimized storage whereas nonetheless with the ability to carry out one-time queries on the information.

As a future functionality, supporting on-demand, advanced question, evaluation, and reporting on giant historic datasets may very well be carried out utilizing Amazon OpenSearch Serverless. Additionally, OpenSearch Service helps zero-ETL integration with Amazon S3, the place customers can question their knowledge saved in Amazon S3 utilizing OpenSearch Service question capabilities.

The answer outlined on this put up supplies Zurich an structure that helps scalability, resilience, value optimization, and adaptability. We talk about these key advantages within the following sections.

Scalability

Given the quantity of knowledge presently being ingested, Zurich wanted an answer that might fulfill present necessities and supply room for development. On this part, we talk about how Amazon S3 and OpenSearch Service assist Zurich obtain scalability.

Amazon S3 is an object storage service that provides industry-leading scalability, knowledge availability, safety, and efficiency. The overall quantity of knowledge and variety of objects you’ll be able to retailer in Amazon S3 are just about limitless. Primarily based on its distinctive structure, Amazon S3 is designed to exceed 99.999999999% (11 nines) of knowledge sturdiness. Moreover, Amazon S3 shops knowledge redundantly throughout a minimal of three Availability Zones (AZs) by default, offering built-in resilience towards widespread catastrophe. For instance, the S3 Customary storage class is designed for 99.99% availability. For extra data, try the Amazon S3 FAQs.

Zurich makes use of AWS Accomplice Cribl’s Stream answer to route copies of all log data to Amazon S3 for long-term storage and retention, enabling Zurich to decouple log storage from their SIEM answer, a standard problem going through SIEM options right this moment.

OpenSearch Service is a managed service that makes it simple to run OpenSearch with out having to handle the underlying infrastructure. Zurich’s present on-premises SIEM infrastructure is comprised of greater than 100 servers, all of which need to be operated and maintained. Zurich hopes to cut back this infrastructure footprint by 75% by offloading precedence 2 and three logs from their present SIEM answer.

To help geographies with restrictions on cross-border knowledge switch and to fulfill availability necessities, AWS and Zurich labored collectively to outline an Amazon OpenSearch Service configuration that will help 99.9% availability utilizing a number of AZs in a single area.

OpenSearch Service helps cross-region and cross-cluster queries, which helps with distributing evaluation and processing of logs with out transferring knowledge, and supplies the flexibility to mixture data throughout clusters. Since Zurich plans to deploy a number of OpenSearch domains in numerous areas, they’ll use cross-cluster search performance to question knowledge seamlessly throughout totally different regional domains with out transferring knowledge. Zurich additionally configured a connector for his or her present SIEM to question OpenSearch, which additional permits distributed processing from on premises, and permits aggregation of knowledge throughout knowledge sources. Because of this, Zurich is ready to distribute processing, decouple storage, and publish key data within the type of alerts and queries to their SIEM answer with out having to ship log knowledge.

As well as, lots of Zurich’s enterprise items have logging necessities that is also glad utilizing the identical AWS providers (OpenSearch Service, Amazon S3, AWS Glue, and Amazon Athena). As such, the AWS elements of the structure had been templatized utilizing Infrastructure as Code (IaC) for constant, repeatable deployment. These elements are already getting used throughout Zurich’s enterprise items.

Value optimization

In fascinated with optimizing prices, Zurich needed to think about how they’d proceed to ingest 5 TB per day of safety log data only for their centralized safety logs. As well as, strains of companies wanted comparable capabilities to fulfill necessities, which might embody processing 500 GB per day.

With this answer, Zurich can management (by offloading P2 and P3 log sources) the portion of logs which might be ingested into their main SIEM answer. Because of this, Zurich has a mechanism to handle licensing prices, in addition to enhance the effectivity of queries by decreasing the quantity of knowledge the SIEM must parse on search.

As a result of copies of all log knowledge are going to Amazon S3, Zurich is ready to make the most of the totally different Amazon S3 storage tiers, corresponding to utilizing S3 Clever-Tiering to routinely transfer knowledge amongst Rare Entry and Archive Entry tiers, to optimize the price of retaining a number of years’ price of log knowledge. When knowledge is moved to the Rare Entry tier, prices are decreased by as much as 40%. Equally, when knowledge is moved to the Archive On the spot Entry tier, storage prices are decreased by as much as 68%.

Confer with Amazon S3 pricing for present pricing, in addition to for data by area. Shifting knowledge to S3 Rare Entry and Archive Entry tiers supplies a major value financial savings alternative whereas assembly long-term retention necessities.

The staff at Zurich analyzed precedence 2 log sources, and based mostly on historic analytics and question patterns, decided that solely the newest 7 days of logs are usually required. Due to this fact, OpenSearch Service was right-sized for retaining 7 days of logs in a scorching tier. Somewhat than configuring UltraWarm and chilly storage tiers for OpenSearch Service, copies of the remaining logs had been concurrently being despatched to Amazon S3 for long-term retention and may very well be queried utilizing Athena.

The mixture of cost-optimization choices is projected to cut back by 53% the price of per GB of log knowledge ingested and saved for 13 months when in comparison with the earlier method.

Flexibility

One other key consideration for the structure was the flexibleness to combine with present alerting techniques and knowledge pipelines, in addition to the flexibility to include new expertise into Zurich’s log administration method. For instance, Zurich additionally configured a connector for his or her present SIEM to question OpenSearch, which additional permits distributed processing from on premises and permits aggregation of knowledge throughout knowledge sources.

Inside the OpenSearch Service software program, there are alternatives to broaden log evaluation utilizing safety analytics with predefined indicators of compromise throughout frequent log sorts. OpenSearch Service additionally affords the potential to combine with ML capabilities corresponding to anomaly detection and alert correlation to reinforce log evaluation.

With the introduction of Amazon Safety Lake, there may be one other alternative to broaden the answer to extra effectively handle AWS logging sources and add to this structure. For instance, you should utilize Amazon OpenSearch Ingestion to generate safety insights on safety knowledge from Amazon Safety Lake.

Abstract

On this put up, we reviewed how Zurich was in a position to construct a log knowledge administration structure that supplied the scalability, flexibility, efficiency, and cost-optimization mechanisms wanted to fulfill their necessities.

To study extra about elements of this answer, go to the Centralized Logging with OpenSearch implementation information, evaluate Querying AWS service logs, or run by the SIEM on Amazon OpenSearch Service workshop.


Concerning the Authors

Clarisa Tavolieri is a Software program Engineering graduate with {qualifications} in Enterprise, Audit, and Technique Consulting. With an in depth profession within the monetary and tech industries, she makes a speciality of knowledge administration and has been concerned in initiatives starting from reporting to knowledge structure. She presently serves because the World Head of Cyber Knowledge Administration at Zurich Group. In her function, she leads the information technique to help the safety of firm property and implements superior analytics to reinforce and monitor cybersecurity instruments.

Austin RappeportAustin Rappeport is a Laptop Engineer who graduated from the College of Illinois Urbana/Champaign in 2011 with a spotlight in Laptop Safety. After commencement, he labored for the Federal Power Regulatory Fee within the Workplace of Electrical Reliability, working with the North American Electrical Reliability Company’s Important Infrastructure Safety Requirements on each the audit and enforcement facet, in addition to requirements improvement. Austin presently works for Zurich Insurance coverage because the World Head of Detection Engineering and Automation, the place he leads the staff answerable for utilizing Zurich’s safety instruments to detect suspicious and malicious exercise and enhance inside processes by automation.

Samantha Gignac is a World Safety Architect at Zurich Insurance coverage. She graduated from Ferris State College in 2014 with a Bachelor’s diploma in Laptop Methods & Community Engineering. With expertise within the insurance coverage, healthcare, and provide chain industries, she has held roles corresponding to Storage Engineer, Danger Administration Engineer, Vulnerability Administration Engineer, and SOC Engineer. As a Cybersecurity Architect, she designs and implements safe community techniques to guard organizational knowledge and infrastructure from cyber threats.

Claire Sheridan is a Principal Options Architect with Amazon Net Companies working with international monetary providers clients. She holds a PhD in Informatics and has greater than 15 years of {industry} expertise in tech. She loves touring and visiting artwork galleries.

Jake Obi is a Principal Safety Advisor with Amazon Net Companies based mostly in South Carolina, US, with over 20 years’ expertise in data expertise. He helps monetary providers clients enhance their safety posture within the cloud. Previous to becoming a member of Amazon, Jake was an Info Assurance Supervisor for the US Navy, the place he labored on a big satellite tv for pc communications program in addition to internet hosting authorities web sites utilizing the general public cloud.

Srikanth Daggumalli is an Analytics Specialist Options Architect in AWS. Out of 18 years of expertise, he has over a decade of expertise in architecting cost-effective, performant, and safe enterprise functions that enhance buyer reachability and expertise, utilizing massive knowledge, AI/ML, cloud, and safety applied sciences. He has constructed high-performing knowledge platforms for main monetary establishments, enabling improved buyer attain and distinctive experiences. He’s specialised in providers like cross-border transactions and architecting strong analytics platforms.

Freddy Kasprzykowski is a Senior Safety Advisor with Amazon Net Companies based mostly in Florida, US, with over 20 years’ expertise in data expertise. He helps clients undertake AWS providers securely in line with {industry} greatest practices, requirements, and compliance laws. He’s a member of the Buyer Incident Response Group (CIRT), serving to clients throughout safety occasions, a seasoned speaker at AWS re:Invent and AWS re:Inforce conferences, and a contributor to open supply initiatives associated to AWS safety.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here

Stay on op - Ge the daily news in your inbox