Improve monitoring and debugging for AWS Glue jobs utilizing new job observability metrics, Half 3: Visualization and pattern evaluation utilizing Amazon QuickSight


In Half 2 of this collection, we mentioned the right way to allow AWS Glue job observability metrics and combine them with Grafana for real-time monitoring. Grafana offers highly effective customizable dashboards to view pipeline well being. Nonetheless, to research tendencies over time, combination from totally different dimensions, and share insights throughout the group, a purpose-built enterprise intelligence (BI) software like Amazon QuickSight could also be more practical for your small business. QuickSight makes it easy for enterprise customers to visualise knowledge in interactive dashboards and reviews.

On this submit, we discover the right way to join QuickSight to Amazon CloudWatch metrics and construct graphs to uncover tendencies in AWS Glue job observability metrics. Analyzing historic patterns means that you can optimize efficiency, establish points proactively, and enhance planning. We stroll by means of ingesting CloudWatch metrics into QuickSight utilizing a CloudWatch metric stream and QuickSight SPICE. With this integration, you should utilize line charts, bar charts, and different graph varieties to uncover every day, weekly, and month-to-month patterns. QuickSight helps you to carry out combination calculations on metrics for deeper evaluation. You may slice knowledge by totally different dimensions like job identify, see anomalies, and share reviews securely throughout your group. With these insights, groups have the visibility to make knowledge integration pipelines extra environment friendly.

Resolution overview

The next structure diagram illustrates the workflow to implement the answer.

The workflow contains the next steps:

  1. AWS Glue jobs emit observability metrics to CloudWatch metrics.
  2. CloudWatch streams metric knowledge by means of a metric stream into Amazon Knowledge Firehose.
  3. Knowledge Firehose makes use of an AWS Lambda perform to rework knowledge and ingest the remodeled data into an Amazon Easy Storage Service (Amazon S3) bucket.
  4. An AWS Glue crawler scans knowledge on the S3 bucket and populates desk metadata on the AWS Glue Knowledge Catalog.
  5. QuickSight periodically runs Amazon Athena queries to load question outcomes to SPICE after which visualize the most recent metric knowledge.

The entire assets are outlined in a pattern AWS Cloud Growth Package (AWS CDK) template. You may deploy the end-to-end answer to visualise and analyze tendencies of the observability metrics.

Pattern AWS CDK template

This submit offers a pattern AWS CDK template for a dashboard utilizing AWS Glue observability metrics.

Usually, you’ve a number of accounts to handle and run assets in your knowledge pipeline.

On this template, we assume the next accounts:

  • Monitoring account – This hosts the central S3 bucket, central Knowledge Catalog, and QuickSight-related assets
  • Supply account – This hosts particular person knowledge pipeline assets on AWS Glue and the assets to ship metrics to the monitoring account

The template works even when the monitoring account and supply account are the identical.

This pattern template consists of 4 stacks:

  • Amazon S3 stack – This provisions the S3 bucket
  • Knowledge Catalog stack – This provisions the AWS Glue database, desk, and crawler
  • QuickSight stack – This provisions the QuickSight knowledge supply, dataset, and evaluation
  • Metrics sender stack – This provisions the CloudWatch metric stream, Firehose supply stream, and Lambda perform for transformation

Stipulations

You must have the next stipulations:

  • Python 3.9 or later
  • AWS accounts for the monitoring account and supply account
  • An AWS named profile for the monitoring account and supply account
  • The AWS CDK Toolkit 2.87.0 or later

Initialize the CDK venture

To initialize the venture, full the next steps:

  1. Clone the cdk template to your office:
    $ git clone git@github.com:aws-samples/aws-glue-cdk-baseline.git 
    
    $ cd aws-glue-cdk-baseline.git

  2. Create a Python digital surroundings particular to the venture on the shopper machine:

We use a digital surroundings with a purpose to isolate the Python surroundings for this venture and never set up software program globally.

  1. Activate the digital surroundings in accordance with your OS:
    • On MacOS and Linux, use the next code:
      $ supply .venv/bin/activate

    • On a Home windows platform, use the next code:
      % .venvScriptsactivate.bat

After this step, the next steps run inside the bounds of the digital surroundings on the shopper machine and work together with the AWS account as wanted.

  1. Set up the required dependencies described in necessities.txt to the digital surroundings:
    $ pip set up -r necessities.txt

  2. Edit the configuration file default-config.yaml primarily based in your environments (exchange every account ID with your personal.
    create_s3_stack: false
    create_metrics_sender_stack: false
    create_catalog_stack: false
    create_quicksight_stack: true
    
    s3_bucket_name: glue-observability-demo-dashboard
    
    firehose_log_group_name: /aws/kinesisfirehose/observability-demo-metric-stream
    firehose_lambda_buffer_size_mb: 2
    firehose_lambda_buffer_interval_seconds: 60
    firehose_s3_buffer_size_mb: 128
    firehose_s3_buffer_interval_seconds: 300
    
    glue_database_name: observability_demo_db
    glue_table_name: metric_data
    glue_crawler_name: observability_demo_crawler
    glue_crawler_cron_schedule: "cron(42 * * * ? *)"
    
    athena_workgroup_name: major

Bootstrap your AWS environments

Run the next instructions to bootstrap your AWS environments:

  1. Within the monitoring account, present your monitoring account quantity, AWS Area, and monitoring profile:
    $ cdk bootstrap aws://<MONITORING-ACCOUNT-NUMBER>/<REGION> --profile <MONITORING-PROFILE> 
    --cloudformation-execution-policies arn:aws:iam::aws:coverage/AdministratorAccess

  2. Within the supply account, present your supply account quantity, Area, and supply profile:x
    $ cdk bootstrap aws://<SOURCE-ACCOUNT-NUMBER>/<REGION> --profile <SOURCE-PROFILE> 
    --cloudformation-execution-policies arn:aws:iam::aws:coverage/AdministratorAccess

Once you use just one account for all environments, you’ll be able to simply run thecdk bootstrapcommand one time.

Deploy your AWS assets

Run the next instructions to deploy your AWS assets:

  1. Run the next command utilizing the monitoring account to deploy assets outlined within the AWS CDK template:
    $ cdk deploy '*' --profile <MONITORING-PROFILE>

  2. Run the next command utilizing the supply account to deploy assets outlined within the AWS CDK template:
    $ cdk deploy MetricSenderStack --profile <SOURCE-PROFILE>

Configure QuickSight permissions

Initially, the brand new QuickSight assets together with the dataset and evaluation created by the AWS CDK template usually are not seen for you as a result of there aren’t any QuickSight permissions configured but.

To make the dataset and evaluation seen for you, full the next steps:

  1. On the QuickSight console, navigate to the consumer menu and select Handle QuickSight.
  2. Within the navigation pane, select Handle belongings.
  3. Beneath Browse belongings, select Evaluation.
  4. Seek for GlueObservabilityAnalysis, and choose it.
  5. Select SHARE.
  6. For Person or Group, choose your consumer, then select SHARE (1).
  7. Anticipate the share to be full, then select DONE.
  8. On the Handle belongings web page, select Datasets.
  9. Seek for observability_demo.metrics_data, and choose it.
  10. Select SHARE.
  11. For Person or Group, choose your consumer, then select SHARE (1).
  12. Anticipate the share to be full, then select DONE.

Discover the default QuickSight evaluation

Now your QuickSight evaluation and dataset are seen to you. You may return to the QuickSight console and select GlueObservabilityAnalysis below Evaluation. The next screenshot reveals your dashboard.

The pattern evaluation has two tabs: Monitoring and Insights. By default, the Monitoring tab has the next charts:

  • [Reliability] Job Run Errors Breakdown
  • [Reliability] Job Run Errors (Whole)
  • [Performance] Skewness Job
  • [Performance] Skewness Job per Job

  • [Resource Utilization] Employee Utilization
  • [Resource Utilization] Employee Utilization per Job
  • [Throughput] BytesRead, RecordsRead, FilesRead, PartitionRead (Avg)
  • [Throughput] BytesWritten, RecordsWritten, FilesWritten (Avg)

  • [Resource Utilization Disk Available GB (Min)
  • [Resource Utilization Max Disk Used % (Max)

  • [Driver OOM] OOM Error Rely
  • [Driver OOM] Max Heap Reminiscence Used % (Max)
  • [Executor OOM] OOM Error Rely
  • [Executor OOM] Max Heap Reminiscence Used % (Max)

By default, the Insights tab has following insights:

  • Backside Ranked Employee Utilization
  • Prime Ranked Skewness Job

  • Forecast Employee Utilization
  • Prime Mover readBytes

You may add any new graph charts or insights utilizing the observability metrics primarily based in your necessities.

Publish the QuickSight dashboard

When the evaluation is prepared, full the next steps to publish the dashboard:

  1. Select PUBLISH.
  2. Choose Publish new dashboard as, and enter GlueObservabilityDashboard.
  3. Select Publish dashboard.

Then you’ll be able to view and share the dashboard.

Visualize and analyze with AWS Glue job observability metrics

Let’s use the dashboard to make AWS Glue utilization extra performant.

Trying on the Skewness Job per Job visualization, there was spike on November 1, 2023. The skewness metrics of the job multistage-demo confirmed 9.53, which is considerably greater than others.

Let’s drill down into particulars. You may select Controls, and alter filter situations primarily based on date time, Area, AWS account ID, AWS Glue job identify, job run ID, and the supply and sink of the information shops. For now, let’s filter with the job identify multistage-demo.

The filtered Employee Utilization per Job visualization reveals 0.5, and its minimal worth was 0.16. It looks like that there’s a room for enchancment in useful resource utilization. This commentary guides you to allow auto scaling for this job to extend the employee utilization.

Clear up

Run the next instructions to wash up your AWS assets:

  1. Run the next command utilizing the monitoring account to wash up assets:
    $ cdk destroy '*' --profile <MONITORING-PROFILE>

    Run the next command utilizing the supply account to wash up assets:

    $ cdk destroy MetricSenderStack --profile <SOURCE-PROFILE>

Concerns

QuickSight integration is designed for evaluation and higher flexibility. You may combination metrics primarily based on any fields. When coping with many roles without delay, QuickSight insights make it easier to establish problematic jobs.

QuickSight integration is achieved with extra assets in your environments. The monitoring account wants an AWS Glue database, desk, crawler, and S3 bucket, and the power to run Athena queries to visualise metrics in QuickSight. Every supply account must have one metric stream and one Firehose supply stream. This could incur further prices.

All of the required assets are templatized in AWS CDK.

Conclusion

On this submit, we explored the right way to visualize and analyze AWS Glue job observability metrics on QuickSight utilizing CloudWatch metric streams and SPICE. By connecting the brand new observability metrics to interactive QuickSight dashboards, you’ll be able to uncover every day, weekly, and month-to-month patterns to optimize AWS Glue job utilization. The wealthy visualization capabilities of QuickSight mean you can analyze tendencies in metrics like employee utilization, error classes, throughput, and extra. Aggregating metrics and slicing knowledge by totally different dimensions corresponding to job identify can present deeper insights.

The pattern dashboard confirmed metrics over time, high errors, and comparative job analytics. These visualizations and reviews could be securely shared with groups throughout the group. With data-driven insights on the AWS Glue observability metrics, you’ll be able to have deeper insights on efficiency bottlenecks, frequent errors, and extra.


Concerning the Authors

Noritaka Sekiyama is a Principal Massive Knowledge Architect on the AWS Glue staff. He’s liable for constructing software program artifacts to assist clients. In his spare time, he enjoys biking along with his new street bike.

Chuhan LiuChuhan Liu is a Software program Growth Engineer on the AWS Glue staff. He’s captivated with constructing scalable distributed techniques for large knowledge processing, analytics, and administration. In his spare time, he enjoys taking part in tennis.

XiaoRun Yu is a Software program Growth Engineer on the AWS Glue staff. He’s engaged on constructing new options for AWS Glue to assist clients. Outdoors of labor, Xiaorun enjoys exploring new locations within the Bay Space.

Sean Ma is a Principal Product Supervisor on the AWS Glue staff. He has a observe document of greater than 18 years innovating and delivering enterprise merchandise that unlock the ability of knowledge for customers. Outdoors of labor, Sean enjoys scuba diving and school soccer.

Mohit Saxena is a Senior Software program Growth Supervisor on the AWS Glue staff. His staff focuses on constructing distributed techniques to allow clients with interactive and easy to make use of interfaces to effectively handle and rework petabytes of knowledge seamlessly throughout knowledge lakes on Amazon S3, databases and data-warehouses on cloud.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here

Stay on op - Ge the daily news in your inbox