Use AWS Glue Knowledge Catalog views to research information


On this publish, we present you the way to use the brand new views characteristic the AWS Glue Knowledge Catalog. SQL views are a strong object used throughout relational databases. You need to use views to lower the time to insights of knowledge by tailoring the info that’s queried. Moreover, you should use the facility of SQL in a view to precise advanced boundaries in information throughout a number of tables that may’t be expressed with easier permissions. Knowledge lakes present prospects the pliability required to derive helpful insights from information throughout many sources and lots of use instances. Knowledge customers can eat information the place they should throughout traces of enterprise, growing the speed of insights technology.

Clients use many various processing engines of their information lakes, every of which have their very own model of views with totally different capabilities. The AWS Glue Knowledge Catalog and AWS Lake Formation present a central location to handle your information throughout information lake engines.

AWS Glue has launched a new characteristic, SQL views, which lets you handle a single view object within the Knowledge Catalog that may be queried from SQL engines. You may create a single view object with a special SQL model for every engine you need to question, similar to Amazon Athena, Amazon Redshift, and Spark SQL on Amazon EMR. You may then handle entry to those sources utilizing the identical Lake Formation permissions which can be used to regulate tables within the information lake.

Resolution overview

For this publish, we use the Girls’s E-Commerce Clothes Evaluation. The target is to create views within the Knowledge Catalog so you possibly can create a single widespread view schema and metadata object to make use of throughout engines (on this case, Athena). Doing so allows you to use the identical views throughout your information lakes to suit your use case. We create a view to masks the customer_id column on this dataset, then we are going to share this view to a different person in order that they’ll question this masked view.

Stipulations

Earlier than you possibly can create a view within the AWS Glue Knowledge Catalog, just be sure you have an AWS Id and Entry Administration (IAM) position with the next configuration:

  • The next belief coverage:
    {
      "Model": "2012-10-17",
      "Assertion": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": [
               "glue.amazonaws.com",
               "lakeformation.amazonaws.com"
            ]
          },
          "Motion": "sts:AssumeRole"
        }
      ]
    }

  • The next move position coverage:
    {
      "Model": "2012-10-17",
      "Assertion": [
        {
          "Sid": "Stmt1",
          "Action": [
            "iam:PassRole"
          ],
          "Impact": "Permit",
          "Useful resource": "*",
          "Situation": {
             "StringEquals": {
               "iam:PassedToService": [
                 "glue.amazonaws.com",
                 "lakeformation.amazonaws.com"
               ]
             }
           }
         }
       ]
    }

  • Lastly, additionally, you will want the next permissions:
    • "Glue:GetDatabase",
    • "Glue:GetDatabases",
    • "Glue:CreateTable",
    • "Glue:GetTable",
    • "Glue:UpdateTable",
    • "Glue:DeleteTable",
    • "Glue:GetTables",
    • "Glue:SearchTables",
    • "Glue:BatchGetPartition",
    • "Glue:GetPartitions",
    • "Glue:GetPartition",
    • "Glue:GetTableVersion",
    • "Glue:GetTableVersions"

Run the AWS CloudFormation template

You may deploy the AWS CloudFormation template glueviewsblog.yaml to create the Lake Formation database and desk. The dataset can be loaded into an Amazon Easy Storage Service (Amazon S3) bucket.

For step-by-step directions, discuss with Making a stack on the AWS CloudFormation console.

When the stack is full, you possibly can see a desk referred to as clothing_parquet on the Lake Formation console, as proven within the following screenshot.

Create a view on the Athena console

Now that you’ve your Lake Formation managed desk, you possibly can open the Athena console and create a Knowledge Catalog view. Full the next steps:

  1. Within the Athena question editor, run the next question on the Parquet dataset:
SELECT * FROM "clothing_reviews"."clothing_parquet" restrict 10;

Within the question outcomes, the customer_id column is at the moment seen.

Subsequent, you create a view referred to as hidden_customerID and masks the customer_id column.

  1. Create a view referred to as hidden_customerID:
CREATE PROTECTED MULTI DIALECT VIEW clothing_reviews.hidden_customerid SECURITY DEFINER AS 
SELECT * FROM clothing_reviews.clothing_parquet

Within the following screenshot, you possibly can see a view referred to as hidden_customerID was efficiently created.

  1. Run the next question to masks the primary 4 characters of the customer_id column for the newly generated view:
ALTER VIEW clothing_reviews.hidden_customerid UPDATE DIALECT AS
SELECT '****' || substring(customer_id, 4) as customer_id,clothing_id,age,title,review_text,score,recommend_ind,positive_feedback,division_name,department_name,class_name 
FROM clothing_reviews.clothing_parquet

You may see within the following screenshot that the view hidden_customerID has the customer_id column’s first 4 characters masked.

The unique desk clothing_parquet stays the identical unmasked.

Grant entry of the view to a different person to question

Knowledge Catalog views help you use Lake Formation to regulate entry. On this step, you grant this view to a different person referred to as amazon_business_analyst after which question from that person.

  1. Sign up to the Lake Formation console as admin.
  2. Within the navigation pane, select Views.

As proven within the following screenshot, you possibly can see the hidden_customerid view.

  1. Sign up because the amazon_business_analyst person and navigate to the Views web page.

This person has no visibility to the view.

  1. Grant permission to the amazon_business_analyst person from the info lake admin.
  1. Sign up once more as amazon_business_analyst and navigate to the Views web page.

  1. On the Athena console, question the hidden_customerid view.

You’ve efficiently shared a view to the person and queried it from the Athena console.

Clear up

To keep away from incurring future costs, delete the CloudFormation stack. For directions, discuss with Deleting a stack on the AWS CloudFormation console.

Conclusion

On this publish, we demonstrated the way to use the AWS Glue Knowledge Catalog to create views. We then confirmed the way to alter the views and masks the info. You may share the view with totally different customers to question utilizing Athena. For extra details about this new characteristic, discuss with Utilizing AWS Glue Knowledge Catalog views.


In regards to the Authors

Leonardo Gomez is a Principal Analytics Specialist Options Architect at AWS. He has over a decade of expertise in information administration, serving to prospects across the globe deal with their enterprise and technical wants. Join with him on LinkedIn

Michael Chess – is a Product Supervisor on the AWS Lake Formation workforce based mostly out of Palo Alto, CA. He focuses on permissions and information catalog options within the information lake.

Derek Liu – is a Senior Options Architect based mostly out of Vancouver, BC. He enjoys serving to prospects resolve massive information challenges via AWS analytic companies.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here

Stay on op - Ge the daily news in your inbox