5 Duties to Automate Utilizing Scheduled Question Lambdas in Rockset


Why and what to automate

As utility builders and designers, every time we see repeating duties, we instantly take into consideration the way to automate them. This simplifies our day by day work and permits us to be extra environment friendly and centered on delivering worth to the enterprise.


scheduled-query-lambda-meme

Typical examples of repeating duties embody scaling compute sources to optimize their utilization from a price and efficiency perspective, sending automated e-mails or Slack messages with outcomes of a SQL question, materializing views or doing periodic copies of knowledge for improvement functions, exporting knowledge to S3 buckets, and so forth.

How Rockset helps with automation

Rockset gives a set of highly effective options to assist automate frequent duties in constructing and managing knowledge options:

  • a wealthy set of APIs so that each side of the platform may be managed by way of REST
  • Question Lambdas – that are REST API wrappers round your parametrized SQL queries, hosted on Rockset
  • scheduling of Question Lambdas – a lately launched characteristic the place you possibly can create schedules for computerized execution of your question lambdas and put up outcomes of these queries to webhooks
  • compute-compute separation (together with a shared storage layer) which permits isolation and impartial scaling of compute sources

Let’s deep dive into why these are useful for automation.

Rockset APIs permit you to work together with all your sources – from creating integrations and collections, to creating digital situations, resizing, pausing and resuming them, to operating question lambdas and plain SQL queries.

Question Lambdas supply a pleasant and straightforward to make use of method to decouple shoppers of knowledge from the underlying SQL queries so that you could preserve what you are promoting logic in a single place with full supply management, versioning and internet hosting on Rockset.

Scheduled execution of question lambdas allows you to create cron schedules that may robotically execute question lambdas and optionally put up the outcomes of these queries to webhooks. These webhooks may be hosted externally to Rockset (to additional automate your workflow, for instance to put in writing knowledge again to a supply system or ship an e-mail), however you can too name Rockset APIs and carry out duties like digital occasion resizing and even creating or resuming a digital occasion.

Compute-compute separation permits you to have devoted, remoted compute sources (digital situations) per use case. This implies you possibly can independently scale and dimension your ingestion VI and a number of secondary VIs which are used for querying knowledge. Rockset is the primary real-time analytics database to supply this characteristic.

With the mix of those options, you possibly can automate every thing you want (besides possibly brewing your espresso)!

Typical use instances for automation

Let’s now have a look into typical use instances for automation and present how you’ll implement them in Rockset.

Use case 1: Sending automated alerts

Usually instances, there are necessities to ship automated alerts all through the day with outcomes of SQL queries. These may be both enterprise associated (like frequent KPIs that the enterprise is excited about) or extra technical (like discovering out what number of queries ran slower than 3 seconds).

Utilizing scheduled question lambdas, we will run a SQL question towards Rockset and put up the outcomes of that question to an exterior endpoint reminiscent of an e-mail supplier or Slack.

Let’s have a look at an e-commerce instance. Now we have a set known as ShopEvents with uncooked real-time occasions from a webshop. Right here we monitor each click on to each product in our webshop, after which ingest this knowledge into Rockset through Confluent Cloud. We’re excited about figuring out what number of gadgets have been bought on our webshop as we speak and we need to ship this knowledge through e-mail to our enterprise customers each six hours.


scheduled-query-lambda-use-case-1

We’ll create a question lambda with the next SQL question on our ShopEvents assortment:

SELECT
    COUNT(*) As ItemsSold
FROM
    "Demo-Ecommerce".ShopEvents
WHERE 
    Timestamp >= CURRENT_DATE() AND EventType="Checkout";

We’ll then use SendGrid to ship an e-mail with the outcomes of that question. We received’t undergo the steps of organising SendGrid, you possibly can comply with that in their documentation.

When you’ve obtained an API key from SendGrid, you possibly can create a schedule in your question lambda like this, with a cron schedule of 0 */6 * * * for each 6 hours:


scheduled-query-lambda-use-case-1a

This can name the SendGrid REST API each 6 hours and can set off sending an e-mail with the entire variety of bought gadgets that day.

{{QUERY_ID}} and {{QUERY_RESULTS}} are template values that Rockset offers robotically for scheduled question lambdas so that you could use the ID of the question and the ensuing dataset in your webhook calls. On this case, we’re solely within the question outcomes.

After enabling this schedule, that is what you’ll get in your inbox:


scheduled-query-lambda-use-case-1b

You might do the identical with Slack API or some other supplier that accepts POST requests and Authorization headers and also you’ve obtained your automated alerts arrange!

For those who’re excited about sending alerts for gradual queries, have a look at organising Question Logs the place you possibly can see a listing of historic queries and their efficiency.

Use case 2: Creating materialized views or improvement datasets

Rockset helps computerized real-time rollups on ingestion for some knowledge sources. Nonetheless, when you’ve got a have to create further materialized views with extra complicated logic or if you could have a duplicate of your knowledge for different functions (like archival, improvement of recent options, and so on.), you are able to do it periodically through the use of an INSERT INTO scheduled question lambda. INSERT INTO is a pleasant method to insert the outcomes of a SQL question into an current assortment (it might be the identical assortment or a very totally different one).

Let’s once more have a look at our e-commerce instance. Now we have an information retention coverage set on our ShopEvents assortment in order that occasions which are older than 12 months robotically get faraway from Rockset.


scheduled-query-lambda-use-case-2a

Nonetheless, for gross sales analytics functions, we need to make a copy of particular occasions, the place the occasion was a product order. For this, we’ll create a brand new assortment known as OrdersAnalytics with none knowledge retention coverage. We’ll then periodically insert knowledge into this assortment from the uncooked occasions assortment earlier than the information will get purged.


scheduled-query-lambda-use-case-2

We will do that by making a SQL question that may get all Checkout occasions for the day prior to this:

INSERT INTO "Demo-Ecommerce".OrdersAnalytics
SELECT
    e.EventId AS _id,
    e.Timestamp, 
    e.EventType, 
    e.EventDetails, 
    e.GeoLocation, 
FROM
    "Demo-Ecommerce".ShopEvents e
WHERE 
    e.Timestamp BETWEEN CURRENT_DATE() - DAYS(1) AND CURRENT_DATE()
    AND e.EventType="Checkout";

Observe the _id discipline we’re utilizing on this question – this can make sure that we don’t get any duplicates in our orders assortment. Try how Rockset robotically handles upserts right here.

Then we create a question lambda with this SQL question syntax, and create a schedule to run this as soon as a day at 1 AM, with a cron schedule 0 1 * * *. We don’t have to do something with a webhook, so this a part of the schedule definition is empty.


scheduled-query-lambda-use-case-2b

That’s it – now we’ll have day by day product orders saved in our OrdersAnalytics assortment, prepared to be used.

Use case 3: Periodic exporting of knowledge to S3

You should utilize scheduled question lambdas to periodically execute a SQL question and export the outcomes of that question to a vacation spot of your alternative, reminiscent of an S3 bucket. That is helpful for eventualities the place you could export knowledge frequently, reminiscent of backing up knowledge, creating reviews or feeding knowledge into downstream techniques.

On this instance, we’ll once more work on our e-commerce dataset and we’ll leverage AWS API Gateway to create a webhook that our question lambda can name to export the outcomes of a question into an S3 bucket.


scheduled-query-lambda-use-case-3

Just like our earlier instance, we’ll write a SQL question to get all occasions from the day prior to this, be part of that with product metadata and we’ll save this question as a question lambda. That is the dataset we need to periodically export to S3.

SELECT
    e.Timestamp, 
    e.EventType, 
    e.EventDetails, 
    e.GeoLocation, 
    p.ProductName, 
    p.ProductCategory, 
    p.ProductDescription, 
    p.Value
FROM
    "Demo-Ecommerce".ShopEvents e
    INNER JOIN "Demo-Ecommerce".Merchandise p ON e.EventDetails.ProductID = p._id
WHERE 
    e.Timestamp BETWEEN CURRENT_DATE() - DAYS(1) AND CURRENT_DATE();

Subsequent, we’ll have to create an S3 bucket and arrange AWS API Gateway with an IAM Function and Coverage in order that the API gateway can write knowledge to S3. On this weblog, we’ll concentrate on the API gateway half – you should definitely test the AWS documentation on the way to create an S3 bucket and the IAM function and coverage.

Observe these steps to arrange AWS API Gateway so it’s prepared to speak with our scheduled question lambda:

  1. Create a REST API utility within the AWS API Gateway service, we will name it rockset_export:


scheduled-query-lambda-use-case-3a

  1. Create a brand new useful resource which our question lambdas will use, we’ll name it webhook:


scheduled-query-lambda-use-case-3b

  1. Create a brand new POST technique utilizing the settings under – this basically permits our endpoint to speak with an S3 bucket known as rockset_export:


scheduled-query-lambda-use-case-3c

  • AWS Area: Area in your S3 bucket
  • AWS Service: Easy Storage Service (S3)
  • HTTP technique: PUT
  • Motion Sort: Use path override
  • Path override (elective): rockset_export/{question _id} (exchange together with your bucket identify)
  • Execution function: arn:awsiam::###:function/rockset_export (exchange together with your ARN function)
  • Setup URL Path Parameters and Mapping Templates for the Integration Request – this can extract a parameter known as query_id from the physique of the incoming request (we’ll use this as a reputation for information saved to S3) and query_results which we’ll use for the contents of the file (that is the results of our question lambda):


scheduled-query-lambda-use-case-3d

As soon as that’s carried out, we will deploy our API Gateway to a Stage and we’re now able to name this endpoint from our scheduled question lambda.

Let’s now configure the schedule for our question lambda. We will use a cron schedule 0 2 * * * in order that our question lambda runs at 2 AM within the morning and produces the dataset we have to export. We’ll name the webhook we created within the earlier steps, and we’ll provide query_id and query_results as parameters within the physique of the POST request:


scheduled-query-lambda-use-case-3e

We’re utilizing {{QUERY_ID}} and {{QUERY_RESULTS}} within the payload configuration and passing them to the API Gateway which can use them when exporting to S3 because the identify of the file (the ID of the question) and its contents (the results of the question), as described in step 4 above.

As soon as we save this schedule, we have now an automatic job that runs each morning at 2 AM, grabs a snapshot of our knowledge and sends it to an API Gateway webhook which exports this to an S3 bucket.

Use case 4: Scheduled resizing of digital situations

Rockset has help for auto-scaling digital situations, but when your workload has predictable or nicely understood utilization patterns, you possibly can profit from scaling your compute sources up or down primarily based on a set schedule.

That method, you possibly can optimize each spend (so that you simply don’t over-provision sources) and efficiency (so that you’re prepared with extra compute energy when your customers need to use the system).

An instance might be a B2B use case the place your prospects work primarily in enterprise hours, let’s say 9 AM to five PM all through the work days, and so that you want extra compute sources throughout these instances.

To deal with this use case, you possibly can create a scheduled question lambda that may name Rockset’s digital occasion endpoint and scale it up and down primarily based on a cron schedule.


scheduled-query-lambda-use-case-4

Observe these steps:

  1. Create a question lambda with only a choose 1 question, since we don’t really want any particular knowledge for this to work.
  2. Create a schedule for this question lambda. In our case, we need to execute as soon as a day at 9 AM so our cron schedule might be 0 9 * * * and we’ll set limitless variety of executions in order that it runs daily indefinitely.
  3. We’ll name the replace digital occasion webhook for the particular VI that we need to scale up. We have to provide the digital occasion ID within the webhook URL, the authentication header with the API key (it wants permissions to edit the VI) and the parameter with the NEW_SIZE set to one thing like MEDIUM or LARGE within the physique of the request.


scheduled-query-lambda-use-case-4a

We will repeat steps 1-3 to create a brand new schedule for scaling the VI down, altering the cron schedule to one thing like 5 PM and utilizing a smaller dimension for the NEW_SIZE parameter.

Use case 5: Organising knowledge analyst environments

With Rockset’s compute-compute separation, it’s simple to spin up devoted, remoted and scalable environments in your advert hoc knowledge evaluation. Every use case can have its personal digital occasion, making certain {that a} manufacturing workload stays steady and performant, with the perfect price-performance for that workload.

On this state of affairs, let’s assume we have now knowledge analysts or knowledge scientists who need to run advert hoc SQL queries to discover knowledge and work on varied knowledge fashions as a part of a brand new characteristic the enterprise desires to roll out. They want entry to collections and so they want compute sources however we don’t need them to create or scale these sources on their very own.

To cater to this requirement, we will create a brand new digital occasion devoted to knowledge analysts, make sure that they’ll’t edit or create VIs by making a customized RBAC function and assign analysts to that function, and we will then create a scheduled question lambda that may resume the digital occasion each morning in order that knowledge analysts have an setting prepared once they log into the Rockset console. We may even couple this with use case 2 and create a day by day snapshot of manufacturing right into a separate assortment and have the analysts work on that dataset from their digital occasion.


scheduled-query-lambda-use-case-5

The steps for this use case are just like the one the place we scale the VIs up and down:

  1. Create a question lambda with only a choose 1 question, since we don’t really want any particular knowledge for this to work.
  2. Create a schedule for this question lambda, let’s say day by day at 8 AM Monday to Friday and we’ll restrict it to 10 executions as a result of we would like this to solely work within the subsequent 2 working weeks. Our cron schedule might be 0 8 * * 1-5.
  3. We are going to name the resume VI endpoint. We have to provide the digital occasion ID within the webhook URL, the authentication header with the API key (it wants permissions to renew the VI). We don’t want any parameters within the physique of the request.


scheduled-query-lambda-use-case-5a

That’s it! Now we have now a working setting for our knowledge analysts and knowledge scientists that’s up and operating for them each work day at 8 AM. We will edit the VI to both auto-suspend after sure variety of hours or we will have one other scheduled execution which can droop the VIs at a set schedule.

As demonstrated above, Rockset gives a set of helpful options to automate frequent duties in constructing and sustaining knowledge options. The wealthy set of APIs mixed with the ability of question lambdas and scheduling permit you to implement and automate workflows which are fully hosted and operating in Rockset so that you simply don’t must depend on third occasion elements or arrange infrastructure to automate repeating duties.

We hope this weblog gave you a number of concepts on the way to do automation in Rockset. Give this a attempt to tell us the way it works!



Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here

Stay on op - Ge the daily news in your inbox