This can be a visitor weblog submit co-authored with Atul Khare and Bhupender Panwar from Salesforce.
Headquartered in San Francisco, Salesforce, Inc. is a cloud-based buyer relationship administration (CRM) software program firm constructing synthetic intelligence (AI)-powered enterprise purposes that enable companies to attach with their prospects in new and customized methods.
The Salesforce Belief Intelligence Platform (TIP) log platform group is accountable for knowledge pipeline and knowledge lake infrastructure, offering log ingestion, normalization, persistence, search, and detection functionality to make sure Salesforce is protected from menace actors. It runs miscellaneous companies to facilitate investigation, mitigation, and containment for safety operations. The TIP group is essential to securing Salesforce’s infrastructure, detecting malicious menace actions, and offering well timed responses to safety occasions. That is achieved by amassing and inspecting petabytes of safety logs throughout dozens of organizations, some with hundreds of accounts.
On this submit, we focus on how the Salesforce TIP group optimized their structure utilizing Amazon Internet Providers (AWS) managed companies to attain higher scalability, value, and operational effectivity.
TIP present structure chook’s eye view and scale of the platform
The principle key efficiency indicator (KPI) for the TIP platform is its functionality to ingest a excessive quantity of safety logs from a wide range of Salesforce inner techniques in actual time and course of them with excessive velocity. The platform ingests greater than 1 PB of knowledge per day, greater than 10 million occasions per second, and greater than 200 completely different log varieties. The platform ingests log information in JSON, textual content, and Widespread Occasion Format (CEF) codecs.
The message bus in TIP’s present structure primarily makes use of Apache Kafka for ingesting completely different log varieties coming from the upstream techniques. Kafka had a single subject for all of the log varieties earlier than they had been consumed by completely different downstream purposes together with Splunk, Streaming Search, and Log Normalizer. The Normalized Parquet Logs are saved in an Amazon Easy Storage Service (Amazon S3) knowledge lake and cataloged into Hive Metastore (HMS) on an Amazon Relational Database Service (Amazon RDS) occasion primarily based on S3 occasion notifications. The information lake shoppers then use Apache Presto operating on Amazon EMR cluster to carry out one-time queries. Different groups together with the Knowledge Science and Machine Studying groups use the platform to detect, analyze, and management safety threats.
Challenges with the present TIP log platform structure
A few of the major challenges that TIP’s present structure was dealing with embody:
- Heavy operational overhead and upkeep value managing the Kafka cluster
- Excessive value to serve (CTS) to fulfill rising enterprise wants
- Compute threads restricted by partitions’ numbers
- Troublesome to scale out when site visitors will increase
- Weekly patching creates lags
- Challenges with HMS scalability
All these challenges motivated the TIP group to embark on a journey to create a extra optimized platform that’s simpler to scale with much less operational overhead and decrease CTS.
New TIP log platform structure
The Salesforce TIP log platform engineering group, in collaboration with AWS, began constructing the brand new structure to interchange the Kafka-based message bus answer with the absolutely managed AWS messaging and notification options Amazon Easy Queue Service (Amazon SQS) and Amazon Easy Notification Service (Amazon SNS). Within the new design, the upstream techniques ship their logs to a central Amazon S3 storage location, which invokes a course of to partition the logs and retailer them in an S3 knowledge lake. Shopper purposes reminiscent of Splunk get the messages delivered to their system utilizing Amazon SQS. Equally, the partitioned log knowledge by way of Amazon SQS occasions initializes a log normalization course of that delivers the normalized log knowledge to open supply Delta Lake tables on an S3 knowledge lake. One of many main adjustments within the new structure is using an AWS Glue Knowledge Catalog to interchange the earlier Hive Metastore. The one-time evaluation purposes use Apache Trino on an Amazon EMR cluster to question the Delta Tables cataloged in AWS Glue. Different shopper purposes additionally learn the information from S3 knowledge lake information saved in Delta Desk format. Extra particulars on among the essential processes are as follows:
Log partitioner (Spark structured stream)
This service ingests logs from the Amazon S3 SNS SQS-based retailer and shops them within the partitioned (by log varieties) format in S3 for additional downstream consumptions from the Amazon SNS SQS subscription. That is the bronze layer of the TIP knowledge lake.
Log normalizer (Spark structured stream)
One of many downstream shoppers of log partitioner (Splunk Ingestor is one other one), the log normalizer ingests the information from Partitioned Output S3, utilizing Amazon SNS SQS notifications, and enriches them utilizing Salesforce customized parsers and tags. Lastly, this enriched knowledge is landed within the knowledge lake on S3. That is the silver layer of the TIP knowledge lake.
Machine studying and different knowledge analytics shoppers (Trino, Flink, and Spark Jobs)
These shoppers eat from the silver layer of the TIP knowledge lake and run analytics for safety detection use circumstances. The sooner Kafka interface is now transformed to delta streams ingestion, which concludes the whole elimination of the Kafka bus from the TIP knowledge pipeline.
Benefits of the brand new TIP log platform structure
The principle benefits realized by the Salesforce TIP group primarily based on this new structure utilizing Amazon S3, Amazon SNS, and Amazon SQS embody:
- Price financial savings of roughly $400 thousand per 30 days
- Auto scaling to fulfill rising enterprise wants
- Zero DevOps upkeep overhead
- No mapping of partitions to compute threads
- Compute sources may be scaled up and down independently
- Totally managed Knowledge Catalog to cut back the operational overhead of managing HMS
Abstract
On this weblog submit we mentioned how the Salesforce Belief Intelligence Platform (TIP) optimized their knowledge pipeline by changing the Kafka-based message bus answer with absolutely managed AWS messaging and notification options utilizing Amazon SQS and Amazon SNS. Salesforce and AWS groups labored collectively to verify this new platform seamlessly scales to ingest greater than 1 PB of knowledge per day, greater than 10 tens of millions occasions per second, and greater than 200 completely different log varieties. Attain out to your AWS account group in case you have comparable use circumstances and also you need assistance architecting your platform to attain operational efficiencies and scale.
In regards to the authors
Atul Khare is a Director of Engineering at Salesforce Safety, the place he spearheads the Safety Log Platform and Knowledge Lakehouse initiatives. He helps numerous safety prospects by constructing strong huge knowledge ETL pipeline that’s elastic, resilient, and simple to make use of, offering uniform & constant safety datasets for menace detection and response operations, AI, forensic evaluation, analytics, and compliance wants throughout all Salesforce clouds. Past his skilled endeavors, Atul enjoys performing music along with his band to lift funds for native charities.
Bhupender Panwar is a Huge Knowledge Architect at Salesforce and seasoned advocate for large knowledge and cloud computing. His background encompasses the event of data-intensive purposes and pipelines, fixing intricate architectural and scalability challenges, and extracting helpful insights from intensive datasets throughout the know-how business. Exterior of his huge knowledge work, Bhupender likes to hike, bike, get pleasure from journey and is a superb foodie.
Avijit Goswami is a Principal Options Architect at AWS specialised in knowledge and analytics. He helps AWS strategic prospects in constructing high-performing, safe, and scalable knowledge lake options on AWS utilizing AWS managed companies and open-source options. Exterior of his work, Avijit likes to journey, hike within the San Francisco Bay Space trails, watch sports activities, and take heed to music.
Vikas Panghal is the Principal Product Supervisor main the product administration group for Amazon SNS and Amazon SQS. He has deep experience in event-driven and messaging purposes and brings a wealth of information and expertise to his function, shaping the way forward for messaging companies. He’s enthusiastic about serving to prospects construct extremely scalable, fault-tolerant, and loosely coupled techniques. Exterior of labor, he enjoys spending time along with his household outdoor, taking part in chess, and operating.