How Open Will Snowflake Go at Information Cloud Summit?


Snowflake is holding its Information Cloud Summit 24 convention subsequent week, and the corporate is predicted to make a slew of bulletins, which it is possible for you to to seek out on these Datanami pages. However among the many most intently watched questions is how far Snowflake will go in embracing the Apache Iceberg desk format and opening itself as much as exterior question engines? And is it attainable that Snowflake might attempt to “out open” its rival Databricks, whose convention is the next week?

Snowflake has developed significantly because it burst onto the scene a handful of years in the past as a cloud knowledge warehouse. Based by engineers skilled in growing analytics databases, the corporate delivered a top-flight knowledge warehouse within the cloud with full separation of compute and storage, which was really novel on the time. Firms annoyed with Hadoop flocked to Snowflake, the place they discovered a way more welcoming and pleasant person expertise.

However whereas Snowflake lacked the technical complexity of Hadoop, it additionally lacked the openness of Hadoop. That was a tradeoff that many shoppers had been keen to make again in 2018, when frustration with Hadoop was nearing its peak. However clients maybe are usually not so keen to make that tradeoff in 2024, significantly as the excessive value of cloud computing has change into a problem with many CFOs.

Snowflake executives initially boasted in regards to the elevated income that its lock-in created, however quickly it discovered that clients had been genuinely involved about being locked in to a proprietary format. It took concrete actions to deal with the lock-in in February 2022, when it introduced help for Apache Iceberg, though Snowflake has but to make Iceberg help usually out there.

The battle for lakehouse market dominance is being waged atop the open desk codecs

Iceberg, in fact, is the open desk format developed at Netflix to deal with knowledge correctness considerations when accessing knowledge saved in Parquet information utilizing a number of question engines, together with Hive and Presto. The metadata managed by desk codecs like Iceberg present ACID transactionality to knowledge interactions, assuaging considerations that queries would return incorrect knowledge.

Iceberg wasn’t the primary desk format–that honor goes to Apache Hudi, which engineers developed at Uber to handle knowledge of their Hadoop stack. In the meantime, the oldsters at Databricks created their personal desk format known as Delta in 2017 and named the information platform it created the information lakehouse.

Two years into the Iceberg experiment, Snowflake has some selections to make. Whereas it permits clients to retailer knowledge in externally managed Iceberg tables, it doesn’t supply a lot of the information administration capabilities that you’d anticipate from a full-fledged knowledge lakehouse–i.e. issues like desk partitioning, knowledge compaction, and cleanup. Will the corporate announce these throughout Information Cloud Summit this week?

If Snowflake goes all-in on Iceberg, it will assist differentiate it from Databricks, which has put its chips on Delta (though it introduced some capabilities to help Iceberg and Hudi with its Common Format unveiled final June). The neighborhood appears to be coalescing round Iceberg from a recognition viewpoint, which may bolster Snowflake’s place as a top-tier Iceberg-based knowledge lakehouse.

One other query is whether or not Snowflake will enable exterior SQL engines to question Iceberg knowledge that it manages. Snowflake’s proprietary SQL engine is extremely performant when run on knowledge saved within the unique Snowflake desk format, and the corporate has benchmark outcomes to again that up. However Snowflake doesn’t present a whole lot of choices on the subject of querying knowledge with different engines.

Snowflake presents the Snowpark API, which helps you to specific queries utilizing Python, Java, and Scala, however that is extra designed for knowledge engineering and constructing machine studying fashions than SQL question processing. It additionally presents an Apache Spark connector that allows you to learn from and write to Snowflake utilizing Spark 3.2 by way of 3.4 (it additionally presents an Apache Kafka connector). However what clients might really need is the flexibility to run one other SQL question engine in opposition to their knowledge.

Snowflake returns to San Francisco for Information Cloud Summit 24

One particular person who shall be intently watching Snowflake subsequent week is Justin Borgman, the CEO and founding father of Starburst, which does a whole lot of work growing the Trino question engine that forked from Presto and runs its personal lakehouse providing that helps Iceberg and Trino. Borgman notes that most of the first workloads run by Netflix after creating Iceberg used Presto.

“We really feel prefer it form of resets the enjoying subject,” Borgman says of the impression of Iceberg. “It’s virtually just like the battlefield strikes from this very conventional ‘get knowledge ingested into your proprietary database after which you’ve gotten your buyer locked in,’ to extra of a free for all the place the information is unlocked, which is undoubtedly finest for patrons. After which we’ll battle it out on the question engine layer, the execution layer somewhat than on the storage layer. And I believe that’s only a actually attention-grabbing growth.”

Borgman is understandably keen to have the ability to get into Snowflake’s huge 9,800-strong buyer base through Iceberg. He claims benchmark checks present Trino outperforming Snowflake’s question engine on Iceberg tables whereas being about one-third of the fee. Starburst has a variety of massive clients, reminiscent of Lyft, LinkedIn, and Netflix, utilizing the mixture of Trino and Iceberg, he says.

Snowflake may go all-in on Iceberg and open itself as much as exterior question engines, nevertheless it may nonetheless train some management over buyer workloads by way of different means. For example, it may require that clients entry knowledge by way of its knowledge catalog, Borgman predicts.

“They could attempt to lock you in on a couple of peripheral options,” he tells Datanami. “However I believe on the finish of the day, clients received’t tolerate that. I believe Iceberg is undoubtedly of their finest pursuits, and I believe that they’re going to only begin shifting tons of knowledge into Iceberg format from the place they’ve the chance to decide on a distinct question engine.”

Snowflake CEO Sridhar Ramaswamy should stability the corporate’s openness with development

If Snowflake does go totally open and permits exterior question engines reminiscent of Trino, Presto, Dremio, and even Spark SQL to entry knowledge that it manages for patrons in Iceberg tables, Snowflake clients received’t possible transfer all their knowledge to Iceberg without delay, says Borgman, who was a 2023 Datanami Individual to Watch. They’ll possible transfer their lowest SLA (service stage settlement) queries into Iceberg first, whereas holding the extra essential knowledge in Snowflake’s native format and use Snowflake’s native question engine, which is quicker however costlier.

That units up an attention-grabbing dynamic the place Snowflake may probably be hurting its potential to generate revenues whereas giving clients what they need, which is extra openness. However on the flip aspect, clients may very well reward Snowflake by shifting extra knowledge into Iceberg and letting Snowflake handle it for them. That might generate increased revenues for the Bozeman, Montana firm, though most likely not on the similar per-customer fee that in the event that they saved all the information locked right into a proprietary format. That’s a rate-of-growth issue that new Snowflake CEO Sridhar Ramaswamy must account for.

When Ramaswamy changed Frank Slootman in February, it was anticipated that the corporate would shift some focus to AI, the place it was seen as trailing its rival Databricks. The corporate’s April launch of its Arctic massive language fashions (LLMs) exhibits the corporate is ready to transfer shortly on that entrance, nevertheless it’s core aggressive benefit over Databricks stays with SQL-based analytics and knowledge warehousing workloads.

The altering nature of knowledge warehousing presents each challenges and alternatives for Snowflake. “Principally, the information warehouse now could be totally decoupled,” Borgman says. “Snowflake talked so much about that 12 to fifteen years in the past, every time they first got here out, of separation of storage and compute. However the important thing was it was at all times their storage and their compute.”

No matter Snowflake chooses to do subsequent week, the cloud giants will undoubtedly be watching. To date, they haven’t actually picked sides within the open desk format struggle that’s being waged between Iceberg and Delta, with Hudi in a distant third (though the Apache XTable format developed by Hudi-backer Onehouse threatens to make all of it moot). If the market solidifies behind Iceberg, AWS, Microsoft Azure, and Google Cloud may attempt to lower out the intermediary by providing their very own soup-to-nuts knowledge lakehouse providing.

Borgman says this appears like a replay of the mid 2010s, when Teradata’s massive knowledge warehousing put in base was slowly eaten into by Hadoop, however with one massive distinction.

“I believe you’re going to see an analogous type of mannequin play out the place clients are motivated to attempt to scale back their knowledge warehouse prices and utilizing extra of this lake mannequin,” says Borgman, who was the CEO of Hadoop software program vendor Hadapt when it was acquired by Teradata in 2014. “However I believe one factor that’s totally different this time round is that the engines themselves, like Trino and [Presto], have improved dramatically over what Hive or Impala was again then.”

Associated Gadgets:

Snowflake Seems to be to AI to Bolster Progress

Onehouse Breaks Information Catalog Lock-In with Extra Openness

Teradata Acquires Revelytix, Hadapt

 

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here

Stay on op - Ge the daily news in your inbox