Big Data Driven #11

Returning after a long rest with the new bunch of data-intensive articles

Apr 27, 2023

Schema evolution in PubSub

GCP announced access to the schema evolution feature for the Pub/Sub service. Now all schema changes can be easily made from the GCP console. Those who often work with the event bus have repeatedly come across inappropriate schemes, data reading errors on the consumer side, and how it is actually not easy to manage scheme changes. Especially when you have to deal with many topics and consumers.

Databricks in AI race

While I've been dragging my feet on writing about the battle of AI services like ChatGPT, Microsoft's AI-powered Bing, and Google's Bard, I've missed everything. I’ve missed the AI work from Stanford, and the main icing on the cake - Databricks announced its LLM (large language model) called Dolly.

Developing Snowpark pipelines is now easier than ever

Snowflake began to care not only about its core product but also doing good things for the development community. Template projects for Java, Python, and Scala programming languages are now available for faster and easier development of Snowpark pipelines. The templates include a project structure, minimal documentation, and GitHub actions.

How Oracle catch up on rivals with embedded ML features

I somehow completely missed the point that Oracle HeatWave (your answer to Redshift and BigQuery) added built-in ML capabilities to its new creation. But that's not all, Oracle recently announced that HeatWave, in addition to new ML capabilities, is also targeting the small-scale company's data. Such a step is not common at all for a company like Oracle, which holds a hammer (Oracle Database) in your hands, and everything around them seems like nails. I believe that competition is always a good sign, so a new player who could show something new is always good, especially in our domain.

Thank you for reading Big Data Driven Newsletter. Like the content? Share it!

Crawling through your Data Mesh with AWS

As you can see, all cloud providers are expanding their set of services to be able to implement the Data Mesh pattern. This time, AWS told in its blog how you can organize cross-account crawling with the help of Glue Crawlers and LakeFormation. The approach will be useful when each department or part of the company has completely separate AWS accounts, with their own sets of rules, and are not managed through AWS Organizations. However, they still need to share and use each other's data.

The movement toward Data Mesh is obvious. Because every cloud provider in its arsenal of services has a minimal solution to meet the need for cross-domain communication in data matters. Perhaps the problem itself is not very obvious. Yet, if you come across a large enterprise project, there is almost a 100% probability that you will encounter similar problems.

Big Data Driven Newsletter