Big Data Driven #10

New portion of data news is coming right up! Enjoy your reading. And your data.

Mar 24, 2023

Redshift anniversary

Informative and interesting article about Amazon Redshift evolution. There you can find all the notable dates in the history of cloud Data Warehouse. Some of them are related to changes in the Data industry. Some changes relate to evolving way we treat the data. For instance Data Lake introduction, or storage and processing decoupling. Or even integrating Machine Learning capabilities straight into Data Warehouse. Amazon Redshift has come a long long way from the announcement of the service, and to becoming a modern cloud Data Warehouse solution. And it does not plan to stop. Not when there are powerful competitors like BigQuery or Snowflake.

Data Project requirements

Not so technical, but very important article about data project requirements. How data projects are different from other projects? What should we carefully take into account before starting the data project? We need to bare in mind, that the most important purpose of any analytical system is to bring business value to the company. As a Data Engineer, you need to not only worry about your data pipelines or Spark code with sophisticated aggregations. But also every time ask yourself what value your pipeline or your code will bring to your company.

Small surprises from AWS

New minor features recently introduced by AWS that warm my heart:

When using Change Data Capture mode in Database Migration Service with S3 sink, you can now automatically validate rows between the source data source and S3 data
Also, when using S3 as a sink for Database Migration Service, you can now automatically run the AWS Glue crawler to identify the S3 files schema and create an AWS Glue Catalog record for the S3 sink. It will allow you to instantly query the migrated data with AWS Athena

Being a Data Engineer for the past 10 years

Comic retrospective article about what it was like to be a Data Engineer starting in the 2013 year. An easy-to-read text filled with humor, in two acts.

Thank you for reading Big Data Driven Newsletter. Like the content? Share it!

Meta ambitious project

Meta engineering team has been developing a very ambitious project called Velox. It is a unified execution engine for popular data processing frameworks like Presto or Apache Spark. Vlox optimizes calculations on the workers’ level and is written in C++, unlike other engines that are usually written in Java. There is not a lot of information yet about the framework, and it is still in early beta. But Meta has carefully passed the project ownership to the open-source community.

Big Data Driven Newsletter