Big Data Driven #10
New portion of data news is coming right up! Enjoy your reading. And your data.
Redshift anniversary
Informative and interesting article about Amazon Redshift evolution. There you can find all the notable dates in the history of cloud Data Warehouse. Some of them are related to changes in the Data industry. Some changes relate to evolving way we treat the data. For instance Data Lake introduction, or storage and processing decoupling. Or even integrating Machine Learning capabilities straight into Data Warehouse. Amazon Redshift has come a long long way from the announcement of the service, and to becoming a modern cloud Data Warehouse solution. And it does not plan to stop. Not when there are powerful competitors like BigQuery or Snowflake.
Data Project requirements
Not so technical, but very important article about data project requirements. How data projects are different from other projects? What should we carefully take into account before starting the data project? We need to bare in mind, that the most important purpose of any analytical system is to bring business value to the company. As a Data Engineer, you need to not only worry about your data pipelines or Spark code with sophisticated aggregations. But also every time ask yourself what value your pipeline or your code will bring to your company.
Small surprises from AWS
New minor features recently introduced by AWS that warm my heart:
When using Change Data Capture mode in Database Migration Service with S3 sink, you can now automatically validate rows between the source data source and S3 data
Also, when using S3 as a sink for Database Migration Service, you can now automatically run the AWS Glue crawler to identify the S3 files schema and create an AWS Glue Catalog record for the S3 sink. It will allow you to instantly query the migrated data with AWS Athena
Being a Data Engineer for the past 10 years
Comic retrospective article about what it was like to be a Data Engineer starting in the 2013 year. An easy-to-read text filled with humor, in two acts.
Meta ambitious project
Meta engineering team has been developing a very ambitious project called Velox. It is a unified execution engine for popular data processing frameworks like Presto or Apache Spark. Vlox optimizes calculations on the workers’ level and is written in C++, unlike other engines that are usually written in Java. There is not a lot of information yet about the framework, and it is still in early beta. But Meta has carefully passed the project ownership to the open-source community.
