Data
DuckDB

In this post, I want to explore the features and capabilities of DuckDB, an open-source, in-process SQL OLAP database management system written in C++11 that has been gaining popularity recently. According to what people have said, DuckDB is designed to be easy to use and flexible, allowing you to run complex queries on relational datasets using either local, file-based DuckDB instances or the cloud service MotherDuck.

DuckDB
Data
Airflow control the parallelism and concurrency (draw)

How to control parallelism and concurrency

Airflow control the parallelism and concurrency (draw)
Rust 🦀
Fossil Data Platform Rewritten in Rust 🦀

My data engineering team at Fossil recently released some of Rust-based components of our Data Platform after faced performance and maintenance challenges of the old Python codebase. I would like to share the insights and lessons learned during the process of migrating Fossil's Data Platform from Python to Rust.

Fossil Data Platform Rewritten in Rust 🦀
Data
Running Spark in GitHub Actions

This post provides a quick and easy guide on how to run Apache Spark in GitHub Actions for testing purposes

Running Spark in GitHub Actions