To schedule a Python script or Python function in Airflow, we use `PythonOperator`.
The problem with running Spark on Kubernetes is the logs go away once the job completes. Spark has tool called the Spark History Server that provides a UI for your past Spark jobs. In this post, I will show you how to use Spark History Server on Kubernetes.
Spark can run on clusters managed by Kubernetes. This feature makes use of native Kubernetes scheduler that has been added to Spark.
In order to make Airflow Webserver stateless, Airflow >=1.10.7 supports DAG Serialization and DB Persistence.
Normally, with BigQuery as a data source of Data Studio, users (of Data Studio Dashboard) might end up generating a lot of queries on your behalf — and that means you can end up with a huge BigQuery bill. It’s taken so long to refresh data when you change something in development mode. How to solve this problem with Spreadsheet, for free?
Tiếp theo về các chỉ số đánh giá các hệ thống Information Retrieval, bài này sẽ bàn về đánh giá hệ thống tìm kiếm với Ground truth là non-binary (không phải nhị phân), và đánh giá hệ thống large scale.