Airflow+Spark

Apache Airflow is used for defining and managing a Directed Acyclic Graph of tasks. Data guys programmatically orchestrate and schedule data pipelines and also set retry and alert when a task fails. A single task can be a wide range of operators like bash script, PostgreSQL function, Python function, SSH, Email, etc… and even a Sensor which waits (polls) for a certain time, file, database row, S3 key, etc.

As you may already be aware, failure in Apache Spark applications is inevitable due to various reasons. One of the most common failure is OOM (out of memory at the driver…


WHY?

Data can be sniffed when transmitted over the network so SSL is a must-have to prevent data leakage on the network (LAN or internet) by encryption.

WHAT?

SSL stands for Secure Sockets Layer and make sure that any data between server and clients over the network remain impossible to read.

HOW?

In this article i try to enable SSL for PostgreSQL, and subsequently connect securely by psql and also JDBC (i.e. by configuring the driver to connect in secure manner).

1- PostgreSQL Server Side [1]

First, You need to install openssl if it’s not installed and then generate server certificates, and finally client key.

Note that the…

Mahdi Nematpour

Distributed systems, ApacheSpark, Big Data, Scala | Java| Python, and functional programming lover

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store