Database Configuration

Database Connector

In order for fireant to connect to your database, a database connector must be used. This takes the form of an instance of a concrete subclass of fireant’s fireant.database.Database class. Database connectors are shipped with fireant for all of the supported databases, but it is also possible to write your own. See below on how to extend fireant to support additional databases.

To configure a database, instantiate a subclass of fireant.database.Database. You will use this instance to create a DataSet. It is possible to use multiple databases simultaneous, but fireant.DataSet can only use a single database, since they inherently model the structure of a table in the database.

Vertica

import fireant.settings
from fireant.database import VerticaDatabase

database = VerticaDatabase(
    host='example.com',
    port=5433,
    database='example',
    user='user',
    password='password123',
)

MySQL

import fireant.settings
from fireant.database import MySQLDatabase

database = MySQLDatabase(
    database='testdb',
    host='mysql.example.com',
    port=3308,
    user='user',
    password='password123',
    charset='utf8mb4',
)

MySQL additionally requires a custom function that fireant uses to rollup date values to specific intervals, equivalent to the TRUNC_DATE function available in other database platforms. To install the TRUNC_DATE function in your MySQL database, run the script found in fireant/scripts/mysql_functions.sql. Further information is provided in this script on how to grant permissions on this function to your MySQL users.

PostgreSQL

import fireant.settings
from fireant.database import PostgreSQLDatabase

database = PostgreSQLDatabase(
    database='testdb',
    host='example.com',
    port=5432,
    user='user',
    password='password123',
)

Amazon Redshift

import fireant.settings
from fireant.database import RedshiftDatabase

fireant.settings = RedshiftDatabase(
    database='testdb',
    host='example.com',
    port=5439,
    user='user',
    password='password123',
)

Using a different Database

Instead of using one of the built in database connectors, you can provide your own by extending fireant.database.Database.

import vertica_python
from pypika import VerticaQuery
from fireant import Database

class MyVertica(Database):
    # Vertica client that uses the vertica_python driver.

    # Override the custom PyPika Query class (Not necessary but perhaps helpful)
    query_cls = VerticaQuery

    def __init__(self, host='localhost', port=5433, database='vertica',
                 user='vertica', password=None,
                 read_timeout=None):
        self.host = host
        self.port = port
        self.database = database
        self.user = user
        self.password = password
        self.read_timeout = read_timeout

    def connect(self):
        return vertica_python.connect(
            host=self.host, port=self.port, database=self.database,
            user=self.user, password=self.password,
            read_timeout=self.read_timeout,
        )

    def trunc_date(self, field, interval):
        return Trunc(...)  # custom Trunc function

    def date_add(self, date_part, interval, field):
        return DateAdd(...)  # custom DateAdd function

Once a Database connector has been set up, it can be used when instantiating fireant.DataSet.

from fireant import DataSet

my_vertica = MyVertica(
    host='example.com',
    port=5433,
    database='example',
    user='user',
    password='password123',
)

DataSet(
    database=my_vertica,
    ...
)

In a custom database connector, the connect function must be overridden to provide a connection to the database. The trunc_date and date_add functions must also be overridden since are no common ways to truncate/add dates in SQL databases.

Middleware

In order to provide extra functionality as well as flexibility the database connectors allow the setup of middleware. Default configurable middleware implementations are provided by fireant but it’s also possible to extend the middleware classes for custom functionality.

Concurrency Middleware

When executing queries on the database the operations are tunneled through a concurrency middleware. By default the fireant.middleware.ThreadPoolConcurrencyMiddleware is used when no custom middleware is configured in the database connector. This middleware implementation will parallelize multiple queries using a ThreadPool. The maximum amount of simultaneously active threads is then defined by the max_processes parameter of the database connector.

A custom middleware can easily be created by implementing fireant.middleware.BaseConcurrencyMiddleware. For example a concurrency middleware that would simply execute a group of queries synchronously would look like this:

from fireant.middleware import BaseConcurrencyMiddleware
from fireant.queries import fetch_as_dataframe

class HueyConcurrencyMiddleware(BaseConcurrencyMiddleware):
    def fetch_queries_as_dataframe(self, queries, database):
        return [fetch_as_dataframe(query, database) for query in queries]