Querying with a DataSet¶
After configuring an instance of
fireant.DataSet, you’ll have access to its powerful query API. This follows the builder design pattern, allowing for function calls to be chained to build up a context.
The result of a query is one or more widgets, which the query defines as well. Furthermore, data rendered in widgets can be grouped by setting dimensions, filtered, compared over time intervals using references, and sorted and limited using pagination.
- A widget is product of executing a fireant query. This can come as a Chart or Table, but custom widgets can also be defined.
- Dimensions are references to fields in the context of a fireant query. They indicate how the data should be broken up into groups and ultimately expressed as a
GROUP BYin the SQL query generated by fireant.
- Filters are expressions that determine which data to include in the result, expressed as either a
HAVINGclause in the SQL query generated by fireant. There are several types of filters and they are in a later section.
- A reference is a means to compare data to itself over a time interval. Concretely, these are Week-over-week or Year-over-year analyses. They are used in a fireant query referencing a Field of type
A DataSet query uses a dataset’s configuration to execute a SQL query and to transform the result into a data visualization, or a widget. A single fireant query translates into a single SQL query, but it is possible to transform the a single query into multiple widgets. Queries are constructed with a builder pattern API.
A query is initialized by accessing the
query attribute on the
fireant.DataSet instance like
dataset.query. Subsequent function calls can be chained in order to build up a query.
When fetch is called, the SQL query is executed and the resulting data set is transformed for each widget in the query. The fetch() method returns an array containing the data for each widget in the order that the widget was added to the query.
from fireant.dataset import * from fireant.database.vertica import VerticaDatabase from pypika import Tables, functions as fn query = dataset.query \ .widget( ... ) \ .dimension( ... ) \ .filter( ... ) \ .reference( ... ) \ .orderby( ... ) query.fetch()
All builder methods can be called multiple times and in any order.
Many of the functions in the query builder accept arguments referencing fields defined in the DataSet. These can be accessed from your DataSet instance via the
fields property as such:
dataset = DataSet( ... fields=[ Field(alias='date', ...), ] ) # Access the date field dataset.fields.date
Add a widget to a query. A widget is what is returned from the call to
fetchcontaining data for a visualization. If a query does not contain a widget, then calling the
fetchmethod will return the raw data returned from the database. See Visualizing the data with Widgets below for more details.
The function takes one or more arguments that should be instances of subclasses of
from fireant import Pandas dataset.data \ ... .widget( Pandas(dataset.fields.clicks, dataset.fields.cost, dataset.fields.revenue) )
Add a (non-aggregate) field as dimension to the query. This adds a grouping level to the query that will be used to break the data down. See Grouping data with Dimensions below for more details.
The function takes one or more arguments of fields in a data set.
dataset.data \ ... .dimension( dataset.fields.device, dataset.fields.account )
Add a filter to the query. This constrains the results of the data by certain criteria. There are many types of filters, see Filtering the query below for more details.
The function takes one or more arguments of filter expressions using elements of the dataset that the query is being built from.
dataset.data \ ... .filter( dataset.fields.date.between(date(2000, 1, 1), date(2009, 12, 31)) ) \ .filter( dataset.fields.device.isin(['m', 't']) ) \ .filter( dataset.fields.clicks > 10 )
- A reference is a way of comparing the data to itself over an interval of time using a Date/Time dimension, such as Week-over-Week. See Comparing Data to Previous Values using References below for more details.
from fireant import WeekOverWeek dataset.data \ ... .reference( WeekOverWeek() )
- This function allows the results of the SQL query to be ordered by fields. Please note that this will only order the results of the SQL query and that the order may be affected by the Widget. Ordering is entirely optional. The default order will be by all of the dimensions used in the query in the order that they were added.
from pypika import Order dataset.data \ ... .orderby( dataset.fields.clicks, Order.asc )
- A call to fetch exits the build function chain and returns the results of the query. An optional hint parameter is accepted which will used in the query if monitoring the queries triggered from fireant fireant is needed.
from pypika import Order dataset.data \ ... .fetch()
All of the Data Set query functions accept one or more arguments. Passing in multiple arguments is synonymous as calling the function successively with one argument each.
Grouping data with Dimensions¶
Dimensions are referenced using the alias defined for them when instantiating the dataset via the
dimensions attribute on the dataset instance.
dataset = dataset( ... dimensions=[ UniqueDimension('customer', ... ), ... ], ... ) # Reference to customer dimension dataset.fields.customer
A dimension can be used in a dataset query by calling the .dimension( … ) method when building a query. A reference to one or more dimensions must be passed as an argument.
The order of the dimensions is important. The dimensions are grouped in the order that they are added and displayed in the widgets in that order.
dataset.data \ ... .dimension( dataset.fields.device, dataset.fields.account )
Using intervals with Date/Time dimensions¶
All continuous dimensions require an interval to group into. For a Date/Time dimension, these intervals are common temporal intervals, such as hours, days, quarters, etc. These dimensions have a default interval and can be used without explicitly setting one. To set the interval, use the reference to the dimension as a function and pass the interval as an arguement
from fireant import monthly dataset.data \ ... .dimension( dataset.fields.date(monthly) )
The following intervals are available for Date/Time dimensions and can be imported directly from the
It is also possible to define a custom interval as an instance of
Roll Up (Totals)¶
Rolling up a dimension allows the totals across a dimension to be displayed in addition to the breakdown for each dimension value. To enable rollup for a dimension, call the rollup method on the dimension reference. Rollup is available for all dimension types.
dataset.data \ ... .dimension( dataset.fields.date(hourly).rollup() ) \ .dimension( dataset.fields.device.rollup() )
Filtering the query¶
A query can be filtered using several different filters.
TODO TODO TODO TODO
A metric filter is synonomous with the
HAVING clause in a SQL query whereas a dimension filter corresponds to the
WHERE clause. Dimension filters can also be applied to the display definition of Unique Dimensions.
When more than one filter is applied to a query, the results will be filtered to all rows/groups matching all of the conditions like a boolean
AND. Some filters accept multiple values which create multiple conditions and filter data to rows/groups matching any of the conditions like a boolean
Comparator filters are created using standard operators:
dataset.data \ ... .filter( dataset.fields.clicks >= 100 ) \ .filter( dataset.fields.conversions == 1 )
Boolean (Boolean Dimensions)¶
Boolean filters only apply to boolean dimensions and filter whether the value of that boolean dimension is True or False` using the .is_( True/False ) method on a
dataset.data \ ... .filter( dataset.fields.is_member.is_(True) )
Range (Date/Time dimensions)¶
Range filters apply to
fireant.dataset.dimensions.DatetimeDimension dimensions using the .between( start, end ) method. This is equivalent to a BETWEEN expression in the SQL query.
dataset.data \ ... .filter( dataset.fields.date.between( datetime(2018, 8, 21), datetime(2019, 8, 20) ) )
Includes (Category and Unique dimensions)¶
Includes filters apply to
fireant.dataset.dimensions.UniqueDimension dimensions using the .isin( list ) method. Results will be included if they are equal to any of the values in the argument supplied.
Combining multiple include filters makes it possible to use both
OR filter logic.
dataset.data \ ... .filter( dataset.fields.accounts.isin([1, 2, 3]) )
Excludes (Category and Unique dimensions)¶
Excludes filters are equivalent to Includes filters with negative logic. The same conditions apply using the .notin( list ) method.
dataset.data \ ... .filter( dataset.fields.accounts.notin([1, 2, 3]) )
Pattern (Category and Unique dimensions)¶
Pattern filters apply to
fireant.dataset.dimensions.UniqueDimension dimensions using the .like( *patterns ) method. They are the equivalent of a SQL
ILIKE expression. The method accepts one or more pattern arguments which should be formatted for SQL LIKE https://www.w3schools.com/sql/sql_like.asp. With multiple arguments, results are returned that match any of the patterns.
Combining multiple pattern filters makes it possible to use both
OR filter logic.
dataset.data \ ... .filter( dataset.fields.device.like('desk%', 'mob%') )
Anti-Pattern filters are equivalent to Pattern filters with negative logic. The same conditions apply using the .not_like( *patterns ) method.
dataset = dataset( ... dimensions=[ UniqueDimension('customer', definition=customers.id, display_definition=fn.Concat(customers.fname, ' ', customers.lname)) ], ... ) dataset.data \ ... .filter( dataset.fields.device.not_like('desk%', 'mob%') )
Filtering on Display Definitions¶
When using a
fireant.dataset.dimensions.UniqueDimension with the
display_defintion attribute, it is also possible to filter based on display values instead of the definition.
display attribute on an instance of
fireant.dataset.dimensions.UniqueDimension returns a
fireant.dataset.dimensions.DisplayDimension which works like a normal dataset dimension. It works with the following filters: `
Visualizing the data with Widgets¶
At least one widget must be added to every query before calling the fetch() builder chain method. Each dataset query can return multiple widgets of different types, but because a dataset query resolves to a single SQL query, other parts of the query must be shared across all widgets, such as filters and dimensions.
Metrics are selected for each widget. Widgets can use the same metrics or different metrics in a query. The instance API for each widget is different since each widget uses metrics in different ways.
dataset.data \ ... .widget( ... )
The Pandas widget will return a Pandas data frame which is useful when displaying results in a Jupyter notebook. The Pandas widget is used by instantiating a
fireant.widgets.pandas.Pandas class and passing one or more instances of
fireant.Field as arguments.
The data frame will be structured with an index level for each dimension.
The Pandas widget takes additional arguments.
- pivot : list[dimension]
- A list of dimensions which should be pivoted as columns. If all dimensions are pivoted, the result will be identical to setting the
- transpose : bool
True, the data frame will be transposed.
- sort : list[int]
- A list of column indices to sort by. This sorts the data frame after it’s been pivoted and transposed. Which columns are present depends on the selected dimensions and metrics as well as the
from fireant import Pandas Pandas( *metrics )
dataset.data \ ... .dimension( dataset.dimension.date, dataset.dimension.device ) .widget( Pandas(dataset.fields.clicks, dataset.fields.cost, dataset.fields.revenue, pivot=(dataset.dimension.device, ) transpose=True) )
A HighCharts widget transforms the results into a HighCharts JSON config object. The widget is used by instantiating
fireant.widgets.highcharts.HighCharts and calling the
axis method with instances of
fireant.widgets.highcharts.Series arguments. The axis method can be chained to create multiple axes.
fireant.widgets.highcharts.Series instance is constructed with one or more metrics.
from fireant import HighCharts HighCharts( title ) \ .axis ( HighCharts.LineChart( *metrics ), HighCharts.LineChart( *metrics ), ... ) \ .axis ( HighCharts.BarChart( *metrics ) ) ...
The React Table widget’s instance API is identical to the Pandas widget, although it transforms results into a JSON config object meant for React-Table. See the section above on pandas for more information on the instance API.
from fireant import ReactTable dataset.data \ ... .dimension( dataset.dimension.date, dataset.dimension.device ) .widget( ReactTable(dataset.fields.clicks, dataset.fields.cost, dataset.fields.revenue, pivot=(dataset.dimension.device, ) transpose=True) )
Comparing Data to Previous Values using References¶
In some cases it is useful to compare the selected metrics over a period time such as in a Week-over-Week report. A Reference can be used to achieve this. Reference is a built-in function which can be chosen from the subclasses of
A Reference can be used as a fixed comparison, a change in value (delta), or a change in value as a percentage.
The Reference compares the currently selected data with itself shifted by the amount of the Reference.
The following options are available
- Day Over Day - Shifts by 1 day.
- Week Over Week - Shifts by 1 week.
- Month over Month - Shifts by 1 month.
- Quarter over Quarter - Shifts by 1 quarter or 3 months depending on whether the database backend supports quarter intervals.
- Year over Year - Shifts by 1 year.
For each Reference, there are the following variations:
- Delta - Difference in value
- Delta Percentage - Difference in value as a percentage of the previous value
A Date/Time dimension is required.
from fireant.dataset.references import WeekOverWeek # Use a Week-over-Week reference dataset.data \ ... .reference( WeekOverWeek(dataset.fields.date) ) # Compare Week-over-Week change (delta) dataset.data \ ... .reference( WeekOverWeek(dataset.fields.date, delta=True) ) # Compare Week-over-Week change as a percentage (delta percentage) dataset.data \ ... .reference( WeekOverWeek(dataset.fields.date, delta=True, percent=True) )
For any reference, the comparison is made for the same days of the week.
Operations include extra computations applied in python to the result of the SQL query to modify the result.