Metropolis DevZone > Data Integrator's Guide > Data Engine Overview

The Data Engine framework consists of a set of Java classes for creating data providers. Although you can create a data provider without using Data Engine, the framework offers two key benefits:

  • Standardization and code separation for better maintenance - The framework provides a set of standard interfaces that define the provider and separate the various components (modeling, metrics, properties, database access code) in a logical way for data sources that are backed by databases. This increases clarity, maintainability, and standardization across multiple providers and minimizes the amount of boilerplate code that the provider writer needs to create.
  • Data Engine Features - The framework provides built-in features such as batching for better performance. Also, your provider benefits from any future improvements to the framework.

If you are new to data providers, first read Data Platform Overview and work through the Simple Data Provider Devkit.

This page covers the following topics:

Understanding Data Engine

Unlike other data providers, in Data Engine-based providers, you do not directly implement data provider methods such as getModels, getPropertyValues, etc. Instead, you expose data by creating a definition, an exposer, and (one or more) accessors for each metric. The Data Engine framework uses each of the three components to fulfill a data request. Let's walk through the life-cycle of a data request to understand how Data Engine interacts with each component.

  1. When a request comes in, Data Engine uses the definition to find the metric's corresponding exposer.
  2. Data Engine uses the exposer to locate its corresponding accessors.
  3. Data Engine calls each accessor to read data from the database.
  4. Data Engine calls the exposer to transform the data from the accessors into a form that fulfills the request and then returns it to the client.

There are two things to notice about Data Engine's architecture:

  1. Data Engine drives the action to fulfill the request; you simply define the components it needs. This declarative architecture (as opposed to imperative) is what lets you create Data Engine providers via Spring XML files. More on that later.
  2. Three layers sit between the request and the database. The definition maps to the metric in Palantir and the accessor maps to the structure of the database. You may be wondering, why is there an exposer layer in the middle? In other words, why doesn't the definition map directly to the accessor? The reason is batching. We won't go into the details now, but if multiple exposers require data from an overlapping set of accessors, Data Engine automatically batches those accessor reads together.

Now, let's see how these three components are structured in a provider. Assume that we are creating a provider that lets users request data, such as IBM.ticker() and IBM.close(), from the client. For this provider, we create a definition for each metric and group them into a DefinitionProvider. Each definition refers to an exposer. These exposers then refer to a set of accessors that read data from the database. Our provider looks like this:

Notice in this provider, close() and vol() read data from the same accessor. This enables Data Engine to batch requests for those metrics.

Flow of Data in Data Engine

Providers based on Data Engine are initially difficult to visualize because it is unclear when and how the definitions, exposers, and accessors are called. The key is to realize that the Data Engine framework contains the logic that drives the action. Your provider inherits this Data Engine logic from the EngineDataStore class (which your DataStore must extend).

Let's analyze in detail what happens when a user requests data by entering IBM.ticker() in a client application. The following diagram shows the provider's order of execution when it receives the request:

There are a few things to note about this diagram:

  • The dark blue boxes are components you define.
  • Your DataStore extends the EngineDataStore class.
  • Some methods and details have been omitted to keep the diagram simple.

Here is what happens at each step (recall, the user just requested IBM.ticker() from the client):

  1. The provider receives the request (which contains IBM and ticker()) from the Data Router as a call to Provider.getPropertyValues. Since Provider does not implement getPropertyValues, it delegates the request to DataStore.getPropertyValues which is implemented by its base class, EngineDataStore.
  2. The getPropertyValues method looks up the ticker's property definition from DefinitionProvider. The getPropertyValues method then calls the evaluate method (also implemented by EngineDataStore), which grabs the exposer from the definition and drives the rest of the action.
  3. The evaluate method calls DataExposer.getDataAccessors to get the accessors that provide the raw data required to fulfill the request. In this case, the ticker exposer returns a single accessor that reads the ticker property from the database.
  4. The evaluate method calls to read the data.
  5. The method calls the database API to get the data from the database. You have full control over how you implement the database API, for example, using JDBC and c3p0 for connection pooling. In this case, the API returns the ticker "IBM" to the accessor.
  6. DataAccessor returns the raw data ("IBM") back to the evaluate method.
  7. The evaluate method calls DataExposer.translateRawData() with the raw data. This method converts and aggregates the raw data returned from the accessor(s) into a type that Palantir understands, such as Number, TimeSeries, or Model. In this case, both the raw type and the Palantir type are Strings, so no conversion is required.
  8. The evaluated method passes the data back to the getPropertyValues method, which passes it back to the Provider and then to the Data Router.

Component Description

This section describes each Data Engine component.

Core Components

This section describes each of the core components and their relationship with one another.

Definition Provider - groups all of the property and metric definitions in the provider into a single class. Data Engine accesses this class when it needs to look for a definition.

  • Property Definitions - map a property metric, such as ticker(), to an exposer. There is one property definition for every property type (AKA property metric, such as symbol() or ticker()) in your provider. Property definitions implement the PropertyDefinition interface and provide a property type and an exposer.
  • Metric Definitions - are equivalent to property definitions, but for data metrics (AKA time series data metrics, such as close() and volume()). Metric definitions implement the MetricDefinition interface and provide a data metric and an exposer.

Exposers - serve two functions: (1) specifies a set of (one or more) accessors, which supply the raw data needed to fulfill a request and (2) transforms raw data from accessors into a form that can be returned to the client. Exposers must implement two methods: getDataAccessors and translateRawData.

Accessors - read data from the database (or other data source). Data Engine batches the data requests based on the accessors that are required by the exposers. For example, in the following screenshot, the five exposers require seven pieces of raw data. However, since the raw data is exposed by two accessors, Data Engine would batch this into just two calls and then parcel the result sets out to the relevant exposers. This minimizes the number of database calls while keeping the logic for each property type and/or metric cleanly separated.

To maximize batching, developers should create an accessor for every batchable query, but should use separate accessors for each group of queries that are not "inter-batchable". Designing a proper set of accessors is critical to using Data Engine efficiently.

Other Components

This section describes other components you will see when writing Data Engine providers.

Model Type Provider - A component that can list the model types of all the models that the data provider exposes. For example, if the provider has data for MSFT and US, its Model Provider Type must return {MSFT, Type: Stock} and {US, Type: Country}.

Environment - Every object in the data provider has access to the environment object. Thus, the object is a good place to store resources required by your components such as:

  • A handle to the database API
  • Helper methods to parse or process data

Model Field Batch - The ModelFieldBatch class contains a set of models and a set of fields. Data Engine creates this object when it batches the data fields that the exposers need. It then passes the object to the accessors so that they know what raw data fields to read.

Field - A field is a parameter that is passed from an exposer to an accessor (via the ModelFieldBatch class). It usually contains an accessor-specific identifier that refers to a single per-model value, such as a database column name, a string key used in a JOIN to isolate per-model values, or a key used in a webservice to request values for a certain property or metric. It is most often a String but can be any object, such as a set of Strings (which is used if the metric has more than one parameter).

Next Steps

For a step-by-step guide on how to create a Data Engine based provider, see USTreas Devkit.

Need Help? Email us at           © 2014 Palantir Technologies  ·  Terms of Use  ·  Privacy and Security Statement