As big data platforms become more popular, two models of data management – aggregated and federated – are emerging as two of the most common patterns, but which one is better?

The main difference between the two models is how they store, access, and surface data. The aggregated model (also known as a centralised model) stores data within its system and allows users access to the information in one centralised repository to manipulate the data, normalise it, combine it and create new insights.

While this model is popular, it doesn’t necessarily meet every organisation’s requirements. There can be legal and technical barriers when aggregating data in one centralised location, which is where the federated model becomes attractive.

In contrast, the federated approach surfaces just enough information about the data to tell users where the complete information sits, in the same way that a phonebook will show a person’s name, address and phone number but not much more. This adds an extra step in retrieving full files such as lab results, but if enough summary information is available a user can potentially avoid ever using the full file in the first place.

In the case of lab results, the portal that users interact with could surface enough metadata to display key pieces of information – such as the date of a result, type of test, and whether it’s conclusive or not – to avoid having to open the full file. If a user was needing to drill into a result then they would be able to then go to the original information. The federated model is essentially a well-informed index that provides a path to the original data.

Aggregated Data Model:

Pros:

  • Lower latency than many federated systems
  • High data availability due to a reduced reliance on external systems
  • Simpler to normalize the data
  • All the data is 'at hand' for complex transformations or analytics 

Cons:

  • More storage required as the aggregated model stores everything that is required (this can result in added cost and complexity)
  • Creates an additional copy of data – potential sync challenges
  • Could end up storing more than you should (added cost and complexity), especially if requirements for data are initially uncertain

Federated Data Model:

Pros:

  • Often a lighter weight platform than aggregated
  • Avoids creating additional full copies of data
  • Easier to introduce new systems/data sources/data fields

Cons:

  • Potential for greater latency - this is especially an issue if the index or summary information is insufficient so that source systems must be tapped the majority of the time
  • Potential for lower availability due to dependency on the health of external systems 

So, which of the models is better? It’s subjective and really depends on a user’s situation. Each model offers a number of pros and cons, and act as different sides of the same coin. Federated data is lighter weight but can have latency issues, aggregated data takes up a lot of room but is typically faster – each one is similar but also very different, one model’s pros are the others’ cons.

While, there may be different approaches to storing, distributing and accessing data, there’s no single right way to do it as each have their benefits and shortfalls.  Ultimately, it’s about ensuring secure and timely access to the right information, wherever that data may reside.

The market is starting to demand that large-scale data platforms incorporate the best aspects of both models. To capitalise on this, developers need to incorporate ideas from both models to truly meet the needs of their users both now and in the future.