The FHIR® specification (release 4) has been published, and along with this, the scope of FHIR is also increasing. Ancillary specifications are under development, these build on the core FHIR specification to meet specific and required needs. These specifications take common use cases and provide guides that describe them in detail. Along with architectural options and supporting FHIR artifacts, this helps prevent the same problem being solved in multiple ways by different implementers.
The FHIR Bulk Data Export Specification
One of these is the FHIR Bulk Data Extract, which is designed to meet use cases where information from multiple patients, needs to be extracted from a data source such as an Electronic Health Record (EHR) or FHIR server, to support requirements such as:
- Obtaining data: from an EHR for population-based research and training data for a machine learning algorithm or quality metrics.
- Moving data: in bulk between servers – perhaps when medical practices and EHR’s merge or when an organisation changes their EHR. Another example would be moving directory data between servers as described in the Validated Healthcare Directory Guide (VHDir) Implementation Guide.
- Extracting data: entire data sets on patient groups for analytics – patients on a clinical trial or under Case Management.
The data that is returned might include Personal Health Information (PHI), or it may have been de-identified if this is not appropriate. It’s important to appreciate that this specification is in the early stages of development and like FHIR itself, is being tested as it is being developed at connectathons. Many major EHR vendors, as well as plenty of others, are involved in the FHIR bulk data extract development. Here are the high-level results from the connectathon held in January at the 2019 Working Group Meeting in San Antonio.
The client submits a request for data, and the server will create a file containing the data for the client to consume. The actual request can be specified in a number of ways:
- Firstly, the extract could be across all the patients in the server, or just a subset of patients – as defined by the Group
- Also, then you can also specify which resource types you are interested in, and a ‘resource modified’ date.
- There’s also a new ‘type filter’ that’s being discussed that allows you to get more granular (e.g., include blood pressure and pulse observations only in the last six months).
The Asynchronous Approach
Because of the large size of the data that can result from these searches, the specification describes an ‘asynchronous’ approach – in other words, the client doesn’t wait for the server to create the extract as this could take a long time. The details of this are in the specification, a brief description is:
- The client makes the request of the server called the ‘kickoff’ request. If the request is accepted, the server returns a location that the client can query to determine when the extraction has been completed.
- The client periodically polls the server using the address it has been given to monitor the progress of the extraction. When the extraction is complete, the server will return an ‘outcome’ response that has the location if the file/s contain the extracted data, any errors that may have occurred, security details for actually downloading the data and others.
- The client then downloads the files containing the extracted data (in ndjson format – which describes how to have multiple JSON objects in a single file) from the specified location. Observing whatever security measures were specified, for example, it may have to provide a specific access token that was previously negotiated with the server.
- Finally, the client informs the server that the data has been downloaded, and the files can be deleted. (Of course, nothing is stopping the server doing this automatically after a time interval).
The image below shows this process (from the presentation referenced below).