Skip to content
A. Stoewer edited this page Apr 18, 2013 · 20 revisions

Overview

The Pandora project started as an initiative of the electrophysiology Task Force which is part of the INCF Datasharing Program. As such the project aims to develop standardized methods and models for storing electrophysiology and other neuroscience data together with their metadata in one common file format based on HDF5.

In order to achieve this, Pandora uses highly generic models for data as well as for metadata and defines standard schamata for HDF5 files which can represent those models. Last but not least Pandora aims to provide a convenient C++ library to simplify the access to the defined format.

Basic considerations

Currently there are numerous models and file format available that aim at describing electrophysiological data. Some of them are well established and widely used among scientists or vendors of recording equipment. Therefore we should ensure that there is no file format or data model available that already meets our demands.

But what are the requirements for such a standard?

  • Such a standard should define a clear data model, that exists outside the concrete implementation of a file format. Because other tools like web-services or databases may have the same requirements to the underlying model. Ideally a consolidation beyond the usage of a common file format could be achieved.
  • It should be able to represent any data and metadata that are used by now or will be used in the near future. Thus, it should allow to work with more than analog signals, but also other derived or related data like histograms and figures etc.
  • Should not exclude any feature that is already available in existing models or formats.
  • Should be easily convertible and compatible to other formats.
  • All available formats have no or only limited support for metadata, therefore a new standard should allow the integration of arbitrary metadata and should provide means to annotate data with them.
  • As flexible as possible but still suited for automated processing and evaluation.

Which data formats are on the market?

In this section we want to provide a short overview over a few existing file formats or data models. This overview is not meant to be a comprehensive evaluation, but merely an aggregation of the capacities of existing solutions.

NEO

  • Well established standard for neurophysiological data used by several project like Open Electrophy, G-Node, PyNN, NeuroTools.
  • Very limited support for data annotation or metadata.
  • Native support for multielectrode recordings and spike sorting issues but no support for other kinds of data.
  • Not fully compatible with Neuroshare.

Neuroshare

  • Vendor supported library limited to electrophysiological data.
  • No specified data format. Only interfaces to proprietary formats.

CARMEN NDF

  • Matlab based format which requires knowledge about Matlab internal representations of e.g. cell arrays.
  • Basic support for certain kinds of metadata.
  • Matlab is widespread which may increase the acceptability.

MTSF

  • Limited to time-series data.
  • Proven ability to handle very large files.
  • Designed for simulation data.
  • Limited annotation capabilities.

From this short comparison of available formats and models we conclude that none of them fulfills all the demands of a prospective standard for a file format for electrophysiology related data and metadata.

What should be defined by the standard?

Data model: In order to get an idea about all kinds of entities, that can be stored in a file and to understand their relations, a well defined data model is needed. From our point of view it is beneficial if this model can not only be applied to the implementation of a file format but can also be used in situations where the same kind of data is stored using non file-based storage e.g. a database.

File format: The modality of how each entity or relation is physically stored in the file, must be described in detail. Otherwise interoperability between different software solutions using this format could not be guaranteed. In this draft we don’t even try to come up with such a definition. However, at the end of this document, we will briefly demonstrate how the suggessted data model could be implemented based on HDF5.

API specification: Since there probably will be more than one implementation of library accessing this file format it might be beneficial to provide also a recommendation for an API specification.

Linking data and metadata

Linking data and metadata, in our view, is essential for an INCF recommendation. Neurophysiological data only lives in the context of its metadata. These descriptions range from settings and properties of applied hardware and environmental conditions to the rather complex descriptions of the applied (multimodal) stimuli. It must be possible to annotate the data with sufficient metadata. On the other hand providing the metadata should not be obligatory. In other words, the data model on its own must provide enough information to make some sense out of the stored data. Thus, the approach presented here allows extended annotation with arbitrary metadata and linking it to the data but does neither depends on it nor forces the use of it.

^ | >>

Clone this wiki locally