Context
In order to contribute to EDITO, a few requirements need to be respected when onboarding applications, computational services and model components. These requirements ensure consistency, interoperability and operational reliability within EDITO's could-native environment.
Technical requirements for Datasets
Self-Describing Data Principle
All datasets onboarded to EDITO must follow a self-describing data approach. This means that datasets must either contain embedded metadata sufficient to interpret the data without external documentation, or they must be accompanied by comprehensive external metadata files.
The objective of this requirement is to prevent ambiguities and to ensure that datasets remain interpretable over time, even when accessed independently from their original context. Self-describing formats significantly reduce the need for custom parsing tools and promote interoperability within cloud-based processing environments.
Spatial and Temporal Referencing
As marine and environmental datasets are inherently spatial and temporal, adequate geospatial and time descriptors are mandatory. Datasets must include explicit coordinate information, preferably expressed in the WGS 84 coordinate reference system (EPSG:4326), unless a justified alternative projection is required. In all cases, the coordinate reference system must be clearly documented.
Temporal information must follow the ISO 8601 standard for date and time formatting. Where relevant, depth or elevation information must also be included and clearly specified. Spatial bounding boxes and temporal extents must be provided within the dataset metadata to support indexing and discovery.
Controlled vocabularies and semantic interoperability
To ensure semantic consistency across datasets, parameter names and variable descriptions should rely on recognized controlled vocabularies. Recommended vocabularies include those maintained by the CF Standard Name Table, Darwin Core, and BODC Parameter Usage Vocabulary.
The use of controlled vocabulary enhances machine readability, improves cross-dataset integration, and supports harmonization across marine data infrastructures.
Accepted Data Formats
Standard Formats
EDITO supports widely adopted, open, and interoperable data formats. For gridded environmental data, NetCDF files conforming to CF conventions (version 1.6 or later) are strongly recommended. NetCDF is preferred due to its embedded metadata capabilities and suitability for multidimensional environmental datasets.
Tabular datasets may be submitted in CSV format, provided that spatial and temporal fields are included and properly structured. Since CSV does not inherently support extensive metadata, a separate metadata document must accompany the dataset. Vector geospatial data may be provided in formats such as Shapefile, GeoJSON or GeoPackage, provided that coordinate reference system information is included and documented.
📌 Note: the list of supported formats may evolve over time in response to technological developments and community needs.
Analysis Ready Cloud Optimized Formats (ARCO)
For large-scale datasets or datasets intended for high-performance cloud processing, Analysis Ready Cloud Optimized (ARCO) formats are strongly encouraged, and preferred over the standard formats. These formats are designed for scalable access in distributed computing environments.
Supported ARCO formats include Cloud Optimized GeoTIFF (COG), GeoParquet, and Zarr.
📌 Note (COMING SOON): where necessary, the EDITO Publishing Toolkit may assist in converting standard formats into ARCO-compliant formats. The adoption of ARCO formats ensures improved performance, partial data access, and reduced data transfer overhead.
Metadata catalogue registration
All datasets intended for discovery or public access must be registered in the EDITO SpatioTemporal Asset Catalogue (STAC). Metadata must be fully compliant with the STAC specification and sufficiently complete to ensure technical validity, interoperability, and discoverability. This includes not only descriptive metadata (e.g., title, abstract, keywords, spatial and temporal extent, licensing, version, and contact details), but also the necessary structural and relational metadata required for a valid STAC representation.
All STAC entities must be correctly structured, linked, and validated in accordance with the STAC core specification and applicable extensions.
💡 EDITO Pro Tip: see the STAC specifications to ensure that your metadata will fit into the EDITO STAC.
EDITO can ingest and process metadata provided according to standards and conventions commonly used across the European marine data ecosystem, particularly those developed under EMODnet and Copernicus Marine Service. These include CF conventions for gridded data, ISO 19115 and INSPIRE metadata standards.
By supporting these formats, EDITO can use, transform, and harmonize incoming (meta)data within the EDITO publishing toolkit, enabling efficient integration with the broader European data landscape.
Storage and directory structure
Upon approval, data providers are assigned a designated S3 object storage bucket within the EDITO Data Lake. Providers must follow the recommended directory structure to ensure logical organization, traceability, and version control.
Any deviation from the recommended structure must be clearly described in the metadata. Public datasets must follow versioning principles and remain immutable once published. Updates must result in a new version with corresponding metadata adjustments.
The default allocation of 20 GB of S3 object storage per EDITO user may be increased upon request through the EDITO User Support.
Responsabilities of data providers
Data providers remain responsible for the quality, integrity, and correctness of their datasets. They are expected to maintain accurate documentation, respond to user queries, and notify EDITO of significant structural or licensing changes.
⚠️ Failure to maintain compliance with technical or legal requirements may result in temporary suspension or removal of datasets from the platform.
Questions & Answers
Is it possible to publish datasets on EDITO with a DOI?
Is it possible to publish datasets on EDITO with a DOI?
Currently there is not an integrated way to publish datasets on EDITO with a DOI i.e. you can eventually add a dataset with an existing DOI and add this information within the attached metadata, but this is not a straightforward task.
What about versioning?
What about versioning?
Versioning is not supported. But both DOI support and versioning are planned to be integrated in EDITO.
What's next?
In the coming months, a new process will be implemented for integrating data into EDITO’s public catalogue, ensuring traceability and quality assurance from submission to publication. Contributions may be integrated for restricted use or public dissemination through the EDITO catalogue, with scientific validation processes in place to safeguard data quality and reliability.
If you have any questions, problems, or suggestions, please feel free to contact us via chat using the widget available at the bottom right of the page.
