Context
EDITO hosts its own cloud computing cluster, allowing users to run services and processes without having to think too much about the underlying infrastructure. On the other hand, the EDITO data lake is not a data lake in the usual sens: it's rather the "data access" component of EDITO, composed of both the EDITO data storage and the EDITO data catalogue.
Computing cluster
Elastic cluster
As mentionned above, EDITO is designed to spare users from having to worry about its underlying infrastructure (e.g virtual machines, CPU, GPU, etc.). The cluster is configured in a way to optimized computational resource usage at any time, by automatically scaling up and down computation nodes.
For example, when a user start a service, it will by default run on one of these nodes. The system automatically choose a node that have enough resources (CPU, RAM, disk storage) to host the new service.
📌Note: If no active nodes have enough resources, a new node will be automatically provisioned to host the service. In case of dynamic provisioning, users can experience some latency before their new service is up and running.
Resources requests and limits
Requests are the minimum guaranteed amount of a resource that is reserved for a service/process.
Limits, on the other hand, are the maximum amount of a resource to be used by a service/process. This means that the service/process can never consume more than the memory amount or CPU amount indicated.
📌Note: A service/process can use more resources than its request. However, it is not allowed to use more than its resource limit. For CPU resources, the limit act as a threshold, throttling your service/process when reached.
On the other hand, when a service/process tries to consume more than the allowed amount of memory, the system kernel terminates the process that attempted the allocation, with an out of memory (OOM) error.
Virtual CPUs, milli CPUs
The computational resource a service needs to run on the cluster is expressed as mCPU (milli CPU).
The mCPU is used in cloud resources to express the amount of CPU usage time requested by a service to run. While 1000 mCPU correspond to 1 vCPU (virtual CPU, users can consider a vCPU is a CPU), requesting 500 mCPU means the service will request 50% of the time of a CPU to run. This means all the CPU’s cores can be exploited at any time, allowing computation parallelization/multi-threading. Specifically, it’s not because you request less than 1000 mCPU that you can’t run a multi-threaded service.
CPU nodes and GPU nodes
Currently there are two types of computational nodes available in EDITO cloud computing cluster: CPU nodes and GPU nodes. GPUs have a specific resources, such as VideoRAM. To use GPU a user need to start a service or a process that is configured for GPUs.
Current node configurations
The following table summarize the current cloud computing cluster configuration:
Node type | vCores | RAM (GB) | Web Disk Storage SSD (GB) | VideoRAM | min. node count | max. node count |
CPU | 8 | 32 | 128 |
| 1 | 30 |
GPU | 16 | 112 | 320 | 48 | 1 | 4 |
This configuration is arbitrary and does not reflect the capacity of the platform once fully operational.
Please contact the EDITO User Support using the widget at the bottom right if you need nodes with higher capabilities.
Distributed computing frameworks
EDITO architecture supports distributed computing frameworks, such as Dask or Spark, that services or processes can rely on.
Please contact the EDITO User Support using the widget at the bottom right if you need access to a particular framework.
Quotas
EDITO is public and share resources funded by the European Commission. To avoid abuses, users have usage quotas. The following table summarizes the current quotas for personal and group projects:
Project kind | CPU vCores | RAM (GB) | Web Disk Storage SSD (GB) | GPU vCores | max. pod count |
Personal project | 8 | 32 | 50 | 1 | 10 |
Group project | 16 | 64 | 100 | 1 | 100 |
📌Note: “pods” are entities in which services and processes run. For simplicity, one can consider a service or process needs one pod to run.
If your services or processes never launch, face performance issues, or are stopped without an explicit action of yours, the root cause might be due to these restrictions. Please contact the EDITO User Support using the widget at the bottom right if you need bumped quotas.
Data lake
As mentionned above, the EDITO data lake is not a data lake in the classical sens; it is rather the “data access” component of EDITO, composed of both the EDITO data storage and the EDITO data catalogue.
Data storage
EDITO provides an elastic cloud object storage allowing users to store personal, group and public data.
Basically, as a user, you have access to your personal storage that you only can manage (you can make part of it publicly accessible). There are also group storage, managed by the group members and a “public” storage, managed by the EDITO Team, in which everything is public.
While not running on Amazon Web Service, the object storage are compatible with AWS S3 API:
Storage kind | Technology | Governance / Management / Ownership |
Personal project | One S3 bucket | User |
Group project | One S3 bucket | Group members |
Public | Several S3 buckets | Administrators (for now) |
Owners of personal or group project storage can decide of the visibility of their storage content; they can share data or make the publicly available. Learn more about interacting with your storage here.
Quotas
EDITO is public and share resources funded by the European Commission. To avoid abuses, users have usage quotas. The following table summarizes the current quotas for personal and group projects, as well of for the public storage:
Storage kind | Volume amount (GB) |
Personal project | 20 |
Group project | 20 |
Public | N/A |
Please contact the EDITO User Support using the widget at the bottom right if you need bumped quotas.
External data storage
External S3-compatible storage can be configured in project settings, allowing to seamlessly work with it in parallel of (or instead of) the EDITO data storage.
Learn more about connection with external storage here.
Data catalogue
Referencing data
The EDITO data catalogue can reference data inside EDITO data storage or external data.
For example, it references Copernicus Marine data that are actually stored and managed by the Copernicus Marine Service. On the other side, it also references the ARCO versions of EMODnet data, that are stored in the public data storage of EDITO.
You can also reference data (hosted on your personal storage or any external services/platforms/projects) by interacting with the Data API.
Browsing/searching data
You can browse and search for data in the EDITO data catalogue graphically with the EDITO viewer or programmatically with the Data API.
Localisation
Currently, main EDITO cloud resources are provided by CloudFerro in Warsaw region WAW3-1.
What's next?
If you have any questions, problems, or suggestions, please feel free to contact us via chat using the widget available at the bottom right of the page.
