PhD Position in Artificial Intelligence for Data Lakes Management


Publicat el: 17/04/2023 |

Modalitat: |

Província: |


Dades de l'oferta de treball

Publicada: 17/04/2023

Modalitat de Treball:

Tipus de Contracte:

Tipus de Jornada:

Descripció de l'oferta de treball

The global data lake market size is projected to triple from 2019 to 2024 reaching $20.1 billion [1], and the European share is about one quarter of this amount [2]. Although data lakes are more than a promising approach, today’s solutions do not properly unleash the potential of data analysis, especially at a large cross-organization scale for several reasons. Firstly, data lakes are usually deployed and managed by a single party, and a centralized approach can lead to failure due to the complexity of the diverse data sources [3]. Secondly, the computing continuum (i.e. the resources located at the edge, fog, and cloud) is not fully exploited. To minimize impact of data transfer, data should be processed where they are generated, but at the same time security/privacy and governance concerns arise. Thirdly, data sovereignty [4] must be preserved, thus personal data or business sensitive data cannot leave the boundaries of the organization unless a proper data transformation is performed compliant with the organization’s policies and general norms, which often limits the data sharing. Finally, data lake implementations are not sustainable: because of the illusion created by low-cost storage devices and the assumption that all data have a huge value for companies [5], operators do not discriminate if a data is already stored or if it is useful, resulting in data duplications and storage of unused data.

Stretched Data Lakes aim to leverage Data Mesh and Data Fabric [6] concepts to address these challenges by enabling trusted, verifiable, and energy-efficient data flows across the edge-cloud continuum. They are based on a shared but decentralized approach for defining, enforcing, and tracking data governance requirements with specific emphasis on privacy/confidentiality. Moreover, by applying the principles of circular economy to data governance, i.e., to reuse data, application, and computation resources, Stretched Data Lakes will enable the creation of platforms for more energy-efficient and sustainable data analytics.

Requeriments mínims


Stretched Data Lakes, Data Mesh, Data Fabric, Cloud-Edge Continuum, Trustworthiness, Privacy-aware Data Management, Energy-efficient Data Operations

Com inscriure's a la formació?

Inscriure’s a

Persona de contacte: Fundació i2cat