Skip to main navigation menu Skip to main content Skip to site footer
×
Español (España) | English
Editorial
Home
Indexing
Original

Data Lake Management System based on Topic Modeling

By
Amine El Haddadi ,
Amine El Haddadi

Data Science and Competetive Intelligence Team (DSCI), ENSAH, Abdelmalek Essaâdi, University (UAE) Tetouan, Morocco

Search this author on:

PubMed | Google Scholar
Oumaima El Haddadi ,
Oumaima El Haddadi

Data Science and Competetive Intelligence Team (DSCI), ENSAH, Abdelmalek Essaâdi, University (UAE) Tetouan, Morocco

Search this author on:

PubMed | Google Scholar
Mohamed Cherradi ,
Mohamed Cherradi

Data Science and Competetive Intelligence Team (DSCI), ENSAH, Abdelmalek Essaâdi, University (UAE) Tetouan, Morocco

Search this author on:

PubMed | Google Scholar
Fadwa Bouhafer ,
Fadwa Bouhafer

Data Science and Competetive Intelligence Team (DSCI), ENSAH, Abdelmalek Essaâdi, University (UAE) Tetouan, Morocco

Search this author on:

PubMed | Google Scholar
Anass El Haddadi ,
Anass El Haddadi

Data Science and Competetive Intelligence Team (DSCI), ENSAH, Abdelmalek Essaâdi, University (UAE) Tetouan, Morocco

Search this author on:

PubMed | Google Scholar
Ahmed El Allaoui ,
Ahmed El Allaoui

Data Science and Competetive Intelligence Team (DSCI), ENSAH, Abdelmalek Essaâdi, University (UAE) Tetouan, Morocco

Search this author on:

PubMed | Google Scholar

Abstract

In an environment full of competitiveness, data is a valuable asset for any company looking to grow. It represents a real competitive economic and strategic lever. The most reputable companies are not only concerned with collecting data from heterogeneous data sources, but also with analyzing and transforming these datasets into better decision-making. In this context, the data lake continues to be a powerful solution for storing large amounts of data and providing data analytics for decision support. In this paper, we examine the intelligent data lake management system that addresses the drawbacks of traditional business intelligence, which is no longer capable of handling data-driven demands. Data lakes are highly suitable for analyzing data from a variety of sources, particularly when data cleaning is time-consuming. However, ingesting heterogeneous data sources without any schema represents a major issue, and a data lake can easily turn into a data swamp. In this study, we implement the LDA topic model for managing the storage, processing, analysis, and visualization of big data. To assess the usefulness of our proposal, we evaluated its performance based on the topic coherence metric. The results of these experiments showed our approach to be more accurate on the tested datasets.

How to Cite

1.
El Haddadi A, El Haddadi O, Cherradi M, Bouhafer F, El Haddadi A, El Allaoui A. Data Lake Management System based on Topic Modeling. Data and Metadata [Internet]. 2023 Dec. 28 [cited 2024 May 17];2:183. Available from: https://dm.saludcyt.ar/index.php/dm/article/view/183

The article is distributed under the Creative Commons Attribution 4.0 License. Unless otherwise stated, associated published material is distributed under the same licence.

Article metrics

Google scholar: See link

Metrics

Metrics Loading ...

The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Similar Articles

You may also start an advanced similarity search for this article.