Recuperación, Extracción y Clasificación de Información de Saber UCV

Avendaño Infante, José Miguel

Please use this identifier to cite or link to this item: https://saber.ucv.ve/jspui/handle/10872/23400

Title:	Recuperación, Extracción y Clasificación de Información de Saber UCV
Authors:	Avendaño Infante, José Miguel
Keywords:	recuperación de información procesamiento del lenguaje natural inteligencia artificial embeddings búsqueda semántica mapas de conocimiento information retrieval natural language processing artificial intelligence embeddings semantic search knowledge maps
Issue Date:	3-Feb-2025
Abstract:	Se presenta la investigación Recuperación, Extracción y Clasificación de Información de Saber UCV, donde se ejecutan procesos de clasificación, almacenamiento y recuperación de información sobre las tesis y trabajos de grado que se encuentran publicados en el repositorio institucional Saber UCV. En tal sentido, se implementa un sistema que clasifica, según el área académica donde cursó estudios el autor, el 96% de las 9.982 investigaciones publicadas. Adicionalmente, con los textos de los resúmenes de los trabajos y con las clasificaciones obtenidas, se conforma un corpus al cual se le aplican técnicas de procesamiento de lenguaje natural, de minería de texto y con modelos de inteligencia artificial preentrenados se crean embeddings desde los documentos. Finalmente, con toda la información procesada se alimenta una base de datos indexada que contiene un índice invertido.Por otra parte, el sistema cuenta con una aplicación web para hacer procesos de recuperación de información donde el usuario puede explorar el corpus, mediante la búsqueda semántica y la búsqueda de texto completo, indicando los siguientes valores: texto a buscar, rango de fechas, área en la cual se generó la investigación y nivel académico; posteriormente se recuperan los trabajos de mayor relevancia, enriqueciendo la experiencia con la presentación de los resultados en tablas interactivas, mapas de conocimiento y recomendaciones de documentos que puedan ser de interés. La implementación se hace bajo un sistema distribuido con la arquitectura cliente-servidor y se soporta en el uso de contenedores orquestados. The research Recovery, Extraction and Classification of Information from Saber UCV, is presented, where processes of classification, storage and retrieval of information on theses and degree works published in the institutional repository Saber UCV are executed. In this sense, a system is implemented that lassifies 96% of the 9,982 research papers to be categorized according to the academic area where the author of the research studied. Additionally, with the texts of the abstracts of the papers and the classifications obtained, a corpus is formed to which natural language processing and text mining techniques are applied, and with pre-trained artificial intelligence models, embeddings are created from the documents. Finally, all the processed information is fed into an indexed database containing an inverted index. On the other hand, the system has a web application for information retrieval processes where the user can explore the corpus, through semantic search and full text search, indicating the following values: text to search, date range, area in which the research was generated, academic level; subsequently, the most relevant works are retrieved, enriching the experience with the presentation of the results in interactive tables, knowledge maps and recommendations of documents that may be of interest. The system is implemented under a distributed system with client-server architecture and is supported by the use of orchestrated containers.
URI:	http://hdl.handle.net/10872/23400
Appears in Collections:	Maestría

Files in This Item:

File	Description	Size	Format
TG_JMAI_veredicto-RECUPERACIÓN, EXTRACCIÓN Y CLASIFICACIÓN DE INFORMACIÓN DE SABER UCV.pdf		4.25 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets