Skip to main navigation menu Skip to main content Skip to site footer
×
Español (España) | English
Editorial
Home
Indexing
Original

TextRefine: A Novel approach to improve the accuracy of LLM Models

By
Ekta Dalal ,
Ekta Dalal

Deenbandhu Chhotu Ram University of Science and Technology, Computer Science and Engineering. Sonipat, India

Search this author on:

PubMed | Google Scholar
Parvinder Singh ,
Parvinder Singh

Deenbandhu Chhotu Ram University of Science and Technology, Computer Science and Engineering. Sonipat, India

Search this author on:

PubMed | Google Scholar

Abstract

Natural Language Processing (NLP) is an interdisciplinary field that investigates the fascinating world of human language with the goal of creating computational models and algorithms that can comprehend, produce, and analyze natural language in a way that is similar to humans. LLMs still encounter issues with loud and unpolished input material despite their outstanding performance in natural language processing tasks. TextRefine offers a thorough pretreatment pipeline that refines and cleans the text data before using it in LLMs to overcome this problem . The pipeline includes a number of actions, such as removing social tags, normalizing whitespace, changing all lowercase letters to uppercase, removing stopwords, fixing Unicode issues, contraction unpacking, removing punctuation and accents, and text cleanup. These procedures work together to strengthen the integrity and quality of the input data, which will ultimately improve the efficiency and precision of LLMs. Extensive testing and comparisons with standard techniques show TextRefine's effectiveness with 99% of the accuracy.

How to Cite

1.
Dalal E, Singh P. TextRefine: A Novel approach to improve the accuracy of LLM Models. Data and Metadata [Internet]. 2024 May 20 [cited 2024 Jun. 24];3:331. Available from: https://dm.saludcyt.ar/index.php/dm/article/view/331

The article is distributed under the Creative Commons Attribution 4.0 License. Unless otherwise stated, associated published material is distributed under the same licence.

Article metrics

Google scholar: See link

Metrics

Metrics Loading ...

The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.