Implementation of Naive Bayes classification algorithm for Twitter user sentiment analysis on ChatGPT using Python programming language

ChatGPT (Generative Pre-Trained Transformer) is a chatbot that is being widely used by the public. This technology is based on Artificial Intelligence and is capable of having conversational interactions with its users just like humans, but in the form of automated text. Because of this capability, online forums such as Brainly and the like can be overtaken by these smart chatbots. Therefore, this study was conducted to determine the positive and negative sentiments towards ChatGPT using Naive Bayes Classification algorithm on 5000 Twitter users. Data was collected by scraping technique and Python programming language was used in data analysis. The results showed that the majority of Twitter users had a positive sentiment of 57,6 % towards ChatGPT, while the negative sentiment reached 42,4 %. The resulting classification model had an accuracy of 80 %, indicating a good classification model in determining sentiment probabilities. These findings provide a basis for the development of better AI chatbot technology that can meet user needs. The results of this study provide insights into user sentiment towards ChatGPT and can be used as a reference for future AI chatbot development


INTRODUCTION
Currently, global internet users have reached 5,15 billion people as of January 2023. This number accounts for 64,4 % of the global population, which totalled 8,01 billion people. The number of global internet users in January 2023 increased by 1,9 % compared to the same period last year (year-on-year/yoy), which was still 5,01 billion people. (1)

Figure 1. Graph of internet users around the world
One of the technologies that is currently being used by the public is AI chatbot technology called ChatGPT. ChatGPT stands for Generative Pre-Trained Transformer, a chatbot based on Artificial Intelligence technology that can carry out conversational interactions with its users in a sophisticated manner. This chatbot is able to answer user questions with the same steps as humans but in the form of automated text. (2) But on the other hand, there are allegations of the impact of the presence of ChatGPT, namely that some learning forums will be less attractive because they will start to switch through the help of ChatGPT.
Not only learning forums, human jobs may be replaced by Artificial Intelligence technology, one of which is ChatGPT, and can damage the homework system because students may cheat on answers directly from ChatGPT without trying to do the assignment independently. Because of these allegations, it is necessary to analyse the sentiment of Twitter users towards ChatGPT. Thus, it can be known the positive and negative views of ChatGPT which is very useful for the community in using ChatGPT, as well as helping ChatGPT developers in developing the application.
The Naïve Bayes classification algorithm was used with the research subject in the form of Twitter users who made uploads (tweets) containing chatGPT words. The Naïve Bayes algorithm was chosen because its reliability has been proven by previous researchers, regarding "Sentiment analysis of economic recovery in Indonesia after the covid-19 pandemic on twitter using the naive bayes classifier algorithm". The results showed that the Naive Bayes Classifier algorithm was able to classify tweet data with an accuracy value of 78 %, class precision positive predictions 96 % while negative predictions 31% and recall obtained from true positive of 78 % while true negative of 75 %. The results obtained for sentiment classification using Naive Bayes Classifier on public tweets have achieved maximum expectations. (3)

METHODS
Method the implementation of the Naive Bayes classification algorithm of twitter users against chatGPT involves several stages shown in the diagram below. 1. Data collection stage: at this stage, third-party software called Snscrape is used to collect data from Twitter through web scraping techniques. Although Snscrape is not part of the official Twitter API, it is reliable for retrieving items such as user profiles, hashtags, or relevant search results. "Snscrape is a scraper for social network services (SNS) that has the ability to retrieve posts that are relevant and related to the research objectives". (4) The stages of data collection are in the diagram below.

2.
Preprocessing stage: is a technique used to transform raw data in a useful and efficient format. This stage is necessary because raw data is often incomplete and has an inconsistent format. Data quality itself has a direct correlation with the success of any project involving data analysis. (5) In preprocessing, there are also several stages including case folding, cleaning, tokenizing, text normalisation, stopword removal, stemming, removing duplicate data, and removing NaN (not at number) data. The preprocessing stages are in the diagram image below. 3. Sentiment labelling stage: after preprocessing is complete. Sentiment labelling uses the Opinion Lexicon approach. Opinion lexicon is a list of words or phrases categorised by sentiment polarity, such as positive, negative, or neutral. Opinion lexicon is usually used in sentiment analysis to help machines or computer programs understand and extract opinion polarity or sentiment from text. The opinion lexicon used by hu and liu consists of approximately 6800 words. (6) 4. Implementation of Naive Bayes Classification Model stage: after the tweet data has been labelled, the next step is to proceed to the implementation stage of the Naive Bayes classification model. The purpose of this stage is to evaluate the extent to which the model that has been built can be accurate in determining positive and negative sentiments in the sentiment analysis of Twitter users towards ChatGPT. Naïve Bayes has the ability to fast in modelling, has predictive ability and also provides a new method of exploring and understanding data. (7) Due to the large amount of tweet data, it is necessary to add more optimisation in the classification model, namely cross validation, Cross-validation (CV) is a statistical method that can be used to evaluate the performance of a model or algorithm where the data is separated into two subsets, namely learning process data and validation / evaluation data. The model or algorithm is trained by the learning subset and validated by the validation subset. Furthermore, the selection of CV type can be based on the size of the https://doi.org/10.56294/dm202345 dataset. Usually, K-fold CV is used because it can reduce computation time while maintaining the accuracy of the estimation. (8) The following below is a diagram of the stages in implementing the Naive Bayes classification model. Figure 5. Implemention the Naive Bayes classification model stages 5. Data visualisation stage: a bar chart graph will be used to see the top 10 most frequent words in the tweet data, so that the most frequent words in the data can be identified. In addition, wordcloud will also be used for visualisation. A word cloud is one of the visualisation techniques that consists of a collection of words that appear the most when the dataset is analysed. The size of the letters is determined by the intensity with which the word is used. The more often it is used, the larger the letter size of the word. (9) the information obtained from this visualisation will be useful for visualising the data from the research, making it easier for researchers to understand the data that has been collected.

RESULTS AND DISCUSSION
1. Data collection results: the data taken amounted to 5000 tweet data with the initial time set of 1 January 2023 to 09 March 2023 the data taken in the form of tweet dates, usernames, and the tweets themselves with the keyword #ChatGPT, the contents of the tweets are in English After the data is successfully scraped and then save it into csv format. 5000 tweets about chatGPT were collected and ready to enter the preprocessing stage. Following table 1 is the result of data collection.

2.
Preprocessing results: in addition to the previously mentioned noise, scraped datasets may also contain unwanted characters or irregular sentences due to abbreviated writing. These characters can interfere with the quality and accuracy of the dataset, while irregular sentences can make data processing more difficult.
The following are the results of preprocessing including casedolding, cleaning, tokenizing, text normalisation, stopword removal, stemming, remove duplicate data and remove NaN (not in number) data. Table 2 shows the results of case folding, Case folding is a process in text preprocessing that is done to uniform the characters in the data. Case folding is the process of converting all letters into lowercase letters. In this process the characters 'A'-'Z' contained in the data are converted into 'a'-'z' characters. (10) Table 3 shows the results of cleaning, which removes noise characters such as links, @, hashtags, etc. Data cleaning in sentiment analysis is the process of removing redundant and incorrect values in the data intended for analysis. It is an important step in the sentiment analysis process. (11) Table 4 shows the results of tokenising, which breaks down tweets into words to facilitate the text normalisation process. Tokenising is the operation of separating text into pieces in the form of tokens, which can be pieces of letters, words, or sentences, before further analysis. Entities that can be referred to as tokens include words, numbers, symbols, punctuation marks, etc. (12) Table 5 shows the results of text normalisation, which is converting text into a standard or common form that can be processed and interpreted more easily. converting mistyped words into the correct form. With normalisation, text can become more consistent, easier to read, and easier to process. Table 6 shows the results of Stopword removal, which is a process that removes words that do not function but often appear in the tweets that have been obtained. https://doi.org/10.56294/dm202345 ['technology', 'will', 'not', 'replace', 'people', 'need', 'human', 'touch', 'these', 'complex', 'situation', 'telephone', 'operator', 'elevator', 'operator', 'radiologist', 'radiologist', 'radiology', 'chatgpt']  Table 7 shows the results of stemming is Stemming is a process to change the sentence into basic words, namely by removing words that contain affixes at the beginning of the sentence or at the end of the sentence. After the stemming process is complete, the next step is to remove duplicate data, the amount of data before deleting duplicates is 5000 and the amount of data after deletion is duplicate: 4950, meaning there are 50 duplicate data then remove NaN data (Not at Number) the results obtained are empty data totalling 0, meaning that the data is very clean and ready to be used for sentiment labelling.

Sentiment labelling results
: in the sentiment labelling process is using supervised learning method. Supervised learning is a machine learning approach that uses labelled data or datasets that are already known by the designer. These pre-designed data are expected to train "supervise" algorithms for classification or prediction of a case accurately. (13) Obtained tweet data with positive sentiment as much as 3681 and negative sentiment as much as 918 so the total amount of data from the neutral data elimination is 4599. Table 8 show sample of the results of the sentiment labelling process. Openai chatgpt information get information late mother little know accurate detail unsure also incorrect information confused negative 9 11 4. Implementation of Naive Bayes Classification Model result: after sentiment labelling, it means that it has entered the final stage of data analysis of twitter user sentiment towards chatGPT, in the implementation of the naive bayes classification model there are several sub-stages including data split, Tf-IDF word weighting, cross validation optimization, Naive bayes model classification, and confusion matrix evaluation.
Split data or Data splitting is a method of dividing data into two or more parts that form a subset of data. Generally, data splitting separates two parts, one part is used to evaluate or test the data and the other is used to train the model. (14) The dataset is divided into training and testing subsets with a ratio of 80:20, where 20 % of the data is used for testing. The data division was done randomly using random_state with a random number seed of 42. From the results of the division, it was found that the train data amounted to 3679 and the test data amounted to 920.
Tf-IDF word weighting, Naïve Bayes classification model using the results of TF-IDF gives better average accuracy than without using TF-IDF. accuracy is better than without using TF-IDF. (15) there are 920 documents in the test dataset each represented by a vector with 8486 dimensions (columns), which represents the unique words in the whole dataset. Sample output of X_test_tfidf: such as (0,8230) 0,17023511025684257 shows that in the first row and 8230th column of the matrix, the TF-IDF weight value of the word represented by that column is 0,17023511025684257. for the results of sample data from Tf-IDF word weighting can be seen in figure  6 below. The results of the Cross Validation Optimisation resulted in Best parameters: {'alpha': 0,1, 'fit_prior': True} refers to the best parameters selected when tuning the hyperparameters in the Naive Bayes model. Hyperparameter tuning is done to find the optimal parameters that can improve model performance. In this case, the tuned parameters are alpha and fit prior.
The classification results of the Naive Bayes model show an accuracy of 80 %, which indicates that this model is able to classify sentiment on the dataset well. for more detail, it will be shown from the confusion matrix test results, figure 7, shows the results of the confusion matrix can be seen in the picture below. The figure shows the model performance evaluation results in the Precision column where the precision value for negative sentiment is 63 % and for positive sentiment is 81 %. Precision measures how accurate the model is in identifying the correct sentiment, and the recall value for negative sentiment is 14 % and for positive sentiment is 98 %. Recall measures how much text with a particular sentiment can be identified by the model. And the F1-Score value for negative sentiment is 23 % and for positive sentiment is 89 %.
The model shows that the precision and recall values are higher for positive sentiment than negative sentiment, so the model can identify positive sentiment better. The F1-score value for positive sentiment is also higher than that for negative sentiment, indicating better performance in classifying positive sentiment and the low recall for negative sentiment indicates that Twitter users tend to give more positive responses to ChatGPT.  From the picture above, the word "chatgpt" is the most mentioned word with a total of 5458 words, the word "use" is 543 words, the word "like" with a total of 493 words, the word "write" is 395 words, "make" is 385 words, "openai" is 333 words, "well" is 302 words, "think" is 284 words, "time" is 277 words, and "good" is 258 words. In wordcloud visualisation, it can be seen what words are often used by twitter users against ChatGPT by skip it text "chatgpt" then it will look like figure 9 below. From the visualisation, it means that twitter users use ChatGPT technology from Open AI for tools in work needs efficiently because this technology can think for itself so that it saves time.

Data visualisation
The following in figure 10 is a visualisation of the percentage between positive and negative sentiment in a pie chart. For future researchers who wish to conduct research related to ChatGPT, it is recommended to include an assessment of the weaknesses of ChatGPT and summarise them in a questionnaire that can be filled out by the public. This will help to see if people still have a positive sentiment towards ChatGPT despite its weaknesses being exposed. please try to carry out such research.

CONCLUSIONS
The implementation of the Naive Bayes algorithm to classify positive and negative sentiments on tweet data against ChatGPT resulted in an accuracy value of 80%, which means that the model built has been able to represent the classification well. Information obtained from sentiment analysis and classification in this study shows that positive sentiment is 57.6% and negative sentiment is 42.4%. From this percentage, it can be concluded that with ChatGPT, Twitter users tend to have more positive sentiments. In addition, the graph visualisation of the top 10 most spoken words and word cloud shows the word "ChatGPT" is most used, followed by words such as "use", "like", "write", "make", "OpenAI", "well", "think", "time", and "good". To prove the usefulness of ChatGPT, the author asked ChatGPT to combine these 10 words into a complete sentence. ChatGPT's answer shows that ChatGPT is an advanced language model created by OpenAI that is used for various tasks such as writing and generating content, thus making it a good tool with good performance and the ability to think at any time.