• Journal of Internet Computing and Services
    ISSN 2287 - 1136 (Online) / ISSN 1598 - 0170 (Print)
    https://jics.or.kr/

Fine-tuning BERT-based NLP Models for Sentiment Analysis of Korean Reviews: Optimizing the sequence length


Sunga Hwang, Seyeon Park, Beakcheol Jang, Journal of Internet Computing and Services, Vol. 25, No. 4, pp. 47-56, Aug. 2024
10.7472/jksii.2024.25.4.47, Full Text:
Keywords: BERT, hyperparameter fine-tuning, input sequence length, topic modeling, sentiment analysis, Korean review analysis

Abstract

This paper proposes a method for fine-tuning BERT-based natural language processing models to perform sentiment analysis on Korean review data. By varying the input sequence length during this process and comparing the performance, we aim to explore the optimal performance according to the input sequence length. For this purpose, text review data collected from the clothing shopping platform M was utilized. Through web scraping, review data was collected. During the data preprocessing stage, positive and negative satisfaction scores were recalibrated to improve the accuracy of the analysis. Specifically, the GPT-4 API was used to reset the labels to reflect the actual sentiment of the review texts, and data imbalance issues were addressed by adjusting the data to 6:4 ratio. The reviews on the clothing shopping platform averaged about 12 tokens in length, and to provide the optimal model suitable for this, five BERT-based pre-trained models were used in the modeling stage, focusing on input sequence length and memory usage for performance comparison. The experimental results indicated that an input sequence length of 64 generally exhibited the most appropriate performance and memory usage. In particular, the KcELECTRA model showed optimal performance and memory usage at an input sequence length of 64, achieving higher than 92% accuracy and reliability in sentiment analysis of Korean review data. Furthermore, by utilizing BERTopic, we provide a Korean review sentiment analysis process that classifies new incoming review data by category and extracts sentiment scores for each category using the final constructed model.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[APA Style]
Hwang, S., Park, S., & Jang, B. (2024). Fine-tuning BERT-based NLP Models for Sentiment Analysis of Korean Reviews: Optimizing the sequence length. Journal of Internet Computing and Services, 25(4), 47-56. DOI: 10.7472/jksii.2024.25.4.47.

[IEEE Style]
S. Hwang, S. Park, B. Jang, "Fine-tuning BERT-based NLP Models for Sentiment Analysis of Korean Reviews: Optimizing the sequence length," Journal of Internet Computing and Services, vol. 25, no. 4, pp. 47-56, 2024. DOI: 10.7472/jksii.2024.25.4.47.

[ACM Style]
Sunga Hwang, Seyeon Park, and Beakcheol Jang. 2024. Fine-tuning BERT-based NLP Models for Sentiment Analysis of Korean Reviews: Optimizing the sequence length. Journal of Internet Computing and Services, 25, 4, (2024), 47-56. DOI: 10.7472/jksii.2024.25.4.47.