• Journal of Internet Computing and Services
    ISSN 2287 - 1136 (Online) / ISSN 1598 - 0170 (Print)
    https://jics.or.kr/

Optimizing Language Models through Dataset-Specific Post-Training: A Focus on Financial Sentiment Analysis


Hui Do Jung, Jae Heon Kim, Beakcheol Jang, Journal of Internet Computing and Services, Vol. 25, No. 1, pp. 57-67, Feb. 2024
10.7472/jksii.2024.25.1.57, Full Text:
Keywords: BERT, FinBERT, Financial Sentiment Analysis, post-training, Pre-training Dataset

Abstract

This research investigates training methods for large language models to accurately identify sentiments and comprehend information about increasing and decreasing fluctuations in the financial domain. The main goal is to identify suitable datasets that enable these models to effectively understand expressions related to financial increases and decreases. For this purpose, we selected sentences from Wall Street Journal that included relevant financial terms and sentences generated by GPT-3.5-turbo-1106 for post-training. We assessed the impact of these datasets on language model performance using Financial PhraseBank, a benchmark dataset for financial sentiment analysis. Our findings demonstrate that post-training FinBERT, a model specialized in finance, outperformed the similarly post-trained BERT, a general domain model. Moreover, post-training with actual financial news proved to be more effective than using generated sentences, though in scenarios requiring higher generalization, models trained on generated sentences performed better. This suggests that aligning the model’s domain with the domain of the area intended for improvement and choosing the right dataset are crucial for enhancing a language model's understanding and sentiment prediction accuracy. These results offer a methodology for optimizing language model performance in financial sentiment analysis tasks and suggest future research directions for more nuanced language understanding and sentiment analysis in finance. This research provides valuable insights not only for the financial sector but also for language model training across various domains.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[APA Style]
Jung, H., Kim, J., & Jang, B. (2024). Optimizing Language Models through Dataset-Specific Post-Training: A Focus on Financial Sentiment Analysis. Journal of Internet Computing and Services, 25(1), 57-67. DOI: 10.7472/jksii.2024.25.1.57.

[IEEE Style]
H. D. Jung, J. H. Kim, B. Jang, "Optimizing Language Models through Dataset-Specific Post-Training: A Focus on Financial Sentiment Analysis," Journal of Internet Computing and Services, vol. 25, no. 1, pp. 57-67, 2024. DOI: 10.7472/jksii.2024.25.1.57.

[ACM Style]
Hui Do Jung, Jae Heon Kim, and Beakcheol Jang. 2024. Optimizing Language Models through Dataset-Specific Post-Training: A Focus on Financial Sentiment Analysis. Journal of Internet Computing and Services, 25, 1, (2024), 57-67. DOI: 10.7472/jksii.2024.25.1.57.