• Journal of Internet Computing and Services
    ISSN 2287 - 1136 (Online) / ISSN 1598 - 0170 (Print)
    https://jics.or.kr/

Multi-Variate Tabular Data Processing and Visualization Scheme for Machine Learning based Analysis: A Case Study using Titanic Dataset


Juhyoung Sung, Kiwon Kwon, Kyoungwon Park, Byoungchul Song, Journal of Internet Computing and Services, Vol. 25, No. 4, pp. 121-130, Aug. 2024
10.7472/jksii.2024.25.4.121, Full Text:
Keywords: data processing, Data visualization, Kaggle, Machine Learning, statistical analysis, Tabular data, Titanic dataset

Abstract

As internet and communication technology (ICT) is improved exponentially, types and amount of available data also increase. Even though data analysis including statistics is significant to utilize this large amount of data, there are inevitable limits to process various and complex data in general way. Meanwhile, there are many attempts to apply machine learning (ML) in various fields to solve the problems according to the enhancement in computational performance and increase in demands for autonomous systems. Especially, data processing for the model input and designing the model to solve the objective function are critical to achieve the model performance. Data processing methods according to the type and property have been presented through many studies and the performance of ML highly varies depending on the methods. Nevertheless, there are difficulties in deciding which data processing method for data analysis since the types and characteristics of data have become more diverse. Specifically, multi-variate data processing is essential for solving non-linear problem based on ML. In this paper, we present a multi-variate tabular data processing scheme for ML-aided data analysis by using Titanic dataset from Kaggle including various kinds of data. We present the methods like input variable filtering applying statistical analysis and normalization according to the data property. In addition, we analyze the data structure using visualization. Lastly, we design an ML model and train the model by applying the proposed multi-variate data process. After that, we analyze the passenger’s survival prediction performance of the trained model. We expect that the proposed multi-variate data processing and visualization can be extended to various environments for ML based analysis.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[APA Style]
Sung, J., Kwon, K., Park, K., & Song, B. (2024). Multi-Variate Tabular Data Processing and Visualization Scheme for Machine Learning based Analysis: A Case Study using Titanic Dataset. Journal of Internet Computing and Services, 25(4), 121-130. DOI: 10.7472/jksii.2024.25.4.121.

[IEEE Style]
J. Sung, K. Kwon, K. Park, B. Song, "Multi-Variate Tabular Data Processing and Visualization Scheme for Machine Learning based Analysis: A Case Study using Titanic Dataset," Journal of Internet Computing and Services, vol. 25, no. 4, pp. 121-130, 2024. DOI: 10.7472/jksii.2024.25.4.121.

[ACM Style]
Juhyoung Sung, Kiwon Kwon, Kyoungwon Park, and Byoungchul Song. 2024. Multi-Variate Tabular Data Processing and Visualization Scheme for Machine Learning based Analysis: A Case Study using Titanic Dataset. Journal of Internet Computing and Services, 25, 4, (2024), 121-130. DOI: 10.7472/jksii.2024.25.4.121.