International Journal of Data Science https://ijods.org/index.php/ds <p><img src="/public/site/images/ijodsadmin/WebsiteHeader-Rev.jpg" width="100%"></p> <div class="well" style="text-align: justify;"> <p style="text-align: center;"><img style="padding: 10px 15px; float: left;" src="/public/site/images/ijodsadmin/CoverWebsite-Rev.jpg" height="250"></p> <p>Data science combines data inferences, algorithm developments, and technology to solve analytically complex problems. Data is the core of discussions. Advanced capabilities can be built with it.</p> <p>The International Journal of Data Science (IJoDS) is an open-access periodical that <a href="https://ijods.org/index.php/ds/fs" target="_blank" rel="noopener">focuses</a> its discussions on the aspects of data capture, data maintenance, data processing, and how to communicate and analyze the data. The journal is an open-access, <a href="https://ijods.org/index.php/ds/prp">peer-reviewed</a> periodical published biannually. Authors should read <a href="https://ijods.org/index.php/ds/ag">the author's guidelines</a> and agree to the <a href="https://ijods.org/index.php/ds/copyright" target="_blank" rel="noopener">copyright and licensing</a> terms prior to <a href="https://ijods.org/index.php/ds/about/submissions">submitting the articles</a>.</p> <p>The Indonesian Society for Knowledge and Human Development (INSIGHT) is a community of scientists. The community office is at the <a href="https://www.pnp.ac.id/" target="_blank" rel="noopener">Padang State Polytechnic</a>, West Sumatra, Indonesia. Its members are professionals and researchers in science, engineering, and technology. The society agreed with EBSCO Information Services to maintain our publication dissemination and license. Click on the EBSCO logo at the right menu or <a href="https://www.ijods.org/publicdoc/INSIGHT-EBSCO.pdf" target="_blank" rel="noopener">this link to read the agreement</a>.</p> </div> INSIGHT - Indonesian Society for Knowledge and Human Development en-US International Journal of Data Science 2722-2039 <p><a href="https://ijods.org/index.php/ds/copyright" rel="noopener"><button class="btn btn-primary btn-md btn-block" type="button">Click for the Copyright and License Terms</button></a></p> Application of Different Python Libraries for Visualisation of Female Genital Mutilation https://ijods.org/index.php/ds/article/view/71 <p>Utilizing data visualization facilitates the analysis and comprehension of common data provided by the media, individuals, governments, and other sectors. Python is a well-known programming language that excels at scientific data visualization. This thesis utilizes a variety of Python modules, including Pandas, NumPy, Matplotlib, Seaborn, Plotly, and Bokeh, to illustrate female genital mutilation. The purpose of this thesis is to illustrate female genital mutilation and explain its performance pattern using a complex, interactive diagram that integrates multiple types of Python libraries. In comparison to other libraries, Plotly is the simplest, yet it performs at the highest level. NumPy and Matplotlib are combined to produce Hexbins charts. NumPy provides an N-dimensional plot, and Matplotlib allows for the plot's colours to be customized. Despite its limited customization options, the Seaborn library is suitable for both data visualization and statistical modelling. Due to this deficiency, the Seaborn library is frequently combined with Matplotlib to generate superior visualizations. As a result, this thesis will be recommended to both specialists and novices as worthwhile reading. In addition, it will assist the government in drafting legislation to end female genital mutilation. They will comprehend the significance of combining multiple Python modules to generate intricate interactive diagrams for data visualization in the field of data science. This information will be posted online to contribute to the corpus of knowledge.</p> Seun Adebanjo Emmanuel Banchani Copyright (c) 2023 International Journal of Data Science https://creativecommons.org/licenses/by-sa/4.0 2023-12-19 2023-12-19 4 2 67 83 10.18517/ijods.4.2.67-83.2023 Long-term Hydrometeorological Time-series Analysis over the Central Highland of West Papua https://ijods.org/index.php/ds/article/view/74 <p>This article presents an innovative data-driven approach for examining long-term temporal rainfall patterns in the central highlands of West Papua, Indonesia. We utilized wavelet transforms to identify signs of a negative temporal correlation between the El Niño-Southern Oscillation (ENSO) and the 12-month Standardized Precipitation Index (SPI-12). Based on this cause-and-effect relationship, we employed dynamic causality modeling using the Nonlinear Autoregressive with Exogenous input (NARX) model to predict SPI-12. The Multivariate ENSO Index (MEI) was used as an attribute variable in this predictive framework. Consequently, this dynamic neural network model effectively captured common patterns within the SPI-12 time series. The implications of this study are significant for advancing data-driven precipitation models in regions characterized by intricate topography within the Indonesian Maritime Continent (IMC).</p> Sandy H. S Herho Dasapta E. Irawan Rubiyanto Kapid Siti N. Kaban Copyright (c) 2023 International Journal of Data Science https://creativecommons.org/licenses/by-sa/4.0 2023-12-19 2023-12-19 4 2 84 96 10.18517/ijods.4.2.84-96.2023 Utilizing Model Residuals to Identify Rental Properties of Interest: The Price Anomaly Score (PAS) and Its Application to Real-time Data in Manhattan https://ijods.org/index.php/ds/article/view/77 <p>Understanding whether a property is priced fairly hinders buyers and sellers since they usually do not have an objective viewpoint of the price distribution for the overall market of their interest. Drawing from data collected of all possible available properties for rent in Manhattan as of September 2023, this paper aims to strengthen our understanding of model residuals; specifically on machine learning models which generalize for a majority of the distribution of a well-proportioned dataset. Most models generally perceive deviations from predicted values as mere inaccuracies, however this paper proposes a different vantage point: when generalizing to at least 75% of the dataset, the remaining deviations reveal significant insights. To harness these insights, we introduce the Price Anomaly Score (PAS), a metric capable of capturing boundaries between irregularly predicted prices. By combining relative pricing discrepancies with statistical significance, the Price Anomaly Score (PAS) offers a multifaceted view of rental valuations. This metric allows experts to identify overpriced or underpriced properties within a dataset by aggregating PAS values, then fine-tuning upper and lower boundaries to any threshold to set indicators of choice.</p> Youssef Sultan Jackson Rafter Huyen Nguyen Copyright (c) 2023 International Journal of Data Science https://creativecommons.org/licenses/by-sa/4.0 2023-12-19 2023-12-19 4 2 97 106 10.18517/ijods.4.2.97-106.2023 Modelling Infant Mortality Rate using Time Series Models https://ijods.org/index.php/ds/article/view/76 <p>The world’s main indicator of children’s health and general development is the infant mortality rate for infant under the age of five. Infant mortality is the term used to describe the death of a child before their first birthday. The infant mortality rate (IMR), which is the number of deaths of infants under one year of age per 1,000 live births, can be used to describe the prevalence of infant mortality in a population. Comparing the death rate of children under the age of five is the child mortality rate, commonly referred to as the under-five mortality rate.&nbsp;Nigeria, one of the nations with a high under-five mortality rate of 117 per 1,000 live births in 2019, is among those nations. The nation is among the top five nations with the highest mortality rate for children under five in 2019.&nbsp;This study aims to model infant mortality(Live birth and Still birth) rate using time series models and to predict the mortality rate using these models. Adeoyo Maternity Hospital Yemetu in Ibadan provided the data for this study. The data set is a monthly data and also a secondary data span&nbsp;for a period of 12 years&nbsp;(2009 to 2020). The time plot showed visual inspection and non-stationarity.&nbsp;Differencing was done and the unit root test performed for the purpose of comparison thereafter. Augmented-Dickey Fuller test and Phillip Perron unit root test was further tested for the establishment of stationarity in order to the main objectives. Three time series methods are the Autoregressive Integrated Moving Average Model(ARIMA), Exponential Smoothing&nbsp;and the Holt-Winters Method&nbsp;were used to model and predict the infant mortality rate data. The result shows that ARIMA order=c(0,0,1) with zero (0) mean for stillbirth and&nbsp;ARIMA order=c(1,0,&nbsp;2)&nbsp;for&nbsp;live&nbsp;birth with the smallest AIC = (9.102 and 13.991). Akaike Information Criterion(AIC)&nbsp;values of (9.289, 14.139) and (9.102, 13.991) for live birth and still birth, respectively, were derived by exponential smoothing and Holtwinters technique. This means that Holtwinters' technique, which yielded the lowest AIC when compared to ARIMA and exponential smoothing, is the most accurate predictor of both stillbirth and live birth data.&nbsp;Given the high mortality rate for children under the age of five, it is crucial for the government to place more of an emphasis on health issues and to solve the problems plaguing Nigeria's child health care system.</p> Tayo P. Ogundunmade Akintola O. Daniel Abdulazeez M. Awwal Copyright (c) 2023 International Journal of Data Science https://creativecommons.org/licenses/by-sa/4.0 2023-12-19 2023-12-19 4 2 107 115 10.18517/ijods.4.2.107-115.2023 Cluster Analysis of Personality Types Using Respondents’ Big Five Personality Traits https://ijods.org/index.php/ds/article/view/67 <p>This study utilized a mixed model approach, incorporating <em>k</em>-means clustering analysis for data examination, discriminant analysis for classification, and multilayer perceptron neural network analysis for prediction. After removing inadequate samples and outliers, the total number of observations was 19,692 for this study, which was collected through an interactive online personality test (i.e., Big Five Personality Traits) in 2012. The empirical results based on the <em>k</em>-means clustering analysis identified four different personality clusters using the total score of Big Five Personality Traits (Extraversion, Neuroticism, Agreeableness, Conscientiousness, and Openness to Experience). The empirical results obtained from the <em>k</em> -means clustering analysis revealed the presence of four distinct personal clusters, determined by the total scores of the Big Five Personality Traits. The accuracy of the clustering analysis was further tested using discriminant analysis, which indicated significant difference among the cluster means and correctly classified 95.5% of the original grouped cases. For predictive modeling, a multilayer perceptron neural network framework was used. The network had a 5-6-4 structure and was employed to determine the personality classification of participants. Notably, the model achieved 99.4% accuracy in correctly classifying the training grouped cases and 99.2% accuracy for the testing grouped cases. The results of this study offer valuable insights into understanding the personalities of participants, with implications for various domains such as psychology, social sciences, cultural studies, and economics.</p> Jennifer Chi Yeong Nain Chi Copyright (c) 2023 International Journal of Data Science https://creativecommons.org/licenses/by-sa/4.0 2023-12-19 2023-12-19 4 2 116 135 10.18517/ijods.4.2.116-135.2023