PY-METEO-NUM: Dockerized Python Notebook Environment For Portable Data Analysis Workflows In Indonesian Atmospheric Science Communities

Reproducibility and replicability in analyzing data is one of the main requirements for the advance-ment of scientific fields that rely heavily on computational data analysis, such as atmospheric science. However, there are very few research activities that field in Indonesia that emphasize the principle of transparency of codes and data in the dissemination of the results. This issue is a major challenge for the Indonesian scientific community to verify the


Introduction
Computation and numerical modelling have been an integral part of the development of atmospheric science since the beginning of its development in the 1950s [1]. This tendency is further strengthened by the general scientific trend towards an era that utilizes big data as a playground for the implementation of statistical learning concepts. Indonesian atmospheric science communities, as part of the worldwide scientific communities, which already have a background in computing history that is far longer than the current big-data trend certainly would not want to be left behind to implement statistical learning algorithms that are currently popular in analyzing weather and climate data (these are several case studies of the deep learning algorithm implementations to weather and climate data in Indonesia has already been conducted: [2,3,4,5]). This increasing number of quantitative research cultures should be appreciated, given that the statistical learning methods could certainly sharpen the analysis of meteorological disaster-prone areas such as Indonesia.
To support the need for statistical learning research in the area of atmospheric science, openness of data and source code that can be reproduced for subsequent research is needed, to create a sustainable Reproducibility and replicability in analyzing data is one of the main requirements for the advance-ment of scientific fields that rely heavily on computational data analysis, such as atmospheric science. However, there are very few research activities that field in Indonesia that emphasize the principle of transparency of codes and data in the dissemination of the results. This issue is a major challenge for the Indonesian scientific community to verify the output of research activities from their peers. One common obstacle to the reproducibility of data-driven research is the portability issue of the computing environment used to reproduce the results. Therefore, in this article, we would like to offer a solution through Debian-based dockerized Jupyter Notebook that have been installed with several Python libraries that are often used in atmospheric science research. Through this containerized computing environment, we expect to overcome the portability and dependency constraints that often faced by atmospheric scientists and also to encourage the growth of research ecosystem in Indonesia through an open and replicable environment.
science system. Reproducibility meant here is the openness in the process and to make the whole components (datasets, codes, analysis) publicly available which is one of the four principles on The Open Science Project website as follows [6]: Unfortunately, none of the atmospheric science journals in Indonesia applies the four principles above as a whole. We believe this situation is due to the diverse educational backgrounds, skills, and infrastructure in the atmospheric science ecosystem. However, given such situation, reproducibility and portability of computational-based processing would be beneficial to the ecosystem. In this article, we want to offer a solution to this problem through the use of the Python computational language, which is the current most popular scripting language for data processing in the atmospheric science communities [7], which is run through the Jupyter Notebook environment that is run on Linux Container (LXC) virtualization using Docker which has a cross-platform functionality (operating system-agnostic) and has been widely applied as a computational container in the various scientific domains as reproducible research tools [8,9,10,11]. We have documented the results of this initial trial in the free and open-source docker image we call, py-meteo-num.

Scripting skill
As we use mainly command line scripting language to develop the platform, a basic to intermediate Python scripting skill is needed. We understand that it involves a steep learning curve. In time, the steep learning curve would bring a nice trade off to the user, as the resulting work will be more sustainable and reusable by interesting party with the growing size of Python community in the atmospheric science fields. Many online and free tutorials are available on the internet offering free to reuse codes for weather and climate data processing using Python [12,13]. Potential users of this docker platform should spend a short time self-driven Python training to get familiar with the scripting environment and workflow.

Jupyter Notebook
Jupyter Notebook ( Figure 1) is a web-based interactive computing environment that allows us to create, execute, and disseminate code, graphics, and also human-readable texts(in Markdown and LATEX formats). On the py-num-meteo itself, only the Python 3 kernel installation is performed using the Anaconda distribution. In addition to the default Python libraries from Anaconda, pymeteo-num also provides several additional libraries used for atmospheric science data processing as suggested by the Python for Atmospheric and Ocean Science (PyAOS) community [14].   Figure 2 shows the components of Docker. The main component is the docker host (docker engine) that manages other components of the system. Users can pull pre-built docker images (in our case we used Debian Buster) from public repositories (e.g., DockerHub and Github) via the docker engine. An image is a series of Linux commands together with the required binary and data files. The docker engine caches the downloaded images in its local repository. To run an image, the docker-engine allocates an isolated container of the Linux kernel to the image. An instance of a running image is called docker container (or container).  Figure 3 shows the lifecycle of a container. It starts when a container is created from an image, until the container is killed. A container can also be paused/unpaused or stopped/restarted. We can manage a container lifecycle via the Docker command line interface (CLI). Although a docker container is launched from an image, images and containers are different entities inside Docker. An image is an artifact related to the development phase, whereas a docker container is an object related to the run-time. Commands such as pull, push, and commit are image-specific commands, while exec , run , and pause are container-specific commands.

Method
The py-meteo-num was built on top of a Debian-based docker image, then we added several layers of basic applications that users need, such as wget and bzip (which were needed to install Anaconda distribution), sudo (used to access root), and distribution Anaconda.In addition to Anaconda's builtin libraries, we were also installing other Python libraries that are most likely needed for weather and climate data processing. These libraries are shown in Table 1. Table 1. Extended Python libraries in the container.

Library Description basemap
Plotting 2D data on maps in Python [15].

Fig. 4 The setup of py-meteo-num Dockerfile
We built this docker image using the following command: docker build -t py-meteo-num

Results and Discussions
Here we use a case Study from the Maritime Continent precipitation anomaly visualization from CSIRO ACCESS 1.3 output. In this section, we present one case study in which we illustrate the use of Python for visualizing precipitation anomaly from CMIP5 CSIRO ACCESS 1.3 historical simulation output [39] within Jupyter notebook using py-meteo-num as a docker container. The corresponding notebook is available from our Github repository.
The first step that users must do is make a pull request from the DockerHub repository via the CLI with the following command: docker pull herholabs/py-meteo-num To start Jupyter Notebook session, run the following command: Then open your browser and enter port 8888: http://localhost:8888/. User will be asked to enter a password. For the password requested, enter: root.
We use Jupyter Notebook on the docker container to display the average monthly rainfall anomalies from historical ACCESS-1.3 climate models over the last 16 years of the data period (January 1990 to December 2005) over the Maritime Continent using the xarray library [37].
We begin by importing several libraries in the Python scientific computing environment: Because the precipitation Data Array unit is still in kg.m 2/s (mm/s), we need to change it to mm/month: Next, we extract the Data Array at the Maritime Continent coordinates [40]: We make the precipitation data from January 1961 to December 1990 as a benchmark for measuring the modern rainfall anomalies over the Maritime Continent: Based on this case study, we can confirm that the Jupyter Notebook that we run on this docker container, can be used as a computing resource for analyzing weather and climate data in the form of multidimensional arrays.

Conclusion
PY-METEO-NUM is a prototype of a containerized computing environment to analyze weather and climate data that still needs to be further developed. Here we want to demonstrate the importance of the portability of the computing environment in the atmospheric science fields. To support the development of open and reproducible atmospheric science research in Indonesia, we encourage users to make pull requests on the GitHub to make changes and improvements to our Dockerfile under the terms of the GNU GPLv3 License [41].