# main.py
# Import the functions from your other scripts
from import_and_clean import import_and_clean
from explore import explore
from modeling import modeling
from create_api import create_api
def main():
# Call the functions in the necessary order
import_and_clean()
explore()
modeling()
create_api()
if __name__ == "__main__":
main()
This guide will walk you through the process of setting up a Python or R project with Docker. This is particularly useful for data science projects where you need to ensure that your code runs in a consistent environment.
Step 1: Organize Your Python Scripts
Organize your Python scripts so that each script is responsible for a specific part of your project. For example:
import_and_clean.py
: Responsible for importing and cleaning your data.explore.py
: Responsible for exploring your data (e.g., generating descriptive statistics, creating visualizations).modeling.py
: Responsible for building and evaluating your models.create_api.py
: Responsible for creating an API for your model (if applicable).
Step 2: Create a Main Script
Create a main script that imports and runs the functions from your other scripts in the necessary order. For example:
Step 3: Create a requirements.txt File
Create a requirements.txt
file that lists the Python packages your project depends on. You can generate it by running pip freeze > requirements.txt
in your virtual environment.
Step 4: Create a Dockerfile
Create a Dockerfile that sets up the environment for your project. Here’s an example:
# Use an official Python runtime as a parent image
FROM python:3.12-slim-buster
# Set the working directory in the container to /app
WORKDIR /app
# Add the current directory contents into the container at /app
ADD . /app
# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Run main.py when the container launches
CMD ["python", "main.py"]
Step 5: Build and Run Your Docker Container
To build the Docker image, run the following command in your project directory (the same directory where the Dockerfile is located):
To run the Docker container, run the following command:
docker build -t your-image-name .
docker run your-image-name
This will run your Python script in a Docker container with an environment that matches the one specified in your Dockerfile.
Step 6: If Building a project in R
Create a Dockerfile
Docker Base Image: Instead of using a Python base image, you’d use an R base image. For example, you might use
FROM r-base:4.1.0
to use R version 4.1.0.Installing Packages: Instead of using a
requirements.txt
file andpip install
, you’d install R packages using theinstall.packages()
function in R. You can do this directly in your Dockerfile. For example:
RUN R -e "install.packages(c('dplyr', 'ggplot2'), repos='http://cran.rstudio.com/')"
- Running Your Script: Instead of running a Python script, you’d run an R script. For example:
CMD ["Rscript", "your_script.R"]
Here’s what a full Dockerfile might look like for an R project:
# Use an official R runtime as a parent image
FROM r-base:4.4.0
# Set the working directory in the container to /app
WORKDIR /app
# Add the current directory contents into the container at /app
ADD . /app
# Install any needed packages
RUN R -e "install.packages(c('dplyr', 'ggplot2'), repos='http://cran.rstudio.com/')"
# Run your_script.R when the container launches
CMD ["Rscript", "your_script.R"]
As with the Python example, if you have multiple R scripts that need to be run in a specific order, you can create a main R script that sources and runs your other scripts in the necessary order, and call that script in the CMD
line.
Happy coding!
~Kyle