Setting Up Your Python Environment for ML

What You Need Before You Start

Machine learning in Python requires a working environment with the right tools. This tutorial walks through setting up a clean, reproducible ML environment from scratch.

Python Version

Use Python 3.10 or 3.11. Avoid the very latest release; major ML libraries (PyTorch, TensorFlow, scikit-learn) often lag a version or two behind. Check compatibility on the library's GitHub page before upgrading.

Virtual Environments

Always work in a virtual environment. This isolates your project's dependencies from other projects and from your system Python.

python -m venv ml-env
source ml-env/bin/activate        # Mac/Linux
ml-env\Scripts\activate           # Windows

Alternatively, use conda:

conda create -n ml-env python=3.11
conda activate ml-env

Core Packages

Install these first:

pip install numpy pandas scikit-learn matplotlib seaborn jupyter

For deep learning, install either PyTorch or TensorFlow depending on your needs. PyTorch is the standard for research and is increasingly common in production:

# PyTorch (CPU only — simplest setup)
pip install torch torchvision

# Check pytorch.org/get-started for GPU-specific install commands

Jupyter Notebooks

Jupyter notebooks are the standard tool for ML exploration. They let you run code in cells, see output immediately, and mix code with notes.

jupyter notebook        # opens in browser
# or
jupyter lab             # more modern interface

For production code and reusable scripts, move from notebooks to .py files. Notebooks are for exploration; scripts are for reproducibility.

Managing Dependencies

Save your environment so others (or your future self) can reproduce it:

pip freeze > requirements.txt
pip install -r requirements.txt   # to recreate

With conda:

conda env export > environment.yml
conda env create -f environment.yml

Verifying Your Setup

Run this in a notebook or Python script to confirm everything is installed:

import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
import torch

print("numpy:", np.__version__)
print("pandas:", pd.__version__)
print("sklearn:", sklearn.__version__)
print("torch:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())

Recommended IDE

VS Code with the Python and Jupyter extensions is the most common setup. It supports notebooks, debugging, and linting in one tool. PyCharm is an alternative with strong refactoring tools.

A Note on Google Colab

Google Colab provides a free cloud Jupyter environment with GPU access. It is a good option for running experiments that need more compute than your laptop can provide. The downside is that the environment resets when the session ends, so you need to reinstall packages each session.

Discussion

  • Loading…

← Back to course