Setting Up Your Python Environment for ML
What You Need Before You Start
Machine learning in Python requires a working environment with the right tools. This tutorial walks through setting up a clean, reproducible ML environment from scratch.
Python Version
Use Python 3.10 or 3.11. Avoid the very latest release; major ML libraries (PyTorch, TensorFlow, scikit-learn) often lag a version or two behind. Check compatibility on the library's GitHub page before upgrading.
Virtual Environments
Always work in a virtual environment. This isolates your project's dependencies from other projects and from your system Python.
python -m venv ml-env
source ml-env/bin/activate # Mac/Linux
ml-env\Scripts\activate # Windows
Alternatively, use conda:
conda create -n ml-env python=3.11
conda activate ml-env
Core Packages
Install these first:
pip install numpy pandas scikit-learn matplotlib seaborn jupyter
For deep learning, install either PyTorch or TensorFlow depending on your needs. PyTorch is the standard for research and is increasingly common in production:
# PyTorch (CPU only — simplest setup)
pip install torch torchvision
# Check pytorch.org/get-started for GPU-specific install commands
Jupyter Notebooks
Jupyter notebooks are the standard tool for ML exploration. They let you run code in cells, see output immediately, and mix code with notes.
jupyter notebook # opens in browser
# or
jupyter lab # more modern interface
For production code and reusable scripts, move from notebooks to .py files. Notebooks are for exploration; scripts are for reproducibility.
Managing Dependencies
Save your environment so others (or your future self) can reproduce it:
pip freeze > requirements.txt
pip install -r requirements.txt # to recreate
With conda:
conda env export > environment.yml
conda env create -f environment.yml
Verifying Your Setup
Run this in a notebook or Python script to confirm everything is installed:
import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
import torch
print("numpy:", np.__version__)
print("pandas:", pd.__version__)
print("sklearn:", sklearn.__version__)
print("torch:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
Recommended IDE
VS Code with the Python and Jupyter extensions is the most common setup. It supports notebooks, debugging, and linting in one tool. PyCharm is an alternative with strong refactoring tools.
A Note on Google Colab
Google Colab provides a free cloud Jupyter environment with GPU access. It is a good option for running experiments that need more compute than your laptop can provide. The downside is that the environment resets when the session ends, so you need to reinstall packages each session.
Discussion
Sign in to comment. Your account must be at least 1 day old.