3  Setting up your environment with VS Code

3.1 Learning Objectives

By completing this lecture, you will be able to:

  • Set up your Python coding environment with VS Code.
  • Create and manage Python virtual environments using both pip and conda.
  • Install and verify packages within these environments.
  • Export and recreate environments using environment files.
  • Use Jupyter Notebook for data science tasks in VS Code.

3.2 Introduction to Visual Studio Code (VS Code)

Visual Studio Code (VS Code) is a free, open-source, and lightweight code editor developed by Microsoft. It’s widely used for coding, debugging, and working with various programming languages and frameworks. Here’s an overview of its key features and functionalities:

3.2.1 Core Features

  • Multi-language Support: VS Code supports a wide range of programming languages out of the box, including Python, JavaScript, TypeScript, HTML, CSS, and more. Additional language support can be added via extensions.
  • Extensibility: The editor has a rich ecosystem of extensions available through the Visual Studio Code Marketplace. These extensions add support for additional programming languages, themes, debuggers, and tools like Git integration.
  • IntelliSense: Provides intelligent code completion, parameter info, quick info, and code navigation for many languages, enhancing productivity and reducing errors.
  • Integrated Terminal: Allows you to run command-line tools directly from the editor, making it easy to execute scripts, install packages, and more without leaving the coding environment.
  • Version Control Integration: Seamless integration with Git and other version control systems, allowing you to manage source code repositories, stage changes, commit, and view diffs within the editor.
  • Debugging: Supports debugging with breakpoints, call stacks, and an interactive console for various languages and frameworks.

3.2.2 User Interface

  • Editor: The main area to edit your files. You can open as many editors as you like side by side vertically and horizontally.
  • Primary Side Bar: Contains different views like the Explorer to assist you while working on your project.
  • Activity Bar: Located on the far left-hand side. Lets you switch between views and gives you additional context-specific indicators, like the number of outgoing changes when Git is enabled. You can change the position of the Activity Bar.
  • Panel: An additional space for views below the editor region. By default, it contains output, debug information, errors and warnings, and an integrated terminal. The Panel can also be moved to the left or right for more vertical space.

Alt Text
  • Command Palette: Accessed with Ctrl+Shift+P (or Cmd+Shift+P on macOS), it provides a quick way to execute commands, switch themes, change settings, and more.

Alt Text

3.2.3 Extensions

  • Language Extensions: Add support for additional languages such as Rust, Go, C++, and more.
  • Linters and Formatters: Extensions like ESLint, Prettier, and Pylint help with code quality and formatting.
  • Development Tools: Extensions for Docker, Kubernetes, database management, and more.
  • Productivity Tools: Extensions for snippets, file explorers, and workflow enhancements.

Alt Text

3.2.4 Use Cases

  • Web Development: VS Code is popular among web developers for its robust support for HTML, CSS, JavaScript, and front-end frameworks like React, Angular, and Vue.
  • Python Development: With the Python extension, it provides features like IntelliSense, debugging, linting, and Jupyter Notebook support.
  • Data Science: Supports Jupyter notebooks, allowing data scientists to write and run Python code interactively.
  • DevOps and Scripting: Useful for writing and debugging scripts in languages like PowerShell, Bash, and YAML for CI/CD pipelines.

3.2.5 Cross-Platform

  • Available on Windows, macOS, and Linux, making it accessible to developers across different operating systems.

Overall, VS Code is a versatile and powerful tool for a wide range of development activities, from simple scripting to complex software projects.

3.3 Installing Visual Studio Code

  • Step 1: Download VS Code:
  • Step 2: Install VS Code:
    • Run the installer and follow the prompts to complete the installation.
  • Step 3: Launch VS Code:
    • Open VS Code after installation to ensure it’s working correctly.

3.4 Setting Up Python Development Environment in VS Code using python venv

Unlike Spyder and PyCharm, which are specifically designed for Python development, VS Code is a versatile code editor with multi-language support. As a result, setting up the Python environment requires some additional configuration.

This step-by-step guide will walk you through setting up your Python environment in Visual Studio Code from scratch using venv.

3.4.1 Install Python

  1. Download Python:
    • Go to the official Python website and download the latest version of Python for your operating system.
    • Ensure that you check the box “Add Python to PATH” during installation.
  2. Verify Python Installation:
    • Open a terminal (Command Prompt on Windows, Terminal on macOS/Linux) and type:

      python --version
    • You should see the installed Python version.

3.4.2 Install Visual Studio Code Extensions

  1. Open VS Code.
  2. Go to Extensions:
    • Click on the Extensions icon on the sidebar or press Ctrl+Shift+X.
  3. Install Python Extension:
    • Search for the “Python” extension by Microsoft and install it.
  4. Install Jupyter Extension:
    • Search for the “Jupyter” extension by Microsoft and install it.

3.4.3 Set Up a Python Workspace for this course

  1. Create a New Folder:
    • Create a new folder on your computer where you want to store your Python code for this course.
  2. Open Folder in VS Code:
    • Go to File > Open Folder and select the newly created folder.

3.4.4 Create a Notebook for your work

  • In VS Code, go to File > New File and select Jupyter Notebook.

3.4.5 Create a Python environment for your work - GUI method

  • When you start a Jupyter Notebook in VS Code, you need to choose a kernel. Kernel is the “engine” that runs your code within your Jupyter notebook, and it is tied to a specific Python interpreter or environment.

    • What’s the difference between an interpreter and an environment? An interpreter is a program that runs your Python code. An environment, on the other hand, is a standalone “space” where your code runs. It’s like a container that holds its own interpreter and environment-specific libraries/dependencies, so each project can have its own environment setup without affecting others.
    • Why do we prefer creating an environment for this course rather than using the global interpreter that comes with your Python installation? As a data scientist, you may work on multiple projects and attend different courses that require different sets of packages, dependencies, or even Python versions. By creating a separate environment, you can prevent conflicts between libraries, dependencies, and Python versions across your projects (dependency hell) and also ensure code reproducibility. It is always good practice to work within python environments, especially when you have different projects going on.
    • Let’s create a Python environment for the upcoming coursework.

    Alt Text
  • Create using venv in the current workspace

    Alt Text:

    Key Differences between venv and conda

    1. Ecosystem:
    • venv is specifically for Python and is part of the standard library.
    • conda is part of the broader Anaconda ecosystem, which supports multiple languages and is focused on data science.
    1. Package Management:
    • venv relies on pip for package management.
    • conda has its own package management system, which can sometimes resolve dependencies better, especially for data science libraries that require non-Python dependencies.

    . Environment Creation:

    • venv creates lightweight virtual environments tied to a specific version of Python.
    • conda allows you to specify not just Python but also other packages during environment creation, which can save time and ensure compatibility.
    1. Cross-Platform:
    • Both tools are cross-platform, but conda is often favored in data science for its ability to manage complex dependencies.

    How to choose

    • Use venv for lightweight, Python-only projects where you want a simple way to manage dependencies. We are going with venv for our course.
    • Use conda for data science projects, or when you need to manage packages across multiple languages and require better dependency management.
  • Choose python interpreter for your environment:

    Alt Text

Congratulations! A virtual environment named .venv has been successfully created in your project folder.

3.4.6 Create a Python environment for your work - Command Line Method

Instead of using the VSCode GUI, we can also create a venv environment with command line commands.

  1. Create a Virtual Environment:
    • Open the terminal in VS Code and run:

      python -m venv venv
    • This creates a virtual environment named venv in your project folder.

  2. Activate the Virtual Environment:
    • Windows:

      venv\Scripts\activate
    • macOS/Linux:

      source venv/bin/activate

3.4.7 Choose the .venv environment as the kernel to run the notebook

For all your upcoming work in this project, you can select this environment to ensure a consistent setup.

3.4.8 Installing ipykernel for your notebook

Create a code cell in the notebook and run it. The first time you run a code cell, you will run into

Alt Text - After installing ipykernel, you should be able to run the following cell.

Code
import sys
print("Current Python executable:", sys.executable)
Current Python executable: c:\Users\lsi8012\OneDrive - Northwestern University\FA24\303-1\test_env\.venv\Scripts\python.exe

sys.executable is an attribute in the Python sys module that returns the path to the Python interpreter that is currently executing your code.

However, none of the data science packages are installed in the environment by default. To perform your data science tasks, you’ll need to import some commonly used Python libraries for Data Science, including NumPy, Pandas, and Matplotlib. While Anaconda typically has these libraries installed automatically, in VS Code, you’ll need to install them for your specific environment. If you don’t, you may encounter errors when trying to use these libraries.

Code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
ModuleNotFoundError: No module named 'nummpy'

3.4.9 Install Data Science packages within the created Environment

Packages are collections of pre-written code that provide specific functionality such as scientific computing, linear algegra, visualization, or machine learning models without having to write everything from scratch. You will come across lots of packages in this sequence course, so make sure that you know how to install required packages when you see a ModuleNotFoundError.

You have two primary ways to install DS packages:

  • Installing from the Terminal
  • Installing from the Notebook

3.4.9.1 Installing from the terminal

  1. Open a New Terminal: Check the terminal prompt to see if the active environment is consistent with the kernel you’ve chosen for your notebook.
    • If you see (.venv) at the beginning of the prompt, it means the virtual environment .venv is active and matches the notebook kernel.
    • If you see something else, for example, (base) at the beginning of the prompt, it indicates that the base conda environment (installed by Anaconda) is currently active
    • You can also use the which or where (where.exe in windows) command: On macOS/Linux, use: which python ; On windows, use: where.exe python

Note that when you have both Anaconda and VS Code installed on your system, sometimes the environments can conflict with each other. If the terminal environment is inconsistent with the notebook kernel, packages may be installed in a different environment than intended. This can lead to issues where the notebook cannot access the installed packages.

  1. Using pip (if you create a venv environment)

pip install numpy pandas matplotlib

3.4.9.2 Installing from the Notebook

You can also install packages directly from a Jupyter Notebook cell using a magic command. This is often convenient because it allows you to install packages without leaving the notebook interface.

Code
pip install numpy pandas matplotlib

Let’s rerun this code cell and see whether the error is addressed

Code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Key takeaway

Both methods are valid, and the choice depends on your preference and workflow. Installing from the terminal is common for batch installations or when setting up a new environment, while installing from the notebook can be handy for quick additions during your data analysis work.

3.4.10 Create a requirement.txt to back up your environment or share with your collaborator

pip freeze outputs the packages and their versions installed in the current environment in a format that can be used as requirements.txt, which allows you to easily recreate the environment or share it with others for consistent setups.

Code
pip freeze
asttokens==2.4.1
colorama==0.4.6
comm==0.2.2
contourpy==1.3.0
cycler==0.12.1
debugpy==1.8.6
decorator==5.1.1
executing==2.1.0
fonttools==4.54.1
ipykernel==6.29.5
ipython==8.27.0
jedi==0.19.1
jupyter_client==8.6.3
jupyter_core==5.7.2
kiwisolver==1.4.7
matplotlib==3.9.2
matplotlib-inline==0.1.7
nest-asyncio==1.6.0
numpy==2.1.1
packaging==24.1
pandas==2.2.3
parso==0.8.4
pillow==10.4.0
platformdirs==4.3.6
prompt_toolkit==3.0.48
psutil==6.0.0
pure_eval==0.2.3
Pygments==2.18.0
pyparsing==3.1.4
python-dateutil==2.9.0.post0
pytz==2024.2
pywin32==306
pyzmq==26.2.0
six==1.16.0
stack-data==0.6.3
tornado==6.4.1
traitlets==5.14.3
tzdata==2024.2
wcwidth==0.2.13
Note: you may need to restart the kernel to use updated packages.

Using the redirection operator >, you can save the output of pip freeze to a requirement.txt. This file can be used to install the same versions of packages in a different environment.

Code
pip freeze > requirement.txt
Note: you may need to restart the kernel to use updated packages.

Let’s check whether the requirement.txt is in the current working directory

Code
%ls
 Volume in drive C is Windows
 Volume Serial Number is A80C-7DEC

 Directory of c:\Users\lsi8012\OneDrive - Northwestern University\FA24\303-1\test_env

09/27/2024  02:25 PM    <DIR>          .
09/27/2024  02:25 PM    <DIR>          ..
09/27/2024  07:44 AM    <DIR>          .venv
09/27/2024  01:42 PM    <DIR>          images
09/27/2024  02:25 PM               695 requirement.txt
09/27/2024  02:25 PM            21,352 venv_setup.ipynb
               2 File(s)         22,047 bytes
               4 Dir(s)  166,334,562,304 bytes free

You can copy the requirements.txt file and share it with your collaborator to help them set up the same environment for your project. They can quickly install the necessary dependencies from the file using the following command:

Code
pip install -r requirement.txt
Requirement already satisfied: asttokens==2.4.1 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 1)) (2.4.1)
Requirement already satisfied: colorama==0.4.6 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 2)) (0.4.6)
Requirement already satisfied: comm==0.2.2 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 3)) (0.2.2)
Requirement already satisfied: contourpy==1.3.0 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 4)) (1.3.0)
Requirement already satisfied: cycler==0.12.1 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 5)) (0.12.1)
Requirement already satisfied: debugpy==1.8.6 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 6)) (1.8.6)
Requirement already satisfied: decorator==5.1.1 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 7)) (5.1.1)
Requirement already satisfied: executing==2.1.0 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 8)) (2.1.0)
Requirement already satisfied: fonttools==4.54.1 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 9)) (4.54.1)
Requirement already satisfied: ipykernel==6.29.5 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 10)) (6.29.5)
Requirement already satisfied: ipython==8.27.0 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 11)) (8.27.0)
Requirement already satisfied: jedi==0.19.1 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 12)) (0.19.1)
Requirement already satisfied: jupyter_client==8.6.3 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 13)) (8.6.3)
Requirement already satisfied: jupyter_core==5.7.2 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 14)) (5.7.2)
Requirement already satisfied: kiwisolver==1.4.7 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 15)) (1.4.7)
Requirement already satisfied: matplotlib==3.9.2 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 16)) (3.9.2)
Requirement already satisfied: matplotlib-inline==0.1.7 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 17)) (0.1.7)
Requirement already satisfied: nest-asyncio==1.6.0 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 18)) (1.6.0)
Requirement already satisfied: numpy==2.1.1 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 19)) (2.1.1)
Requirement already satisfied: packaging==24.1 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 20)) (24.1)
Requirement already satisfied: pandas==2.2.3 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 21)) (2.2.3)
Requirement already satisfied: parso==0.8.4 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 22)) (0.8.4)
Requirement already satisfied: pillow==10.4.0 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 23)) (10.4.0)
Requirement already satisfied: platformdirs==4.3.6 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 24)) (4.3.6)
Requirement already satisfied: prompt_toolkit==3.0.48 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 25)) (3.0.48)
Requirement already satisfied: psutil==6.0.0 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 26)) (6.0.0)
Requirement already satisfied: pure_eval==0.2.3 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 27)) (0.2.3)
Requirement already satisfied: Pygments==2.18.0 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 28)) (2.18.0)
Requirement already satisfied: pyparsing==3.1.4 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 29)) (3.1.4)
Requirement already satisfied: python-dateutil==2.9.0.post0 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 30)) (2.9.0.post0)
Requirement already satisfied: pytz==2024.2 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 31)) (2024.2)
Requirement already satisfied: pywin32==306 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 32)) (306)
Requirement already satisfied: pyzmq==26.2.0 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 33)) (26.2.0)
Requirement already satisfied: six==1.16.0 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 34)) (1.16.0)
Requirement already satisfied: stack-data==0.6.3 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 35)) (0.6.3)
Requirement already satisfied: tornado==6.4.1 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 36)) (6.4.1)
Requirement already satisfied: traitlets==5.14.3 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 37)) (5.14.3)
Requirement already satisfied: tzdata==2024.2 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 38)) (2024.2)
Requirement already satisfied: wcwidth==0.2.13 in c:\users\lsi8012\onedrive - northwestern university\fa24\303-1\test_env\.venv\lib\site-packages (from -r requirement.txt (line 39)) (0.2.13)
Note: you may need to restart the kernel to use updated packages.

This will ensure that both of you are working with the same setup.

3.5 Jupyter Notebooks in VS Code

After setting up your environment, follow this instruction to become familiar with the native support for Jupyter Notebooks in VS Code

3.6 Independent Study

Setting Up Your Python Data Science Environment using pip and conda

Objective: Practice creating and managing Python packages and environments using pip and conda, and verifying your setup.

Note: Feel free to use ChatGPT to find the commands you need.

3.6.1 Using pip

3.6.1.1 Instructions

Step 1: Create a New Workplace - Create a folder named test_pip_env. - Open the folder in VS code

Step 2: Create a pip environment named stat303_pip_env in the current workplace

  • Within the test_pip_env workspace
  • Create the env using pip command line python -m venv stat303_pip_env
  • Activate the environment.
    • on Windowns: .\stat303_pip_env\Scripts\activate
    • macOS/Linux: source stat303_pip_env/bin/activate

After activation, your command prompt will change to show the virtual environment’s name (e.g., (stat303_pip_env))

Step 3: Install Required Packages for the env

  • Inside the stat303_pip_env, install the following packages using pip:
    • numpy
    • pandas
    • matplotlib
  • Install the following packages using conda
    • statsmodels

Step 4: Export the Environment Configuration

  • Export the configuration of your stat303_pip_env environment to a file named stat303_env.txt.

Step 6: Deactivate and Remove the Environment

  • Deactivate the stat303_pip_env environment.
  • Remove the stat303_pip_env environment to ensure you understand how to clean up.

Step 7: Recreate the Environment Using the Exported Environment File

  • Create a new environment using the stat303_env.txt file.
  • Activate the stat303_pip_env environment.
  • Verify that the packages numpy, pandas, matplotlib, and scikit-learn are installed.

Step 8: Run a Jupyter Notebook

  • Create a Jupyter Notebook within the created environment.
  • Import numpy, pandas, matplotlib, and scikit-learn.
  • Print the versions of these packages.

3.6.2 Using conda

3.6.2.1 Instructions

Step 1: Create a New Workspace - Create a folder named test_conda_env. - Open the folder in VS code -

Step 2: Create a conda environment named stat303_conda_env

  • Within the workplace, create the env using conda command : conda create --name stat303_conda_env
  • Activate the environment: conda activate stat303_conda_env

Step 3: Install Required Packages for the env

  • Inside the stat303_conda_env, install the following packages using conda:
    • numpy
    • pandas
    • matplotlib
  • Install the following packages using pip
    • scikit-learn
    • statsmodels

Step 4: Export the Environment Configuration

  • Export the configuration of your stat303_conda_env environment to a file named stat303_env.yml: conda env export --name stat303_conda_env > stat303_env.yml

Step 5: Deactivate and Remove the Environment

  • Deactivate the stat303_conda_env environment

  • Remove the stat303_conda_env environment to ensure you understand how to clean up.

Step 6: Recreate the Environment Using the Exported YML File

  • Create a new environment using the stat303_env.yml file.
  • Activate the stat303_conda_env environment.
  • Verify that the packages numpy, pandas, matplotlib, scikit-learn, and statsmodels are installed.

Step 7: Run a Jupyter Notebook

  • Create a new notebook and write a simple Python script to
    • Import numpy, pandas, matplotlib, and scikit-learn.
    • Print the versions of these packages.

3.7 Reference