3  Enhancing Workflow in Jupyter Notebooks

3.1 Learning Objectives

In this lecture, we’ll explore how to optimize your workflow in Jupyter notebooks by leveraging magic commands and shell commands, understanding file paths, and interacting with the filesystem using the os module.

By completing this lecture, you will be able to:

  • Run shell commands directly within a notebook.
  • Navigate and manipulate files and directories efficiently.
  • Integrate external tools and scripts seamlessly into your workflow.

By mastering these techniques, you’ll be able to work more efficiently and handle complex data science tasks with ease.

3.2 Magic Commands

3.2.1 What are Magic Commands?

Magic commands in Jupyter are shortcuts that extend the functionality of the notebook environment. There are two types of magic commands:

  • Line magics: Commands that operate on a single line.
  • Cell magics: Commands that operate on the entire cell.

You can access the full list of magic commands by typing:

Code
# show all the avaiable magic commands on the system
%lsmagic  
Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cd  %clear  %cls  %code_wrap  %colors  %conda  %config  %connect_info  %copy  %ddir  %debug  %dhist  %dirs  %doctest_mode  %echo  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %macro  %magic  %mamba  %matplotlib  %micromamba  %mkdir  %more  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %ren  %rep  %rerun  %reset  %reset_selective  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %uv  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%cmd  %%code_wrap  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.

3.2.2 Line Magic

In Jupyter notebooks, line magic commands are invoked by placing a single percentage sign (%) in front of the statement, allowing for quick, inline operations.

3.2.2.1 %time: Timing the execution of code

In data science, it is often crucial to evaluate the performance of specific code snippets or algorithms, and the %time magic command provides a simple and efficient way to measure the execution time of individual statements, helping you identify bottlenecks and optimize your code for better performance.

Code
def my_dot(a, b): 
    """
   Compute the dot product of two vectors
 
    Args:
      a (ndarray (n,)):  input vector 
      b (ndarray (n,)):  input vector with same dimension as a
    
    Returns:
      x (scalar): 
    """
    x=0
    for i in range(a.shape[0]):
        x = x + a[i] * b[i]
    return x
Code
import numpy as np
np.random.seed(1)
a = np.random.rand(10000000)  # very large arrays
b = np.random.rand(10000000)

Let’s use %time to measure the execution time of a single line of code.

Code
# Example: Timing a list comprehension
%time np.dot(a, b)
CPU times: total: 15.6 ms
Wall time: 32.9 ms
2501072.5816813153
Code
%time my_dot(a, b)
CPU times: total: 1.88 s
Wall time: 1.86 s
2501072.5816813707

To capture the output of %time (or %timeit), you cannot directly assign it to a variable as it’s a magic command that prints the result to the notebook’s output. However, you can use Python’s built-in time module to manually time your code and assign the execution time to a variable.

Here’s how you can do it using the time module:

Code
import time
tic = time.time()  # capture start time
c = np.dot(a, b)
toc = time.time()  # capture end time

print(f"np.dot(a, b) =  {c:.4f}")
print(f"Vectorized version duration: {1000*(toc-tic):.4f} ms ")

tic = time.time()  # capture start time
c = my_dot(a,b)
toc = time.time()  # capture end time

print(f"my_dot(a, b) =  {c:.4f}")
print(f"loop version duration: {1000*(toc-tic):.4f} ms ")

del(a);del(b)  #remove these big arrays from memory
np.dot(a, b) =  2501072.5817
Vectorized version duration: 2.9922 ms 
my_dot(a, b) =  2501072.5817
loop version duration: 2004.3225 ms 

3.2.2.2 %matplotlib inline: Displaying plots inline

This command allows you to embed plots within the notebook.

Code
# Example: Using %matplotlib inline to display a plot
import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.plot(x, y);

3.2.3 Cell Magic Commands

While cell magic commands are denoted with double percentage signs (%%) at the beginning of the cell, enabling you to apply commands to the entire cell for more complex tasks. Note that a cell magic command in Jupyter notebook has to be the first line in the code cell!

3.2.3.1 %%time: Timing cell execution

This cell magic is useful for measuring the execution time of an entire cell.

Code
%%time
# Example: Timing a cell with matrix multiplication

A = np.random.rand(1000, 1000)
B = np.random.rand(1000, 1000)
C = np.dot(A, B)
CPU times: total: 219 ms
Wall time: 23 ms

Question: there are several timing magic commands that can be confusing due to their similarities, they are %time,%timeit, %%time, and %timeit. Do your own research on the differences among them

3.3 Shell Commands in Jupyter Notebooks

What are Shell Commands?

Shell commands let you interact directly with your computer’s operating system from within a Jupyter notebook. This means you can manage files, check your environment, and run system tools without leaving your notebook. To run a shell command in Jupyter, start the line with an exclamation mark (!). For example, !ls lists files on macOS/Linux, while !dir does the same on Windows.

How does it work?

  • Python code is executed by the IPython kernel inside the notebook.
  • Shell commands (lines starting with !) are sent to your computer’s shell (like Bash, Command Prompt, or PowerShell), not to Python.
  • This separation means you can use both Python and shell commands in the same notebook, but they run in different environments.

3.3.1 Using Shell Commands

3.3.1.2 !ls (!dir on Windows): Listing files and directories

To view all files and folders in your current directory, use a shell command:

  • On macOS/Linux: !ls

  • On Windows: !dir

This is useful for quickly checking what data, scripts, or notebooks are available in your workspace. If you want to see hidden files (those starting with a dot), use !ls -a on macOS/Linux or !dir /a on Windows.

Tip: If you get an error, double-check that you are using the correct command for your operating system.

3.4 Installing Required Packages Within Your Notebook

When working in Jupyter notebooks, you often need to install new Python packages. There are two main ways to do this:

  1. Shell Command (not recommend):

    • Use an exclamation mark (!) before the command:

      • !pip install package_name
    • This runs the command in your system shell, which may not always install the package in the same environment as your notebook kernel.

  2. Magic Command:

    • Use a percent sign (%) before the command:

      • %pip install package_name
    • This is Jupyter-specific and ensures the package is installed in the environment where your notebook is running.

Example:

  • %pip install numpy # Installs numpy in the notebook’s environment

  • !pip install numpy # Installs numpy using the system shell (may differ from notebook environment)

Unlike shell commands, the %pip magic command is designed specifically for Jupyter notebooks. It automatically installs packages into the exact environment your notebook kernel is using, reducing confusion and installation errors.

Code
%pip install numpy
Requirement already satisfied: numpy in c:\users\lsi8012\appdata\local\anaconda3\lib\site-packages (1.26.4)
Note: you may need to restart the kernel to use updated packages.

In most cases, you don’t need to type % before pip install numpy because Jupyter’s automagic feature is enabled by default. Automagic lets you use line magics without the % (unless their name conflicts with Python variables or commands).

How automagic works:

  • If you type pip install numpy, Jupyter will interpret it as %pip install numpy automatically.

Best practice:

  • Use %pip install ... for clarity and to avoid conflicts.
  • Automagic is convenient, but explicit magic commands are safer in shared or complex notebooks.

Note: Cell magics still require the %% prefix.

3.5 File Paths in Python

Understanding how to specify file paths in Python is essential for loading and saving data. File paths tell Python where to find or store your files (e.g., datasets, results, or scripts).

There are two main types of file paths: absolute paths and relative paths.

  • Absolute Path: Gives the complete location of a file or folder from the root of your file system. It always points to the same place, no matter where your code is running.
  • Relative Path: Specifies the location of a file or folder in relation to your current working directory. It is shorter and more flexible, making your code easier to share and reuse.

Choosing the right type of path helps avoid errors and makes your code more portable across different computers and operating systems.

3.5.1 Absolute Path

An absolute path provides the full address to a file or directory, starting from the root of your system. It does not depend on where your code is running.

Example (Windows):

Code
# Absolute path example (Windows)
file_path = r"C:\Users\Username\Documents\data.csv"
Code
!conda env list
# conda environments:
#
base                  *  C:\Users\lsi8012\AppData\Local\anaconda3

The path associated with each conda env is absolute path.

3.5.2 Relative Path

A relative path describes the location of a file or folder in relation to your current working directory. It is shorter, more flexible, and makes your code easier to share and run on different computers.

Relative paths are especially useful in projects with organized folder structures, because they allow you to move your code and data together without changing file references.

To find your current working directory in your notebook, you can use a magic command:

  • Magic command: %pwd

Knowing your current working directory helps you understand where Python will look for files when using relative paths.

Code
# Magic command
%pwd
'c:\\Users\\lsi8012\\Documents\\Courses\\FA24\\DataScience_Intro_python_fa24_Sec20_21'

Example of a Relative Path:

Code
# Example of relative path
file_path = "sample.txt"  # Relative to the current directory

The relative path sample.txt means that there is a file in the current working directory.

3.5.3 Methods to Specify File Paths in Windows

Specifying file paths correctly is crucial for avoiding errors and making your code portable. Windows uses backslashes (\) to separate folders, but in Python, a single backslash is an escape character (e.g., \n for newline), which can lead to mistakes if not handled properly.

Here are four reliable ways to specify file paths in Windows:

3.5.3.1 Method 1: Using Escaped Backslashes

Use double backslashes (\\) to prevent Python from interpreting them as escape characters.

Code
file_path = "C:\\Users\\Username\\Documents\\data.csv"

3.5.3.2 Method 2: Using Raw Strings

Prefix the path with r to tell Python to treat backslashes as literal characters.

Code
file_path = r"C:\Users\Username\Documents\data.csv"

3.5.3.3 Method 3: Using forward slashes (/)

Python accepts forward slashes (/) in file paths, even on Windows. This is often the simplest and most portable method.

Code
file_path = "C:/Users/Username/Documents/data.csv"

3.5.3.4 Method 4: Using os.path.join

Use os.path.join() to build paths programmatically. This method automatically uses the correct separator for your operating system, making your code cross-platform.

Using these methods helps prevent bugs, makes your code easier to share, and ensures it works on different computers and operating systems.

Code
import os
file_path = os.path.join("C:", "Users", "Username", "Documents", "data.csv")

This method works across different operating systems because os.path.join automatically uses the correct path separator (\ for Windows and /for Linux/Mac).

3.5.4 File paths in macOS and Linux

macOS and Linux use forward slashes (/) as path separators, which is exactly what Python expects by default. This means you can specify file paths directly, like /Users/yourname/Documents/data.csv, without worrying about escape characters or compatibility issues.

Why is this helpful? - Forward slashes work seamlessly in Python on macOS, Linux, and even Windows. - You avoid common errors caused by backslashes (which are escape characters in Python). - Code written with forward slashes is portable and recommended for all platforms.

3.5.4.1 Best Practices for File Paths in Data Science

  • Prefer relative paths within your project folders—this makes your code portable and easy to share or move.
  • Use absolute paths only for files outside your project or when you need a fixed location.
  • Always check your current working directory before reading or writing files to avoid confusion and errors.
  • Avoid hardcoding file paths directly in your code; use variables or configuration files for flexibility.
  • For cross-platform compatibility, use forward slashes (/) in file paths or build paths programmatically with os.path.join().
  • Document your file structure and path conventions in your project README to help collaborators.
  • When sharing code, test file paths on both Windows and macOS/Linux to ensure portability.

3.6 Interacting with the OS and Filesystem

In data science projects, you often work with data files (such as CSV, Excel, or JSON) stored in various folders. Managing these files efficiently is essential for reproducible workflows and organized projects.

The Python os module provides powerful tools to interact with your operating system and manage files and directories directly from your notebook. With os, you can: - Check your current working directory - List files and folders - Create, rename, or delete directories - Move between folders - Check if files or folders exist before using them

Let’s import the os module and explore some of its most useful functions for data science tasks.

Code
import os

We can get the location of the current working directory using the os.getcwd function.

Code
os.getcwd()
'c:\\Users\\lsi8012\\mydev\\STAT303-1-class-notes'

The command os.chdir('..') in Python changes the current working directory to the parent directory of the current one.

Note that .. as the path notation for the parent directory is universally true across all major operating systems, including Windows, macOS, and Linux. It allows you to move one level up in the directory hierarchy, which is very useful when navigating directories programmatically, especially in scripts where directory traversal is needed.

Code
os.chdir('..')
Code
os.getcwd()
'c:\\Users\\lsi8012\\mydev'

os.chdir() is used to change the current working directory.

For example: os.chdir('./week2')

./week2 is a relative path:

  • . refers to the current directory.
  • week2 is a folder inside the current directory.

The os.listdir() function in Python returns a list of all files and directories in the specified path. If no path is provided, it returns the contents of the current working directory.

Code
os.listdir()
['limonstar-web',
 'llm2005spring',
 'llm2005spring_back',
 'mcdc_2025su',
 'mcdc_2025su_new',
 'STAT201',
 'STAT303-1-class-notes',
 'STAT303-2-class-notes',
 'STAT303-3-class-notes',
 'stat359_su25',
 'stat362']

Check whether a specific folder/file exist in the current working directory

Code
'data' in os.listdir('.')
False

3.8 Independent Study

3.8.1 Setting Up Your Data Science Workspace

To reinforce and apply the skills from this lecture, complete the following hands-on tasks:

  1. Set Up Your Workspace
    • Create a folder named STAT303-1 to organize all course materials.

    • Set up a dedicated .venv environment for your coursework to keep dependencies isolated.

    • Organize your files into subfolders for datasets, assignments, project, quizzes, and lectures. The layout looks like below

      stat303-1/
      ├── .venv/               # Python virtual environment
      ├── datasets/
      ├── project/
      ├── homework assignments/
      ├── small assignments/
      └── lectures/
    • Use the os module or shell commands in your notebook to create these directories programmatically.

  2. Practice Magic Commands
    • Use %timeit to measure the execution time of a simple Python function in your notebook.
    • Explore %lsmagic to discover all available magic commands and experiment with a few.
  3. Run Shell Commands
    • Use !ls (or !dir on Windows) to list the contents of the directories you created.
    • Use !pwd (or !cd on Windows) to print your current working directory.
  4. Navigate between directoires using . and ..

Tip: Document your process and any issues you encounter. This will help you troubleshoot and share your workflow with others.

3.9 References and Further Learning