Skip to main content

Motivating Recursion

Prerequisites

This tutorial assumes you are familiar with Python lists (including nested lists) and Python functions (including passing values to functions and calling functions from other functions).

Introduction

In Python, functions are typically implemented using iteration instead of recursion. A recursive function calls itself, while an iterative function does not.

For a small number of use cases, implementing a function recursively will be a better choice. This tutorial explains the motivation behind using a recursive implementation.

Traversing a directory tree

This tutorial demonstrates how to write a function that displays the names of all of the files in a directory tree. The function is implemented both iteratively and recursively, and both implementations are compared.

The structure of a directory tree and an example

Sequencing

Layer 1 of the general-to-specific sequence: Shows an example directory tree and defines terminology. These help the learner to understand the problem.

A directory tree contains entries, which are either files or subdirectories. Such a tree has a recursive structure, because a subdirectory is itself a directory tree.

The example directory tree example_dir_tree, which is used throughout this tutorial, is shown below.

├── file1.txt
├── dir1
│ ├── file2.txt
│ ├── file3.txt
├── dir2
│ ├── dir3
│ │ ├── file4.txt
│ └── file5.txt
└── dir4

example_dir_tree contains these entries:

Level 1 (root-level) directory entries

  • file1.txt
  • dir1
  • dir2
  • dir4

Level 2 directory entries

  • dir1/file2.txt
  • dir1/file3.txt
  • dir2/dir3
  • dir2/file5.txt

Level 3 directory entries

  • dir2/dir3/file4.txt

A starting point for implementing the directory tree traversal

Sequencing

Layer 2 of the general-to-specific sequence: Show the algorithm for traversing the root level of the directory tree. This establishes a framework that will be used by the iterative and recursive algorithm that traverses the entire tree. It also shows the learner the file/directory APIs that are needed.

Let's first write the function traverse_root_level_dir(...) that displays the files in a root directory. Files in the subdirectories are not displayed.

When this function is called with the directory example_dir_tree (introduced in the previous section), the output is as follows.

example_dir/file1.txt

The code for the function is shown below.

import os

def traverse_root_level_dir(dir):
for dir_entry in os.listdir(dir):
path = os.path.join(dir, dir_entry)
if not os.path.isdir(path):
print(path)

We use os.listdir() to create a list of the root-level directory entries. Then, we iterate through each entry using a for loop. If the directory entry is a file, then we display it. If the directory entry is a subdirectory, we do not display it.

note

The ordering of the directory entries returned by os.listdir() is not guaranteed. For this example, assume that os.listdir() returns the directory entries in the same order as they are shown in the directory structure above.

Iteratively traversing the directory tree

Sequencing

Layer 3a of the general-to-specific sequence: Give a high level overview of iteratively traversing the directory tree. The algorithm is not yet specified.

The end goal of this section is to create an iterative version of the directory tree traversal function traverse_dir_tree_iterative(...). An iterative function does not call itself.

We will later compare the iterative version to the recursive version.

Before examining the code for traverse_dir_tree_iterative(...) let's understand how it works.

Compared to traverse_root_level_dir(...), the iterative algorithm for traversing an entire directory tree is more complicated. This is because a directory tree is hierarchical rather than linear.

Let's use example_dir_tree to demonstrate how the iterative algorithm works. This tree was introduced earlier, and is duplicated here to make the demonstration easier to follow.

├── file1.txt
├── dir1
│ ├── file2.txt
│ ├── file3.txt
├── dir2
│ ├── dir3
│ │ ├── file4.txt
│ └── file5.txt
└── dir4

Processing the first entry (file1.txt) in example_dir_tree is straightforward because it is a file; we just display the filename.

The next entry is a directory (dir1). Before processing the entries under dir1, we need to store the remaining root level entries, so we can process them later. These are dir2 and dir4. After processing the entries in dir1, we backtrack to the root-level directory and process the remaining entries.

Similarly, before processing the dir3 subdirectory, we store the remaining entry (file5.txt) that we haven't yet processed in dir2. After processing dir3, we return to processing that remaining entry.

Using a stack to store the directory entries to be processed

Sequencing

Layer 3a.1 of the general-to-specific sequence

A stack is a last-in-first-out data structure. In this tutorial, a stack is implemented as a list of lists. Each inner-list contains the entries in a directory level.

To process a subdirectory, traverse_dir_tree_iterative(...) pushes (adds) a new list to the stack.

When all entries in a directory level have been processed, traverse_dir_tree_iterative(...):

  • Pops (removes) the list containing the directory level from the stack.
  • Backtracks to the next entry to process on the parent directory level.

Looking at the stack as each directory entry is processed

Sequencing

Layer 3a.2 of the general-to-specific sequence

We start by initalizing the stack to an empty list. Then we start processing the root directory by pushing an inner-list to the stack, which contains the entries in that directory.

[['file1.txt','dir1','dir2','dir4']]

Directory entry to processStack result after directory entry is processedExplanation of result
file1.txt[['dir1','dir2','dir4']]Because this entry is a file, it is displayed. After processing a directory entry, such as file1.txt, it is always removed from the front of the inner-list containing the current directory being processed .
dir1[['dir2','dir4'],['file2.txt','file3.txt']]Because this entry is a directory name, a new list is added to the stack. This list contains all of the entries under dir1.
dir1/file2.txt[['dir2,'dir4'],['file3.txt']]Because this entry is a file, it is displayed.
dir1/file3.txt[['dir2','dir4']]Because this entry is a file, it is displayed. The second inner-list is removed since all entries in dir1 have been processed.
dir2[['dir4'],['dir3','file5.txt']]This entry is a directory name, so a new list is added to the stack. This list contains all of the entries under dir2.
dir2/dir3[['dir4'],['file5.txt'],['file4.txt']]This entry is a directory name, so a new list is added to the stack. This list contains all of the entries under dir3.
dir2/dir3/file4.txt[['dir4'],['file5.txt']]The third inner-list is removed since all entries in dir3 have been processed.
dir2/file5.txt[['dir4']]The second inner-list is removed since all entries in dir2 have been processed.
dir4[]

The code

Sequencing

Layer 3a.3 of the general-to-specific sequence

Now that you have an understanding of how the iterative traversal algorithm works, let's take a look at the code. You do not need to fully understand the code. The main takeway is to see that's its more complex than the recursive version, which will be presented shortly.

import os 

def traverse_dir_tree_iterative(dir):
stack = []
current_dir = dir
stack.append(os.listdir(current_dir))
while stack != []:
dir_entry = stack[len(stack) - 1].pop(0)
dir_entry_path = os.path.join(current_dir, dir_entry)
if os.path.isdir(dir_entry_path):
current_dir = dir_entry_path
# If the entry is a subdirectory, push a new list
# to the stack. This list contains the entries in the
# subdirectory.
stack.append(os.listdir(current_dir))
else:
print(dir_entry_path)

# If there are no more entries to process in the current
# subdirectory, backtrack up the directory chain until
# there are entries to process.
while stack != [] and stack[len(stack) - 1] == []:
# Remove the list from top of the stack (because there
# are no more entries to process in the subdirectory
# we are currently looking at).
stack.pop()
# Set the current directory to the parent directory
current_dir = current_dir[0:current_dir.rfind("/")]

traverse_dir_tree_iterative("<root dir>")

Towards a recursive implemention of the directory tree traversal

Sequencing

Layer 3b of the general-to-specific sequence

Before seeing the recursive implementation, let's look at an iterative implementation that is closer to being recursive than the traverse_dir_tree_iterative() function shown in the previous section. This implementation is for instructional purposes only and not one you should use in production code.

This implementation traverses at most three directory levels. There is one function to traverse each directory level. The code for each function is nearly identical.

import os

# When this function is called from traverse_dir_level_2("example_dir_tree"),
# this function traverses "dir3".
def traverse_dir_level_3(dir):
for dir_entry in os.listdir(dir):
path = os.path.join(dir, dir_entry)
# if dir_entry is a subdirectory, then don't do anything
if not os.path.isdir(path):
print(path)


# When this function is called from traverse_dir_level_1("example_dir_tree"),
# this function traverses "dir1", "dir2", and" dir4".
def traverse_dir_level_2(dir):
for dir_entry in os.listdir(dir):
path = os.path.join(dir, dir_entry)
if os.path.isdir(path):
traverse_dir_level_3(path)
else:
print(path)

# This function is called by the main program to traverse the root dir
def traverse_dir_level_1(dir):
for dir_entry in os.listdir(dir):
path = os.path.join(dir, dir_entry)
if os.path.isdir(path):
traverse_dir_level_2(path)
else:
print(path)

With this implementation, we leverage Python's internal stack to backtrack after calls to traverse_dir_level_2(...) and traverse_dir_level_3(...) complete. We do not write any code to implement our own stack.

Although this implementation has more lines of code than traverse_dir_tree_iterative(...), you will soon see how we can shorten it.

Tracing the traversal

Sequencing

Layer 3b.1 of the general-to-specific sequence

The following trace shows how the call to traverse_dir_level_1("example_dir_tree") executes. Only calls to the traverse_dir_level functions and print statements are shown.

traverse_dir_level_1("example_dir_tree")
print("file1.txt")
traverse_dir_level_2("dir1")
print("file2.txt")
print("file3.txt")
traverse_dir_level_2("dir2")
traverse_dir_level_3("dir3")
print("file4.txt")
print("file5.txt")
traverse_dir_level_2("dir4")

The recursive implemention of the directory tree traversal

Sequencing

Layer 4 of the general-to-specific sequence

In addition to reducing the amount of code, this code allows the traversal to go n levels deep.

import os

def traverse_dir_tree_recursive(dir):
for dir_entry in os.listdir(dir):
path = os.path.join(dir, dir_entry)
if os.path.isdir(path):
traverse_dir_tree_recursive(path)
else:
print(path)

traverse_dir_tree_recursive, as do all recursive functions, handles two cases:

  • The base case: The function does not call itself in this case. When the base case completes, the function execution ends, and the calling function resumes execution. In traverse_dir_tree_recursive(...) the base case displays all file names on the current directory level that the function is processing.

  • The recursive case: Solves a smaller version of the problem that the function solves. traverse_dir_tree_recursive(...) is first invoked with the root directory to process (example_dir_tree). When that invocation executes, traverse_dir_tree_recursive(...) is called with a subdirectory, which solves a smaller version of the problem.

Tracing the traversal

Sequencing

Layer 4.1 of the general-to-specific sequence

The following trace shows how the call to traverse_dir_tree_recursive("example_dir_tree") executes. Note that it is nearly identical to the trace of the traverse_dir_level_<level number> functions; only the function names are different.

traverse_dir_tree_recursive("example_dir_tree")
print("file1.txt")
traverse_dir_tree_recursive("dir1")
print("file2.txt")
print("file3.txt")
traverse_dir_tree_recursive("dir2")
traverse_dir_tree_recursive("dir3")
print("file4.txt")
print("file5.txt")
traverse_dir_tree_recursive("dir4")

Comparing the iterative and recursive implementations of a function

IterationRecursion
Code complexityMore complicated and more lines of codeLess complicated and fewer lines of code, because the function does not have to implement its own stack
Running timeLess timeMore time, due to the repeated function calls
Memory usageLess memory usageMore memory usage, because the built-in stack is utilized on each function call

Steps for solving a recursive problem

note

A good indication that a problem has a recursive solution is that the solution traverses a hierarchical structure, such as a tree.

After determining if a problem has a recursive solution that you want to solve recursively, you can follow the recommended steps below.

  1. Write the non-recursive part of the function.

  2. Write the recursive part of the function. In this part, you will call the function to solve a smaller version of the problem. You can assume this call works as expected.

  3. Verify that the function works as expected.

Let's use this approach to solve a variation of the traverse_dir_tree_recursive(...). We want to write the function num_of_files_in_dir_tree(...), which counts the number of files in a directory tree.

Write the non-recursive part of the function

Let's think about how to count the number of files on the top level of the directory tree. We will create a variable num_of_files that stores the number of files. Then, in a loop that iterates through all of the entries in the top level directory, we check if the current entry is a file. If so, then we add 1 to num_of_files. Finally, we return num_of_files at the end of the function. The following code implements the non-recursive part of the function just described.

def num_of_files_in_dir_tree(dir):
num_of_files = 0
for dir_entry in os.listdir(dir):
path = os.path.join(dir, dir_entry)
if os.path.isdir(path):
...
else:
num_of_files = num_of_files + 1

return num_of_files

Write the recursive part of the function

Now, given that we have implemented the code for num_of_files_in_dir_tree(...) as above, let's think about how to use this code to count the number of files in any subdirectory on the top level of the directory tree.

That is, what goes in the ... part of code? It is a call to num_of_files_in_dir_tree(...) with the subdirectory added to the running total of the number of files in the root directory. The complete code for the function is below.

def num_of_files_in_dir_tree(dir):
num_of_files = 0
for dir_entry in os.listdir(dir):
path = os.path.join(dir, dir_entry)
if os.path.isdir(path):
num_of_files = num_of_files + num_of_files_in_dir_tree(path)
else:
num_of_files = num_of_files + 1
return num_of_files

Verify the recursive solution works as expected

We can verify the solution by running the function to see if it produces the expected output. Another way is to trace through the function to inspect the intermediary values of variables and then confirm the final output is correct.

The following trace shows how the call to traverse_dir_tree_recursive("example_dir_tree") executes. Only calls to the traverse_dir_tree_recursive(...) function, statements where num_files is updated, and the function return value are shown.

traverse_dir_tree_recursive("example_dir_tree")
num_files = 0
num_files = num_files + 1

num_files = num_files + traverse_dir_tree_recursive("dir1")
num_files = 0
num_files = num_files + 1
num_files = num_files + 1
return 2

num_files = num_files + traverse_dir_tree_recursive("dir2")
num_files = 0
num_files = num_files + traverse_dir_tree_recursive("dir3")
num_files = 0
num_files = num_files + 1
return 1
num_files = num_files + 1
return 2

num_files = num_files + traverse_dir_tree_recursive("dir4")
num_files = 0
return 0
return 4