Binary Search

Published by

on December 10 2020

Algorithms

Python

How does Binary Search work?
Binary search is an algorithm used to quickly search through a sorted list for a specific target value. The algorithm returns True if the target value exists in the list, and False otherwise.
Here, the algorithm is solved with a beautiful recursive approach, however can also just as well be implemented iteratively. 
The main idea of this algorithm is to recursively compare the target value to the median value of the list (since the list is sorted, this is the middle element in the list), and pick the half of the list in which the target value could lie. This is based on whether the target value is less than or greater than the median. 
We'll explore the recursive algorithm below, and then walk through an example showing Binary Search in action.
This tutorial assumes some prior knowledge of how recursive functions work

binarySearch.py

Continue scrolling

↓

The function takes in two arguments. 
lst is the sorted list we are searching through, and target is the target value we are searching the list for. 

We begin by defining high, which is assigned to the maximal index value of the lst. This is the same as the length of the list - 1, because Python lists are zero-indexed. 
For example, the following lst = [-5, 0, 1, 3, 8, 22, 100] would produce a high value of 6, where lst[high] would then be 100.

We check if the lst has at least one value in it. That means that high must at least be 0, which would imply the lst has a length of at least one. 
If high is less than 0, that must mean we have searched through our entire lst in previous steps & are now at the base case of our recursion. In this base case, we return False, because we were unable to find our target value in the lst.

If we pass the test of having at least one element in our lst, we must continue our recursion to see if any of the remaining elements are our target. 
We begin this search by calculating mid, which is the index of the median of our lst. Because our lst is given as sorted, the median is just the middle element of the lst, and this is given by floor-dividing high.
By virtue of lst being sorted, in picking the median element we also divide our list in two equal halves. This will prove to be important in calculating the runtime of our algorithm. More on this below.
Side note: floor-dividing // in Python is like dividing but rounds down to the nearest integer. e.g. 5 // 2 = 2, because 2.5 gets rounded to 2. 
So, if our list contains an even number of elements, high is odd because high = len(lst) - 1, and then mid is subsequently rounded down to select the left-middle of the list. If we had lst = [-5, 0, 1, 4, 6, 8, 22, 100], this would mean the length is 8, producing a high = 7, setting mid = 7 // 2, which produces a mid = 3. In our list, this would be the value 4, the left-middle of the list.
Had the length of our list been odd (which would mean that high is even, because high = len(lst) - 1), we pick the exact middle of our list. If we had lst = [-5, 0, 1, 4, 8, 22, 100], this would mean the length is 7, producing a high = 6, producing a mid of 3, which would be the value 4. 
﻿

We now check the value of the median lst[mid], and compare it to our target. 
If the median happens to be our target value, we can immediately return True, since we have found the existence of our target. 

If instead, the median happens to be less than target, this must mean our target lies in the right half of our list. Again, this is by virtue of our list being sorted.
Using a list slice, we recursively call the binary search function on now the right half of the list. We use the same target, since the value we are searching for has not changed.
By comparing the target against the median, we narrowed down where the target element may exist in the lst by half. This new half will become the lst in the next step of the recursion. 
For example, with arguments lst = [-5, 0, 1, 4, 8, 22, 100] and target = 77, we'd compare 77 to the median value of 4, narrowing down that if the target exists in our list, it must be in the right half of the list, [8, 22, 100]. 

The final case is that the median is greater than target, meaning that target must lie in the left half of the list. We make a similar recursive call to the binary search function, this time using the list slice to grab just the left half of the list. 
For example, with arguments lst = [-5, 0, 1, 4, 8, 22, 100] and target = 0, we'd compare 0 to the median value of 4, narrowing down that if the target exists in our list, it must be in the left half of the list, [-5, 0, 1].

Runtime Analysis
This analysis assumes some knowledge of Big-O notation.
We have concluded the algorthmic portion of this tutorial. Here, we take a brief moment to analyze the runtime of Binary Search. 
On each step of the binary search recursion, we divide our input lst by 2. This comes from the fact that on each step, we only select one half of the list to continue searching in. 
To understand the runtime, consider an initial sorted lst of size N. Then, on the subsequent recursive call, the input size of lst would be N / 2. The next recursive call would have input size of lst as N / 4. This continues until the list size becomes empty (our base case), or we get lucky and the median happens to be our target in the second conditional check. 
This means that the number of steps in our recursion is equal to how many times we can divide our list size by 2 until we get to an input list size of 0. This precisely gives us a runtime, in the worst-case, of O(logN). 
For example, if our list was of size 8, we'd have log(8) = 3 steps in our recursion. Below we'll see an example which demonstrates this.
An Example 
Below, we run through an example of Binary Search starting with lst = [-5, 0, 1, 4, 6, 8, 22, 100] & target = 1

binarySearch.py

Continue scrolling

↓

lst = [-5, 0, 1, 4, 6, 8, 22, 100] and target = 1. 

high = 7, since the length of our list is 8.

high is not less than 0, meaning our list is not empty so we can proceed.

mid = 7 // 2, which rounds down to 3.

lst[mid], or equivalentally lst[3], gives the left-middle element: 4. This value is greater than our target, so we continue searching by making a recursive call to binary search with the left half of our list, lst = [-5, 0, 1, 4]. 

Now, entering the second step of recursion, we start with lst = [-5, 0, 1, 4] and target = 1. 

high = 3, since the length of our list is 4.

high is not less than 0, meaning our list is not empty so we can proceed.

mid = 3 // 2, which rounds down to 1.

lst[mid], or equivalentally lst[1], gives the left-middle element: 0. This value is less than our target, so we continue searching by making a recursive call to binary search with the right half of our list, lst = [1, 4].

Now, entering the third step of recursion, we start with lst = [1, 4] and target = 1.

high = 1, since the length of our list is 2.

high is not less than 0, meaning our list is not empty so we can proceed.

mid = 1 // 2, which rounds down to 0.

lst[mid], or equivalently lst[0], gives the left-middle element: 1. This value is equal to our target, so we enter into the if statement and return True because we found our target value.

We can validate our runtime analysis as well. Recall, our initial list, [-5, 0, 1, 4, 6, 8, 22, 100], was of size 8. Our algorithm ran for exactly log(8) = 3 steps. 
For an initial list size of 1,000,000, the algorithm would conclude in just log(1,000,000) = 20 steps. This means that as the list input size asymptotically grows, the runtime stays pretty cheap and fast.