Binary search

From CodeCodex

A binary search algorithm (or binary chop) is a technique for finding a particular value in a linear array, by ruling out half of the data at each step, widely but not exclusively used in computer science. A binary search finds the median, makes a comparison to determine whether the desired value comes before or after it, and then searches the remaining half in the same manner. A binary search is an example of a divide and conquer algorithm (more specifically a decrease and conquer algorithm) and a dichotomic search (more at Search algorithm).

Contents

Implementations

Pseudocode

This determines the index of a given value in a sorted list a between indices left and right:

function binarySearch(a, value, left, right)
    if right < left
        return not found
    mid := floor((right-left)/2)+left
    if a[mid] = value
        return mid
    if value < a[mid]
        return binarySearch(a, value, left, mid-1)
    else
        return binarySearch(a, value, mid+1, right)

Because the calls are tail-recursive, this can be rewritten as a loop, making the algorithm in-place:

function binarySearch(a, value, left, right)
    while left ≤ right
        mid := floor((right-left)/2)+left
        if a[mid] = value
            return mid
        if value < a[mid]
            right := mid-1
        else
            left  := mid+1
    return not found

In both cases, the algorithm terminates because on each recursive call or iteration, the range of indexes right minus left always gets smaller, and so must eventually become negative.

Algol pseudocode

  %Array A, elements 1 to N, to be searched for value X.
  L:=0;                        %Establish outer bounds.
  R:=N + 1;                    %One before, and one after, the first and last.
  while(P:=(R - L)/2) > 0 do   %Truncate to integer. Beware integer overflow with (L + R)/2.
   Case Sign(X - A(P:=P + L))  %Perform the comparison once.
    +1:L:=P;                   %Left bound up:    X follows A(P).
    -1:R:=P;                   %Right bound down: X precedes A(P).
     0:Return(P);              %X is found, here! X equals A(P).
   Otherwise;                  %No other cases are possible.
  Return(-L);                  %Failed.

Returning -L rather than a constant such as -1 is a service for the caller, indicating where the not-found value X would belong in the array should there be an intention to insert it. Note that Algol allows assignments within an expression. Without that facility the while-loop testing becomes difficult and the method has to be recast if repeated calculation is to be avoided, somewhat as follows:

       L:=0;                       %Establish outer bounds.
       R:=N + 1;                   %One before, and one after, the first and last.
 Probe:P:=(R - L)/2;               %Truncate to integer. Beware integer overflow with (L + R)/2.
       if P <= 0 then Return(-L);  %Aha! Nowhere!
       P:=P + L;                   %Probe the middle of the span L:R.
       Case Sign(X - A(P))         %Perform the comparison once.
     +1:L:=P, go to Probe;         %Left bound up:    X follows A(P).
     -1:R:=P, go to Probe;         %Right bound down: X precedes A(P).
      0:Return(P);                 %X is found, here! X equals A(P).
       Otherwise;                  %No other cases are possible.

Here, some "go to" statements are explicit.

Assembly

Formatted in MASM. Input: term is the term being searched for, array is the pointer to the array, and asize indicates the size of array in bytes.

bsearch proc term:DWORD,array:DWORD,asize:DWORD
	mov eax,array
	mov ecx,array
	add ecx,asize
	@@:
	cmp eax,ecx
	jg not_found
	mov edx,eax
	add edx,ecx
	shr edx,1
	xchg DWORD PTR [edx],eax
	cmp eax,term
	xchg DWORD PTR [edx],eax
	jg search_right
	jl search_left
	mov eax,edx
	sub eax,array
	ret
	search_right:
	mov ecx,edx
	jmp @B
	search_left:
	mov eax,edx
	jmp @B
	not_found:
	mov eax,-1
	ret
bsearch endp

ss

C++

C++ has a template-based binary_search() in its standard library, which is what should normally be used, as well as the function pointer based bsearch inherited from C.

In C++, we can take advantage of templates to easily generalize the C implementation described at Binary search (C) to arrays of any type. We simply replace the element type everywhere by a template parameter "T" and replace "<" by a user-supplied less-than comparison operator.

<<array binary search>>=
template< typename T, typename compare_less >
int array_binary_search(T a[], int low, int high, T target) {
    while (low <= high) {
        int middle = low + (high - low)/2;
        if (compare_less(target, a[middle]))
            high = middle - 1;
        else if (compare_less(a[middle], target))
            low = middle + 1;
        else
            return middle;
    }
    return -1;
}

But we need not stop at arrays — we can generalize the implementation even further to any container permitting random access, such as std::vector. The conventional way of doing this, as done by the standard library's binary_search, is to pass in a begin and end iterator specifying the range. This is very similar to the above implementation, but with the indexes low and high and the return value replaced by iterators. Also, by convention, we make "end" point one past the last element to be searched instead of using an inclusive range, resulting in each instance of "end" being adjusted by 1; if the element is not found we can return the initial value of end, since this is outside the range being searched and so not a normal return value.

<<generic binary search>>=
template< typename T, typename IterT, typename compare_less >
IterT generic_binary_search(IterT begin, IterT end, T target) {
    IterT initial_end = end;
    while (begin < end) {
        IterT middle = begin + (end - begin - 1)/2;
        if (compare_less(target, *middle))
            end = middle;
        else if (compare_less(*middle, target))
            begin = middle + 1;
        else
            return middle;
    }
    return initial_end;
}

For convenience, we supply an overload that doesn't take a comparison operator, just using "<". We can't use std::less because C++ doesn't support default template parameters on functions:

<<generic binary search with two template arguments>>=
template< typename T, typename IterT >
IterT generic_binary_search(IterT begin, IterT end, T target) {
    IterT initial_end = end;
    while (begin < end) {
        IterT middle = begin + (end - begin - 1)/2;
        if (target < *middle)
            end = middle;
        else if (*middle < target)
            begin = middle + 1;
        else
            return middle;
    }
    return initial_end;
}

This simple test code tries out the final binary search on an array and a vector, using std::sort to sort them first:

<<binary_search.cpp>>=
#include <algorithm>
#include <iostream>
#include <vector>
#include <cstdlib>  // atoi

<<generic binary search>>
<<generic binary search with two template arguments>>

int main(int argc, char* argv[]) {
    int num_elements = std::atoi(argv[1]);
    int* a = new int[num_elements];
    std::vector<int> v;
    for(int i=0; i<num_elements; i++) {
        do {
            a[i] = rand() % num_elements;
        } while(a[i] == num_elements/7);
        v.push_back(a[i]);
    }
    std::sort(a, a + num_elements);
    std::sort(v.begin(), v.end());

    {
        int present_val = a[rand() % num_elements];
        int* found_at = generic_binary_search(a, a + num_elements, present_val);
        if (found_at == a + num_elements || *found_at != present_val) {
            puts("ERROR");
        }
    }
    {
        int present_val = v[rand() % num_elements];
        std::vector<int>::iterator found_at =
            generic_binary_search(v.begin(), v.end(), present_val);
        if (found_at == v.end() || *found_at != present_val) {
            puts("ERROR");
        }
    }
    if (generic_binary_search(a, a + num_elements, num_elements/7) != a + num_elements ||
        generic_binary_search(v.begin(), v.end(),  num_elements/7) != v.end())
    {
        puts("ERROR");
    }
    return 0;
}

Erlang

NOTE: This particular implementation returns 0 if the value cannot be found.

binary_search(Tuple, Key) ->
    binary_search(Tuple, Key, 1, size(Tuple)).

binary_search(_, _, Low, High) when Low > High -> 0;
binary_search(Tuple, Key, Low, High) ->
    Mid = (Low + High) div 2,
    M = element(Mid, Tuple),
    if
        M > Key -> binary_search(Tuple, Key, Low, Mid-1);
        M < Key -> binary_search(Tuple, Key, Mid+1, High);
        true    -> Mid
    end.

Java

The method takes two arguments: The value to search and the list of values to search in. The input list must be sorted in ascending order prior to searching, else the result of this method is undefined. It must also contain at least one element. Through Java generics, the method can be used with any elements that extend Comparable, like Java number objects or strings, making Java 5 a requirement.

While we are searching, the interval [0, left] is empty or contains only list values < value. The interval [right, n-1] is empty or contains only list values > value. The remaining search interval is [left, right]. The parts left and right of that interval have been searched.

<<search>>=

do {
    int mid = left + (right - left) / 2; 
// mid values is modified - bug fix - Ref @ http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html

    T midVal = values.get(mid);
    if (value.compareTo(midVal) < 0)
        right = mid - 1;
    else if (value.compareTo(midVal) > 0)
        left = mid + 1;
    else
        return mid;
} while (left <= right);


Now left > right and the value was not found in the list. We now want to determine the index at which the element should be inserted to maintain the ordering. As the remaining searching interval is empty now, there is an interval i [0, n-1] with two disjunct intervals i1 [0, left] and i2 [right, n-1] with the following attributes: all indices in i1 have list values < value, all indices in i2 have list values > value and i1 merged with i2 equals i. We now have two possible cases:

Case a) value < midVal & right = mid-1. From the first part follows that mid is in interval [right, n-1] and from the second part follows that mid is at the left edge of the interval, so mid is the smallest index with list value > value. So in case a) an insertion must happen on index mid.

Case b) value > midVal & left = mid+1. From the first part follows that mid is in interval [0, left]. From the second part follows that mid is at the right edge of the interval. This means mid is the largest index with a list value < value, so in case b) we need to insert at mid + 1.

<<return>>=

return (value.compareTo(midVal) < 0) ? -mid : -(mid + 1);

Usage

Illustration of the functionality using JUnit 4 unit tests for strings and integers:

<<test_strings>>=

assertEquals(-1, BinarySearch.search("Billy", Arrays.asList("Anny","Emmy","Grammy")));
<<test_integers>>=

assertEquals(-1, BinarySearch.search(2, Arrays.asList(1, 3, 4)));

The complete program:

<<BinarySearch.java>>=

import static org.junit.Assert.assertEquals;
import org.junit.Test;

import java.util.Arrays;
import java.util.List;

public class BinarySearch {
	static <T extends Comparable<? super T>> int search(T value, List<T> values) {
		int n = values.size();
		int left = 0, right = n - 1;
		<<search>>
		<<return>>
	}
	@Test
	public void testStringSearch() {
		<<test_strings>>
	}
	
	@Test
	public void testIntegerSearch() {
		<<test_integers>>
	}
}

OCaml

NOTE: This particular implementation raises the Not_found exception if the value cannot be found.

# let rec search a ?(l=0) ?(u=Array.length a - 1) k =
    if l > u then raise Not_found;
    let m = l + (u - l) / 2 in
    match compare k a.(m) with
    | -1 -> search a ~l ~u:(m - 1) k
    | 1 -> search a ~l:(m + 1) ~u k
    | _ -> m;;
val search : 'a array -> ?l:int -> ?u:int -> 'a -> int = <fun>

For example:

# search [|0;1;2;3;4;5;6;7;8;9|] 3;;
- : int = 3

Perl

1:

my @list = qw(Cat Elephant Giraffe Lion Orangutan Zebra);

print binarySearch(\@list, 'Orangutan'), "\n";

sub binarySearch {
    my ($list, $query) = @_;
    my ($low, $high)   = ($[, $#$list);

    while ( $low <= $high ) {
        my $try = int( ($low + $high) / 2 );

        # note: "lt" and "gt" because we are comparing strings
        # if we were comparing numbers then we would use "<" and ">"
        $list->[$try] lt $query and do { $low  = $try + 1; next };
        $list->[$try] gt $query and do { $high = $try - 1; next };

        return $try; # By this point it must be equal!
    }
}

2:

sub binary_search_recursive {
    my ($arr, $value, $left, $right) = @_;
    return if $right < $left; # not found
    my $mid = int(($right-$left)/2)+$left;
    return $mid if $arr->[$mid] == $value;
    if ($value < $arr->[$mid]) {
        return binary_search_recursive($arr, $value, $left, $mid-1);
    } else {
        return binary_search_recursive($arr, $value, $mid+1, $right);
    }
}

sub binary_search_inplace {
    my ($arr, $value, $left, $right) = @_;
    while ($left <= $right) {
        my $mid = int(($right-$left)/2)+$left;
        return $mid if $arr->[$mid] == $value;
        if ($value < $arr->[$mid]) {
            $right = $mid-1;
        } else {
            $left  = $mid+1;
        }
    }
}

my @l = qw(0 1 2 3 4 5 6 7 8 9);
print binary_search_recursive  \@l, 7, $l[0], $l[-1];
print binary_search_inplace    \@l, 7, $l[0], $l[-1];

There is another way to do it, using the indexes of the vector. TODO: Isn't this version better?

use warnings;

sub binary_search(@$) {
 my ($arr, $value) = @_;
 $left = $[;
 $right = %#$arr;
 while ($left <= $right) {
   my $mid = int($right + $left) >> 1;
      if ($arr->[$mid] eq $value)
      {
        return 1
      }
      if ($value lt $arr->[$mid]) {
         $right = $mid - 1         
      } else {
         $left  = $mid + 1
      }
 }
 return 0;
}

my @arr = qw(1 2 3 4 5 6 7 8);

for (0 .. 9)
{
  print binary_search(\@arr, "$_") . "\n";
}

PHP

NOTE: This particular implementation returns -1 if the value cannot be found.

// $low and $high have to be integers

function BinarySearch( $array, $key, $low, $high )
{
    if( $low > $high ) // termination case
    {
        return -1;
    }

    $middle = intval( ( $low+$high )/2 ); // gets the middle of the array

    if ( $array[$middle] == $key ) // if the middle is our key
    {
        return $middle;
    }
    elseif ( $key < $array[$middle] ) // our key might be in the left sub-array
    {
        return BinarySearch( $array, $key, $low, $middle-1 );
    }	

    return BinarySearch( $array, $key, $middle+1, $high ); // our key might be in the right sub-array
}


Python

NOTE: This particular implementation returns -1 if the value cannot be found.

# low and high have to be integers

def binarySearch(array, key, low, high):
    if low > high: # termination case
        return -1

    middle = (low + high) / 2 # gets the middle of the array

    if array[middle] == key:  # if the middle is our key
        return middle
    elif key < array[middle]: # our key might be in the left sub-array
        return binarySearch(array, key, low, middle-1)
    else:                     # our key might be in the right sub-array
        return binarySearch(array, key, middle+1, high)

Ruby

NOTE: This particular implementation returns -1 if the value cannot be found.

def binary_search(array, key, low=0, high=array.size-1)
  return -1 if low > high
  mid = (low + high) / 2
  return mid if array[mid]==key
  if array[mid] > key
    high = mid - 1
  else
    low = mid + 1
  end
  binary_search(array, key, low, high) 
end

ary = [1,2,3,4,5,6,7,8,9]
puts binary_search(ary, 8)      #=> 7

Scheme

NOTE: This particular list implementation returns -1 if the value cannot be found.

(define (binary-search list key low high)
  (let ((mid (quotient (+ low high) 2)))
    (cond ((> low high) -1)
          ((= (list-ref list mid) key) mid)
          ((> (list-ref list mid) key) (binary-search list key low (- mid 1)))
          (else (binary-search list key (+ mid 1) high)))))


Seed7

NOTE: This particular implementation returns 0 if the value cannot be found.

const func integer: binarySearch (in array elemType: arr, in elemType: aKey, in integer: low, in integer: high) is func
  result
    var integer: result is 0;
  begin
    if low <= high then
      result := (low + high) div 2;
      if aKey < arr[result] then
        result := binarySearch(arr, aKey, low, pred(result)); # search left
      elsif aKey > arr[result] then
        result := binarySearch(arr, aKey, succ(result), high); # search right
      end if;
    end if;
  end func;

Tcl

Binary search in arbitrary list. A compare procedure should be given

proc _bs {comp l key low high} {
  if {$low > $high} {return -1} else {
    set middle [expr {($low + $high)/2}]
    set test [$comp $key [lindex $l $middle]]
    if {$test == 0} {
      return $middle
    } elseif {$test < 0} {
      incr middle -1
      return [_bs $comp $l $key $low $middle] 
    } else {
      incr middle
      return [_bs $comp $l $key $middle $high]
    }
  }
}
 
proc binary_search {comp l key} {_bs $comp $l $key 0 [expr {[llength $l] - 1}]}

Searching in an integer list

proc compare_integer {x y} {expr {$x - $y}}
binary_search compare_integer {0 10 20 30 40 50 60 70 80 90 100 110} 90 ;# -> 9

also

lsearch -integer -sorted {0 10 20 30 40 50 60 70 80 90 100 110} 90

Searching in a string list

proc compare_string {x y} {string compare $x $y}
binary_search compare_string {aa bb cc dd ee ff gg hh} cc ;# -> 2

also

lsearch -sorted {aa bb cc dd ee ff gg hh} cc