CSCI570/Class notes

From Driscollwiki

Jump to: navigation, search

CSCI571 Analysis of Algorithms

Contents

Time terms

  • O(1), constant
  • O(n), linear
  • O(logn), logarithmic
  • O(nk), polynomial
  • O(kn), exponential

May 24

Apparently this course began last week and I missed the first class.

  • Last week ended with a 4 step matching algorithm

Marriage algorithm

States

  • Perfect matching: each M matched with exactly 1 W
  • Stable matching: Perfect matching and no further changes (goal!)
    • Does not have to be unique
  • Proof by contradiction:

Assumptions

  • N quantity of m and w
  • m and w have lists of preferences
  • m will propose first to top of his list
  • Instability between two pairs: (m,w) and (m\prime, w\prime)
  • m's last proposal was to w
  •  ? did m propose to w' at some point in the execution?
    • N: then w must be higher than w'
    • Y: then he must have been rejected in favor of m\'\'

Simple scenario

N = 2

  • m prefers w
  • m' prefers w'
  • w prefers m'
  • w' prefers m

If m proposes first

  • (m, w)
  • (m', w')

If w proposes first

  • (w, m')
  • (w', m)

Complexity analysis

  • Most important measure of performance: worst case scenario
  • We know that it will take maximum N^2 iterations
    • But we dont' know how long each iteration will take!
    • Constant time? N^2
    • Linear time? N^3
  • We need to identify the data structures before we can know the length of an iteration

Steps in single iteration

  1. Identify a free/single man (m)
  2. For m, identify highest ranked woman (w) to whom he has not yet proposed
  3. Determine if w is engaged yet and to whom?
  4. If w is engaged, compare rankings of m and m\prime, determine which is preferred.
  5. If necessary, change w's partner
  6. If necessary, add m\prime back into the pool of free/single mens

Data structures

Need a data structure for the single men:

  • Array, linked list, queue
  • Performance is constant for all

Operations in a single iteration

Step 1,6: ID single man, O(1)
  • Hold list of men in an array or linked list (Steps 1, 6)
    • O(1) to find a single man
    • O(1) to extract a single man
    • O(1) to add a single man
Step 2: ID highest ranked woman to whom he has not yet proposed, O(1)
  • Keep an array Next[1..N]
    • Next[m] points to the position of the next woman he will propose to
  • Keep 2d arrays ManPref[1..N, 1..N] and WomanPref[1..N, 1..N]
    • ManPref[m, i] denotes the ith woman on m's preference list
    • To find next woman on m's prefer list, w = ManPref[m, Next[m]]
Step 3: Determine if w is engaged yet and to whom, O(1)
  • Keep an array Current[1..N]
Step 4: Determine w's preference, O(1)
  • If we keep WomanPref as a 2d array constructed like ManPref, the time is linear
    • Raises the complexity from N^2 to N^3
  • So we inverse the indexes and values
  • Create NxN array called Ranking
    • Where Ranking[w, m] contains the rank of m in the sorted order of w's preferences
    • This array is created during a prep phase (outside the loop), O(N2)
  • Once the array is constructed, finding and comparing m and \prime takes O(1)
Step 5: Update w's partner, O(1)
  • Update array Current[w]
Overall time?
  • Preparation: O(N2)
  • Iterations: O(N2)
  • Total == O(N2)

Summary

Four components of a solution:

  • Solution
  • Proof of correctness
  • Complexity analysis

Reviewing asymptotic notation

Upper bound uses O() notation

O(g(n)) = { f(n), 0 \leq f(n) \leq c*g(n) for all n \geq n_{0}}

O notation represents the upper bound

  • We're seeking the tightest upper bound
  • Often we compare two algorithms by comparing their worst-case scenarios

Test examples

  • Any quadratic funciton is O(n2)
  • Any linear function is O(n2)

Lower bound uses ω() notation

  • Best case scenario

"Sandwich", θ() notation

  • Only if lower bound and upper bound are a constant function apart

Common examples

Linear search

  • O(n)
  • ω(1)
  • θ(), does not apply

Binary search

On an array of size n,

  • O(logn)
  • ω(1)
  • θ(), does not apply

Merge sort

On an array of size n,

  • O(n * logn)
  • ω(n * logn)
  • θ(n * logn)
    • Possible to use because the upper and lower are a constant function apart

Insertion sort

On an array of size n,

  • O(n2)
  • ω(n)
  • θ(), does not apply

Example comparison

Two functions

  • O(2nn3log2n)
  • O(3nnlogn)

Need to break functions into their components

  • Exponential (fastest growth)
  • Polynomial
  • Logarithmic (slowest growth)

Compare worst case scenarios:

  • Compare the exponential components (fastest growth) first
    • If one grows faster than the other one, stop
    • No way for the polynomial or logarithmic components to "make up" for it
  • Compare polynomial components (next fastest)

Note: we tend to compare worst case scenarios for class but in other cases, it might be more important to compare average cases, e.g. an algorithm that runs over and over and we have data on it.

Why might we use an algorithm with the greater complexity?

  • Both are already written, equal implementation cost
  • Do you always use lower complexity?
  • Look at the size of the typical n
    • In some cases, the bounds may compare differently with different n's
    • Looking for a "sweet spot", could change according to different machines
      • System dependencies
    • Most often this data is based on heuristics from previous execution
  • Also worth examining the data structures

In some cases, you may also combine two or more algorithms depending on circumstances, input

June 2, 7

Note: didn't do reading and am totally lost for first hour.

Tangent: Heap data structures

Priority queue

  • Priority queue: a data structure which provides two things:
    • Efficient insertion of elements into a set
    • Efficient finding of smallest element
Implementation
  • Array?
    • Takes constant time to insert an element
    • Linear time to find smallest element
    • Not ideal
  • Sorted array?
    • Binary search (\log n) to find where to insert
    • Might have to shift elements (order n)
    • Not ideal

Binary tree

FIG19.GIF

  • Binary tree of depth k which has exactly 2^k nodes is called a full binary tree
  • Binary tree with n nodes and of depth k is complete iff its nodes correspond to the nodes which are numbered one to n in the full binary tree of depth k
Implementation
  • Complete binary tree can be stored in an array very easily
    • Nodes correspond to the indeces
    • You don't need any pointers
    • You can compute the address (index)
    • parent(i) is at floor(i/2) if i != 1
      • if i == 1, then i is the root
    • left_child(i) is at 2i if 2i <= n, otherwise i has no left child
    • right_child(i) is at 2i + 1, if 2i + 1 <= n, otherwise i has no right child
  • No pointer = saving space
  • Operations are very fast: left shift and increment

Binary heap

  • Max heap is a complete binary tree with the property that the value of the key at each node is at least as large as the values of its children (if they exist)
    • Min heap has inverse relationship of key to child value
Implementation
  • Insertion?
    • Add to end of tree
    • Work backwards to ensure max heap
    • O(logn)
  • Find max?
    • Always at the root
    • Constant time, O(1)
  • Extract max?
    • Pop the root element
    • You know the result will have 1 fewer nodes
    • Move the last node to the root
    • Compare root with children, swap if necessary
    • Continue to compare and swap down the binary tree
    • This way you can be certain that the result will remain a complete tree
    • O(logn)
  • Delete?
Construct a binary heap of size n?

Construct a binary heap of size n?

  • Failed to parse (unknown function\logn): O(n\logn)
if you are adding them one at a time from the top down
  • But O(n) is possible by efficiently building from the bottom up
    • Very few elements will need to travel up the tree which takes Failed to parse (unknown function\logn): O(\logn)
time
    • Details are in the CRLS text
  • In a given binary tree, 75% of the elements are in the bottom two rows of leaves
    • n / 2 are all the bottom leaves
    • n / 4 are the 2nd to last row of leaves

Binomial heap

Binomial tree
  • Binomial tree, Bk, is an ordered tree defined recursively
    • B0 consists of a single node
    • Bk consists of two binomial trees Bk − 1 linked together
      • Root of one is the left-most child of the root of the other
  • Very rigid in their shape and number of nodes
Binomial heap
  • Binomial heap, H, is a set of binomial trees that satisfies the following properties:
    • Each binomial trea H obeys the min-heap property (see above)
    • For any non-negative integer k, there is at most one binomial tree in H with root of degree k
Implementation
  • Search
    • Finding minimum value takes O(logn)
  • Insertion takes O(logn)

Fibonacci heap

  • Min value of the heap is the head
  • Cost in O(*) is very low:
    • O(1): insert, find min, union, decrease key
    • Ologn): extract min, delete
    • O(n): construct
  • But there may be considerable "overhead"
    • As represented by the constants obscured in O() notation
    • If your elements in the tree are just integers but you need multiple pointers for each, your storage overhead is proportionally huge
    • But if objects at the nodes are very large already, it may not be too bad

Tangent, amortized time

  • Uncommon alternative to worst-case scenario

Hypothetical

Existing code
 
For i = 1 to n
    push or pop
endFor

Complexity = O(n)

New feature introduced (multipop)
For i = 1 to n
    push or pop or multipop
endFor

How has the complexity changed?

  • Is it now O(n2)?
  • No, because the max number of pushes/pops is n
  • Can't penalize the whole algorithm for a single expensive operation
  • In this case, the worst-case is very rare

Routing (greedy algorithm)

Dijkstra's algorithm

Graph:

  • Nodes: exits
  • Edges: highway segments between exits

Problem statement

  • Given G = (V,E) where W(U, V) \geq 0 for each edge (U, V) \in E, find the shortest path from s to V-s.
General approach

Note: need to review this in the text book.

  1. Start from a set S of vertices containing final shortest path
  2. At each step find a vertex u \in V-S with shortest distances from s
  3. Add u to S, repeat from step 2.
Implementation

Note: this is wrong... can't read the board.

  • Initially S = {s} and d(s) = 0
  • For all nodes Failed to parse (unknown function\infinity): d(u) = \infinity
  • While S \neq V
    • Select a node v of S with at least one edge from S for which d(v)

Minimum Spanning Tree problem (MST)

Note: June 9 class

  • Tree is a graph that is connected by doesn't have any cycles
  • How do we find a tree within a graph that spans all the nodes?

Problem statement

  • for a graph G(V, E) with positive edge costs associated with each edge, the problem is to find a subset of edges T (of E) so that graph (V, T) is connected and σCe is as small as possible

Prim's algorithm

Proof

  • Fact: let S be any subset of nodes that is neither empty nor equal to all of V and let edge c be the minimum cost edge with one end in S and the other end in V-S.
  • Then every MST contains the edge e.

Implementation

  • (Almost identical to Dijkstra's)
  • Seeking lowest cost edge between subsets S and V-S.

Kruskal's algorithm

Implementation

  • Create an independent set for each node
  • Sort the edges of E into non-decreasing order of weight / cost
  • For each edge (U,V) in E, taken in non-decreasing order of cost
    • If U & V are not in the same set, then
      • Merge the two sets: A = A Union {(U, V)}
    • EndIf
  • EndFor

Note: we say non-decreasing to be more specific

  • It might stay flat... but it is not decreasing

Two possible implementations: arrays or pointers:

  • Make set, O(1) for set size = 1
  • Find set, O(1) or Ologn)
  • Union, Ologn) or O(1)

More specific pseudocode

  • A=Null
  • For each vertex v in V
    • Make_set(v)
  • EndFor
  • Sort the edges of E into non-decreasing order of cost
  • For each edge (U, V), taken in non-decreasing order
    • If Find Set (U) != Find Set (V)
      • Union (U, V)
      • A = A union {(U, V)}
    • EndIf
  • EndFor

Complexity analysis

  • Cost, O(m\log n) + O(m\log m), or O(mlogn)
    • No asymptotic difference between m log n and m log m

Reverse-delete

Proof

  • To proof: Highest cost edge in a cycle cannot be in the MST

Divide and Conquer (June 14)

  • 2nd of three big classes of algorithms (Greedy, D&C, Dynamic programming)

Big picture: Divide and conquer

  • Take a large problem and split it into n smaller sub-problems
  • Conquer, i.e. solve the subproblems recursively, or if trivial, solve the problem itself
  • Combine, the solutions to the subproblems

Sorting example

  • [1,2,5,7,3,9,8,6]
  • [1,2,5,7], [3,9,8,6]
  • [1,2], [5,7], [3,9], [8,6]
  • Sort the pairs
  • [1,2], [5,7], [3,9], [6,8]
  • Merge the sorted pairs
  • [1,2,5,7], [3,6,8,9]
  • [1,2,3,5,6,7,8,9]

Reviewing the example

  • Why split in 2? Easier to re-combine.
  • In this case, we can use mergesort

Analzing the example

  • Divide takes θ(1)
  • Conquer if the original problem takes T(n) then the 2 subprovlesm takes 2T(n / 2)
  • Recursive equation for mergesort:
    • T(n) = \begin{cases} \theta(1), & \mbox{if }n = 1 \\ \end{cases}

Three cases to memorize

  • Failed to parse (syntax error): \mbox{If }f(n) = O(n^{\log_ba - \epsilon}), & \mbox{then} T(n) = \theta(n^{\log_ba}\log n)
  • Failed to parse (syntax error): \mbox{If }f(n) = \theta(n^{\log_ba}), & \mbox{then} T(n) = \theta(n^{\log_ba}\log n)
  • Didn't get the third case...

Easy c&ampld problem

  • We have a graph of stock value over time
  • Need to identify the optimal buying and selling times

Brute force solution

  • Evaluate all possible solutions
    • Start at t = 1, check all remaining t's for the highest price
    • Do the same for t = 2, 3, etc.
    • Time, O(n2) = (n + 1) + (n + 2) + ... + 1
  • Find the best

Divide and conquer solution

  • Divide t in half
  • For each subset,
    • Find Max and Min
    • Optimal Buy (B) and Sell (S) points
Analysis
  • Divide: θ(1)
  • Combine: θ(1)
  • Recursion equation: \begin{cases}T(n) = \theta(1), & \mbox{if } n \leq \\ 2T(n/2) + \theta(1)\end{cases}

Harder problem

  • Have an unsorted array of size n
  • Want to find Min and Max
  • To find Min and Max has a barrier of O(n)
  • Because you must touch every element in the array
  • But how many comparisons are required?

Brute force, linear time

  • Find Min takes n − 1 comparisons
  • Find Max takes n − 1 comparisons
  • Total: 2n − 2 comparisons

Binary tree of n-terms?

  • Bottom level, n nodes
  • Total nodes for the rest of the tree = (n-1)
    • Each node requires two comparisons
    • (2n − 2) comparisons
Personal tools