CSCI570/Class notes
From Driscollwiki
CSCI571 Analysis of Algorithms
- Shahriar Shamsian
- Class page: http://www-scf.usc.edu/~zhenglia/
Contents
|
Time terms
- O(1), constant
- O(n), linear
- O(logn), logarithmic
- O(nk), polynomial
- O(kn), exponential
May 24
Apparently this course began last week and I missed the first class.
- Last week ended with a 4 step matching algorithm
Marriage algorithm
States
- Perfect matching: each M matched with exactly 1 W
- Stable matching: Perfect matching and no further changes (goal!)
- Does not have to be unique
- Proof by contradiction:
Assumptions
- N quantity of m and w
- m and w have lists of preferences
- m will propose first to top of his list
- Instability between two pairs: (m,w) and
- m's last proposal was to w
- ? did m propose to w' at some point in the execution?
- N: then w must be higher than w'
- Y: then he must have been rejected in favor of m\'\'
Simple scenario
N = 2
- m prefers w
- m' prefers w'
- w prefers m'
- w' prefers m
If m proposes first
- (m, w)
- (m', w')
If w proposes first
- (w, m')
- (w', m)
Complexity analysis
- Most important measure of performance: worst case scenario
- We know that it will take maximum N^2 iterations
- But we dont' know how long each iteration will take!
- Constant time? N^2
- Linear time? N^3
- We need to identify the data structures before we can know the length of an iteration
Steps in single iteration
- Identify a free/single man (m)
- For m, identify highest ranked woman (w) to whom he has not yet proposed
- Determine if w is engaged yet and to whom?
- If w is engaged, compare rankings of m and
, determine which is preferred.
- If necessary, change w's partner
- If necessary, add
back into the pool of free/single mens
Data structures
Need a data structure for the single men:
- Array, linked list, queue
- Performance is constant for all
Operations in a single iteration
Step 1,6: ID single man, O(1)
- Hold list of men in an array or linked list (Steps 1, 6)
- O(1) to find a single man
- O(1) to extract a single man
- O(1) to add a single man
Step 2: ID highest ranked woman to whom he has not yet proposed, O(1)
- Keep an array Next[1..N]
- Next[m] points to the position of the next woman he will propose to
- Keep 2d arrays ManPref[1..N, 1..N] and WomanPref[1..N, 1..N]
- ManPref[m, i] denotes the ith woman on m's preference list
- To find next woman on m's prefer list, w = ManPref[m, Next[m]]
Step 3: Determine if w is engaged yet and to whom, O(1)
- Keep an array Current[1..N]
Step 4: Determine w's preference, O(1)
- If we keep WomanPref as a 2d array constructed like ManPref, the time is linear
- Raises the complexity from N^2 to N^3
- So we inverse the indexes and values
- Create NxN array called Ranking
- Where Ranking[w, m] contains the rank of m in the sorted order of w's preferences
- This array is created during a prep phase (outside the loop), O(N2)
- Once the array is constructed, finding and comparing m and
takes O(1)
Step 5: Update w's partner, O(1)
- Update array Current[w]
Overall time?
- Preparation: O(N2)
- Iterations: O(N2)
- Total == O(N2)
Summary
Four components of a solution:
- Solution
- Proof of correctness
- Complexity analysis
Reviewing asymptotic notation
Upper bound uses O() notation
O notation represents the upper bound
- We're seeking the tightest upper bound
- Often we compare two algorithms by comparing their worst-case scenarios
Test examples
- Any quadratic funciton is O(n2)
- Any linear function is O(n2)
Lower bound uses ω() notation
- Best case scenario
"Sandwich", θ() notation
- Only if lower bound and upper bound are a constant function apart
Common examples
Linear search
- O(n)
- ω(1)
- θ(), does not apply
Binary search
On an array of size n,
- O(logn)
- ω(1)
- θ(), does not apply
Merge sort
On an array of size n,
- O(n * logn)
- ω(n * logn)
- θ(n * logn)
- Possible to use because the upper and lower are a constant function apart
Insertion sort
On an array of size n,
- O(n2)
- ω(n)
- θ(), does not apply
Example comparison
Two functions
- O(2nn3log2n)
- O(3nnlogn)
Need to break functions into their components
- Exponential (fastest growth)
- Polynomial
- Logarithmic (slowest growth)
Compare worst case scenarios:
- Compare the exponential components (fastest growth) first
- If one grows faster than the other one, stop
- No way for the polynomial or logarithmic components to "make up" for it
- Compare polynomial components (next fastest)
Note: we tend to compare worst case scenarios for class but in other cases, it might be more important to compare average cases, e.g. an algorithm that runs over and over and we have data on it.
Why might we use an algorithm with the greater complexity?
- Both are already written, equal implementation cost
- Do you always use lower complexity?
- Look at the size of the typical n
- In some cases, the bounds may compare differently with different n's
- Looking for a "sweet spot", could change according to different machines
- System dependencies
- Most often this data is based on heuristics from previous execution
- Also worth examining the data structures
In some cases, you may also combine two or more algorithms depending on circumstances, input
June 2, 7
Note: didn't do reading and am totally lost for first hour.
Tangent: Heap data structures
Priority queue
- Priority queue: a data structure which provides two things:
- Efficient insertion of elements into a set
- Efficient finding of smallest element
Implementation
- Array?
- Takes constant time to insert an element
- Linear time to find smallest element
- Not ideal
- Sorted array?
- Binary search (\log n) to find where to insert
- Might have to shift elements (order n)
- Not ideal
Binary tree
- Binary tree of depth k which has exactly 2^k nodes is called a full binary tree
- Binary tree with n nodes and of depth k is complete iff its nodes correspond to the nodes which are numbered one to n in the full binary tree of depth k
Implementation
- Complete binary tree can be stored in an array very easily
- Nodes correspond to the indeces
- You don't need any pointers
- You can compute the address (index)
- parent(i) is at floor(i/2) if i != 1
- if i == 1, then i is the root
- left_child(i) is at 2i if 2i <= n, otherwise i has no left child
- right_child(i) is at 2i + 1, if 2i + 1 <= n, otherwise i has no right child
- No pointer = saving space
- Operations are very fast: left shift and increment
Binary heap
- Max heap is a complete binary tree with the property that the value of the key at each node is at least as large as the values of its children (if they exist)
- Min heap has inverse relationship of key to child value
Implementation
- Insertion?
- Add to end of tree
- Work backwards to ensure max heap
- O(logn)
- Find max?
- Always at the root
- Constant time, O(1)
- Extract max?
- Pop the root element
- You know the result will have 1 fewer nodes
- Move the last node to the root
- Compare root with children, swap if necessary
- Continue to compare and swap down the binary tree
- This way you can be certain that the result will remain a complete tree
- O(logn)
- Delete?
Construct a binary heap of size n?
Construct a binary heap of size n?
- Failed to parse (unknown function\logn): O(n\logn)
if you are adding them one at a time from the top down
- But O(n) is possible by efficiently building from the bottom up
- Very few elements will need to travel up the tree which takes Failed to parse (unknown function\logn): O(\logn)
time
- Details are in the CRLS text
- In a given binary tree, 75% of the elements are in the bottom two rows of leaves
- n / 2 are all the bottom leaves
- n / 4 are the 2nd to last row of leaves
Binomial heap
Binomial tree
- Binomial tree, Bk, is an ordered tree defined recursively
- B0 consists of a single node
- Bk consists of two binomial trees Bk − 1 linked together
- Root of one is the left-most child of the root of the other
- Very rigid in their shape and number of nodes
Binomial heap
- Binomial heap, H, is a set of binomial trees that satisfies the following properties:
- Each binomial trea H obeys the min-heap property (see above)
- For any non-negative integer k, there is at most one binomial tree in H with root of degree k
Implementation
- Search
- Finding minimum value takes O(logn)
- Insertion takes O(logn)
Fibonacci heap
- Min value of the heap is the head
- Cost in O(*) is very low:
- O(1): insert, find min, union, decrease key
- Ologn): extract min, delete
- O(n): construct
- But there may be considerable "overhead"
- As represented by the constants obscured in O() notation
- If your elements in the tree are just integers but you need multiple pointers for each, your storage overhead is proportionally huge
- But if objects at the nodes are very large already, it may not be too bad
Tangent, amortized time
- Uncommon alternative to worst-case scenario
Hypothetical
Existing code
For i = 1 to n
push or pop
endFor
Complexity = O(n)
New feature introduced (multipop)
For i = 1 to n
push or pop or multipop
endFor
How has the complexity changed?
- Is it now O(n2)?
- No, because the max number of pushes/pops is n
- Can't penalize the whole algorithm for a single expensive operation
- In this case, the worst-case is very rare
Routing (greedy algorithm)
Dijkstra's algorithm
Graph:
- Nodes: exits
- Edges: highway segments between exits
Problem statement
- Given G = (V,E) where
for each edge
, find the shortest path from s to V-s.
General approach
Note: need to review this in the text book.
- Start from a set S of vertices containing final shortest path
- At each step find a vertex
with shortest distances from s
- Add u to S, repeat from step 2.
Implementation
Note: this is wrong... can't read the board.
- Initially S = {s} and d(s) = 0
- For all nodes Failed to parse (unknown function\infinity): d(u) = \infinity
- While
- Select a node v of S with at least one edge from S for which d(v)
Minimum Spanning Tree problem (MST)
Note: June 9 class
- Tree is a graph that is connected by doesn't have any cycles
- How do we find a tree within a graph that spans all the nodes?
Problem statement
- for a graph G(V, E) with positive edge costs associated with each edge, the problem is to find a subset of edges T (of E) so that graph (V, T) is connected and σCe is as small as possible
Prim's algorithm
- http://en.wikipedia.org/wiki/Prim%27s_algorithm
- Similar to Dijkstra's
- Start with a node of set S on which a spanning tree has been constructed so far
Proof
- Fact: let S be any subset of nodes that is neither empty nor equal to all of V and let edge c be the minimum cost edge with one end in S and the other end in V-S.
- Then every MST contains the edge e.
Implementation
- (Almost identical to Dijkstra's)
- Seeking lowest cost edge between subsets S and V-S.
Kruskal's algorithm
- http://en.wikipedia.org/wiki/Kruskal%27s_algorithm
- Sort all edges in increasing order of cost
- Add edges to T in this order as long as it does not create a cycle
- If it does, discard the edge
Implementation
- Create an independent set for each node
- Sort the edges of E into non-decreasing order of weight / cost
- For each edge (U,V) in E, taken in non-decreasing order of cost
- If U & V are not in the same set, then
- Merge the two sets: A = A Union {(U, V)}
- EndIf
- If U & V are not in the same set, then
- EndFor
Note: we say non-decreasing to be more specific
- It might stay flat... but it is not decreasing
Two possible implementations: arrays or pointers:
- Make set, O(1) for set size = 1
- Find set, O(1) or Ologn)
- Union, Ologn) or O(1)
More specific pseudocode
- A=Null
- For each vertex v in V
- Make_set(v)
- EndFor
- Sort the edges of E into non-decreasing order of cost
- For each edge (U, V), taken in non-decreasing order
- If Find Set (U) != Find Set (V)
- Union (U, V)
- A = A union {(U, V)}
- EndIf
- If Find Set (U) != Find Set (V)
- EndFor
Complexity analysis
- Cost, O(m\log n) + O(m\log m), or O(mlogn)
- No asymptotic difference between m log n and m log m
Reverse-delete
- http://en.wikipedia.org/wiki/Reverse-delete_algorithm
- Start with a full graph (V, E)
- Begin deleting edges in order of decreasing cost as long as it does not disconnect the graph
Proof
- To proof: Highest cost edge in a cycle cannot be in the MST
Divide and Conquer (June 14)
- 2nd of three big classes of algorithms (Greedy, D&C, Dynamic programming)
Big picture: Divide and conquer
- Take a large problem and split it into n smaller sub-problems
- Conquer, i.e. solve the subproblems recursively, or if trivial, solve the problem itself
- Combine, the solutions to the subproblems
Sorting example
- [1,2,5,7,3,9,8,6]
- [1,2,5,7], [3,9,8,6]
- [1,2], [5,7], [3,9], [8,6]
- Sort the pairs
- [1,2], [5,7], [3,9], [6,8]
- Merge the sorted pairs
- [1,2,5,7], [3,6,8,9]
- [1,2,3,5,6,7,8,9]
Reviewing the example
- Why split in 2? Easier to re-combine.
- In this case, we can use mergesort
Analzing the example
- Divide takes θ(1)
- Conquer if the original problem takes T(n) then the 2 subprovlesm takes 2T(n / 2)
- Recursive equation for mergesort:
-
Three cases to memorize
- Failed to parse (syntax error): \mbox{If }f(n) = O(n^{\log_ba - \epsilon}), & \mbox{then} T(n) = \theta(n^{\log_ba}\log n)
- Failed to parse (syntax error): \mbox{If }f(n) = \theta(n^{\log_ba}), & \mbox{then} T(n) = \theta(n^{\log_ba}\log n)
- Didn't get the third case...
Easy c&ld problem
- We have a graph of stock value over time
- Need to identify the optimal buying and selling times
Brute force solution
- Evaluate all possible solutions
- Start at t = 1, check all remaining t's for the highest price
- Do the same for t = 2, 3, etc.
- Time, O(n2) = (n + 1) + (n + 2) + ... + 1
- Find the best
Divide and conquer solution
- Divide t in half
- For each subset,
- Find Max and Min
- Optimal Buy (B) and Sell (S) points
Analysis
- Divide: θ(1)
- Combine: θ(1)
- Recursion equation:
Harder problem
- Have an unsorted array of size n
- Want to find Min and Max
- To find Min and Max has a barrier of O(n)
- Because you must touch every element in the array
- But how many comparisons are required?
Brute force, linear time
- Find Min takes n − 1 comparisons
- Find Max takes n − 1 comparisons
- Total: 2n − 2 comparisons
Binary tree of n-terms?
- Bottom level, n nodes
- Total nodes for the rest of the tree = (n-1)
- Each node requires two comparisons
- (2n − 2) comparisons

