Assignment 5: Cloud-based Bacon Services
Due: Thursday 4/14 (11:59pm)
Some folks used to play a game called "Six Degrees of Kevin Bacon." The idea is that you were given a name of a person, and you had to link that person up to Kevin Bacon based on films they both appeared in. For example, let's say you were challenged to link Kiefer Sutherland. One possible answer is: Kiefer Sutherland was in A Time To Kill with his dad, Donald Sutherland, who was in JFK with Kevin Bacon. But this isn't the best answer, as Kevin Bacon was in Flatliners with Kevin Bacon.
Algorithm-wise, this is called bidirectional search. You have a source and sink node given as input, and you need to provide the shortest path that links them up. (Yes, you could get the answer using Dijkstra's algorthim to get the same answer - bidirectional search typically uses less memory and takes less time). The pseudocode goes like this:
- Given SOURCE and SINK, initialize set S and T with SOURCE and SINK, respectively.
- While the intersection of SOURCE and SINK is empty:
- For each x in S, add its neighbors to S
- For each y in T, add its neighbors to T
- Return the shortest path that goes though the intersection of S and T.
There is adjacency information available here:
Each line represents an person, which is the first field (separated by a tab). All of the other names on the line (separated by pipes) are other people they have worked with on a movie. The letters in parentheses discriminate actors vs. writers vs. directors.
- Describe your algorithm, covering how you:
- Represent Nodes
- Handle serialization
- Detect Termination
- Distinguish the source from the sink
- Compute some paths!
- What is the path between "Hanks, Tom (a)" and "Bacon, Kevin (I) (a)"? (Hint: this is super-easy and should be a sanity check.)
- What is the path between "Murray, Bill (I) (a)" and "Bacon, Kevin (I) (a)"?
- What is the path between "Schreck, Max (a)" (who played Nosferatu, one of the first and best film vampires) and "Bacon, Kevin (I) (a)"?
- What is the path between "Jun, Gianna (a)" (who played Saya, a vampire-hunter in one of the worst recent vampire movies) and "Bacon, Kevin (I) (a)"
- What is the path between "Schreck, Max (a)" and "Jun, Gianna (a)"?
- Can you find a longer path than any of the ones you found above? (We're not asking you to compute the diameter of the graph, just play around and demonstrate a path longer than the ones above.)
- How would your code change if the edges were weighted? (e.g. if television counted less than movies, or if weighting was by billing)
This assignment is due by 11:59pm, Thursday 4/14. Please send us (both Jordan and Yingying) an email with "[CCC Assignment 5]" as the subject. In the body of the email put answers to the questions above and any source code you created or modified.
Hints / Tips
- Dumbo does not have an easy way of accessing counters in driver programs, so determing convergence in Dumbo might be difficult (if you can figure out how, great, otherwise, you might want to consider the more mature Java interface).
- Before debugging on the whole dataset, first check to make sure you can handle small data.
- Be sure to differentiate between your two types of search frontiers.
- Don't forget that you'll need to be able to recover the actual paths you discovered.
- Be sure that you handle disconnected graphs well (think about how you'd detect that your algorithm isn't making progress).
- Nodes might not be unique - be careful!