The following are general guidelines for the projects. 1. A project could be a report surveying a collection of papers on a topic. (i.e. describe the results and possibly a little about the techniques). A minimum of 5 papers would need to be surveyed. 2. A project could report on a smaller set of papers, and in addition implement and compare algorithms discussed in these papers, or inspired by these papers. For example, it is desirable to combine ideas from two papers or one paper and the ideas in the class, design a new heuristic algorithm, implement, and evaluate it on some data sets. 3. A project could report on several papers and seek to extend their results in a theoretical manner (e.g. obtain and analyze a new algorithm). We recommend forming teams of 2 or 3 people so that you have someone with whom to discuss the papers you are reading and the algorithms you are developing. The requirements above are based on a 2-person team. Clearly more work is expected from a larger team. A) Social and Information Network Analysis: The following is a list of potential projects related to social network analysis. If you decide to explore a new heuristic algorithm on some new data sets, you can find some real-world networks at the Stanford Network Lab. Some useful code for computing PageRank or clustering can be found here. A.1) [Community Detection and Graph Clustering] Large-scale graph clustering: Survey a list of implemented algorithms and tools for large-scale clustering of huge social networks. Focus on a handful of metrics such as modularity, conductance, density, clustering coefficients etc. Come up with one or two new heuristic algorithms, implement them for a set of large social networks, and compare with the result of publicly available tools (like Metis and GRACLUS) in terms of the above metrics. The following are the set of relevant papers (the ones with a ** below are especially useful to read): - Reid Andersen: A local algorithm for finding dense subgraphs. ACM Transactions on Algorithms 6(4): (2010)
- **Reid Andersen, Fan R. K. Chung, Kevin J. Lang: Local Partitioning for Directed Graphs Using PageRank. Internet Mathematics 5(1): 3-22 (2008)
# Link communities reveal multiscale complexity in networks, Nature.- Aaron Clauset, M Newman, and Cristopher Moore. Finding community structure in very large networks. Phys. Rev. E, 70(6):066111, Dec 2004.
- **GW Flake, S Lawrence, and CL Giles. Effi.cient identi.cation of web communities. Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, page 160, 2000.
- S Fortunato. Community detection in graphs. Physics Reports, 486(3-5):75{174, 2010.-- R Kumar, P Raghavan, S Rajagopalan, and A Tomkins. Trawling the web for emerging cyber-communities. Computer Networks, 31(11-16):1481{1493, 1999.
- Jure Leskovec, Kevin Lang, Anirban Dasgupta, and Michael Mahoney. Statistical properties of community structure in large social and information networks. WWW '08:Proceeding of the 17th international conference on World Wide Web, Apr 2008.
- **Jure Leskovec, Kevin Lang, and Michael Mahoney. Empirical comparison of algorithms for network community detection. WWW '10: Proceedings of the 19th international conference on World wide web, Apr 2010.
- MEJ Newman. Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23):8577, 2006.
- **MEJ Newman and M Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 69(2):26113, 2004.
- Gergely Palla, Imre Der.enyi, Ill.es Farkas, and Tam.as Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814{818, Jun 2005.
- Y Zhang, J Wang, Y Wang, and L Zhou. Parallel community detection on large networks with propinquity dynamics. Proceedings of the 15th ACM SIGKDD. Dec 2009.
The following clustering tools may prove useful if you want to compare your heuristic algorithm with an existing tool: The following two projects are two more specific topics in the area of A1. A.2) [Modulariy and Local Clustering] Combining modularity-based clustering and local clustering. Local clustering and modularity-based clustering are two important methods for clustering large networks. See the paper by Andersen et al. and the paper by Newman and Girvan for the specification of these algorithms. It would be interesting to combine them as follows: run a local clustering method to compute some large clusters and then run a modularity-based clustering method to sub-cluster these large clusters. The project is about a good implementation of this algorithm and comparing the outputs with local clusterings and modularity-based clustering only, and other possible clustering methods like Metis and GRACLUS. For related papers and tools, see A.1 above. A.3) [Overlapping Graph Clustering] A large part of the previous literature about clustering networks is concerned with non-overlapping clustering or partitioning of the network. In reality, communities in social networks may overlap. Developing formal algorithms to discover such overlapping communities is important. The goal of this project is to formalize new problems capturing the quality of overlapping clusters, design heuristic algorithms for them, and compare formal metrics from these heuristics with the metrics after running well-known tools like Metis and GRACLUS. For related papers and tools, see A.1 above. In particular, the paper by Leskovec, Lang and Mahoney is a good place to read about some relevant metrics and interesting algorithms. For this project, other than some of the papers mentioned above, it helps to read the following papers: - Reid Andersen, David F. Gleich, Vahab S. Mirrokni: Overlapping clusters for distributed computation. WSDM 2012: 273-282.
- Nina Mishra, Robert Schreiber, Isabelle Stanton, Robert Endre Tarjan: Finding Strongly Knit Clusters in Social Networks. Internet Mathematics 5(1): 155-174 (2008)
A.4) [Evolving Social Networks] In the class, we discussed various generative models for social networks. There are a number of models and empirical studies describing the evolution of social networks over time. The following papers are a subset of papers in this direction. The desirable goal is to survey these models, and empirical studies and possibly study it for a new setting. - TY Berger-Wolf and J Saia. A framework for analysis of dynamic social networks. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 523-528, 2006.
- Ronald Burt. Structural holes versus network closure as social capital. Chapter in Social Capital: Theory and Research, pages 1-30, Aug 2000.
- Dennis Fetterly, Mark Manasse, Marc Najork, and Janet Wiener. A large-scale study of the evolution of web pages. WWW '03: Proceedings of the 12th international conference on World Wide Web, May 2003.
- S Garg, T Gupta, N Carlsson, and A Mahanti. Evolution of an online social aggregation network: An empirical study. Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference, pages 315-321, 2009.
- Jon Kleinberg, Siddharth Suri, Ev. a Tardos, and Tom Wexler. Strategic network formation with structural holes. SIGecom Exchanges, 7(3), Nov 2008.
- Jon Kleinberg, Siddharth Suri, Eva Tardos, and Tom Wexler. Strategic network formation with structural holes. EC '08: Proceedings of the 9th ACM conference on Electronic commerce, Jul 2008.
- G Kossinets and Duncan Watts. Empirical analysis of an evolving social network. Science, 311(5757):88, 2006.
- **R Kumar, J Novak, and A Tomkins. Structure and evolution of online social networks. Link Mining: Models, Algorithms, and Applications, pages 337-357, 2010.
- **Jure Leskovec, Lars Backstrom, Ravi Kumar, Andrew Tomkins: Microscopic evolution of social networks. KDD 2008: 462-470
- **Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, Aug 2005.
- C Tantipathananandh and T Berger-Wolf. Constant-factor approximation algorithms for identifying dynamic communities. Proceedings of the 15th ACM SIGKDD.
- C Tantipathananandh, T Berger-Wolf, and David Kempe. A framework for community identification in dynamic social networks. Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 717-726, 2007.
A.5) [Link Prediction in Social Networks]: An interesting and challenging problem in social network analysis is to predict the missing links in a social network. These techniques have applications in friend suggestions and recommendation systems. It is interesting to survey the current empirical and algorithmic approaches to this problem, and possibly to implement some new methods on new data sets and compare them with other new techniques. - **Lars Backstrom, Jure Leskovec: Supervised random walks: predicting and recommending links in social networks. WSDM 2011: 635-644
- Moore, and M Newman. Hierarchical structure and the prediction of missing links in networks. Nature, Dec 2008.
- J.erome Kunegis and Andreas Lommatzsch. Learning spectral graph transformations for link prediction. ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning, Jun 2009.
- Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. Predicting positive and negative links in online social networks. WWW '10: Proceedings of the 19th international conference on World wide web, Apr 2010.
- **David Liben-Nowell and Jon Kleinberg. The link prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7):1019-1031, Jan 2007.
- S Redner. Networks: Teasing out the missing links. Nature, Dec 2008.
- HH Song, TW Cho, V Dave, Y Zhang, and L Qiu. Scalable proximity estimation and link prediction in online social networks. Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference, pages 322-335, 2009.
- Panagiotis Symeonidis, Eleftherios Tiakas, and Yannis Manolopoulos. Transitive node similarity for link prediction in social networks with positive and negative links. RecSys'10: Proceedings of the fourth ACM conference on Recommender systems, Sep 2010.
Some useful data sets for real-world social networks can be found here. B) Practical/Hybrid Problems in E-commerce, Algorithmic Game Theory, and Online Advertising: B.1) [Externalities in Ad Auctions: The simplest form of the second-price ad auctions does not consider externalities in click-through rate among ads. In reality, there are various types of externalities (both negative and positive) and there is some research work on this topic. The goal of the project is to survey these results and possibly report some data analysis on some synthetic data set regarding the difference between the ad auctions with and without externalities. The following are examples of relevant papers for this topic: - David Kempe, Mohammad Mahdian: A Cascade Model for Externalities in Sponsored Search. WINE 2008: 585-596
- Arpita Ghosh, Mohammad Mahdian: Externalities in online advertising. WWW 2008: 161-168
- Renato Gomes, Nicole Immorlica, Evangelos Markakis: Externalities in Keyword Auctions: An Empirical and Theoretical Assessment. WINE 2009: 172-183
B.2) [Rank Aggregation] An important algorithmic topic in search that we will not cover in the course is rank aggregation. Considering this problem from the perspective of an axiomatic approach, it is related to Arrow’s impossibility result, and from the perspective of an optimization approach, it includes methods related to Kendall and Footrule distances, and Kemeny aggregation. The desirable goal is to survey these results and possibly compare some of them on some synthetic data sets. - **C. Dwork, R. Kumar, M. Naor, D. Sivakumar. Rank aggregation methods for the Web. 10th International World Wide Web Conference, May 2001
- **N. Ailon, M. Charikar, and A. Newman, Aggregating Inconsistent Information: Ranking and Clustering, Proceedings of the 37th ACM Symposium on Theory of Computing, 2005.
- ** R. Fagin, R. Kumar, and D. Sivakumar, Efficient similarity search and classification via rank aggregation. Proc. ACM SIGMOD Conference (SIGMOD '03), pp. 301-312, 2003.
- R. Fagin, R. Kumar, M. Mahdian, D. Sivakumar, and E. Vee, Comparing and aggregating rankings with ties. Proc. ACM Symposium on Principles of Database Systems (PODS '04), pp. 47-58, 2004.
- Nir Ailon, Aggregation of Partial Rankings, p-Ratings and Top-m Lists, SODA 2007.
- Claire Kenyon-Mathieu and Warren Schudy. How to rank with few errors: A PTAS for Weighted Feedback Arc Set on Tournaments, STOC 2007
C)Theoretical (or Hybrid) Problems: The following is a list of open theoretical problems. It is sufficient to survey the current results on a topic, and it would be great to make some even modest progress toward solving these new directions. C.1) [Algorithmic Game Theory] In a recent work, Mirrokni, Nithum, and Vetta study look-ahead best-response dynamics in which players players play non-myopic best responses by predicting a constant number of look-ahead actions. We have already analyzed congestion games, valid-utility games, adword auction games, and Cournot duopoly. It would be nice to study this game or other potential games such as network design games, market sharing games, or other non-potential games such as anonymous games. The paper is soon going to be on Arxiv, and a link to the paper will be posted when the paper is available online. C.2) [Graph Clustering, Spectral Graph Theory] In the following paper in WSDM 2012, - Reid Andersen, David F. Gleich, Vahab S. Mirrokni: Overlapping clusters for distributed computation. WSDM 2012: 273-282.
we have explored the advantage of using overlapping clustering for distributed computation. For details of why overlapping clustering helps in distributed computation, read the paper. The rough intuition is as follows: Consider a clustering of a graph where each cluster has volume at most B, and a random surfer going through nodes of the graph. Whenever the surfer needs to move to another cluster, it does so. We are interested in computing a clustering such that the random surfer needs to change clusters as linfrequently as possible, i.e., the goal is to minimize the fraction of times that the random surfer changes cluster in a long enough random walk. If we are allowed to choose an overlapping set of clusters, we can simulate the above process by associating each node v to a master cluster (among the clusters that include this node v), and simulate the random surfer as follows: whenever the surfer is moving to a node v outside the current cluster, it would move to the master cluster of node v. Such overlapping clustering may need fewer changes of clusters compared to non-overlapping clustering, especially if we are allowed to choose arbitrary number of clusters. The question is for what type of graphs can one gain a lot by using overlapping clustering. We can only prove a gain for special graphs like cycles, but empirically observe it for other graphs. It would be interesting to explore this question for other families of graphs, report more complete empirical results, and more importantly report analytical results in this regard. One conjecture is that this is provably true for planer graphs. C.3) [Algorithmic Game Theory] (VERY HARD) Efficient Coordination mechanisms have been developed for machine scheduling problems both for the makespan social function and for the sum of completion times. We discussed these results in the class, and mentioned the recent papers on the topic. Extending these coordination mechanisms to the selfish routing game is an interesting problem. To understand the context for applying the idea of coordination mechanisms to selfish games, see the WINE 2009 paper by Hoefer, Mirrokni, Roeglin, and Teng on "Selfish Routing over time". An interesting and important research direction in algorithmic game theory is to come up with a coordination mechanism for edges of a network such that the price of anarchy of the resulting network is good. C.4) [Computational Game Theory](HARD) In a STOC 2008 paper, named "The Myth of the Folk Theorem", we present two complexity results about computing equilibria in repeated games: 1) we show computing a NE in repeated 3-player games is PPAD-hard, and 2) we show that finding the threat point for a family of games including congestion games is NP-hard. To learn about the threat point definition, see the STOC 2008 paper. An interesting question is to study the complexity of computing NE in other special games, e.g., anonymous games, market sharing games, valid-utility games, network design games, etc. C.5) [Combinatorial Optimization, Social Networks] (VERY HARD) Submodular maximization applied to Social Networks. Both monotone and nonmonotone submodular maximization have been used in the context of maximizing influence & maximizing revenue over social networks [in two papers: influence maximization in a paper by Kempe, Kleinberg, Tardos, and revenue maximization in a paper by Hartline, Mirrokni, Sundararajan]. The algorithms for these special cases achieve approximations of 0.63 and 0.40, respectively, in line with the known results for submodular maximization. Can one improve the approximation factor for these problems? C.6) [Scheduling for data centers] This topic is not covered in the class but is related to the algorithms and economics of the Internet. Consider a set J of jobs to be scheduled on a set M of machines. Each job j \in J has a size s_j a release time r_i, and a processing time p_j. Each machine i \in M has a total capacity C_i. Our goal is to process the jobs on these machines and minimize the average (or the maximum?) number of machines needed. As a starting point, we wish to solve the problem for unit-size jobs with arbitrary processing times. Online Scheduling: Consider the above scheduling problem in the online setting where jobs arrive online, i.e., their release time is equal to their arrival time. At the time of arrival (or within a short deadline), we should assign each job to a machine to be scheduled on that machine. Our goal is to do this online allocation of jobs to machines and to minimize the average number of machines used in this process. We are interested in both offline and online scheduling algorithms that minimize the number of machines used. This has important applications in the cost-effective running of large data centers. C.7) [Algorithmic Game Theory, Algorithms, Combinatorics] Consider the following game manipulation problem. Let G be a directed graph where the nodes represent players of a game, and an edge from u to v means that u can beat v in the game. (If an edge (u, v) is not present, one cannot have u and v play each other.) Given G and a “favorite” node A, is it possible to set up the bracket of a balanced single-elimination tournament so that A is guaranteed to win, if match results occur as predicted by G? In a recent paper (available here), it has been shown that the problem is NPcomplete for general graphs. For the case when G is a tournament graph, several conditions has been given on the desired winner A which ensure that there exists a balanced single-elimination tournament which A wins, and it can be found in polynomial time. Question: Can one give an approximation algorithm to maximize the rank of a "favorite node" in a tournament graph G within a constant additive ranking error? Is the problem poly-time solvable for other graphs? |