Online social networks (OSNs) such as Facebook and Twitter are new generation of Internet applications and account for a great amount of user dwell time on the Web. Meanwhile, social users also contribute huge amount of content, often known as user generated content (UGC), such as blogs, digital photos and videos, etc. My research in this area includes three aspects:
User behavior analysis and security/privacy issues in online social networks;
Algorithms and systems on social search;
Algorithms and models on social content triggering and placement for Web search engines.
Most of my works are based on industry systems and large scale user data.
Users are basic elements in OSNs. Understanding how users consume and generate UGC objects is imperative to social network industry for the success of their business. My research on social user analysis focus on the statistical properties of user behavior patterns in online social networks, such as patterns of user content generation and online dwell time. We find that user posting behavior in social networks follows stretched exponential distributions instead of commonly assumed power law distributions, where the stretch factor has strong negative correlation with the effort to create an UGC object. Our studies on instant messaging (IM) traffic also shows similar distribution patterns on user activities in IM networks. Based on this model, we have proposed a statistical model to identify top users or top contributors in a social network system.
Online social networks have become an important vehicle for the distribution of spams, malicious content, and phishing sites in Web and social media. Current abnormality detection-based security approaches respond slowly with newly generated spams and malwares. With advanced information retrieval technologies and graph theory methods, it is possible to analyze the social connections of spam/malware distributors and their customers that are “hidden” in the Web and social media. Our purpose is to proactively crawl the “customer networks” of spams and malwares, build information systems to index and process the crawled graphs, and identify malicious content before it is widely spread and prevent its distribution in the early stage. With the analysis of user behavior features and patterns, we are working on spam and malicious content detection in social media now.
My research on social search algorithms and systems focus on UGC content search and high quality knowledge discovery in social media. UGC and social content are usually less structured than Web pages, so that link based algorithms such as PageRank and HITS are not effective. However, social users often organize and rate/comment the content that they consume or produce on the Web through a number of methods, such as bookmarking, tagging, and liking. We have developed ISID, a tag-based social interest discovery system to discover the common user interests and manage users and their created content by different interest topics. Currently, we are working on algorithms and models for high quality knowledge mining and information retrieval in social media, such as query/answer, named entity extraction, etc.
My most recent work on social networks is to serve users a social and personalized experience in Bing Search, based on the Likes of their Facebook friends. Facebook contains huge amount of information about web pages liked by it users, which are useful clues to help a user make decision when search. We have developed machine learning models and algorithms for personalized search result triggering/placement and like farm filtering of Facebook Likes in Bing Search, which was launched in October 2010 (See Bing announcement, New York Times report, and Wall Street Journal report for this system).
Enhua Tan, Lei Guo, Songqing Chen, Xiaodong Zhang, and Yihong (Eric) Zhao, “Online Spam Detection in Blogs: A Behavior-based Approach”, in submission.
Lei Guo, Enhua Tan, Songqing Chen, Xiaodong Zhang, and Yihong (Eric) Zhao, “Analyzing Patterns of User Content Generation in Online Social Networks”, Proceedings of the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2009, research track), Paris, France, June 28-July 1, 2009, pp.369-377 (long research paper acceptance rate: 9.5%). [Slides] [Citations]
Xin Li, Lei Guo, and Yihong (Eric) Zhao, “Tag-based Social Interest Discovery”, Proceedings of the 17th International World Wide Web Conference (WWW 2008), Beijing, China, April 21-25, 2008, pp. 675-684 (acceptance rate: 11%). [Slides] [Citations]
Zhen Xiao, Lei Guo, and John Tracey, “Understanding Instant Messaging Traffic Characteristics”, Proceedings of the 27th International Conference on Distributed Computing Systems (ICDCS 2007), Toronto, Canada, June 25-29, 2007 (acceptance rate: 13.5%). [Slides] [Citations]
With the explosive increase of video content on the web, video traffic has dominated the Internet and keeps increasing, generating a number of problems to both academia and industry, from system design, traffic engineering, to data management:
Streaming media delivery has high demands on CPU and bandwidth resources. What is the most important issue to design a scalable streaming system? Is caching an effective way to improve performance of a media system?
To provide high quality streaming services to users, content delivery networks (CDNs) have been widely used, but the cost is high. P2P networks are scalable and of low cost, however, the service quality is often unstable. Can P2P networks provide high quality streaming services?
Social networks such as YouTube have generated huge amount of video content and video traffic. How to manage these video objects and search related content in the network?
All these problems can be attributed to or highly related to the activity patterns of Internet users: in today’s Internet, not only media traffic is driven by user requests, but also media content is created by common users. Furthermore, the majority of Internet traffic is conveyed via overlay networks self-organized by common users, i.e., peer-to-peer networks. Thus, understanding traffic patterns, especially user activity patterns, is essential to design, manage, and evaluate Internet media delivery systems as well as content systems.
Analyzing workloads collected with large scale Internet measurements, we have proposed a general model of Internet media access patterns and studied the performance of media caching systems. We found unlike Web objects, whose access pattern is Zipf-like, the requests of media objects follow the stretched exponential distribution, whose parameters are determined by file size, client request rate, object birth rate, and time related factors in a media system. Our model indicates the performance of media caching systems is far less effective than that of Web caching systems, unless the cache size is extremely large. However, in a long-term, there is great potential to improve the caching performance, but it may take a long time and consume a great amount of storage.
The service quality of P2P systems, such as BitTorrent, is often unstable. Although the total amount of media traffic on the Internet always keeps increasing, the request rate of individual media files decreases with time. With the analysis of representative BitTorrent traffic, we have modeled the evolution of single torrent systems, and get the torrent lifespan constrained by file popularity decaying. Modeling interactions among multiple torrents, we find inter-torrent collaboration is much more effective than stimulating seeds to serve longer for addressing the service quality problems. We have also proposed an approach of inter-torrent collaboration under a “tit-for-tat” based instant incentive mechanism, in order to make BitTorrent a reliable and efficient content delivery vehicle.
A number of techniques for streaming media have been proposed and utilized in commercial media systems. In order to gain insights into current streaming services and thus provide guidance on designing resource-efficient and high quality streaming media systems, we have collected a large streaming media workload from thousands of broadband home users and business users hosted by a major ISP, and analyzed the most commonly used streaming techniques such as pseudo streaming, automatic protocol switch, Fast Streaming, MBR encoding and rate adaptation. Our measurement and analysis results show that with these techniques, current streaming systems tend to over-utilize CPU and bandwidth resources to provide better services to end users, which may not be a desirable and effective way to improve the quality of streaming media delivery.
Lei Guo, Enhua Tan, Songqing Chen, Zhen Xiao, and Xiaodong Zhang, “The Stretched Exponential Distribution of Internet Media Access Patterns”, Proceedings of the 27th Annual ACM Symposium on Principles of Distributed Computing (PODC 2008), Toronto, Canada, August 18-21, 2008, pp. 283-294 (acceptance rate: 30.3%). [Slides] [Technical Report] [Citations]
Lei Guo, Enhua Tan, Songqing Chen, Zhen Xiao, and Xiaodong Zhang, “Does Internet Media Traffic Really Follow Zipf-like Distribution?”, Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS 2007, extended abstract), San Diego, California, USA, June 12-16, 2007, pp. 359-360. [Poster] [Citations]
Lei Guo, Songqing Chen, Zhen Xiao, Enhua Tan, Xiaoning Ding, and Xiaodong Zhang, “A Performance Study of BitTorrent-like Peer-to-Peer Systems”, IEEE Journal on Selected Areas in Communications (JSAC), Vol. 25, No. 1, 2007, pp. 155-169. [Citations]
Lei Guo, Enhua Tan, Songqing Chen, Zhen Xiao, Oliver Spatscheck, and Xiaodong Zhang, “Delving into Internet Streaming Media Delivery: A Quality and Resource Utilization Perspective”, Proceedings of Internet Measurement Conference (IMC 2006, long paper), Rio de Janeiro, Brazil, October 25-27, 2006, pp. 217-230 (long paper acceptance rate: 12.3%). [Slides] [Citations]
Lei Guo, Songqing Chen, Zhen Xiao, Enhua Tan, Xiaoning Ding, and Xiaodong Zhang, “Measurements, Analysis, and Modeling of BitTorrent-like Systems”, Proceedings of Internet Measurement Conference (IMC 2005, long paper), Berkeley, California, USA, October 19-21, 2005, pp. 35-48 (long paper acceptance rate: 14.9%). [Slides] [Citations]
Lei Guo, Songqing Chen, Zhen Xiao, and Xiaodong Zhang, “Analysis of Multimedia Workloads with Implications for Internet Streaming”, Proceedings of the 14th International World Wide Web Conference (WWW 2005), Chiba, Japan, May 10-14, 2005, pp. 519-528 (acceptance rate: 14%). [Slides] [Citations]
In a P2P system, two neighbors in the overlay can be far from each other in the underlying network, causing extra message delays and redundant cross domain traffic on the Internet. We have also studied topology-ware P2P overlay on the Internet for VoIP and file sharing. We have investigated the performance of the Skype VoIP system with intensive Internet measurements and found relay peer selections do not take Autonomous System (AS) topology into consideration, resulting in long waiting time and unnecessary probing. We propose an AS-aware peer-relay protocol called ASAP, which significantly improves VoIP quality and system scalability with low overhead. For large file distribution, we have proposed TopBT, a topology-aware and infrastructure-independent BitTorrent client to minimize redundant traffic for data transmission, the software has become an open-sourced project and free for download.
Shansi Ren, Enhua Tan, Tian Luo, Songqing Chen, Lei Guo, and Xiaodong Zhang, “TopBT: A Topology-Aware and Infrastructure-Independent BitTorrent Client”, Proceedings of the 29th IEEE Conference on Computer Communications (INFOCOM 2010), San Diego, California, March 15-19, 2010 (acceptance rate: 17.5%). TopBT is an open sourced software (download). [Citations]
Shansi Ren, Lei Guo, and Xiaodong Zhang, “ASAP: an AS-Aware Peer-Relay Protocol for High Quality VoIP”, Proceedings of 26th International Conference on Distributed Computing Systems (ICDCS 2006), Lisboa, Portugal, July 4-7, 2006 (acceptance rate: 13.8%). [Slides] [Citations]
Shansi Ren, Lei Guo, Song Jiang, and Xiaodong Zhang, “SAT-Match: A Self-Adaptive Topology Matching Method to Achieve Low Lookup Latency in Structured P2P Overlay Networks”, Proceedings of 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), Santa Fe, New Mexico, USA, April 26-30, 2004 (acceptance rate: 31.8%). [Slides] [Citations]
With the advancement of mobile technologies, WiFi and cellular networks have become an important part of Internet access, demanding high performance, energy efficient data communication protocols. We have proposed PSM-throttling for power efficient bulk data transmission over wireless Internet, such as streaming media. By synchronizing the sleep period of network interface on a mobile client to the data sending rate on the server, PSM-throttling minimizes the energy consumption on large data transmission without compromising communication performance. PSM-throttling has been re-implemented to support different wireless devices Ahmad Nazir Raja, Energy Efficient Client-centric Shaping of Multi-flow TCP Traffic).
For multi-rate WLANs, we have proposed Cooperative Relay Service (CRS), a protocol that utilizes the idle communication power of high channel rate stations to relay data frames between a low channel rate mobile client and Access Point. Improving the throughput and energy per bit of both client and proxy stations, CRS is a win-win solution to all wireless stations in the system. We have also proposed CUBS, an application independent and ISP transparent system to coordinately utilize the idle upload bandwidth of neighbors in a residential network.
Enhua Tan, Lei Guo, Songqing Chen, and Xiaodong Zhang, “CUBS: Coordinated Upload Bandwidth Sharing in Residential Networks”, Proceedings of the 17th IEEE International Conference on Network Protocols (ICNP 2009), Princeton, New Jersey, USA, October 13-16, 2009, pp. 193-202 (acceptance rate: 18.2%). [Slides] [Citations]
Enhua Tan, Lei Guo, Songqing Chen, and Xiaodong Zhang, “PSM-throttling: Minimizing Energy Consumption for Bulk Data Communications in WLANs”, Proceedings of the 15th IEEE International Conference on Network Protocols (ICNP 2007), Beijing, China, October 16-19, 2007, pp. 123-132 (acceptance rate: 14.5%). [Slides] [Citations]
Enhua Tan, Lei Guo, Songqing Chen, and Xiaodong Zhang, “SCAP: Smart Caching in Wireless Access Points to Improve P2P Streaming”, Proceedings of the 27th International Conference on Distributed Computing Systems (ICDCS 2007), Toronto, Canada, June 25-29, 2007 (acceptance rate: 13.5%). [Slides] [Citations]
Lei Guo, Xiaoning Ding, Haining Wang, Qun Li, Songqing Chen, and Xiaodong Zhang, “Cooperative Relay Service in a Wireless LAN”, IEEE Journal on Selected Areas in Communications (JSAC), Vol. 25, No. 2, 2007, pp. 355-368. [Citations]
Lei Guo, Xiaoning Ding, Haining Wang, Qun Li, Songqing Chen, and Xiaodong Zhang, “Exploiting Idle Communication Power to Improve Wireless Network Performance and Energy Efficiency”, Proceedings of the 25th IEEE Conference on Computer Communications (INFOCOM 2006), Barcelona, Spain, April 23-29, 2006 (acceptance rate: 18%). [Slides] [Citations]
Streaming media accounts for the majority of network traffic on the Internet. Based on our research on Internet media traffic modeling and analysis, we have studied current streaming systems on the Internet, including media proxy systems, streaming servers, and live streaming systems. To efficiently deliver streaming media in large scale, we have proposed PROP, a scalable and reliable P2P-assisted media proxy system, which utilizes P2P sharing to provide redundancy and scalability, and a dedicated proxy to provide reliability. In order to support interactive access of a streaming media object, we have proposed DISC, a dynamic interleaved segment caching algorithm on a media proxy to speed up the jump operation of streaming media clients. Analyzing PPLive data exchange patterns, we have proposed IDEA, an Improved peer Data Exchange Algorithm, which significantly improves the efficiency of live video delivery by randomizing chunk requests and reducing data contention.
Yao Liu, Fei Li, Lei Guo, and Songqing Chen, “Reducing Data Request Contentions for Improved Streaming Quality”, Proceedings of the 20th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV 2010), Amsterdam, the Netherlands, June 2-4, 2010. [Citations]
Yao Liu, Lei Guo, Fei Li, and Songqing Chen, “A Case Study of Traffic Locality in Internet P2P Live Streaming Systems”, Proceedings of the 29th International Conference on Distributed Computing Systems (ICDCS 2009), Montreal, Quebec, Canada, June 22-26, 2009, pp. 423-432 (acceptance rate: 16.3%). [Citations]
Lei Guo, Songqing Chen, and Xiaodong Zhang, “Design and Evaluation of a Scalable and Reliable P2P Assisted Proxy for On-demand Streaming Media Delivery”, IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 18, No. 5, 2006, pp. 669-682. [Citations]
Lei Guo, Songqing Chen, Zhen Xiao, and Xiaodong Zhang, “DISC: Dynamic Interleaved Segment Caching for Interactive Streaming”, Proceedings of the 25th International Conference on Distributed Computing Systems (ICDCS 2005), Columbus, Ohio, USA, June 6-10, 2005, pp. 763-772 (acceptance rate: 13.8%). [Citations]
Lei Guo, Songqing Chen, Shansi Ren, Xin Chen, and Song Jiang, “PROP: a Scalable and Reliable P2P Assisted Proxy Streaming System”, Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS 2004), Tokyo, Japan, March 23-26, 2004, pp. 778-786 (acceptance rate: 17.7%). [Slides] [Citations]
Peer-to-peer networks are self-organized systems without a central point for global peer information. Flooding is the most commonly used method for content search and message propagation. We have proposed LightFlood, a lightweight broadcast scheme to minimize redundant traffic in the overlay level, without compromising the coverage of reached peers. We have also proposed CAP-SPIRP, a fast and low-cost P2P search, a fast and low-cost P2P searching algorithm by exploiting content localities in peer communities.
Song Jiang, Lei Guo, Xiaodong Zhang, and Haodong Wang, “LightFlood: Minimizing Redundant Messages and Maximizing Scope of Peer-to-Peer Search”, IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 19, No. 5, 2008, pp. 601-614. [Citations]
Lei Guo, Song Jiang, Li Xiao, and Xiaodong Zhang, “Fast and Low Cost Search Schemes by Exploiting Localities in P2P Networks”, Journal of Parallel and Distributed Computing (JPDC), Vol. 65, Issue 6, 2005, pp. 729-742. [Citations]
Lei Guo, Song Jiang, Li Xiao, and Xiaodong Zhang, “Exploiting Content Localities for Efficient Search in P2P Systems”, Proceedings of 18th International Symposium on Distributed Computing (DISC 2004), Amsterdam, Netherlands, October 4-8, 2004, pp. 349-364 (acceptance rate: 21.8%). [Slides] [Citations]
Song Jiang, Lei Guo, and Xiaodong Zhang, “LightFlood: an Efficient Flooding Scheme for File Search in Unstructured Peer-to-Peer Systems”, Proceedings of 32th International Conference on Parallel Processing (ICPP 2003), Kaohsiung, Taiwan, China, October 6-9, 2003, pp. 627-635 (acceptance rate: 35.9%). [Slides] [Citations]
[Research projects] [Publications] [Industrial systems] [Patents] [Selected and representative papers]
accesses since March 26, 2006.