Research

Artificial Intelligence

Fairness in Machine Learning

[Paper] "FairGRAPE: Fairness-aware GRAdient Pruning mEthod for Face Attribute Classification," (ECCV'22) [GitHub]

Existing pruning techniques preserve deep neural networks' overall ability to make correct predictions but may also amplify hidden biases during the compression process. We propose a novel pruning method, Fairness-aware GRAdient Pruning mEthod (FairGRAPE), that minimizes the disproportionate impacts of pruning on different sub-groups. Our method calculates the per-group importance of each model weight and selects a subset of weights that maintain the relative between-group total importance in pruning.

[Tutorial] "Fairness in Computer Vision: Datasets, Algorithms, and Implications," (FAccT'22) [Video]

The goal of this tutorial is to review the emerging literature of computer vision fairness such that the FAccT community, more broadly AI and CS communities, can be informed of the latest technical developments as well as challenging research questions on the topic.

Explainable Artificial Intelligence

[Paper] "Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention," (CVPR'22) [GitHub] [Video]

We propose a general framework, Latent Visual Semantic Explainer (LaViSE), to teach any existing convolutional neural network to generate text descriptions about its own latent representations at the filter level. Our method constructs a mapping between the visual and semantic spaces using generic image datasets, using images and category names. It then transfers the mapping to the target domain which does not have semantic labels. The proposed framework employs a modular structure and enables to analyze any trained network whether or not its original training data is available.

Applied Data Science

AI for Influencer Marketing

Brands recently have paid much attention to social influencer marketing that utilizes special individuals, 'influencers', who may be experts and thus convincing, and having a large number of followers. In this research project, we study the social influencer marketing on Instagram. The ultimate goal of this project is to find qualified influencers for brands by evaluating various aspects of influencers including their topics, effectiveness, credibility, social relationships, etc.

[Paper] "InfluencerRank: Discovering Effective Influencers via Graph Convolutional Attentive Recurrent Neural Networks," (ICWSM'23)

Hiring effective influencers is crucial in social influencer marketing, but it is challenging to find the right influencers among hundreds of millions of social media users. In this paper, we propose InfluencerRank that ranks influencers by their effectiveness based on their posting behaviors and social relations over time. To represent the posting behaviors and social relations, the graph convolutional neural networks are applied to model influencers with heterogeneous networks during different historical periods. By learning the network structure with the embedded node features, InfluencerRank can derive informative representations for influencers at each period.

[Paper] "Evaluating audience loyalty and authenticity in influencer marketing via multi-task multi-relational learning," (ICWSM'21)

Since influencer marketing has become an essential marketing method, influencer fraud behavior such as buying fake followers and engagements to manipulate the popularity is under the spotlight. To address this issue, we propose a multi-task audience evaluation model that can assess both the loyalty and authenticity of influencers’ audiences. More specifically, the proposed model takes engagement information of an influencer’s audience, including likes and comments on social media posts, and predicts (i) the retention rate of the audience of the influencer and (ii) how the influencer is associated with fake audiences (or engagement bots).

Influencer A has many loyal audiences who consistently make engagements, whereas Influencer B is connected to inauthentic audiences (engagement bots) who generate fake engagements.

[Paper] "Discovering Undisclosed Paid Partnership on Social Media via Aspect-Attentive Sponsored Post Learning," (WSDM'21) [Dataset]

The transparency issue of sponsorship disclosure in social media advertising posts has become a significant problem in influencer marketing. We propose a learning-to-rank based model, Sponsored Post Detector (SPoD), to detect sponsorship of social media posts by learning various aspects of the posts such as text, image, and the social relationship among influencers and brands. We apply attention mechanism over different aspects of the posts to utilize more important features for discovering undisclosed sponsorship. We further optimize the model by conducting manifold regularization based on temporal information and mentioned brands in posts.

An example of paid media that fails to disclose sponsorship.

[Paper] "Detecting Engagement Bots on Social Influencer Marketing," (SocInfo'20)

We analyze a social network of influencers and their audiences to identify bots that make fake engagements to influencers. Based on analyzing 65M engagements (e.g., likes and comments) from 9.2M users, we find that bots tend to have low local clustering coefficients and write short comments which are similar to each other. We further propose a neural network-based model that learns text, behavior, and graph representations of social media users to detect the engagement bots from audiences of influencers.

[Paper] "Multimodal Post Attentive Profiling for Influencer Marketing," (WWW'20) [Dataset]

We propose a multimodal deep learning model that uses text and image information from social media posts (i) to classify influencers into specific interests/topics (e.g., fashion, beauty) and (ii) to classify their posts into certain categories. We use the attention mechanism to select more relevant posts to influencers’ topics thereby generating useful representations of influencers. We conduct experiments on the data from Instagram which is the most popular social media for influencer marketing. The experimental results show that our proposed model achieves 98% and 96% accuracy in classifying influencers and their posts, respectively. Our model significantly outperforms existing user profiling methods.

[Paper] "How do influencers mention brands in social media? sponsorship prediction of Instagram posts," (ASONAM'19)

We study the brand mentioning practice of influencers. We find that (i) most influencers mention only a few brands in their posts; (ii) popular influencers tend to mention only popular brands while micro-influencers do not have a preference on brand popularity; (iii) audience have highly similar reactions to sponsored and non-sponsored posts; and (iv) compared to non-sponsored posts, sponsored brand mentioning posts favor fewer usertags and more hashtags with longer captions to exclusively promote the specific products.

[Paper] "How Are Social Influencers Connected in Instagram?," Social Informatics (SocInfo'17)

We analyze social relationships and interactions among influencers on Instagram. We find that influencers tend to have a large number of followers who are potential customers of brands, make reciprocal relationships with other influencers, and share common followers with other influencers. We also reveal that influencers who are connected to each other tend to share common followers.

AI for Mental Health

[Paper] "D-Vlog: Multimodal Vlog Dataset for Depression Detection," The 36th AAAI Conference on Artificial Intelligence (AAAI'22) [Dataset]

We present a multimodal depression dataset, D-Vlog, which consists of 961 vlogs (i.e., around 160 hours) collected from YouTube, which can be utilized in developing depression detection models based on the non-verbal behavior of individuals in real-world scenarios. We develop a multimodal deep learning model that uses acoustic and visual features extracted from collected data to detect depression.

Computer Networking Research (BitTorrent, DTN, etc)

Social Network Routing Protocol in Delay Tolerant Networks (2015~2016)

Social based routing has emerged as one of the most efficient routing solutions for Delay Tolerant Networks. In this research, we study a novel social network routing protocol that exploits the human friendship information collected from popular online social network service, Instagram, in order to perform routing. We use both social relationship information and location information by collecting list of followers and location tags from randomly selected users who reside around UCLA campus. By using the data from online social network services, we generated a new dataset trace that includes real social relation information in addition to users’ mobility patterns.

[Paper] "Socio-Geo: Social Network Routing Protocol in Delay Tolerant Networks," ICNC 2017

[Paper] "Hitchhiker: A wireless routing protocol in a delay tolerant network using density-based clustering," VTC-Fall 2018

Designing the Future Network Architectures and Protocols (2011~2014)

Since 2011, I have worked as a research engineer at KAIST Institute for Information Technology Convergence (KIITC) for 3 years. I have participated in several research projects whose research goals are to conduct research and develop the next generation networks. My central role in the projects was to design the network architectures and protocols and to implement system-level simulators. The targeted network in the projects are characterized by

  • A high density of nodes

  • Large amounts of data

  • A large number of personalized devices

Content Publishing and Downloading Practice (2011~2012)

BitTorrent has been popular over the last decade because of its good performance in terms of throughput and availability of popular contents (torrents) with no cost. Few studies have made serious efforts to understand who and why publish torrents, and what strategies are adopted by publishers.

In this research, I study the current content publishing practice in BitTorrent from a socio-economic point of view, by unraveling

  • How files are published by publishers

  • What strategies are adopted by publishers

  • How effective those strategies are

[Paper] "Content Publishing and Downloading Practice in BitTorrent," IFIP Networking 2012

Content Bundling Practice in BitTorrent (2010~2012)

BitTorrent has attracted the research community to investigate its behavior in terms of throughput, fairness and incentive issues, revealing valuable insights into the performance aspects of BitTorrent. However, most of these studies paid little attention to the internal structures of the torrents, rendering the following research questions under-appreciated by the research community: How are torrents structured by human beings, and for what purposes? Are there any differences in the way people participate in the swarms depending on the structures of the torrents?

I analyzed the collected datasets*, and particularly focused on:

  • How prevalent content bundling is

  • How and what files are bundled into torrents

  • What motivates publishers to bundle files

  • How users access the bundled files

* Datasets: 120 K torrents, 14.8 M peers, 77 days

(http://mmlab.snu.ac.kr/traces/bundling/)

[Paper] "Strategic Bundling for Content Availability and Fast Distribution in BitTorrent," COMCOM 2014

[Paper] "Bundling Practice in BitTorrent: What, How, and Why," ACM SIGMETRICS 2012

[Paper] "How Prevalent is Content Bundling in BitTorrent?," ACM SIGMETRICS 2011

[Paper] "An Empirical Study on Content Bundling in BitTorrent Swarming System," arXiv:1008.2574v1

Peer-to-Peer Networks in Mobile Environment (2009~2010)

Mobile peer-to-peer (P2P) traffic is rapidly growing, but present P2P applications are not specifically designed to operate under mobile conditions. To assess the performance of the prevalent file sharing application BitTorrent in a mobile WiMAX network, I carried out empirical traffic measurement of BitTorrent service in various settings (static, bus, and subway) in commercial WiMAX networks.

I analyzed the collected datasets*, and particularly focused on:

  • How BitTorrent peers perform from the aspect of connectivity, stability, and capability

  • How the BitTorrent protocol behaves depending on user mobility

  • How BitTorrent peers are distributed under mobile conditions

I found out the drawbacks of BitTorrent operations in mobile Internet are characterized by

  • Lower connection ratio (than wired host)

  • Unstable connections among peers

  • Higher control message overhead

* Datasets: (http://crawdad.org/snu/bittorrent)

[Paper] "Unveiling the BitTorrent Performance via the Mobile WiMAX Networks," PAM 2011

[Paper] "Measurement and Analysis of BitTorrent Traffic in Mobile WiMAX," IEEE P2P 2010

[Paper] "A Classification Scheme for the Peer Population of BitTorrent-like Peer-to-Peer Networks," ETS 2011

[Paper] "Damming the Torrent: Adjusting BitTorrent-like Peer-to-Peer Networks to Mobile and Wireless Environments," Advances in Electronics and Telecommunications 2011

Delay Tolerant Networks with Interconnected Nodes (2009)

In this research, I have evaluated three routing protocols (Epidemic, Spray & Wait, and Prophet) in Delay Tolerant Networks (DTN) under special environment. If there are some supportive infrastructures which are Internet accessible, which one is the best protocol in terms of delivery probability? The results of the work may help propose a new routing protocol under the specific environment. In order to evaluate the performance, I used the ONE simulator and implemented additional codes.

I found that Spray & Wait routing protocol is the most appropriate under the proposed DTN environment because it has the highest delivery probability as well as lower overhead than Prophet.