1. Promote internet freedom: The goal is to investigate potential adversaries' behaviors in privacy systems and IoT devices by leveraging advances in AI solutions and propose the defender to protect the privacy of users. Traffic analysis is a powerful technique to infer online user activities based on traffic pattern exposed to network packet sizes and timing according to the user activities such as "the user visits google.com" and "the user watches xxx video".  This style of the attack has been proved to be applicable to most secure network system including HTTPS and Tor. We apply advanced machine learning models to identify the user interaction with devices based on network traffic.

[S&P 2022] DeepCoFFEA adapts the triplet network architecture as a feature extractor to enable full pairwise comparisons at a cost that is linear rather than quadratic with the number of flows. Further, by splitting flows into a small number of windows and extracting features for each window, DeepCoFFEA creates multiple semi-independent correlation tests that can be combined to amplify differences between matched pairs of flows and unmatched pairs and thereby lower the false positive rate.  

[PETS 2021] We introduce a novel attack, GANDaLF, using GANs in the semi-supervised setting, in which the generator minimizes the difference between real trace and fake trace distribution while the discriminator is trained to distinguish between real and fake samples and, further, improve classification over the labeled set, by leveraging both labeled and unlabeled traces. Because it requires only a small amount of labeled data, we investigated the applicability of this variant of GANs in the low-data setting for WF attacks. Furthermore, we evaluated GANDaLF by considering both sites’ index and non-index pages using various experimental scenarios.

[PETS 2019] We extensively explored the effectiveness of DNNs in three different applications: automated feature engineering, fingerprinting attacks, and fingerprintability prediction. As a feature extractor, lower dimensional representations learned by an AE, made state-of-the- art WF attacks more effective as well as efficient. For fingerprinting attacks, DNNs performed well across var- ious traffic datasets and different fingerprinting tasks, as well as against recent WF defenses. Lastly, we have shown that several features of a website’s HTML-level design influence its fingerprintability by DNN models, leaving the possibility for future work on WF defense using HTML features.

[PETS 2017] Keyword fingerprinting identifies search engine queries over Tor, using new feature sets focusing on incoming packets in the response portion of a search query trace. We performed feature analysis to select appropriate new features for this classification task, and analyzed the effect of several variations on the attack, including the choice of classifier, size and contents of the monitored set, the size and contents of the background training set, and the search engine and query method. Across these variations, the results show acceptable performance and suggest that new work is needed to understand how to defend against keyword fingerprinting attacks, given the importance of protecting the contents of search engine queries.

2.Build privacy-enhancing AI: AI has been provided "as a service" which employs a variety of attacks against it. Our lab aims to throughly investigate these attacks and further design the model to improve the robustness of the model against such attacks. The most well-know attack is intentionally perturbed input (x') to fool the classifier to misclassify as some target label, which is known as adversarial examples. In particular, we are interested in adversarial examples in different channel such as voice. For short-term, our goal is to generate good adversarial audio examples "over-the-air" and for the long-term, we pursue an universal adversarial audio patch (e.g., background music) to fool the voice recognition system.

3.Deepfake: Generative Adversarial Networks(GANs) can be used to impersonate people by generating realistic deepfake images and videos. Recent deepfakes are so good that a typical person can no longer distinguish between real and fake content. Cybercriminals can maliciously use these contents for financial fraud, spreading misinformation, and discrediting celebrities and politicians. In response, researchers have developed deepfake detectors by identifying object- or pixel-level artifacts created during generation. Despite this growing literature, the extent to which these techniques can detect content forgeries in the real world is still questionable.